Big Data is not foreign to security analytics professionals. In fact, it is very common to see organizations dumping logs and events into a Hadoop system or another such Big Data platform.
But many of them struggle to reap the benefits and some are stuck in the “collection phase” and the creation of “data lakes”. Landing the data is always the first phase, and that tends to be successful; because it’s easy. It’s the next phase, the analytics phase-that’s hard.
While all this is happening, on the other “side” of the data management arena, the NoSQL world has perfected the ability to represent complex and changing data while continuously running useful analytics. The NoSQL market has been growing like wildfire and it is taking over many domains. Now, it is also starting to be used in the security analytics realm.
First – what exactly is NoSQL? The definition on Wikipedia is:
Both the name and this definition are terrible – they define NoSQL as a negation from relational but do not say anything about what it is. Digging in, NoSQL is a class of databases that emphasizes simplicity of design, usage, scaling and implementation and can easily support flexible and rich data. Thus, NoSQL databases are much more suited to modern data applications than are their relational brethren.
These databases have actually been around for a very long time (some from before relational databases) but the modern reincarnation of these storage systems came from companies such as Google, Facebook and Amazon; and these modern databases are really taking over – they are by far the fastest growing data platforms (our SonarG analytic warehouse is also an example of an ultra-efficient NoSQL Big Data warehouse and analytics platform for security events).
At this point you are wondering what this has to do with me – why would I care that the data management world is in a “NoSQL upheaval”? Well – if you’ve been working in Security Analytics for any length of time you know that Security Analytics is actually all about Big Data – so here are the top five reasons you should care about NoSQL:
Reason #1. JSON is everywhere and a key component of NoSQL databases
Reason #2. Flexible data is what security needs – use first, clean later
One of the biggest advantages NoSQL has is that it supports flexible (variety) data. Not only is JSON a format that allows you to easily represent diverse and complex data sets (through hierarchies and arrays all within a single document), collections of JSON documents do not have to be uniform. Each JSON document can be different and this poses no problem for the data store. Events can be stored even when they come from different sources or different versions of systems and they do not need to be “fixed”. The whole data management cycle is reversed – you can use data as soon as it is created – you do not need to first normalize it, throw away large pieces from it (because it has a different structure or fields), clean it etc. Reversing the pattern and allowing you to use the data (and all of the data) before doing the “hard work” of cleansing/normalizing is key to being able to derive value from your investment quickly.
Reason #3. The NoSQL “query language” is more suited to security analytics than anything else
The NoSQL query language is usually a data flow and aggregation pipeline language. This makes it very easy to do complex security analytics of large data sets and is far more powerful in combining querying with ETL and with analytics. For example, a complex query into a standard relational database might take hours vs. minutes for a NoSQL, columnar-based database.
Reason #4. NoSQL warehousing can “Drain” the security big data lakes
While the focus of Big Data has often started with creating “big data lakes”, many organizations have learned that landing the data is easy – but sometimes worthless. NoSQL makes data access and analysis easy and very often focuses on using the data quickly rather than just storing it.
Reason #5. Managing big data must be cost-effective and efficient
Everything can scale given enough hardware (well, not everything, but still…). NoSQL analytic warehouses focus on efficiency, not just scale. Running analytics on 50TB of data spread over five nodes is much better than running the same analytics on 25 nodes – and NoSQL stresses efficiency and small clusters alongside low latency.
Security Analytics is hard – but it is not harder than other types of analytics. Therefore, it is important to learn from data management trends – and specifically the “marriage” between NoSQL and Big Data – helping make Security Analytics less of a journey and more of a destination.