Skip to content

Big Data Analytics for Cybersecurity

Big data is all the rage. Big data is purported to be the next big thing in information technology, and it promises to solve problems that have defied solutions until now.

According to IBM’s definition of big data, it’s about capturing every bit of unstructured information flowing over computer networks, organizing it in ways that make sense for you through analytics, and then performing actions based on the results (IBM).

Perhaps no industry needs big data more than cybersecurity.

The volume of attacks is growing exponentially; in fact, the latest research shows that malware attacks grew by 650% between 2010-2012 (McAfee).

Furthermore, companies are getting better at catching cyber security threats but this has led malicious actors to innovate even faster (Marsh).

There is a pressing need, therefore, to use big data analytics in cybersecurity. However before exploring how big data analytics can be used in cyber security, here is a primer on what actually constitutes big data and the technologies that make it possible.

What is Big Data?

The definition of ‘big’ varies by context but there’s no doubt that we’re dealing with ‘bigger than before types of information.

For example, the World Wide Web Consortium (W3C) says that Big Data refers to datasets whose size or type is beyond the ability of commonly used software tools to capture, store and process them easily (W3C).

They add: “However, there are many other dimensions along which large datasets differ from smaller ones; for example, the rate of data generation and the diversity of data formats (W3C).

The two broad categories of big data technologies are Hadoop-based technologies and non-Hadoop-based technologies. A sub-segment within Hadoop is Apache Spark which provides real-time insights on hot data.

There are several components that make up these technologies but I will focus on Apache Hadoop since it is an open-source framework for storing and processing large datasets in a distributed computing environment.

The fact that it’s open-source also means that there’s excellent community support for building upon it or tweaking it to suit specific requirements; this further adds to its popularity.

An important concept in big data analytics is Map Reduce which provides parallel processing of maps and reduces functions.

Map functions are applied to each input record (key-value pair) while the reduce function aggregates the results into a single output for that key-value pair; this is then repeated until all data has been processed (Apache).

What is Data Analytics in Cybersecurity?

Endpoint and user behavior data, business applications, operating system event logs, firewalls, routers, virus scanners, external threat intelligence sources, and contextual analysis are just a few of the data sources that data analytics in cybersecurity solutions gather.

Combining and comparing this data creates a single data set for organizations to work with, enabling security experts to perform appropriate algorithms and create fast searches to identify early signs of an attack.

Machine learning methods can also be used to conduct threat and data analysis in near real-time, allowing for more accurate detection.

This post examines the characteristics and advantages of a security analytics platform, the most serious threats to your business, several security solutions, and how security analytics may assist you in preventing assaults and keeping your environment secure.

Why use Big Data Analytics in Cybersecurity?

Cyber attacks are an increasing menace to companies, governments, militaries and even individuals around the world. The statistics speak for themselves: there were over 1 million cyber attacks every day in 2012.

These incidents cost organizations hundreds of millions of dollars annually (Akamai). With so much at stake, it would be foolish to rely on traditional methods for catching malicious actors.

Big data offers greater insight into system behavior and can therefore be used to catch threats. If big data analytics were at the forefront of cyber security, malicious actors could be caught before they do damage; here’s how:

Imitating users: One of the biggest challenges in cyber security is that attackers use common user behavior.

For example, sneaky emails and clicking on suspicious links are some of the most common tactics for tricking people into revealing sensitive information or installing malware.

Big data offers a way around this challenge; if we had access to massive amounts of information such as web links visited, logins, timestamps and device IDs, etc., it might become possible to create profiles of individual users.

This would mean that hackers would have a harder time mimicking different types of user behavior, making it easier to catch them.

Imitating devices: In addition to user behavior, attackers also mimic specific types of device behavior. For example, a smart TV is a very different type of device from say a smartphone or a tablet.

Since each device has unique characteristics such as screen size and processing power, features such as camera resolution may be completely missed by traditional security methods which focus on known malicious signatures for individual types of remotes (e.g. ransomware).

Big data could thus provide insights into commonalities across different devices even if they have never been exposed to the same threat before thereby helping patch up any weak points in security that could otherwise be exploited by hackers.

Increasing productivity: According to the World Economic Forum, big data is expected to increase productivity by around $300 billion per year.

With so much at stake, it’s surprising how few companies today are using big data analytics despite its far-reaching benefits such as reduced operating costs and increased revenue (Scribe Software).

While the security industry tends to lag behind other industries when it comes to adopting new technologies, I expect this field will soon catch up with trends such as cloud computing, blockchain technology and edge devices which are already being used in many sectors today.

Using big data for responsible disclosure of cyber vulnerabilities: Using human power alone, researchers would take decades or more to find out about every single vulnerability that exists within a large piece of software.

This is why information on vulnerabilities must be shared responsibility among relevant parties.

Because big data offers unprecedented insight into system behavior, it could be used for responsible disclosure by researchers to quickly pinpoint areas of the software that are more at risk for cyber attacks due to vulnerabilities.

Reduce overall cost of cyber security: At the moment there is no consensus on how much money is being spent globally on cyber security each year; however, reports suggest that the number exceeds hundreds of billions of dollars (NetDiligence).

One main reason why it is so difficult to get an accurate estimate is that products on the market are constantly changing and developers use different tools and techniques to build them. Big data could help overcome this challenge in two ways:

First, collecting basic information such as device IDs and timestamps for all access logs and files would enable any organization to record and analyze all changes made to their systems over the years.

This could help pinpoint malicious activity much faster than traditional methods such as searching through thousands of different apps and environments for signs of compromise.

Second, big data analysis may assist in discovering commonalities across different types of cyberattacks enabling researchers to create signatures that can be used by security products on the market today (e.g. Fortinet FortiGate ).

Provide real-time visibility into network traffic: At the moment there exist very few ways to know what is going on inside a system;

Unknown threats are detected almost exclusively after an attack has taken place, giving hackers time to cover their tracks thus making it difficult to identify and stop them.

Big data techniques such as machine learning (ML) and artificial intelligence (AI) could change this by helping security companies detect unknown threats in real time even if they have never been seen before.

For example, AI could be used to scan network traffic for unique patterns that are characteristic of an attack.

Real-time mapping of cyber threats: Companies that provide computer security services face the daunting task of constantly adapting to an ever-changing landscape of threats and it is becoming increasingly difficult to stay a step ahead.

Big data could help by providing a real-time map of all cyber threats in the world. This would allow security analysts to quickly create intelligence reports that accurately reflect what is going on in the cyber landscape today.