The concept of a Security Data Lake, a type of Data Lake explicitly designed for information security, has not received much attention yet. However, this is not your ordinary data storage solution. It can potentially bring a company's security to the next level. A Security Data Lake accumulates all the indicators gathered from the Security Operations Center and analyzing this data may form the basis for establishing robust information security within an organization.
This article will delve into the concept of Security Data Lake, highlighting its unique features compared to conventional cloud storage and discussing the key vendors operating in this field.
The idea of a Security Data Lake (SDL) is rooted in the traditional idea of a Data Lake. Its inception was driven by the exponential growth of data and declining storage costs. Gartner no longer views Data Lake as a game-changing technology but more as a trend in the advancement of storage solutions (such as Cloud Data Warehouse).
A Data Lake is a storage repository that keeps vast amounts of data in its original format. According to Gartner: "A Data Lake is a concept consisting of a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format and are in addition to the originating data stores."
The data in the "Lake" can be structured, partially structured, or unstructured, contain tables, text files, system logs, and much more.
Data Lakes and data stores have some distinct differences. In particular, data stores are usually designed for structured, pre-prepared data. The technology used in a Data Lake does not involve bringing the original material to one or another "convenient" structure. This approach allows for multi-variant processing of the same data. This is the primary value of Data Lakes.
The concept's popularity grew when data scientists noticed that traditional data stores presented challenges in solving novel problems. Pre-processing entails the loss of a piece of material that may at first be considered "junk" but which may later be discovered to have value. This problem becomes even more pronounced when dealing with vast amounts of data.
Corporate Data Lakes usually store unstructured data, including details about the company's products, financial metrics, customer data, marketing materials, etc. They are easily expandable and secure, relying on established security measures like other conventional storage systems.
Security Data Lake encompasses more than just security logs and alerts. It includes a range of other security-related information, such as:
So, the Security Data Lake is a centralized repository intended to handle logs and other information directly related to the sphere of providing information security. The data collected from various sources is then analyzed using various tools. The purpose of creating SDL is to make it easier to access the original logs, thereby enhancing the efficiency of security operations.
By centralizing all data in SDL, the investigation process is streamlined, the effort required to gather logs from multiple systems is decreased, and the completeness of the data is guaranteed.
A security officer can theoretically access and examine all log sources without relying on SDL. But in practice, this seemingly simple task becomes difficult due to the presence of hundreds of different solutions for storing security logs and numerous types of network devices. SDL simplifies such processes as automated data retrieval through APIs or other means, data parsing, and information accumulation.
With large amounts of security data being generated, traditional security information and event management systems (SIEMs) can fail, struggling to gather the data effectively. To extract valuable information, the information security team must collect data across on-premises, cloud, and SaaS environments and then conduct analysis. Many tasks in this process are often manual and time-consuming. The implementation of SDL can address these challenges through automation.
There are five key features that SDL should have:
Vendors are trying to expand the tooling of their products and offer features that can improve SDL. Here are some of them:
Unlike traditional IT systems, where the collection of logs is an auxiliary function, in information security tools, logs are part of their functional apparatus. At the same time, the licensing models for information security tools are based on the number of users. This leads to a steep hike in license costs. As a result, security teams sometimes deliberately do not collect all available data that could be useful in protecting against cyberattacks. If logs are absent, the attack may go undetected.
Additionally, the increasing amount of unstructured data, which some experts predict to make up 80% of the world's information by 2025, poses a significant challenge in searching and analyzing it.
Finally, the implementation of SDL is often hampered by a lack of qualified personnel. Companies may not have the personnel with the necessary skills and expertise to implement and manage an SDL effectively.
The main distinction between SDL and SIEM lies in their approach to proactive threat detection.
SIEM is an information system that allows you to identify the causes of alerts and serves to eliminate them. SDL is viewed more as a standalone system. It collects data about protected objects and stores information about possible cyber-attack vectors. Due to this, it can be used for machine learning.
SIEM helps analyze notifications and flag certain events for further investigation. But all further operations are carried out outside of this tool.
SDL can be used to search for threats and identify their signs through the accumulated information request interface. The task of SDL is to identify possible threats, provide context for them, predict the signals that should be expected in the event of a cyber-attack.
Conclusion
Security Data Lakes, a specialized type of Data Lake designed for information security, are still in their early stages of development. Still, some products are already available to enhance organizational security. SDLs can be a valuable tool for security officers to detect attackers within an organization quickly.