Data LakeData Lake Administration Guide

Table of Contents

Exabeam Data Lake Architecture Overview

Data Lake is one of three elements in Exabeam's Security Management Platform (SMP) and the data ingested by Data Lake can be used by Advanced Analytics for analysis and Incident Responder while investigating incident.

At a high level, Exabeam involves three main processes:

  • Log collection

  • Log parsing, enrichment, ingestion, and indexing

  • Data presentation (searching, visualizing, reporting, dashboards, etc)

The system flow begins with the log collectors, which are agent-based or server side (for example, DB collector, eStreamer), running locally on user's machines, collecting operational data, system metrics, and more. The Log Collectors then send those logs to the Log Ingestor.

The log ingestor can consume events from log collectors, syslog sources, or an existing SIEM. The log ingestor provides flow control and pushes the logs to the log indexer.

The log indexer is the piece that is responsible for parsing and enriching before indexing and storing the logs in a distributed search cluster.

The common services are present in the systems of all three products.

Data Lake is offered for hardware and virtual (Amazon Web Services, Google Cloud Platform) deployments as well as for SaaS.

Exabeam Data Lake Hardware and Virtual Architecture
Figure 1. Exabeam Data Lake Hardware and Virtual Architecture


Exabeam Data Lake SaaS Architecture
Figure 2. Exabeam Data Lake SaaS Architecture


How Exabeam Data Lake Works

Data Lake indexes data from the servers, applications, databases, network devices, virtual machines, and so on, that make up your IT infrastructure. Data Lake can collect the data from machines located anywhere, whether it is local, remote, or cloud. Most users connect to Data Lake through a web browser to run searches and create dashboards. Other forms to connect to Data Lake include API streams from log collectors and ingestors. Additionally, Data Lake can push parsed incidents to Exabeam Advanced Analytics or your SIEM.

Component

Description

Exabeam Log Collectors

Agent-based log collectors, server-side collectors, and cloud connectors

Exabeam Log Ingestor

Consumes events from Syslog and Connectors, providing flow control before pushing to Log Indexer

Exabeam Log Indexer

Responsible for parsing, enriching, and indexing log events that are then stored in a distributed cluster

Exabeam Data Lake UI

The Web interface used for searching log events, creating charts, and viewing dashboards

Table 1. Exabeam Data Lake Components


Exabeam Log Collectors in Data Lake

Data Lake can collect the data from machines located anywhere, whether it is local, remote, or cloud. It provides an out-of-the-box, file-based collector and Windows event collector. It also supports organizations that collect:

  • Data from devices communicating via the Cisco eStreamer protocol

  • Logs via cloud applications (PAAS, IAAS, and SAAS)

  • Logs via databases

Most customer environments will utilize a combination of both server-side and agent connectors.

We can deploy and run local agents on machines from which logs must be collected and aggregated. We can also receive Syslogs that are sent to our Log Ingestor from your SIEM or another third-party security service such as FireEye, Symantec, and many others.

Regardless of the method by which Data Lake collects logs, once they are accepted by the Log Ingestor they are treated exactly the same.

Note

Data Lake is optimized to support up to 1,500 collectors for clusters with 2 or more hosts. For single host clusters, up to 700 collectors is supported. There may be up to a 10% EPS performance degradation and up to a 20% increase in search latency, based on the number of collectors.

Exabeam Data Lake Agent Collector

Exabeam supports three types of agent connectors for log collection:

  • Windows Log Collectors – Installed on Windows machines.

  • File Log Collectors – Installed on Windows or Linux machines.

  • Gzip Log Collectors – Installed on Windows or Linux machines.

These are lightweight processes that are installed on machines (i.e. workstations, servers) to capture operational data such as hardware events, security system events, application events, operating system metrics, network packets, health metrics, etc. The connectors read from one or more event logs, Gzipped logs, and filters the events based on user-configured criteria. The connectors watch the event logs and send any new events in real time. The read position is persisted in order to allow the connectors to resume after restarts.

While file log collectors can be installed on Windows machines, they will only collect file inputs and will not collect windows event logs. If you would like to capture Windows event logs you must install Windows Event collectors.

Gzip file collectors process Gzipped files and publish them to Exabeam Data Lake.

Exabeam Data LakeServer Side Collector

Direct log collection is supported on Data Lake. Essentially, as long as there is a way to send syslog from a device (such as Windows or Unix servers) or a security solution (such as a DLP solution), Data Lake can ingest them. Alternatively, Data Lake can remotely connect to databases and Cisco eStreamer to fetch logs. In addition, Data Lake can also ingest logs from any device capable of sending Syslog (e.g. DLP, Firewall)

Data Lake supports data pushes from the following log sources:

  • Syslog

  • DB Collectors for MySQL, MS-SQL, Oracle, PostgreSQL

  • eStreamer

Through Cisco eStreamer Collectors Data Lake provides the ability for organizations to collect data from their Cisco FireSight systems. Like the three collectors mentioned above, the eStreamer collector is a service that runs on the Data Lake Site Collector appliance and connects to the remote servers communicating over the Cisco eStreamer protocol.

Exabeam Data Lake Ingestion

The Data Lake Ingestion Engine serves as an aggregator, accepting logs via Syslog or via Log Collectors. It supports a variety of inputs that simultaneously pull in events from a multitude of common sources, unifying your data regardless of format or schema.

Kafka processes streams of records as they occur and builds real-time streaming data pipelines that reliably move data between systems. It organizes all the incoming logs and builds a message queue to the Indexer, buffering and controlling the volume of logs coming into the Indexer.

Warning

Data Lake architecture is optimized to ingest log events that are less than 1 MB per event. This is a high safety limit that many customers will never hit. Please contact Exabeam Customer Success to assist in fine tuning this value.

  • Syslog- The Ingestor will accept syslog via a syslog ingestor instance listening on multiple ports and protocols. The messages will be written to a Kafka message queue. You can also use a load balancer to distribute your syslog data across your various nodes in the Data Lake cluster. The collector will accept syslog via TLS on port TCP/515. The messages will be forwarded to a Kafka message queue.

  • Collectors- These are deployed on customer systems and will send messages to Kafka directly.

Exabeam Data Lake Indexer

The Data Lake Indexer accepts raw logs from the Ingestor. It then parses relevant information from each log, enriches the data with contextual information, then indexes each log for full-text searching in near real time. The indexer dynamically transforms and prepares your data regardless of format or complexity.

Parsing

One of the purposes of indexing data is to turn verbose messages into user-readable data structures. Data Lake extracts pre-defined fields from the logs by running them through a series of parsers. Log events are “typed” as defined by the parsers. For example, a Windows 4624 event from any collector such as Splunk or Exabeam Cloud Connector would be “typed” as windows-4624.

The original log data, along with the extracted fields, are searchable.