Security ContentExabeam How Content Works Guide

Exabeam Parsers

A parser is a configuration in the parser.conf file that defines:

  • The logs to extract values from

  • Which values should be extracted from the log

  • The Exabeam fields these values are mapped to

Note

The Exabeam Advanced Analytics pipeline is as follows:

parsing > event building > enrichment > session building > modeling > rule triggering

Once you have determined that a log event is valuable for security analytics, the next phase is to extract the values of interest from the log and map them to Exabeam fields.

This is done by the parsing stage, which is the very first stage of the analytics engine pipeline. From ingesting logs to scoring on a timeline, parsing is the entryway into Advanced Analytics.

Associate a Log with a Parser

The parsing engine associates a log with the correct parser by using a unique string or strings that exist only in the specific log. These strings are specified in the Condition parameter of the parser. If multiple conditions are specified, all of the conditions must exist in the log for the parser to take effect.

Parser conditions are evaluated according to their order in the parser list. A log entering the ingestion engine will first be checked against the conditions of the parser at the top of the file. If none match, then it moves onto the next parser in the file, and so on.

Note

Once a log is caught by a parser, no other parser conditions will be evaluated. The parser with the matched condition will be used to parse the event.

If parsers have similar conditions, you must place the parser with a broader condition below the parser with a more specific condition. Otherwise, the parser with the broader condition will also parse the more specific logs.

Extracting and Mapping Values

Regular expressions, or regexes, allow Exabeam to extract specific patterns from logs and map these values to fields based on the order the regexes are applied. A regex for a value of interest will be surrounded by parentheses. The first value in the parentheses will be a set of curly brackets containing the name of the field of the extracted value. The curly brackets are followed by the regular expression identifying the value.

For example, the expression ABC({my_field}...) will parse the immediate three characters after the string “ABC” in the log and will map them to a field called my_field. For example, if the received log is "ABC123XYZ" the field my_field will contain the value “123”.

If the string "ABC" does not exist in the log, the field my_field will not be created.

All regular expression statements are evaluated in consecutive order against the entire log. If a value is mapped to a certain field in one expression and then a different value is mapped to the same field, the second mapping will overwrite the first.

Parser Parameter Definition

The following is an example parser parameter definition that contains common fields, such as Name, Vendor, and Product.

{
  Name = o365-inbox-rules-2
  Vendor = Microsoft
  Product = Office 365
  Lms = Direct
  DataType = "app-activity"
  TimeFormat = "yyyy-MM-dd'T'HH:mm:ss"
  Conditions = ["""Operation":"Set-Mailbox""" ]
  Fields = [
    """"CreationTime":"({time}\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d)"""",
    """Forward.+?Value":"(smtp:)?({target}[^"]+@({target_domain}[^"]+))""""
    """"ResultStatus":"({outcome}[^"]+)"""",
    """"ClientIP":"\[?({src_ip}[^"]+?)\]?:({src_port}\d+)"""",
    """({activity}Set-Mailbox)""",
    """cs1=(\[\{"additional-properties"\:)?\{"({activity}[^"]+)""",
    """msg=({additional_info}.+?)\s\w+=""",
    """"Value":"(?:smtp:)?.+?@({target_domain}[^"]+)"""",
    """UserId":"({user_email}[^"\\]+@({user_domain}[^"]+))""",
    """destinationServiceName=({app}.+?)\s*filePath"""
    """({app}Office 365)"""
  ]
  DupFields = ["app->resource"]
}

Parser Field Descriptions

The following table lists and describes parser fields, and whether they apply differently to Data Lake and Advanced Analytics:

Field

Description

In Data Lake

In Advanced Analytics

Name

The name of the parser. You will use this name when creating event builders. You will see this name in evt.gz logs as the value for exa-msg-type.

Each parser name must be distinct, or a parser with the same name that is seen previously in the configuration files will overwrite any parser that was previously read with the same name.

Vendor

The name of the company or vendor that builds or sells the logging source. In the Parser Parameter Definition example, Office 365 is the log source that generates the activity logs, and Microsoft is the company that builds the product.

The value of this parameter will be in the vendor field, which will be indexed and searchable.

This is searchable from Threat Hunter.

Product

The name of the product that generates these logs.

The value of this parameter will be in the product field, and will be indexed and searchable.

This value is searchable in Threat Hunter.

Lms

This is an optional field used for parser management. It does not have any effect on the parsed log. In the previous example, Direct means the logs are being ingested via syslog directly from the log source, rather than a log management system. Other possible values are DataLake, Splunk, Qradar, and Arcsight, if one of these happens to be the log management systems forwarding logs to Advanced Analytics.

This field has no effect.

This field has no effect.

TimeFormat

A regex-style definition of the structure of the parsed time field. Exabeam supports Unix timestamp formats for parsers, as well as any format that is Unix-readable. If the time field is parsed as a 10-digit number, such as epoch time, then the value for TimeFormat would be epoch. In the previous example, we parse time as 2019-10-100T10:12:50.

Conditions

A set of strings that be included in the logs for the parser to begin evaluating the log. The regexes will be compared against the log only if all conditions are met.

Fields

All the regexes for this parser, where the fields are actually extracted. For any regex, you can parse as many fields as you want. In the previous example, some regexes parse multiple fields, such as the regex parsing user_email and user_domain. Fields are parsed in their own regex for performance reasons.

ISHVF

Ishvf = IsHighVolumeFeed

This field is deprecated as of Advanced Analytics i46. For pre-i46 versions, set this to true ("Ishvf = true") if for the specific logs caught by that parser there is a large volume that is ingested by the ingestion engine and evaluated by that parser for those logs.

DupFields

This is an array that duplicates fields into new field names. It is much more performant than to duplicate the regex. In the previous example, "app" is already parsed by the regexes. You can also create a duplicate field called "resource" with the value of what "app" is parsed as.

Table 1. 


Test a Parser on Advanced Analytics

To test a parser in Advanced Analytics, run the following command on the Advanced Analytics deployment:

(.env)$ exa-fetch-parse --config-file /opt/exabeam/config/custom/custom_lime_config.conf --request 
"(2019-06-03,2019-06-05,syslog)" --status ParseOnly

The command will run the ingestion engine over the supplied log fetch type, in this case the logs that were sent by syslog, and will only parse the specified dates. The directory the command runs the ingestion engine on is specified in the configuration file specified by the --config-file parameter. This is typically the log storage directory. The output will be the same directory.

Troubleshooting Regexes

Regexes extract data to ingest into the Exabeam platform. Creating the correct regex is crucial to getting all the value that Advanced Analytics offers, such as rule scoring and modeling.

You can use multiple regexes for a single field name. Typically, this is used when the format of a field differs within a log. In that situation, you can use multiple regexes to be sure that that one of them will parse the field correctly. In the case that both regexes will be matched against the log, the regex that appears later (further below in the fields array) will have higher precedence, and thus its value for the field will be used.

For example, in the Parser Parameter Definition example, two regexes can parse the app field. If the first regex works, and an app value is parsed, and the second regex also works, the data parsed by the second regex will overwrite what was initially parsed for app by the first regex.

Note

You can use regex101.com to help you create and test regex syntax.

Regexes Misparsing

Design your regexes to be able to capture all possible variations of your data. By carefully creating and testing your regexes, you can make sure that Advanced Analytics doesn't miss data that would prevent it from being able to model a specific field.

Design your regexes to capture edge cases for how a value might appear in the log. In many cases, regexes are initially built to end when a space or an array/log-constructor-like character (':','[', '}') is used to end the regex. You will need to balance the requirements to allow a broadly tuned regex to capture what should be required, as well as limit how far the regex is allowed to capture. Sometimes a regex change is required due to a space being allowed in a filename, or the log management system happens to be appending forward or back slashes.

Performance Tuning

The speed of a regex is crucial for the stability of the ingestion engine. A single high volume log source that hits a single parser that takes 70 ms to parse a single log will severely degrade performance . Starting with Advanced Analytics I48, parsers that impact the ingestion process as a whole will be automatically disabled.

In many cases, this occurs because the regex was designed to be as broadly tuned as possible, and does several 'look aheads' in the log. If a log line is large, a single regex in a parser that tries to look through most, if not the entire log, will cause the ingestion engine to slow down and eventually disable the parser.

Additional Parser Guidelines

Here are a few very important notes to keep in mind when working with parsers:

  • If time is not available in the raw log, use the syslog field headers.

  • Without parsing a user, src_ip, dest_ip, dest_host, or src_host , Advanced Analytics cannot process the event and the log will be of no value to you.

  • Parsers are organized into major vendors . For example, parsers for logs generated by Carbon Black products can be found in config/default/parsers_carbonblack.conf.