Security ContentExabeam How Content Works Guide

Exabeam Models

Exabeam Advanced Analytics performs anomaly detection using models. Without models, rules can only score on 'fact' based logic, the kind that looks for specific things in the logs or counting for specific values over an entire session. Models also track historical values (features) for a given item (scope). For example, tracking hosts (feature values) a user (scope) has logged into. If the current value is deemed to be abnormal, versus the historical values in the model, a rule can associate a score with this anomaly. Anomaly detection is performed by calculating a number of statistics about the features in a given model to check whether the feature value seen, in an event being evaluated, is unusual or not.

Note

The Advanced Analytics pipeline is as follows:

parsing > event building > enrichment > session building > modeling > rule triggering

Advanced Analytics statistical profiling is not only about user-level data. In fact, Exabeam profiles other entities, including hosts and peer groups. RAM and performance permitting, just about anything can be modeled. If it is parsed, then the parsed/enriched field can be used as either the scope or the feature in a model. Ensuring how large a model might grow as well as understanding what values in the future may populate the model and how it will affect anomaly detection are factors to consider when deciding what to make a scope or feature for a model.

Types of Models

There are three types of models:

  • "CATEGORICAL" – As the name suggests, this type of model is used to train on values that are strings such as host or user names.

  • "NUMERICAL_CLUSTERED" – This type of model is used to train on numerical values such as the number of hosts a user logs into a session.

  • "NUMERICAL_TIME_OF_WEEK" – This type of model is used to train on the time when events occur.

Categorical Models

Let's compare the modeling endpoint entity-like processes in both UBA and EA perspectives.

UBA (user based) model

EPA-HP {
  ModelTemplate = "Processes for the user"
  Description = "Models processes for this user"
  Category = "End Point Activity"
  IconName = ""
  ScopeType = "USER"
  Scope = """user"""
  Feature = """process_name"""
  FeatureName = "process"
  FeatureType = "process_name"
  TrainIf = """sequenceCount(process_name,'process-created','process-alert')=1"""
  ModelType = "CATEGORICAL"
  AgingWindow = "32"
  CutOff = "10"
  Alpha = "2"
  MaxNumberOfBins = "10000000"
  ConvergenceFilter = "confidence_factor>=0.8"
  HistogramEventTypes = [   "process-created",   "process-alert"
  ]
  Disabled = "FALSE"
}
  • EPA-HP models all the process names for this user. This is evident by inspecting Scope and Feature values. Since we are modeling process names, the ModelType is CATEGORICAL.

  • Scope is user, a parsed field in process-created/process-alert events.

  • Feature is process_name, a parsed field which means name of process seen in process-created/process-alert events.

  • Category is End Point Activity as process related activity is categorized as endpoint activity.

  • sequenceCount(process_name,'process-created','process-alert')=1 expression makes sure that the model trains when it notices different values of process_name for the user in process-created/process-alert events.

  • Expressions in such models generally use Count/sequenceCount/DistinctCount/sequenceDistinctCount.

  • Histogram for this model displays process names on a host in a specific range of time.

EA (asset based) model

A-EPA-HP {
  ModelTemplate = "Processes on this asset"
  Description = "Models processes on this asset"
  Category = "End Point Activity"
  IconName = ""
  ScopeType = "DEVICE"
  Scope = """dest_host"""
  Feature = """process_name"""
  FeatureName = "process"
  FeatureType = "process_name"
  TrainIf = """CountBy(process_name,dest_host,'process-created','process-alert','process-network')=1"""
  ModelType = "CATEGORICAL"
  AgingWindow = ""
  CutOff = "10"
  Alpha = "3"
  MaxNumberOfBins = "5000000"
  ConvergenceFilter = "confidence_factor>=0.8"
  HistogramEventTypes = [    "process-created",    "process-alert",    "process-network"
  ]  SequenceTypes = [asset]
  Disabled = "FALSE"
}
  • SequenceTypes = [asset] and Model IDA-EPA-HP (which starts with A-) specifies that this Model is Asset (EA) based.

  • A-EPA-HP models all the process names on an asset (asset is dest_host in this case). This is evident by inspecting Scope and Feature values. Since we are modeling process names, the Type is CATEGORICAL.

  • Scope is dest_host (asset), a parsed field which means destination host on which process-created/process-alert/process-network events have taken place.

  • Feature is process_name, a parsed field which means name of process seen in process-created/process-alert/process-network events.

  • Category is End Point Activity as process related activity is categorized as endpoint activity.

  • CountBy(process_name,dest_host,'process-created','process-alert','process-network')=1 expression makes sure that the model trains when it notices different values of process_name with regard to dest_host in process-created/process-alert/process-network events.

  • Expressions in such models generally use CountBy/CountByIf/DistinctCountBy/DistinctCountByIf.

  • Histogram for this model displays process names on a host in a specific range of time.

Numerical Clustered Models

Let's compare the modeling amount of data uploaded to web per day in both UBA and EA perspectives.

UBA (user based) model

WEB-UBytesSum-Out {
  ModelTemplate = "Sum of bytes written/uploaded to the web in a day by the user"
  Description = "Models the amount of data (in bytes) that were uploaded to the web in a day by the user"
  Category = "Web Activity"
  IconName = ""
  ScopeType = "USER"
  Scope = "user"
  Feature = "sequenceSum(bytes_in_post,'web-activity-allowed')"
  FeatureName = "bytes"
  FeatureType = "quantity"
  TrainIf = """sequenceSum(bytes_in_post,'web-activity-allowed')>0"""
  ModelType = "NUMERICAL_CLUSTERED"
  BinWidth = "5"
  AgingWindow = ""
  CutOff = "10"
  Alpha = "1"
  ConvergenceFilter = "confidence_factor>=0.8"  HistogramEventTypes = [
    "sequence-end"
  ]
  Disabled = "FALSE"
}
  • WEB-UBytesSum-Out models amount of data (in bytes) uploaded to web per day by the user. Since we are modeling quantity of data, the ModelType is NUMERICAL_CLUSTERED.

  • Scope is user, a parsed field in web-activity-allowed events.

  • Feature is sequenceSum(bytes_in_post,'web-activity-allowed'), where bytes_in_post is an enriched field which makes sure only bytes uploaded are tracked in web-activity-allowed events.

  • sequenceSum(bytes_in_post,'web-activity-allowed')>0 expression makes sure that the model trains when the sum of bytes uploaded to web sequence by user > 0 in web-activity-allowed events.

  • sequence-end events mentioned in HistogramEventTypes signifies that the histogram for this model is generated at the end of sequence.

  • Expressions in such models generally use sum/sequenceSum/DistinctCount/sequenceDistinctCount.

  • Histogram for this model displays amount of data (in bytes) uploaded to web per day by the user in a specific range of time.

EA (asset based) model

A-WEB-BytesSum-Out {
  ModelTemplate = "Sum of bytes written/uploaded to the web in a day by the asset"
  Description = "Models the amount of data (in bytes) that were uploaded to the web in a day by the asset"
  Category = "Web Activity"
  IconName = ""
  ScopeType = "DEVICE"
  Scope = "src_host"
  Feature = "sumBy(bytes_in_post,src_host,'web-activity-allowed')"
  FeatureName = "bytes"
  FeatureType = "quantity"
  TrainIf = """sumBy(bytes_in_post,'web-activity-allowed')>0"""
  ModelType = "NUMERICAL_CLUSTERED"
  BinWidth = "5"
  AgingWindow = ""
  CutOff = "10"
  Alpha = "1"
  ConvergenceFilter = "confidence_factor>=0.8"
  HistogramEventTypes = [    "sequence-end"
  ]
  SequenceTypes = [asset]
  Disabled = "FALSE"
}
  • SequenceTypes = [asset] and Model IDA-WEB-BytesSum-Out (which starts with A-) specifies that this Model is Asset (EA) based.

  • A-WEB-BytesSum-Out models amount of data (in bytes) uploaded to web per day by the asset (asset is src_host in this case). Since we are modeling quantity of data, the ModelType is NUMERICAL_CLUSTERED.

  • Scope is src_host (asset), a parsed field which means source host which has uploaded data in web-activity-allowed events.

  • Feature is sumBy(bytes_in_post,src_host,'web-activity-allowed') where bytes_in_post is an enriched field which makes sure only bytes uploaded are tracked in web-activity-allowed events.

  • sumBy(bytes_in_post,'web-activity-allowed')>0 expression makes sure that the model trains when the sum of bytes uploaded to web sequence by src_host > 0 in web-activity-allowed events.

  • sequence-end events mentioned in HistogramEventTypes signifies that the histogram for this model is generated at the end of sequence.

  • Expressions in such models generally use sumBy/sumByIf/DistinctCountBy/DistinctCountByIf.

  • Histogram for this model displays amount of data (in bytes) uploaded to web per day by the src_host (asset) in a specific range of time.

Numerical Time of Week Models

Let's have a look at modeling time of print activity for a user.

UBA (User based) Model

PR-UT-TOW {
  ModelTemplate = "Print activity time for user"
  Description = "Models the times of day that this user performs print activity"
  Category = "Print Activity"
  IconName = "user"
  ScopeType = "USER"
  Scope = "user"
  Feature = "TimeOfWeek()"
  FeatureName = "Time"
  FeatureType = "Time"
  TrainIf = """TRUE"""
  ModelType = "NUMERICAL_TIME_OF_WEEK"
  AgingWindow = ""
  CutOff = "10"
  Alpha = "1"
  ConvergenceFilter = "confidence_factor>=0.8"
  HistogramEventTypes = [
    "print-activity"
  ]
  Disabled = "FALSE"}
 // End of PR-UT-TOW
  • PR-UT-TOW models the time at which print activity took place for the user. Since, we are modeling time, the ModelType is NUMERICAL_TIME_OF_WEEK.

  • Scope is user, parsed field in print-activity events.

  • Feature is TimeOfWeek() which fetches the day of the week of the event.time as (0..6) including fractions.

  • Expressions in such models generally use count/sequenceCount.

  • Histogram for this model displays print activity times by the user in a specific range of time.

Model Categories

The following are some model categories used in default content:

  • Alerts

  • Applications

  • Asset Activity Monitoring

  • Assets

  • Critical Server Login

  • DLP

  • Database Activity

  • Devices

  • Directory Service

  • Domain Controller Login

  • Email

  • End Point Activity

  • File Access

  • Groups

  • Identities

  • Locations

  • Network

  • Network Alert

  • Other

  • Physical Access

  • Print Activity

  • Privilege Access

  • Time

  • TopGroups

  • TopUsers

  • Users

  • VPN

  • Web Activity

  • Windows Audit Change

  • Workstations

  • Zones