Monitor Exabeam Processes Using the System Health Page

System Health monitors Exabeam’s various processes and assists Exabeam engineers with troubleshooting. You can navigate to the System Health page from the menu icon at the top right corner of the homepage.

System Health is broken down into two sections: Health Status and System Activity.

System Activity shows each stage of the Exabeam pipeline and its current status. Expand any section to see more details about the state of a particular procedure.

Health Status is an on-demand assessment of the Exabeam pipeline. It is broken down into three categories:

General Health – General health tests that all of the back-end services are running - database storage, log feeds, snapshots, CPU, and memory.
Connectivity – Checks that Exabeam is able to connect to external systems, such as LDAP and LMS.
Log Feeds – This section reports on the health of the DC, VPN, Security Alerts, Windows Servers, and session management logs.

In all of the above areas GREEN indicates the status is good, YELLOW for a warning, and RED if the system is critical.

If there is a critical status on this page we recommend reaching out to Exabeam support.

Health Check

Advanced Analytics has improved the robustness of health checks by providing visibility on the backend data pipeline. All of the below health checks are configurable, please see the Advanced Analytics Administration Guide for more details.

New proactive health checks include:

In a multi-node environment processing current logs, when the worker node is lagging more than 6 hours behind the master node, a proactive notification will appear.
In a multi-node environment processing historical logs, when the worker node is lagging more than 48 hours behind the master node, a proactive notification will appear.
If an environment has been configured to receive syslog, but has not been receiving them for 1 hour, a proactive notification will appear.

In addition to new health checks, the health notifications are machine parseable and formatted. The format can be defined via configuration (e.g., JSON) and each notification type can have its own format configuration. For example, you can define a different configuration for an email alert versus a syslog notification. Each health check has a clearly defined description of what is being measured, the corresponding value, as well as the alert severity.

Configure Alerts for Worker Node Lag

When processing current or historical logs, an alert will be triggered when the worker node is falling behind the master node. How far behind can be configured in /opt/exabeam/config/tequila/custom/health.conf. The parameters are defined below:

RTModeTimeLagHours - During real-time processing the default setting is 6 hours.
HistoricalModeTimeLagHours - During historical processing the default setting is 48 hours.
syslogIngestionDelayHour - If processing syslogs, the default setting is 2 hours.

Disaster Recovery Health Alerts

For organizations that employ a disaster recovery configuration, on-demand and proactive health alerts are provided in the Health Page of Advanced Analytics.

Health Checks:

Progress of the replication between the primary and secondary clusters.
Status of the replication service and the most recent timestamp of replication for the different replication components.
The Disaster Recovery mode that the cluster is running in. Status are: Normal Mode or Failover Mode.

Health Alerts:

Alert notification to administrators if the replication service is not running.

Alerts for Storage Use

Available on the System Health page, the Storage Usage tab provides details regarding the current data retention settings for your Advanced Analytics deployment. Advanced Analytics sends notifications when available storage capacity dwindles to a critical level. Admins have the option to enable and configure automatic data retention and data purging for both HDFS and MongoDB usage.

Data Retention in Advanced Analytics

Advanced Analytics retains both event log and session data for limited periods of time. Retention times depend on the retention categories and the time periods defined in your purchased license.

Data in Advanced Analytics is divided into the following retention categories:

Raw logs. The original event logs sent to Advanced Analytics.
Note
Your Event Selection policy determines which event logs are sent to Advanced Analytics.
Enriched events. The event logs created by Advanced Analytics when the raw logs are received and enriched with contextual data.
Note
Until a raw event log is purged from the system, you can view the event in both its original and enriched forms.
Events that triggered rules. Enriched events that have triggered or helped to trigger one or more rules.
User and Asset Sessions. The containers that Advanced Analytics creates for both users and assets to represent the different timeframes of the enriched events attributed to them. Sessions are retained for the same amount of time as the enriched events that comprise them.
If a session includes one or more events that were involved in triggering rules, the session is retained for as long as the event(s) that triggered the rules are retained; however, any events in the session that did not trigger rules are removed from the session when their retention period expires.

When the date of an event log exceeds the retention period of its category, the event is purged from the system. Likewise, when all the event logs associated with a session have been purged, the session is purged.

For details on the retention periods included with your license, see the Product Entitlement page on the Community site.

System Optimization

This tab is a single aggregated page for auditing and viewing disabled data types, including:

Disabled Models – When a model takes up too much memory, it is disabled and listed here. Re-enabling these models can cause the system to suffer performance issues.
Disabled Event Types – When a high volume user or asset amasses a large number of events of a certain event type, and that event type contributes to a large portion of the overall event count for that user the event type is automatically disabled and listed here.
Disabled Parsers – Advanced Analytics automatically identifies poor parser performance and disables such parsers in order to preserve the system health.
System Load Redistribution – Advanced Analytics automatically identifies overloaded worker nodes, and then takes corrective action by evenly redistributing the load across the cluster.

Critical Alerts, Warnings, and Error Messages

Although all notifications appear on the System Health page, there are two additional ways the Advanced Analytics UI provides better visibility on critical alerts, warnings, and error messages.

When a critical notification is generated, a banner will appear at the top of the UI. It contains specific information about the source of the warning or error, what the user and/or admin should do to correct the potential problem, and any helpful links to relevant knowledge base articles.

Depending on the level of the warning and user type (either administrator or user), the banner includes buttons to Close (i.e., dismiss) the banner and/or read an article containing important information about the message.

Additionally, a message box for critical notifications that require administrator decisions or multiple tasks to fix will appear upon admin login.

These message boxes include buttons to Close (i.e., dismiss) the banner and/or read an article containing important information about the message.

Cloud-delivered Advanced AnalyticsExabeam Advanced Analytics User Guide

Table of ContentsTable of Contents