Exabeam SOC PlatformExabeam SOC Platform Administration Guide

Exabeam Cloud Telemetry Service

The Exabeam telemetry service collects and transmits valuable quality and health metrics to Exabeam Cloud. The transmitted data, which includes system events, metrics, and environment health data, provides insights into system issues and application availability. Examples of system issues include processing downtime such as processing delays and storage issues.

Prerequisites

To enable Exabeam to collect telemetry data, ensure the following prerequisites are met:

  • Advanced Analytics i56.7 or later with a valid license

  • Data Lake i40 or later with a valid license

  • Cloud Connectors 2.5.319 or later with a valid license

  • Access to *.cloud.exabeam.com over HTTPS port 443

Types of Telemetry Data in Exabeam Cloud Telemetry Service

The following use cases illustrate examples covered in the telemetry health service:

  • Infrastructure – Low volume storage

  • Platform – Context pull success/failure

  • Cloud Connectors – Log source connection failure/volume drop

  • Data Lake – Abnormal ingestion lag (red health)

  • Advanced Analytics – Worker nodes lagging behind the primary node in ingesting logs

  • Advanced Analytics – Anomalies (spikes or drops) in event parsing

  • Advanced Analytics – Primary node not processing in near real time

  • Incident Responder – Excessive delays in getting cases from Advanced Analytics or loading cases

  • Incident Responder – Unable to run playbooks or it takes too much time

  • Incident Responder – Unable to detect phishing incidents or send e-mails

At a high level, telemetry data falls into one of three categories:

  • Metrics for example, CPU, events-per-second, and processing delays

  • Events for example, machine restart, user login, and configuration changes

  • Environment for example, versions, products, nodes, and configuration

IP addresses and hostnames are masked before being sent to Exabeam Cloud. For example, {"host": "*.*.0.24"}.

Metrics

The example below shows the metrics data sent from the master node to the telemetry service in Exabeam Cloud:

Note

The example below is only a partial example and does not show the full payload.

{ "metrics": [ {"points":[[1558614965, 0.29]], "name": "tm.plt.service_cpu.exabeam-web-common-host1"}, {"points": [[1558614965, 0.3457]], "name": "tm.plt.service_memory.exabeam-web-common-host1"}, {"points": [[1558614965, 0.77]], "name": "tm.plt.service_cpu.mongodb-shard-host1"}, {"points": [[1558614965, 0.04947]], "name": "tm.plt.service_memory.mongodb-shard-host1"} ] }

Events

The example below shows the events data sent from the master node to the telemetry service in Exabeam Cloud:

Note

The example below is only a partial example and does not show the full payload.

{ "events": [ "dateHappened": 1558614965, "title": "Device /dev/shm S.M.A.R.T health check: FAIL", "text": "S.M.A.R.T non-compatible device" ] }

Environment

The example below shows the environment data sent from the master node to the telemetry service in Exabeam Cloud:

Note

The example below is only a partial example and does not show the full payload.

{"environment": { "versions": { "uba": { "build": "4", "branch": "I46.2"}, "common": { "build": "7", "branch": "PLT-i12.5"}, "exa_security": { "build": "33", "branch": "c180815.1"} }, "hosts": { "host3": { "host": "*.*.0.24","roles": ["oar","cm"]}, "host2": {"host": "*.*.0.72","roles": ["uba_slave"]}, "host1": {"host": "*.*.0.70","roles": ["uba_master"]} }, "licenseInfo": { "customer": "EXA-1234567", "gracePeriod": 60, "expiryDate": "10-11-2021", "version": "3", "products": ["User Analytics","Entity Analytics"], "uploadedAt": 1557740839325 } }

Data Collected by Exabeam Cloud Telemetry Service

The Exabeam telemetry services collects general and application-specific metrics from the applications in your deployment. For specifics on telemetry data collection, see:

Note

You can also view a full list of product metrics and events sent to the Exabeam cloud (including when the requests were made and the full payload) by accessing the audit log file located at /opt/exabeam/data/logs/common/cloud-connection-service/telemetry.log.

General Environment Telemetry Metrics

The following table list the metrics that are collected for your environment.

Name

Description

Frequency

Inventory

Nodes, masked IP addresses, and roles of each node.

Once a day

Product Version

Versions of each product in your deployment.

Once a day

License information

License information for each product in your deployment.

Once a day

Advanced Analytics Telemetry Metrics

The following table lists the metrics that are collected for Advanced Analytics.

Name

Description

Frequency

tm.aa.processing_delay_sec

An Advanced Analytics processing delay (if applicable) in seconds.

5 mins

tm.plt.service_status.<service-name>

Per-service status.

5 min

tm.plt.ssh_logins

Number of SSH logins.

5 min

tm.plt.service_memory.<service-name>

Per-service memory.

5 min

tm.plt.service_cpu.<service-name>

Per-service CPU.

5 min

tm.plt.load_avg_1m

tm.plt.load_avg_5m

tm.plt.load_avg_10m

Load average (CPU) per 1-minute, 5-minute, and 10-minute period.

5 min

tm.aa.compressed_logs_bytes

Log volume of the last hour.

1 hour

tm.aa.compressed_events_bytes

Events volume of the last hour.

1 hour

tm.aa.notable_users

Notable users.

5 min

tm.plt.disk_usage.mongo

tm.plt.disk_usage.data

tm.plt.disk_usage.root

Disk usage per partition.

5 min

tm.plt.total_users

Total users.

1 hour

tm.plt.total_assets

Total assets.

1 hour

Data Lake Telemetry Metrics

The following table lists the metrics that are collected for Data Lake.

Name

Description

Frequency

tm.plt.service_status.<service-name>

Per-service status.

5 min

tm.plt.ssh_logins

Number of SSH logins.

5 min

tm.plt.service_memory.<service-name>

Per-service memory.

5 min

tm.plt.service_cpu.<service-name>

Per-service CPU.

5 min

tm.plt.load_avg_1m

tm.plt.load_avg_5m

tm.plt.load_avg_10m

Load average (CPU) broken per 1-minute, 5-minute, and 10-minute period.

5 min

tm.plt.disk_usage.mongo

tm.plt.disk_usage.data

tm.plt.disk_usage.root

tm.plt.disk_usage.es_hot

tm.plt.disk_usage.kafka

Disk usage per partition.

5 min

tm.plt.total_users

Total users.

1 hour

tm.plt.total_assets

Total assets.

1 hour

tm.dl.es.cluster_status

tm.dl.es.number_of_nodes

tm.dl.es.number_of_data_nodes

tm.dl.es.active_shards

tm.dl.es.active_primary_shards

Elasticsearch cluster status.

5 min

tm.dl.kafka.total_lag

A Kafka delay if detected.

5 min

tm.dl.kafka.connectors_lag

A Kafka connector lag if detected.

5 min

tm.dl.avg_doc_size_bytes

Average document size.

15 min

tm.dl.avg_msg_size_bytes

Average message size.

5 min

tm.dl.index_delay

Index delay if detected.

5 min

tm.dl.connectors_send_rate_bytes

Total connector ingestion rate in bytes.

5 min

tm.dl.ingestion_queue

Kafka topic delay if detected.

5 min

tm.dl.indexing_rate

Average indexing rate.

5 min

tm.dl.shards_today

Elasticsearch shards today.

5 min

tm.dl.shards_total

Elasticsearch shards total.

5 min

Cloud Connectors Telemetry Metrics

The following table lists the metrics that are collected for Cloud Connectors.

Name

Description

Frequency

cc.total_cpu_usage

Total CPU usage %

Not monitored in multi-instance environments due to a platform issue that will be resolved in future Advanced Analytics and Data Lake versions

30 seconds

cc.volume_used_space

Total disk usage %

Not monitored in multi-instance environments due to a platform issue that will be resolved in future Advanced Analytics and Data Lake versions

30 seconds

cc.syslog_write_failure_cnt

Failures to forward events to Advanced Analytics or Data Lake

Not monitored in multi-instance environments due to a platform issue that will be resolved in future Advanced Analytics and Data Lake versions

Every 10K events or 1 minute

cc.total_eps

Combined EPS of all configured Cloud Connectors

30 seconds

cc.account_eps

EPS per Cloud Connector (per account configuration)

Varies depending on the account configuration

cc.account_lag

Ingested data lag in seconds (in comparison to “now”)

Not monitored in Custom Connector and Azure Event Hub

Varies depending on the account configuration

cc.account_running_tasks

Number of currently executing fetches

Varies depending on the account configuration

cc.account_status_v2

The connector status, as seen in the UI, is categorized into the following categories, based on the error message. This categorization is not visible via the UI, and is only being reported as part of telemetry.

  • Internal issue

  • Misconfiguration

  • Internal issues fixed in newer versions

  • Recoverable intermittent error

  • Vendor (log source issue)

  • Downstream issue (Advanced Analytics or Data Lake)

  • Resources constrained

  • Vendor quota is exhausted

  • Unknown (error does not fall under other categories)

Varies depending on the account configuration

Exabeam SOC Platform Status Page

Health metrics data is collected, through the telemetry service, for each of the products in your organization including:

  • Advanced Analytics

  • Cloud Connectors

  • Data Lake

  • Incident Responder

This health summary status is available to you through the Exabeam SOC Platform Status page. Here you can view the current and historical status of each of your products.

Access the Exabeam SOC Platform Status Page from the Community

exabeam-status-page-app-status.png

To access the Exabeam Security Operations Platform Status page:

  1. Log in to the Exabeam Community portal.

  2. Click your name at the top right of the page.

  3. Select My Account from the drop-down menu.

  4. In the SaaS Environments section of the page, click Open Status Page on the line of the environment for which you want to view the status.

    The System Status page displays any notifications for your deployment above the application status.

    exabeam-status-page.png

Subscribe to Notifications about Your Application Health

By monitoring changes to your application health, you can proactively prevent and address issues with your deployment. To ensure that you stay up-to-date on health changes, you can also sign up to receive notifications through e-mail or Slack.

To subscribe to status change notifications:

  1. Click Get Updates at the top right of the status page.

  2. Select Email or Slack from the drop-down menu.