Exabeam Cloud Telemetry Service
The Exabeam telemetry service collects and transmits valuable quality and health metrics to Exabeam Cloud. The transmitted data, which includes system events, metrics, and environment health data, provides insights into system issues and application availability. Examples of system issues include processing downtime such as processing delays and storage issues.
Prerequisites
To enable Exabeam to collect telemetry data, ensure the following prerequisites are met:
Advanced Analytics i56.7 or later with a valid license
Data Lake i40 or later with a valid license
Cloud Connectors 2.5.319 or later with a valid license
Access to *.cloud.exabeam.com over HTTPS port 443
Types of Telemetry Data in Exabeam Cloud Telemetry Service
The following use cases illustrate examples covered in the telemetry health service:
Infrastructure – Low volume storage
Platform – Context pull success/failure
Cloud Connectors – Log source connection failure/volume drop
Data Lake – Abnormal ingestion lag (red health)
Advanced Analytics – Worker nodes lagging behind the primary node in ingesting logs
Advanced Analytics – Anomalies (spikes or drops) in event parsing
Advanced Analytics – Primary node not processing in near real time
Incident Responder – Excessive delays in getting cases from Advanced Analytics or loading cases
Incident Responder – Unable to run playbooks or it takes too much time
Incident Responder – Unable to detect phishing incidents or send e-mails
At a high level, telemetry data falls into one of three categories:
Metrics for example, CPU, events-per-second, and processing delays
Events for example, machine restart, user login, and configuration changes
Environment for example, versions, products, nodes, and configuration
IP addresses and hostnames are masked before being sent to Exabeam Cloud. For example, {"host": "*.*.0.24"}
.
Metrics
The example below shows the metrics data sent from the master node to the telemetry service in Exabeam Cloud:
Note
The example below is only a partial example and does not show the full payload.
{ "metrics": [ {"points":[[1558614965, 0.29]], "name": "tm.plt.service_cpu.exabeam-web-common-host1"}, {"points": [[1558614965, 0.3457]], "name": "tm.plt.service_memory.exabeam-web-common-host1"}, {"points": [[1558614965, 0.77]], "name": "tm.plt.service_cpu.mongodb-shard-host1"}, {"points": [[1558614965, 0.04947]], "name": "tm.plt.service_memory.mongodb-shard-host1"} ] }
Events
The example below shows the events data sent from the master node to the telemetry service in Exabeam Cloud:
Note
The example below is only a partial example and does not show the full payload.
{ "events": [ "dateHappened": 1558614965, "title": "Device /dev/shm S.M.A.R.T health check: FAIL", "text": "S.M.A.R.T non-compatible device" ] }
Environment
The example below shows the environment data sent from the master node to the telemetry service in Exabeam Cloud:
Note
The example below is only a partial example and does not show the full payload.
{"environment": { "versions": { "uba": { "build": "4", "branch": "I46.2"}, "common": { "build": "7", "branch": "PLT-i12.5"}, "exa_security": { "build": "33", "branch": "c180815.1"} }, "hosts": { "host3": { "host": "*.*.0.24","roles": ["oar","cm"]}, "host2": {"host": "*.*.0.72","roles": ["uba_slave"]}, "host1": {"host": "*.*.0.70","roles": ["uba_master"]} }, "licenseInfo": { "customer": "EXA-1234567", "gracePeriod": 60, "expiryDate": "10-11-2021", "version": "3", "products": ["User Analytics","Entity Analytics"], "uploadedAt": 1557740839325 } }
Data Collected by Exabeam Cloud Telemetry Service
The Exabeam telemetry services collects general and application-specific metrics from the applications in your deployment. For specifics on telemetry data collection, see:
Note
You can also view a full list of product metrics and events sent to the Exabeam cloud (including when the requests were made and the full payload) by accessing the audit log file located at /opt/exabeam/data/logs/common/cloud-connection-service/telemetry.log
.
General Environment Telemetry Metrics
The following table list the metrics that are collected for your environment.
Name | Description | Frequency |
---|---|---|
Inventory | Nodes, masked IP addresses, and roles of each node. | Once a day |
Product Version | Versions of each product in your deployment. | Once a day |
License information | License information for each product in your deployment. | Once a day |
Advanced Analytics Telemetry Metrics
The following table lists the metrics that are collected for Advanced Analytics.
Name | Description | Frequency |
---|---|---|
tm.aa.processing_delay_sec | An Advanced Analytics processing delay (if applicable) in seconds. | 5 mins |
tm.plt.service_status.<service-name> | Per-service status. | 5 min |
tm.plt.ssh_logins | Number of SSH logins. | 5 min |
tm.plt.service_memory.<service-name> | Per-service memory. | 5 min |
tm.plt.service_cpu.<service-name> | Per-service CPU. | 5 min |
tm.plt.load_avg_1m tm.plt.load_avg_5m tm.plt.load_avg_10m | Load average (CPU) per 1-minute, 5-minute, and 10-minute period. | 5 min |
tm.aa.compressed_logs_bytes | Log volume of the last hour. | 1 hour |
tm.aa.compressed_events_bytes | Events volume of the last hour. | 1 hour |
tm.aa.notable_users | Notable users. | 5 min |
tm.plt.disk_usage.mongo tm.plt.disk_usage.data tm.plt.disk_usage.root | Disk usage per partition. | 5 min |
tm.plt.total_users | Total users. | 1 hour |
tm.plt.total_assets | Total assets. | 1 hour |
Data Lake Telemetry Metrics
The following table lists the metrics that are collected for Data Lake.
Name | Description | Frequency |
---|---|---|
tm.plt.service_status.<service-name> | Per-service status. | 5 min |
tm.plt.ssh_logins | Number of SSH logins. | 5 min |
tm.plt.service_memory.<service-name> | Per-service memory. | 5 min |
tm.plt.service_cpu.<service-name> | Per-service CPU. | 5 min |
tm.plt.load_avg_1m tm.plt.load_avg_5m tm.plt.load_avg_10m | Load average (CPU) broken per 1-minute, 5-minute, and 10-minute period. | 5 min |
tm.plt.disk_usage.mongo tm.plt.disk_usage.data tm.plt.disk_usage.root tm.plt.disk_usage.es_hot tm.plt.disk_usage.kafka | Disk usage per partition. | 5 min |
tm.plt.total_users | Total users. | 1 hour |
tm.plt.total_assets | Total assets. | 1 hour |
tm.dl.es.cluster_status tm.dl.es.number_of_nodes tm.dl.es.number_of_data_nodes tm.dl.es.active_shards tm.dl.es.active_primary_shards | Elasticsearch cluster status. | 5 min |
tm.dl.kafka.total_lag | A Kafka delay if detected. | 5 min |
tm.dl.kafka.connectors_lag | A Kafka connector lag if detected. | 5 min |
tm.dl.avg_doc_size_bytes | Average document size. | 15 min |
tm.dl.avg_msg_size_bytes | Average message size. | 5 min |
tm.dl.index_delay | Index delay if detected. | 5 min |
tm.dl.connectors_send_rate_bytes | Total connector ingestion rate in bytes. | 5 min |
tm.dl.ingestion_queue | Kafka topic delay if detected. | 5 min |
tm.dl.indexing_rate | Average indexing rate. | 5 min |
tm.dl.shards_today | Elasticsearch shards today. | 5 min |
tm.dl.shards_total | Elasticsearch shards total. | 5 min |
Cloud Connectors Telemetry Metrics
The following table lists the metrics that are collected for Cloud Connectors.
Name | Description | Frequency |
---|---|---|
cc.total_cpu_usage | Total CPU usage % Not monitored in multi-instance environments due to a platform issue that will be resolved in future Advanced Analytics and Data Lake versions | 30 seconds |
cc.volume_used_space | Total disk usage % Not monitored in multi-instance environments due to a platform issue that will be resolved in future Advanced Analytics and Data Lake versions | 30 seconds |
cc.syslog_write_failure_cnt | Failures to forward events to Advanced Analytics or Data Lake Not monitored in multi-instance environments due to a platform issue that will be resolved in future Advanced Analytics and Data Lake versions | Every 10K events or 1 minute |
cc.total_eps | Combined EPS of all configured Cloud Connectors | 30 seconds |
cc.account_eps | EPS per Cloud Connector (per account configuration) | Varies depending on the account configuration |
cc.account_lag | Ingested data lag in seconds (in comparison to “now”) Not monitored in Custom Connector and Azure Event Hub | Varies depending on the account configuration |
cc.account_running_tasks | Number of currently executing fetches | Varies depending on the account configuration |
cc.account_status_v2 | The connector status, as seen in the UI, is categorized into the following categories, based on the error message. This categorization is not visible via the UI, and is only being reported as part of telemetry.
| Varies depending on the account configuration |
Exabeam SOC Platform Status Page
Health metrics data is collected, through the telemetry service, for each of the products in your organization including:
Advanced Analytics
Cloud Connectors
Data Lake
Incident Responder
This health summary status is available to you through the Exabeam SOC Platform Status page. Here you can view the current and historical status of each of your products.
Access the Exabeam SOC Platform Status Page from the Community
![]() |
To access the Exabeam Security Operations Platform Status page:
Log in to the Exabeam Community portal.
Click your name at the top right of the page.
Select My Account from the drop-down menu.
In the SaaS Environments section of the page, click Open Status Page on the line of the environment for which you want to view the status.
The System Status page displays any notifications for your deployment above the application status.
Subscribe to Notifications about Your Application Health
By monitoring changes to your application health, you can proactively prevent and address issues with your deployment. To ensure that you stay up-to-date on health changes, you can also sign up to receive notifications through e-mail or Slack.
To subscribe to status change notifications:
Click Get Updates at the top right of the status page.
Select Email or Slack from the drop-down menu.