This article covers the technical implementation details of how we collect process telemetry on Linux. This article is intended for those who want greater transparency into how the agent collects data for purposes of comparison or buy-in.
Estimated reading time: 5 minutes
Background
The agent utilizes multiple datasources on an endpoint to capture all of the fields necessary to create a complete process event. This article outlines those datasources for Linux.
Datasources for Linux
The agent consumes raw, unfiltered process data from the system's audit netlink socket.
This raw data isn't particularly helpful on it's own, as you can see below:
type=SYSCALL msg=audit(1364481363.243:24287): arch=c000003e syscall=2 success=no exit=-13 a0=7fffd19c5592 a1=0 a2=7fffd19c4b50 a3=a items=1 ppid=2686 pid=3538 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=1 comm="cat" exe="/bin/cat" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="sshd_config" type=CWD msg=audit(1364481363.243:24287): cwd="/home/shadowman" type=PATH msg=audit(1364481363.243:24287): item=0 name="/etc/ssh/sshd_config" inode=409248 dev=fd:00 mode=0100600 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0
Challenges
- Events can span multiple lines and records, requiring correlation
- Data is in a CSV, key-value like format
- Many fields are raw and would benefit from translation to something "human readable"
- Many desired fields are not available from audit including file hashes
Red Canary Linux EDR addresses all of these challenges
- Events are correlated and provided in an easily understandable, consumable JSON format
- Additional metadata and fields are obtained from other datasources, utilizing a plug-n-play architecture we call the Event Model
Here's a comparison of process data collected from Audit (alone) vs. Red Canary's Linux EDR (utilizing Audit):
Field | Audit | Red Canary Linux EDR |
---|---|---|
timestamp | ✅ | ✅ |
host_name | ❌ | ✅ |
user_uid | ✅ | ✅ |
user_name | ❌ | ✅ |
user_domain | ❌ | ✅ |
user_username | ❌ | ✅ |
login_user_uid | ✅ | ✅ |
login_user_name | ❌ | ✅ |
login_user_domain | ❌ | ✅ |
process_md5 | ❌ | ✅ |
process_sha256 | ❌ | ✅ |
process_pid | ✅ | ✅ |
process_name | ✅ | ✅ |
process_path | ✅ | ✅ |
process_command_line | ✅ | ✅ |
parent_process_timestamp | ✅ | ✅ |
parent_process_pid | ✅ | ✅ |
parent_process_name | ❌ | ✅ |
parent_process_path | ❌ | ✅ |
parent_process_md5 | ❌ | ✅ |
parent_process_sha256 | ❌ | ✅ |
What this means to you
- The depth and breadth of telemetry collected will often exceed the telemetry collected by commercial or open source solutions that primarily only use audit (ex: osquery, go-audit, auditbeat, zeek-agent, ...)
- We aren't married or held hostage by any one datasource.
- In the event the operating system API's change or new subsystems are introduced as operating systems evolve, we can easily incorporate these changes instead of requiring a complete re-write of the product.
- Our plug-n-play architecture allows us to use any number of datasources on an endpoint in order to construct an event, giving us flexibility, and the ability to create the richest telemetry possible.
Comments
0 comments
Please sign in to leave a comment.