This article goes deep into the technical implementation details of how we collect process telemetry on Linux. This article is intended for those who want greater transparency into how the agent collects data, for purposes of comparison or buy-in.
Background
The agent utilizes multiple datasources on an endpoint to capture all of the fields necessary to create a complete process event. This article outlines those datasources for Linux.
Datasources for Linux
The agent consumes raw, unfiltered process data from the system's audit netlink socket.
This raw data isn't particularly helpful on it's own, as you can see below:
type=SYSCALL msg=audit(1364481363.243:24287): arch=c000003e syscall=2 success=no exit=-13 a0=7fffd19c5592 a1=0 a2=7fffd19c4b50 a3=a items=1 ppid=2686 pid=3538 auid=500 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=1 comm="cat" exe="/bin/cat" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="sshd_config" type=CWD msg=audit(1364481363.243:24287): cwd="/home/shadowman" type=PATH msg=audit(1364481363.243:24287): item=0 name="/etc/ssh/sshd_config" inode=409248 dev=fd:00 mode=0100600 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0
Challenges:
- Events can span multiple lines and records, requiring correlation
- Data is in a CSV, key-value like format
- Many fields are raw and would benefit from translation to something "human readable"
- Many desired fields are not available from audit, like file hashes
Red Canary Cloud Workload Protection addresses all of these challenges:
- Events are correlated and provided in an easily understand, consumable JSON format
- Additional metadata and fields are obtained from other datasources, utilizing a plug-n-play architecture we call the Event Model.
Here's a comparison of process data collected from Audit (alone) vs. Red Canary's CWP (utilizing Audit):
Field | Audit | Red Canary CWP | |
---|---|---|---|
timestamp | ✅ | ✅ | |
host_name | ❌ | ✅ | |
user_uid | ✅ | ✅ | |
user_name | ❌ | ✅ | |
user_domain | ❌ | ✅ | |
user_username | ❌ | ✅ | |
login_user_uid | ✅ | ✅ | |
login_user_name | ❌ | ✅ | |
login_user_domain | ❌ | ✅ | |
process_md5 | ❌ | ✅ | |
process_sha256 | ❌ | ✅ | |
process_pid | ✅ | ✅ | |
process_name | ✅ | ✅ | |
process_path | ✅ | ✅ | |
process_command_line | ✅ | ✅ | |
parent_process_timestamp | ✅ | ✅ | |
parent_process_pid | ✅ | ✅ | |
parent_process_name | ❌ | ✅ | |
parent_process_path | ❌ | ✅ | |
parent_process_md5 | ❌ | ✅ | |
parent_process_sha256 | ❌ | ✅ |
What this means to you
- The depth and breadth of telemetry collected will most often exceed the telemetry collected by commercial or open source solutions that primarily only use audit (ex: osquery, go-audit, auditbeat, zeek-agent, ...)
- We aren't married, or held hostage, by any one datasource.
- In the event operating system API's change, or new subsystems are introduced as operating systems evolve, we can easily incorporate these changes instead of requiring a complete re-write of the product.
- Our plug-n-play architecture allows us to use any number of datasources on an endpoint in order to construct an event, giving us flexibility, and most importantly, the ability to create the richest telemetry possible.