Distributed Process Monitoring System

Lightweight, extensible, container‑friendly visibility into per‑host processes, resources and anomalies.

Features · Quick Start · API · Architecture · 中文文档

0. Preview

Process Table	Alert List / Tree (placeholder)

Screenshots are placeholders – replace images in docs/ with real captures (screenshot_process_list.png, screenshot_alerts.png).

1. Overview

This project provides a minimal yet extensible stack for distributed process monitoring:

Go Agent: periodically samples local process metadata & metrics using gopsutil.
Go Server: receives batches, evaluates alert rules, offers REST & Prometheus endpoints, optional PostgreSQL persistence.
React Web UI: lists processes, active alerts; (charts & advanced UX are extension points).
Docker / Compose: single‑command sandbox deployment.

Target use cases: lightweight capacity insight, anomaly detection (runaway CPU / memory), baseline for custom SRE tooling, or educational reference architecture.

2. Key Features

Current capabilities (✔ implemented / 🔧 partial / 🚧 planned):

Area	Status	Notes
Process snapshot (pid, name, cmdline, user, status, tree)	✔	Tree endpoint `/processes/tree`
Resource metrics (CPU %, RSS, Mem %, Threads)	✔	gopsutil sampling
Extended metrics (FD count, net conns, ports, IO bytes)	✔	`open_fds`, `net_conns`, `ports`, `read_bytes`, `write_bytes`
Configurable scrape interval	✔	`agent.yaml`
Advanced filtering	✔	name / pid / status / cpu_gt / mem_gt / port
Trend series API	✔	Bucketed averages `/processes/{pid}/series`
Alert engine (threshold + duration)	✔	In‑memory + persisted events
Alert rule CRUD	✔	Requires persistence enabled
Alert event persistence	✔	Upsert `alert_events`
Prometheus metrics	✔	`/metrics` endpoint
PostgreSQL persistence	✔	Toggle in `server.yaml`
Start / exit events	🚧	PID diffing queued
Auth / RBAC	🚧	JWT / OIDC middleware
Grafana dashboards	🚧	Provide sample JSON
Multi‑tenant isolation	🚧	Add tenant_id columns

3. Architecture

Agent (gopsutil) --> batched JSON --> Server (ingest)
																			|-- In-memory ring buffer
																			|-- PostgreSQL (optional)
																			|-- Alert Engine (rules, window eval)
																			|-- REST API /metrics
																						|-- Web UI (React)
																						|-- Prometheus / Grafana

Data path: Agent collects → sends batch → Server buffers & optionally persists → scheduled evaluation updates alerts → consumers query snapshots, history, or aggregated series.

4. Quick Start

4.1 Docker Compose

docker compose up --build

Services:

API: http://localhost:8080
UI: http://localhost:3000

Test:

curl 'http://localhost:8080/api/v1/processes?agent_id=agent1'

4.2 Local (no containers)

cd backend
go run ./cmd/server -config ../server.yaml

Separate terminal:

cd backend
go run ./cmd/agent -config ../agent.yaml

4.3 Frontend Dev

cd web
npm install
npm run dev

Configure a dev proxy or change fetch base to reach :8080.

5. Configuration

agent.yaml

agent_id: agent1
server_url: http://server:8080
interval: 5s

server.yaml

bind_addr: :8080
retention: 1h
eval_interval: 15s
max_snapshots: 7200
persistence: false
db_dsn: postgres://user:pass@postgres:5432/procmon?sslmode=disable

6. API Summary

Method	Endpoint	Purpose
POST	/api/v1/agents/{agentID}/processes	Agent batch upload
GET	/api/v1/processes?agent_id=...&name=&pid=&status=&cpu_gt=&mem_gt=&port=	Current snapshot list
GET	/api/v1/processes/{pid}/history?agent_id=...&minutes=10	Raw history points
GET	/api/v1/processes/{pid}/series?agent_id=...&from=&to=&step=10s	Bucketed (averaged) trend
GET	/api/v1/processes/tree?agent_id=...&include_zombies=0	Process tree (sorted)
GET	/api/v1/alerts	Active firing alerts
GET	/api/v1/alert-rules	List alert rules (persistence)
POST	/api/v1/alert-rules	Create / upsert rule
PUT	/api/v1/alert-rules/{id}	Update rule
DELETE	/api/v1/alert-rules/{id}	Delete rule
GET	/metrics	Prometheus metrics

6.1 Minimal Usage Examples

Upload Batch (Agent simulation)

curl -X POST http://localhost:8080/api/v1/agents/agent1/processes \
	-H 'Content-Type: application/json' \
	-d '{"agent_id":"agent1","interval_s":5,"samples":[{"pid":1234,"ppid":1,"name":"demo","cmdline":"/usr/bin/demo","username":"root","status":"R","cpu_percent":12.5,"memory_rss":2048000,"memory_percent":0.4,"num_threads":5}]}'

List Processes (filter CPU > 50%)

curl 'http://localhost:8080/api/v1/processes?agent_id=agent1&cpu_gt=50'

Create Alert Rule

curl -X POST http://localhost:8080/api/v1/alert-rules \
	-H 'Content-Type: application/json' \
	-d '{"id":"rule_high_cpu","name":"High CPU","metric":"cpu_percent","operator":">","threshold":80,"duration":"60s","enabled":true}'

Fetch Trend Series

curl 'http://localhost:8080/api/v1/processes/1234/series?agent_id=agent1&step=15s&from=$(date -u -d "5 min ago" +%Y-%m-%dT%H:%M:%SZ)'

Alert Rule JSON

{
	"id": "rule_high_cpu",
	"name": "High CPU",
	"process_name": "",         
	"pid": 1234,                 
	"metric": "cpu_percent",    
	"operator": ">",             
	"threshold": 80,
	"duration": "60s",
	"enabled": true
}

Series API

Returns averaged bucket points; empty buckets omitted.

7. Extending

Goal	Hook / File	Approach
Add metric	collector/collector.go	Append field to `ProcessSnapshot`
Persist field	sqlstore/sqlstore.go	ALTER TABLE + CopyFrom columns
New alert metric	alerts.go	Extend `metricValue`
Auth	api.go router	Add JWT / OIDC middleware
Start/exit events	ingestion diff	Track prior PID set per agent
Grafana	/metrics or SQL	Export gauges / build dashboards

8. Operational Notes

max_snapshots bounds memory (ring buffer). Reduce for very high scrape rates.
CPU percent depends on sampling cadence; dual-sample improvement optional.
Series endpoint intentionally simple—integrate TSDB for richer queries.
Avoid high-cardinality Prometheus labels (per-PID) unless filtered.

9. Roadmap

Query endpoint for persisted alert events.
PID start/exit event stream & webhook.
UI charts & rule management views.
Advanced rule types (rate, absence, zombie detection).
Multi-tenant + auth.

10. License

MIT (add LICENSE before publishing publicly).

Chinese version: see README.zh-CN.md.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
web		web
.codecov.yml		.codecov.yml
.gitignore		.gitignore
Dockerfile.agent		Dockerfile.agent
Dockerfile.server		Dockerfile.server
Dockerfile.web		Dockerfile.web
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
agent.yaml		agent.yaml
docker-compose.yml		docker-compose.yml
server.yaml		server.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Process Monitoring System

0. Preview

1. Overview

2. Key Features

3. Architecture

4. Quick Start

4.1 Docker Compose

4.2 Local (no containers)

4.3 Frontend Dev

5. Configuration

6. API Summary

6.1 Minimal Usage Examples

Upload Batch (Agent simulation)

List Processes (filter CPU > 50%)

Create Alert Rule

Fetch Trend Series

Alert Rule JSON

Series API

7. Extending

8. Operational Notes

9. Roadmap

10. License

About

Uh oh!

Releases

Packages

Languages

License

namezzy/Process-Tracking

Folders and files

Latest commit

History

Repository files navigation

Distributed Process Monitoring System

0. Preview

1. Overview

2. Key Features

3. Architecture

4. Quick Start

4.1 Docker Compose

4.2 Local (no containers)

4.3 Frontend Dev

5. Configuration

6. API Summary

6.1 Minimal Usage Examples

Upload Batch (Agent simulation)

List Processes (filter CPU > 50%)

Create Alert Rule

Fetch Trend Series

Alert Rule JSON

Series API

7. Extending

8. Operational Notes

9. Roadmap

10. License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages