Threshold Alerting
Threshold alerting creates alerts only when events occur multiple times within a time window. It’s essential for detecting brute force attacks, scanning, and other high-volume attack patterns.Why Thresholds?
Without thresholds:- 1 failed login → alert
- 1000 failed logins → 1000 alerts (alert fatigue!)
- 10 failed logins in 5 minutes → 1 alert
- Meaningful detection without noise
Use Cases
| Pattern | Configuration |
|---|---|
| Brute force | 10 failed logins in 5 minutes |
| Port scanning | 100 connection attempts in 1 minute |
| Enumeration | 50 404 errors in 5 minutes |
| Credential spraying | 5 failed logins per user in 10 minutes |
Configuring Thresholds
Enable on a Rule
- Open the rule in the editor
- Click Threshold Settings
- Enable threshold alerting
- Configure parameters
- Save and redeploy
Parameters
| Setting | Description | Example |
|---|---|---|
| Threshold Count | Minimum matches to alert | 10 |
| Time Window | Aggregation window | 5 minutes |
| Group By | Field to group counts | source.ip |
Example: Brute Force Detection
- Count: 10
- Window: 5 minutes
- Group By:
source.ip
Group By Field
Thegroup_by field determines aggregation:
No Group By
Count all matching events globally:- 10 failed logins from any source → alert
- Less specific, catches distributed attacks
Group By Source IP
Count per source IP:- 10 failed logins from
192.168.1.50→ alert - 9 failed logins from
192.168.1.51→ no alert (yet)
Group By User
Count per username:- 10 failed logins for
admin→ alert - Catches attacks targeting specific accounts
Multiple Group By
Some systems support multiple fields:- Group by
source.ipANDuser.name - 10 failed logins from same IP for same user → alert
How Counting Works
Time Windows
Windows are rolling, not fixed:Threshold State
CHAD tracks:- Current count per group
- Timestamps of events in window
- When last alert fired
Alert Behavior
First Alert
When threshold is reached:- Alert created with count information
- State continues tracking
- Further events don’t create new alerts immediately
Subsequent Alerts
After initial alert:- Optional cool-down period
- Alert again if count resets and re-exceeds threshold
- Or alert on continued activity (configurable)
Threshold vs Exception Rules
| Feature | Threshold | Exception |
|---|---|---|
| Purpose | Reduce noise from volume | Suppress false positives |
| Logic | Count-based | Condition-based |
| Applies to | All matches of rule | Specific field values |
Performance Considerations
Thresholds require state tracking:- Memory: One state entry per group value
- Storage: Persisted to database
- Cleanup: Old state automatically expires
- Monitor state table size
- Consider shorter time windows
- Use specific index patterns
Best Practices
Start with reasonable thresholds
Start with reasonable thresholds
10 events in 5 minutes is a good starting point for most attacks.
Choose meaningful group-by
Choose meaningful group-by
Group by the attacker-controlled value (usually source IP).
Match threshold to attack
Match threshold to attack
Brute force: lower threshold. Scanning: higher threshold.
Test with historical data
Test with historical data
Use dry-run to estimate alert volume with different thresholds.
Document the reasoning
Document the reasoning
Note why you chose specific threshold values.
Examples
SSH Brute Force
source.ip
Web Scanning
source.ip
Account Lockout Storm
Troubleshooting
Alert never fires
- Threshold too high for your traffic
- Time window too short
- Group-by field missing from some logs
- Rule not matching at all (test without threshold first)
Too many threshold alerts
- Threshold too low
- Time window too long
- Normal traffic exceeds threshold
- Consider exception rules for known sources
Inconsistent counting
- Check timestamp field accuracy
- Verify group-by field is consistent
- Review state expiration settings