Threshold Alerting

Threshold alerting creates alerts only when events occur multiple times within a time window. It’s essential for detecting brute force attacks, scanning, and other high-volume attack patterns.

Why Thresholds?

Without thresholds:

1 failed login → alert
1000 failed logins → 1000 alerts (alert fatigue!)

With thresholds:

10 failed logins in 5 minutes → 1 alert
Meaningful detection without noise

Use Cases

Pattern	Configuration
Brute force	10 failed logins in 5 minutes
Port scanning	100 connection attempts in 1 minute
Enumeration	50 404 errors in 5 minutes
Credential spraying	5 failed logins per user in 10 minutes

Configuring Thresholds

Enable on a Rule

Open the rule in the editor
Click Threshold Settings
Enable threshold alerting
Configure parameters
Save and redeploy

Parameters

Setting	Description	Example
Threshold Count	Minimum matches to alert	`10`
Time Window	Aggregation window	`5 minutes`
Group By	Field to group counts	`source.ip`

Example: Brute Force Detection

title: Brute Force Login Attempt
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4625
  condition: selection
level: medium

Threshold settings:

Count: 10
Window: 5 minutes
Group By: source.ip

Result: Alert when same IP fails login 10+ times in 5 minutes.

Group By Field

The group_by field determines aggregation:

No Group By

Count all matching events globally:

10 failed logins from any source → alert
Less specific, catches distributed attacks

Group By Source IP

Count per source IP:

10 failed logins from 192.168.1.50 → alert
9 failed logins from 192.168.1.51 → no alert (yet)

Group By User

Count per username:

10 failed logins for admin → alert
Catches attacks targeting specific accounts

Multiple Group By

Some systems support multiple fields:

Group by source.ip AND user.name
10 failed logins from same IP for same user → alert

How Counting Works

Time Windows

Windows are rolling, not fixed:

00 - Event 1
02 - Event 2
03 - Event 3
06 - Event 4 (Event 1 expires, window is now 14:02-14:07)
07 - Event 5
...

Each new event extends the window; old events expire.

Threshold State

CHAD tracks:

Current count per group
Timestamps of events in window
When last alert fired

State is stored in PostgreSQL for persistence.

Alert Behavior

First Alert

When threshold is reached:

Alert created with count information
State continues tracking
Further events don’t create new alerts immediately

Subsequent Alerts

After initial alert:

Optional cool-down period
Alert again if count resets and re-exceeds threshold
Or alert on continued activity (configurable)

Threshold vs Exception Rules

Feature	Threshold	Exception
Purpose	Reduce noise from volume	Suppress false positives
Logic	Count-based	Condition-based
Applies to	All matches of rule	Specific field values

Use together: Thresholds reduce volume, exceptions tune out known-good.

Performance Considerations

Thresholds require state tracking:

Memory: One state entry per group value
Storage: Persisted to database
Cleanup: Old state automatically expires

For high-cardinality group-by fields (e.g., source IP on busy network):

Monitor state table size
Consider shorter time windows
Use specific index patterns

Best Practices

Start with reasonable thresholds

10 events in 5 minutes is a good starting point for most attacks.

Choose meaningful group-by

Group by the attacker-controlled value (usually source IP).

Match threshold to attack

Brute force: lower threshold. Scanning: higher threshold.

Test with historical data

Use dry-run to estimate alert volume with different thresholds.

Document the reasoning

Note why you chose specific threshold values.

Examples

SSH Brute Force

title: SSH Brute Force
logsource:
  product: linux
  service: sshd
detection:
  selection:
    message|contains: 'Failed password'
  condition: selection

Threshold: 5 in 2 minutes, group by source.ip

Web Scanning

title: Web Directory Scanning
logsource:
  product: apache
  service: access
detection:
  selection:
    response_code: 404
  condition: selection

Threshold: 50 in 1 minute, group by source.ip

Account Lockout Storm

title: Mass Account Lockout
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4740
  condition: selection

Threshold: 10 in 5 minutes, no group by (global)

Troubleshooting

Alert never fires

Threshold too high for your traffic
Time window too short
Group-by field missing from some logs
Rule not matching at all (test without threshold first)

Too many threshold alerts

Threshold too low
Time window too long
Normal traffic exceeds threshold
Consider exception rules for known sources

Inconsistent counting

Check timestamp field accuracy
Verify group-by field is consistent
Review state expiration settings

Next Steps

Correlation Rules

Link multiple detections

Exception Rules

Tune out false positives

​Threshold Alerting

​Why Thresholds?

​Use Cases

​Configuring Thresholds

​Enable on a Rule

​Parameters

​Example: Brute Force Detection

​Group By Field

​No Group By

​Group By Source IP

​Group By User

​Multiple Group By

​How Counting Works

​Time Windows

​Threshold State

​Alert Behavior

​First Alert

​Subsequent Alerts

​Threshold vs Exception Rules

​Performance Considerations

​Best Practices

​Examples

​SSH Brute Force

​Web Scanning

​Account Lockout Storm

​Troubleshooting

​Alert never fires

​Too many threshold alerts

​Inconsistent counting

​Next Steps