Skip to main content

Threshold Alerting

Threshold alerting creates alerts only when events occur multiple times within a time window. It’s essential for detecting brute force attacks, scanning, and other high-volume attack patterns.

Why Thresholds?

Without thresholds:
  • 1 failed login → alert
  • 1000 failed logins → 1000 alerts (alert fatigue!)
With thresholds:
  • 10 failed logins in 5 minutes → 1 alert
  • Meaningful detection without noise

Use Cases

PatternConfiguration
Brute force10 failed logins in 5 minutes
Port scanning100 connection attempts in 1 minute
Enumeration50 404 errors in 5 minutes
Credential spraying5 failed logins per user in 10 minutes

Configuring Thresholds

Enable on a Rule

  1. Open the rule in the editor
  2. Click Threshold Settings
  3. Enable threshold alerting
  4. Configure parameters
  5. Save and redeploy

Parameters

SettingDescriptionExample
Threshold CountMinimum matches to alert10
Time WindowAggregation window5 minutes
Group ByField to group countssource.ip

Example: Brute Force Detection

title: Brute Force Login Attempt
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4625
  condition: selection
level: medium
Threshold settings:
  • Count: 10
  • Window: 5 minutes
  • Group By: source.ip
Result: Alert when same IP fails login 10+ times in 5 minutes.

Group By Field

The group_by field determines aggregation:

No Group By

Count all matching events globally:
  • 10 failed logins from any source → alert
  • Less specific, catches distributed attacks

Group By Source IP

Count per source IP:
  • 10 failed logins from 192.168.1.50 → alert
  • 9 failed logins from 192.168.1.51 → no alert (yet)

Group By User

Count per username:
  • 10 failed logins for admin → alert
  • Catches attacks targeting specific accounts

Multiple Group By

Some systems support multiple fields:
  • Group by source.ip AND user.name
  • 10 failed logins from same IP for same user → alert

How Counting Works

Time Windows

Windows are rolling, not fixed:
14:00 - Event 1
14:02 - Event 2
14:03 - Event 3
14:06 - Event 4 (Event 1 expires, window is now 14:02-14:07)
14:07 - Event 5
...
Each new event extends the window; old events expire.

Threshold State

CHAD tracks:
  • Current count per group
  • Timestamps of events in window
  • When last alert fired
State is stored in PostgreSQL for persistence.

Alert Behavior

First Alert

When threshold is reached:
  1. Alert created with count information
  2. State continues tracking
  3. Further events don’t create new alerts immediately

Subsequent Alerts

After initial alert:
  • Optional cool-down period
  • Alert again if count resets and re-exceeds threshold
  • Or alert on continued activity (configurable)

Threshold vs Exception Rules

FeatureThresholdException
PurposeReduce noise from volumeSuppress false positives
LogicCount-basedCondition-based
Applies toAll matches of ruleSpecific field values
Use together: Thresholds reduce volume, exceptions tune out known-good.

Performance Considerations

Thresholds require state tracking:
  • Memory: One state entry per group value
  • Storage: Persisted to database
  • Cleanup: Old state automatically expires
For high-cardinality group-by fields (e.g., source IP on busy network):
  • Monitor state table size
  • Consider shorter time windows
  • Use specific index patterns

Best Practices

10 events in 5 minutes is a good starting point for most attacks.
Group by the attacker-controlled value (usually source IP).
Brute force: lower threshold. Scanning: higher threshold.
Use dry-run to estimate alert volume with different thresholds.
Note why you chose specific threshold values.

Examples

SSH Brute Force

title: SSH Brute Force
logsource:
  product: linux
  service: sshd
detection:
  selection:
    message|contains: 'Failed password'
  condition: selection
Threshold: 5 in 2 minutes, group by source.ip

Web Scanning

title: Web Directory Scanning
logsource:
  product: apache
  service: access
detection:
  selection:
    response_code: 404
  condition: selection
Threshold: 50 in 1 minute, group by source.ip

Account Lockout Storm

title: Mass Account Lockout
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4740
  condition: selection
Threshold: 10 in 5 minutes, no group by (global)

Troubleshooting

Alert never fires

  1. Threshold too high for your traffic
  2. Time window too short
  3. Group-by field missing from some logs
  4. Rule not matching at all (test without threshold first)

Too many threshold alerts

  1. Threshold too low
  2. Time window too long
  3. Normal traffic exceeds threshold
  4. Consider exception rules for known sources

Inconsistent counting

  1. Check timestamp field accuracy
  2. Verify group-by field is consistent
  3. Review state expiration settings

Next Steps