FIM, Auditing, and Configuration Compliance in Active-Active and Active-Passive Designs
In a lot of environments I walk into, clustering is treated as a resilience problem - but not always as a security one. We're busy designing for uptime, failover, scale, etc... but we don’t always design for consistency of trust across the cluster (especially for on premise deployments where there's a lot more variables at play in my experience). In doing so we, we expose ourselves to risk
The Misconception
There’s a common assumption:
-
Active-active clusters: “everything is live, so everything must be secured”
-
Active-passive clusters: “only the active node matters right now”
That second assumption is where risk creeps in because, from a security standpoint, a passive node is not inactive - it’s just "waiting". Waiting to take over your production traffic, to execute the same critical workloads you were running before seamlessly, to become the new “trusted” state (a term we use a lot with Tripwire Enterprise and change monitoring.
That why if your passive infrastructure is not continuously monitored and validated (consistently and alongside your active nodes), you’ve effectively built a pre-positioned attack vector into your architecture (and one that will often expose you to the most risk when you're already potentially fighting another fire that started a failover too).
The Real Risk: Drift + Failover = Exposure
In real terms, you want to consider what happens during a failover to justify your monitoring stance:
The Scenario: Active-Passive Cluster Failover
-
Node A (active) is hardened, monitored, compliant
-
Node B (passive) is:
-
Missing patches
-
Has config drift
-
Has unauthorised changes (or worse, persistence mechanisms)
-
Everything looks fine… until failover.
Now Node B becomes active.
And suddenly:
-
Your compliance posture changes instantly
-
Your monitoring baselines are invalid
-
Potentially compromised state becomes production reality
This is not theoretical - this is a common failure mode I see a lot of the time in the real world.
Why FIM and Auditing Must Be Cluster-Wide
1. Trust Must Be Symmetrical
Clusters are built on the assumption of equivalence between nodes.
If Node B cannot be trusted to behave identically to Node A:
-
Your failover is not safe
-
Your resilience is compromised
-
Your security model is inconsistent
FIM (File Integrity Monitoring) ensures:
-
Critical binaries, configs, and system files remain consistent
-
Drift is detected before failover happens
2. Drift Doesn’t Care About Node State
Configuration drift occurs regardless of:
-
Node role (active/passive)
-
Workload assignment
-
Traffic patterns
Common causes:
-
Manual changes during maintenance
-
Automation gaps
-
Patch inconsistencies
-
“temporary” fixes that become permanent
Without continuous monitoring:
-
Passive nodes silently diverge
-
Baselines become meaningless
3. Attackers Target the Quiet Parts of Your Estate
If I were attacking your environment, I wouldn’t go for the most monitored system.
I’d go for:
-
The standby node
-
The DR environment
-
The “not currently in use” infrastructure
Why?
Because:
-
It’s often less monitored
-
Changes go unnoticed
-
It provides a clean pivot point into production
A compromised passive node is effectively a time-delayed breach. And it's not just me - compromised backup infrastructure has become an increasingly common attack vector and we need to remember that the backup of your crown jewels is, in effect, just a set of crown jewels.
4. Compliance Doesn’t Pause for Failover
From an audit and regulatory standpoint:
There is no distinction between:
-
“active system”
-
“standby system”
If it can process production workloads, it must:
-
Meet configuration standards
-
Be continuously audited
-
Provide evidential integrity
Otherwise, you end up in a position where:
-
You pass audits during normal operation
-
You fail them the moment failover occurs
Active-Active Isn’t Immune Either
It’s tempting to assume active-active setups are safer because everything is “live”.
But the same issues apply:
-
Nodes can still drift independently
-
Load distribution can mask inconsistent states
-
Partial compromise can spread laterally
Without FIM and configuration compliance:
-
You lose the guarantee that nodes are equivalent
-
You risk inconsistent behaviour under load
-
You introduce non-deterministic failure modes
What “Good” Looks Like
A properly secured cluster treats every node as production-ready at all times.
Minimum baseline:
-
FIM applied to all nodes
-
Critical OS paths
-
Application configs
-
Cluster configuration files
-
-
Continuous auditing
-
Not scheduled “once a day” checks
-
Near real-time or frequent validation
-
-
Configuration compliance enforcement
-
CIS / internal hardening baselines
-
Drift detection + remediation workflows
-
-
Consistent policy application
-
No “reduced monitoring” for passive nodes
-
No exceptions for DR environments
-
The Design Principle to Take Away
If a node can ever become active, it must be treated as active all the time.
That means:
-
Monitored
-
Audited
-
Validated
-
Trusted
Anything less introduces a gap between:
-
Resilience architecture
-
Security architecture
And that gap is exactly where failures - and breaches - tend to occur.
Final Thought
Clustering is about removing single points of failure.
But if your security controls only apply to part of the cluster, you’ve just moved the single point of failure somewhere less visible.
And that’s significantly worse.
Because it won’t fail loudly.
It will fail quietly - right at the moment you need it most.