Cyber Resilience
Click any control badge to view its details. Download SVG
Key Control Areas
Critical Service Identification and Impact Tolerance
Resilient Architecture and Graceful Degradation
Immutable Backup and Data Protection
Isolated Recovery Environment
Identity and Access Resilience
Testing, Exercising, and Continuous Improvement
Regulatory Alignment and Governance
When to Use
This pattern is essential for any organisation where a prolonged technology outage would cause significant financial, reputational, or safety harm. It is particularly critical for: financial services organisations subject to DORA or operational resilience regulation, healthcare organisations where system availability affects patient safety, critical national infrastructure operators, organisations that have experienced (or closely observed peers experiencing) destructive cyber attacks, any organisation with high-value data that would be targeted by ransomware, and organisations with complex technology estates where recovery sequence and dependencies are not well understood.
When NOT to Use
Very small organisations with simple technology environments and high tolerance for downtime may find the full resilience architecture disproportionate, though basic backup and recovery capability is always appropriate. Organisations in early startup phase where the entire technology estate can be rebuilt from code repositories in hours may prioritise development speed over resilience architecture. Environments that are entirely ephemeral and stateless (pure functions with no persistent data) have different resilience characteristics and may not need traditional backup and recovery.
Typical Challenges
Cost is the primary barrier: resilient architecture requires investment in redundancy, backup infrastructure, recovery environments, and testing programmes that deliver no visible benefit until an incident occurs. Recovery testing disrupts production operations and requires dedicated time from teams already under pressure. Organisations discover circular dependencies during recovery exercises: the backup system depends on DNS which depends on Active Directory which is the thing being recovered. Cloud provider resilience is often assumed but not verified: multi-region deployments may share control plane dependencies that create correlated failures. Immutable backup storage increases cost and complexity compared to traditional backup approaches. The human factor is critical and often underestimated: recovery under the stress of an active incident with exhausted teams and executive pressure is fundamentally different from a planned test. Shadow IT and undocumented systems create recovery gaps: you cannot recover what you do not know exists. Third-party dependencies (SaaS providers, cloud platforms) may have their own resilience limitations that constrain your recovery timeline. Keeping recovery environments current and tested requires ongoing operational investment that competes with feature development.
Threat Resistance
Cyber Resilience addresses the threats that preventive controls alone cannot fully mitigate. Ransomware that encrypts production data is neutralised by immutable backups in isolated storage that the attacker cannot reach or corrupt (CP-09, CP-06, SC-28). Destructive wiper malware that destroys systems is countered by the ability to reconstitute from immutable images in an isolated recovery environment (CP-07, CP-10). Supply chain attacks that compromise trusted software are contained by verified software repositories and integrity checking during recovery (SI-07, SR-10). Advanced persistent threats that compromise the domain controller are addressed by independent identity recovery capability and break-glass access (IA-02, AC-02, CP-02). Insider threats that sabotage backup infrastructure are mitigated by separation of duties, immutable storage, and multi-party authorisation for backup administration (AC-05, CP-09, AC-02). Cloud provider outages are addressed by multi-region architecture and provider-independent recovery capability (SC-36, CP-07). Coordinated attacks targeting both production and backup simultaneously are countered by air-gapped recovery environments with independent credentials (SC-07, AC-02). The fundamental architectural principle is that recovery infrastructure must exist in a separate trust domain that does not share credentials, network paths, or administrative access with the production environment.
Assumptions
The organisation has identified its critical business services and their technology dependencies. Executive sponsorship exists for resilience investment (resilience costs money upfront and pays off during incidents). IT operations teams have the skills to implement and maintain backup and recovery infrastructure. Network architecture supports the creation of isolated recovery environments. The organisation can tolerate the operational overhead of maintaining and testing recovery capabilities. Regulatory requirements for operational resilience are understood and prioritised.
Developing Areas
- Immutable backup testing automation is improving but most organisations still rely on manual quarterly restoration exercises. The gap between backup completion (automated, measured) and backup recoverability (rarely tested, poorly measured) means that many organisations discover corruption or incompatibility only during real incidents. Emerging solutions like automated daily restore-and-validate pipelines are available from major vendors but adoption remains below 20% even in financial services.
- Cross-border disaster recovery coordination is increasingly complex as data residency requirements multiply. GDPR, DORA, and equivalent regulations create conflicting requirements where backup data must be geographically distributed for resilience but confined to specific jurisdictions for compliance. Organisations with operations spanning EU, UK, US, and APAC face recovery architectures constrained by legal geography rather than optimal network topology.
- DORA compliance measurement lacks standardised metrics. The regulation mandates operational resilience testing but does not define pass/fail criteria for recovery time objectives, backup integrity verification, or third-party resilience assessment. Regulatory supervisors are developing expectations through supervisory dialogue rather than published benchmarks, creating uncertainty for organisations attempting to demonstrate compliance.
- Chaos engineering applied to security resilience -- intentionally injecting security failures to test detection and recovery -- is gaining traction but remains controversial. Netflix-style failure injection for availability is well-understood, but deliberately simulating credential compromise, backup corruption, or identity infrastructure failure in production carries risks that most organisations are not willing to accept without mature safety mechanisms.
- Ransomware recovery time objectives are being tested against increasingly sophisticated adversary tactics. Modern ransomware groups target backup infrastructure specifically, with average dwell times of 10-14 days before encryption allows thorough reconnaissance of recovery capabilities. The emerging response -- isolated recovery environments with independent identity and network infrastructure -- is architecturally sound but operationally expensive and rarely tested at the frequency needed to maintain confidence.
Related Patterns
Patterns that operate within or alongside this one. Click any to view.