Hope is not a (DR) strategy
Stephen Kelly, Cirata, CEO
When we think about business disasters, we often picture dramatic events like fires or floods. But in our digital-first world, the most likely disasters are far quieter and often self-inflicted: a failed data system, a critical system outage, Domain Name (DNS) errors, or a ransomware attack that brings operations to a standstill. We have seen this firsthand recently with high profile outages at AWS and Microsoft, and even more recent with the still fresh outage at Cloudflare.
Too many organizations still rely on a disaster recovery (DR) plan that is, frankly, based on hope. They have backups, most likely outdated, but have they ever tested a full restore? They have a plan in a binder, but does it account for the reality of today's complex, cloud environments? And have they rehearsed all reasonable ‘fire drills’ and recovery?
Relying on hope isn't just risky; it's a direct threat to business continuity. In technology, we plan for failure in our code and our architecture, so why do we so often fail to apply that same rigor to our data? This is doubly important in regulated industries or with precious customer data.
The real cost of downtime
The consequences of a failed DR plan go far beyond immediate financial loss.
- Reputational damage: Trust is hard-won and lost in seconds. An extended outage or data loss event can permanently damage your relationship with customers. An estimated 61% of customers lose brand trust when directly affected by an outage.
- Operational paralysis: Your teams can't develop, sell, or support anything if their core systems are down. Every minute of downtime grinds productivity to a halt, and losing revenue
- Competitive disadvantages: While you’re busy recovering, your competitors are busy innovating, courting and serving your customers.
Recent industry analysis shows that the average cost of downtime is climbing into the thousands of dollars per minute for many enterprises. Yet, a surprising number of DR tests still fail. This isn't a hypothetical problem; it’s happening right now, and the gap between the plan and the reality can be devastating.
Building a resilient, recovery-ready future
A modern disaster recovery strategy isn't a document you dust off once a year. It's a living, breathing capability that must be automated, tested, and aligned with your business objectives and ready for change as the threats evolve.
- Automate your recovery: Manual recovery processes are slow, error-prone, and simply can't keep pace with the scale of modern data. True resilience comes from automating the process of moving data to your recovery site and ensuring it is ready for use. This significantly reduces your recovery time objective (RTO).
- Validate everything: A backup is useless if it's corrupted. Your DR strategy must include automated, continuous validation to ensure the data you've replicated is identical to the source. This ensures your recovery point objective (RPO) is not just a target, but a guarantee of data integrity. Key is to bring back the applications and systems in seconds and minutes not weeks and months.
- Test like you fight: The only way to know if your plan works is to test it regularly and realistically. This doesn't have to mean a full-scale, disruptive failover every time. Modern tools allow you to conduct non-disruptive tests that simulate a real event without impacting production workloads, turning DR from a high-stakes gamble into a predictable science.
Take for example, one of our largest clients - a top US bank, depends on the 24/7 availability of its data. The bank uses an on-premises Cloudera CDP cluster for one of their analytics platforms and a secondary CDP environment for Disaster Recovery (DR). The company has established a 15-minute service level agreement (SLA) for the RTO and RPO of the DR environment. Our solution supports complete and continuous replication of data sets at any scale. With zero disruption or impact to the existing system, we were able to migrate the initial data sets with a single pass through the source storage, eliminating the overhead of repeated scans while also supporting continuous replication of any ongoing changes as they occur, thus guaranteeing any changes in the last 15 minutes will be replicated to the DR environment.
Confidence, not anxiety
Disaster recovery should be a source of confidence, not anxiety. It's about taking control and ensuring that when, not if, a disruption occurs, your business is ready.
When was the last time your organization tested its disaster recovery plan? What was the most surprising lesson you learned?
Having been caught out a decade ago as a CEO of FTSE100 company with a serious cyber-attack, I never want to see Execs suffer the pain, anxiety and stress that I lived through. What we advocate is common sense, proactive measure with the robust, resilient strategy for your data and core systems.
Best, Stephen Kelly, CEO, Cirata


