As more organizations use Software as a Service (SaaS) for mission-critical applications, the complexity of managing data loss incidents within these environments grows. Estimates of the hourly cost of downtime range from $336k to multimillions, depending on the company's size. Rapid restoration of SaaS data is key to minimize disruption and cost, but many organizations are ill-prepared in part due to the ‘InfoSec↔SaaS divide’.
Maintaining reliable and complete data is essential for successful applications of AI to ensure accuracy. This article helps bridge the divide by showing how SaaS fundamentally changes Business Continuity and Disaster Recovery (BCDR) planning and data repair, particularly in the era of agentic AI.
Layered Responsibility for SaaS Recovery
Traditionally, Information Security (InfoSec) teams have focused on the Confidentiality, Integrity and Availability (CIA) trinity of their IT infrastructure and systems. To fulfill the data protection and recovery functions defined in the NIST Cyber Security Framework (CSF), system administrators with direct access to IT systems and infrastructure would be delegated responsibility to backup and restore data.
The traditional approach could involve restoring a file server or rolling back database changes to a prior state, all within well-defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
However, SaaS fundamentally shifts BCDR by dividing recovery into two distinct components that require separate planning approaches.
- Infrastructure Resilience: The SaaS provider owns infrastructure-level redundancy and backups to maintain operational continuity during regional outages or major disruptions. InfoSec and SaaS teams are no longer responsible for infrastructure resilience. Instead, they are responsible for backing up and recovering data and files stored in their SaaS instances. This is significant for two primary reasons. First, the RTO and RPO for SaaS data become dependent on the vendor's capabilities, which are not within the control of the customer
- Data Stewardship: A common misconception, even among mature InfoSec teams, is the assumption that SaaS data protection is fully managed by the vendor. This “set it and forget it” mindset, while understandable given the cloud promise, overlooks the need for organizations to backup their SaaS data. Common causes of data loss and corruption are human errors within the customer’s SaaS instance, including accidental deletion, integration issues, and migration mishaps which fall under the customer’s responsibility. Due to the InfoSec↔SaaS divide, organizations may not be prepared to handle such problems if they rely solely on the SaaS provider to recover quickly
We have encountered highly regulated entities conducting BCDR exercises of their SaaS applications that only address issues impacting the infrastructure layer, potentially missing the more frequent occurrences of data loss within their area of responsibility. Lack of SaaS data backup and recovery capabilities that can meet the data-layer RTO and RPO of your business risks resulting in substantial disruption and downtime.
SaaS Rolls Forward, Not Backward
Another fundamental shift in SaaS data recovery is that there is no traditional rollback. To understand this, compare it to the traditional approach of taking a server or database offline for the duration it takes to restore. Under these circumstances, the Recovery Time Actual (RTA) refers to how long it took to roll a system back to a known good state by undoing errors, allowing the system to function again, while also discarding valid transactions that coincided with the incident. SaaS differs in that it allows for precise repair of impacted data while the system is operational and users continue to use it.
Consider the release of a new application version from a development sandbox into production. Best practice is to perform a backup of the SaaS data in the production environment before and after deploying the new release. Performing a comparative analysis between these backups can show unintended changes to data or metadata. In such cases, it is no longer possible to roll back in full; it is necessary to fix the problems precisely and roll forward.
Consider another scenario in which SaaS data serves as an authoritative source for other business processes. When bad SaaS data has been introduced into the process, you need to consider the downstream impact and fix it forward. There is no rolling back.
The ability to rapidly roll forward ensures that data repair processes can keep pace with the increasing speed and scale of AI-driven applications.
SaaS Data Availability and Precision Repair
To maintain the availability and integrity of SaaS data, there are three steps that InfoSec and SaaS teams can take together:
- Determine the vendor’s actual RTO and RPO capabilities and consider adding data redundancy for high-availability systems
- Ensure that data backups are readily accessible and searchable, even when the SaaS platform is unavailable, such as due to account lockout or regional outage. By maintaining data availability even during an incident, you can continue critical business functions while SaaS data is being repaired
- Use a backup and recovery solution that supports surgical repair of SaaS data and practice this operation periodically to ensure you can meet the data-layer RTO and RPO of your business
For the third step to be successful, InfoSec and SaaS teams must practice SaaS precision repair operations together. The ability to put data back quickly into good condition is harder than many organizations may realize. Repairing data and metadata in a SaaS system is constrained by the available APIs, functional or operational limits, and the referential integrity of the data and relationships between objects that must be maintained.
InfoSec and SaaS teams must combine their knowledge and experience to ensure that backups contain all necessary data, as well as metadata, which provides the necessary context, and can be restored reliably. SaaS administrators can prevent users from logging in, disable automations, block upstream data from being sent, or restrict data from being sent to downstream systems as needed. InfoSec needs to know that defined policies can be met and can ask SaaS administrators whether the data-layer RTO and RPO can be accomplished in practice.
To avoid added time, resources, and downtime, ensure that you can repair only the data you need, rather than everything. Remember that InfoSec and SaaS teams are responsible for the data within their SaaS instance, not the underlying infrastructure. Restoring everything is the traditional BCDR approach, preparing for a catastrophic infrastructure failure which is the SaaS provider’s responsibility to manage. Restoring everything for what is typically a precise problem, such as accidental deletions or integration errors, leads to significant time restoring unaffected data, and its lengthy execution time means the system is in a compromised state for longer than necessary. The ability to promptly detect and fix problems, large or small, helps maintain reliable and error-free data for human and AI consumption.
| Practitioner’s Tip: To address the NIST CSF Detect function, InfoSec and SaaS teams can collaborate to configure alerts on SaaS data anomalies, including deletion and corruption. Such early warnings ensure speedy recovery, thereby preventing weeks of downtime and a six- to seven-figure business interruption. Effective alerting concentrates on the highest risks, such as data that is sensitive (e.g., PII, PHI), financially relevant (e.g., SOX compliance), most used (e.g., customer objects), and should not be changed or deleted (e.g., metadata components) |
SaaS Data Integrity and Zero Loss
InfoSec and SaaS administrators have common interests to maintain the availability and integrity of data, but from different perspectives. SaaS administrators think in terms of keeping the system running and maintaining the data quality, focusing on the data-layer, often relying on weekly or ad-hoc backups. InfoSec thinks in terms of detection, response and recovery processes to prevent harm to the business, and aims for little to no data loss and downtime. InfoSec and SaaS need to work together on an approach that addresses both needs, effectively meeting in between.
As organizations modernize and rely more heavily on SaaS data, particularly for agentic AI applications, any loss, corruption, or error can disrupt critical systems and services. Traditional approaches to maintaining data integrity relied on backups that provide a snapshot in time, which may miss key changes between snapshots. SaaS solutions utilizing change data capture capabilities can continuously preserve and inspect every modification, enabling proactive detection and correction.
Continuous backup of mission-critical data, combined with capabilities to repair data precisely, ensures that the integrity of SaaS data can be maintained after an incident. Such trusted and resilient data is essential for successful applications of AI, which require accessible, reliable, relevant, and error-free data. For certain high-value data, consider continuous data capture to achieve zero Data RPO.
United We Stand, Divided We Fall
The layered responsibility for SaaS recovery requires vendors, InfoSec, and application owners to combine forces to prevent SaaS data loss and disruption. By understanding the distinction between SaaS infrastructure RTO and RPO versus data-layer RTO and RPO, organizations can update BCDR plans to ensure the availability and integrity of SaaS data. By understanding that SaaS rolls forward, not backwards, organizations can move beyond outdated rollback strategies and embrace more precise repair capabilities.
This requires vendors, InfoSec, and SaaS teams to collaborate on determining infrastructure RTO/RPO capabilities, ensuring accessible and searchable data backups, and implementing backup and recovery solutions that support precision repair. Furthermore, maintaining SaaS data integrity and aiming for zero loss through continuous data capture and inspection empowers organizations to proactively detect and correct issues.
This collaborative approach, combining the perspectives of InfoSec (CIA processes) and SaaS administrators (system uptime and data quality), strengthens overall SaaS resilience. By working together, teams can establish effective alerting mechanisms, hone their technical skills, and adhere to robust data recovery processes, ultimately reducing downtime, minimizing costs, and preventing incidents from escalating. This unified strategy is essential for safeguarding critical data and ensuring uninterrupted business operations in the modern SaaS environment, enabling successful agentic AI applications.


