Demonstrating Scientific Data Integrity and Security in the Cloud

Companies in the life sciences space often generate loads of data—that much is self-evident, but how to store it may not be so straightforward for the organizations themselves.

While more and more organizations are shifting from on-premise to cloud-based storage systems (and will enjoy a competitive advantage of doing so), some may hesitate, fearing an inability to interpret or comply with regulations, or perhaps that their data will not be secure. I’d argue that the resistance companies may feel, while understandable, is largely unfounded—and that moving to the cloud actually has many advantages, as it generally meets and exceeds regulatory requirements for data integrity and security, along with ease of access. In this time of Covid-19, these benefits have become all the more evident. 

In my role at IDBS, a leading provider of advanced software for Research & Development organizations, a common concern I hear from organizations is the idea of the cloud in general, since it conjures up visions of a diffuse, non-localized storage system, which may seem at first glance less secure than a computer terminal in one’s own facility. This is not the way it works, of course—the storage is in reality highly configurable, controlled, and locatable.

In fact, a company can actually lock down the geopolitical region, country, or state in which the data will reside, and since regulations are underpinned by governing laws of that region, understanding them is essential.  

However, precisely for safety reasons, the street address for high-security cloud data centers is often not disclosed. Many large Infrastructure as a Service (IaaS) organizations, such as Amazon Web Services (AWS), have made the decision not to provide the exact street address for their datacenters to reduce security risks, which has now become best practice. When data is stored across multiple, highly confidential locations, data is actually more secure and less likely to be subject to integrity issues. 

Since being able to “see” the data is a regulatory requirement, the cloud actually makes it much easier to do so, as it allows for access from multiple computer terminals and systems. The challenge for life sciences organization is being able to demonstrate to auditors that suitable due diligence of the infrastructure has been carried out without being able to visit the location, or perhaps even knowing the exact location the data is stored. 

Interpreting the regulations that govern data storage is clearly essential, but can be challenging, as some are open-ended or haven’t kept pace with technology. For instance, the UK’s Organisation for Economic Co-operation and Development regulations states: “The location(s) where the study plan, samples of test and reference items, specimens, raw data and the final report are to be stored.”

However there is no definition of “location.” Once upon a time it would have meant a lab book or a specific computer terminal, but in the new world, it could mean facility site name, geographical area, URL, cloud databases, and so on. This wording can make use of the latest technology challenging, but not impossible. The Food and Drug Administration (FDA) draft GCP guidance goes further than others in allowing SaaS applications for clinical investigations, provided certain conditions are met. 

A potential solution for showing that due diligence checks have been carried out is to rely on the compliance documentation of the supplier. Whilst certifications such as ISO 9001 and ISO 27001 do provide assurance, there is a limited transparency: a successfully audited pharmaceutical organization may not be able to provide full details of the audit, leaving the organization feeling unsure it can demonstrate that risk has been sufficiently mitigated.

I would argue that SOC 2 is a much more transparent and user-friendly system, as it lists out all the controls and any known exceptions to any of them. For a SaaS organization to show they are compliant with SOC 2, they must show not only that they have the relevant controls in place, but that the controls have been operated effectively over six- to 12-months. This type of prolonged, multipoint audit provides a level of visibility and insight that’s far superior to a one- or two-day audit on-site.

Veiga and Calnan (2018) offer a nice example of a pharma company outsourcing its R&D to several sites of a clinical research organization (CRO), who in turn outsource the data collection duties to a hospital who collected the trial’s participant data. The CRO also contracted a third-party IT company (SaaS Provider) for data management. The example demonstrates that integrity of the data can be maintained in highly regulated situations, whilst satisfying regulatory requirements and still maintaining the best practice attributes of cloud security. 

Covid-19 has been a wakeup call for some organizations, who have been forced to grapple with their past hesitance in shifting over. It’s also highlighted the good position of organizations who have already transferred to a cloud-based system, fully capable of work from home during the lockdown. They’ve also been able to avoid the difficult decision of whether to send in employees and put them at risk or to keep them out of risk but reduce the level of support or service. 

When you consider the security and flexibility (as well as benefits not discussed here, including cost-savings and the ability to integrate with new technologies), it’s hard to argue against migrating to the cloud.

From a market pressure point of view, customers who do the research and learn the regulatory considerations have competitive advantage, where customers who don’t do it will miss out. The market is going into the cloud—the question is how fast and who will be left behind.

What’s Hot on Infosecurity Magazine?