Skip to main content

Snowflake

HIPAA compliance with Snowflake

Medicine Doctor Touching Electronic Medical Record On Tablet. Dna. Digital Healthcare And Network Connection On Hologram Modern Virtual Screen Interface, Medical Technology And Futuristic Concept.

At Perficient, our Data Solutions team has worked closely with our Healthcare division to implement Snowflake for HIPAA and HITECH compliance. Snowflake offers healthcare organizations a secure data warehouse environment with many HIPAA compliance features. Perficient’s implementation team includes Snowflake and health industry subject matter experts. We’ll take a look at Snowflake’s benefits for healthcare providers looking to improve their HIPAA and HITECH compliance efforts and some specific strategies that you can use.

Snowflake Edition and Cloud Provider

Snowflake is a cloud-based solution or Software-As-A-Service (SaaS). This implies that all three Snowflake layers of architecture (storage, computing, and cloud services) are already deployed and maintained on a specific cloud platform. The level of attainable security depends on the Snowflake edition and cloud provider region. The Snowflake Edition that your organization chooses determines the unit costs for the credits and the data storage you use as well as the level of security. The Business Critical Edition provides support for PHI data for compliance with HIPAA for any cloud provider in any region. HITRUST CSF certification for Amazon Web Services is available in seven US, EU, Canada, and Asia Pacific regions. Azure offers one region each in the US, Canada, and Western Europe. At this time, Google Cloud Platform does not support HITRUST CSF with Snowflake. There are some support limitations for 3rd party applications on both Azure and GCP.

Deploying Snowflake Business Critical Edition on Amazon Web Services is currently the most recommended platform for US-based healthcare companies.

Continuous Data Protection

Snowflake tools and architecture enable Snowflake’s Continuous Data Protection (CDP) functionality. CDP covers a wide range of capabilities that help to safeguard data stored in Snowflake against human error, malicious acts, and hardware or software failure. In the event of accidental or deliberate distortion, removal, or corruption of your data, Snowflake enables your data to be accessible and recoverable.

The Standard Edition of Snowflake enables CDP by providing:

  • Automatic encryption of all data
  • Object-level access control
  • Support for multi-factor authentication

The Enterprise edition of Snowflake provide additional features that can be used to architect a compliant data solution:

  • Periodic rekeying of encrypted data
  • Column-level Security to apply masking policies to columns in tables or views
  • Row Access Policies to apply row access policies to determine which rows are visible in a query result
  • Object Tagging to apply tags to Snowflake objects to facilitate tracking sensitive data and resource usage

Finally, the Business Critical Edition, in addition to providing support for PHI data in accordance with HIPAA and HITRUST CFA, offers the following advanced security features:

  • Customer-managed encryption keys through Tri-Secret Secure
  • Support for Private Connectivity to the Snowflake Service

Deliberate and thoughtful usage of object tagging can help implement the smart row and column-level policies needed to realize HIPAA compliance with Snowflake. 

End-To-End Encryption

Snowflake is designed to minimize risk by encrypting data at rest and in motion. End-to-end encryption (E2EE) is a form of communication in which no one but end users can read the data. The E2EE architecture minimizes attack surface exposure. Regardless of whether a security breach affects the cloud platform’s infrastructure, data is secured due to its encryption, regardless of whether an internal or external attacker causes the breach.

Encryption

All data files are encrypted throughout each stage of a data movement pipeline. Snowflake has both internal and external stages for data files. Internal stages are in the Snowflake database, which you can use to upload your data files before loading them into tables. External stages are in supported cloud storage systems that you own and control.

Key Lifecycle

The National Institute of Standards and Technology (NIST) recommends limiting the lifetime of a key to enhance security. Snowflake’s Encryption Key Rotation service changes keys automatically regularly. When Snowflake identifies that a table master or account key is more than 30 days old, it automatically rotates them. Data can be reencrypted (or “rekeyed”) automatically regularly. Active keys are retired and new ones are generated when necessary.

Periodic data rekeying completes the Encryption Key Rotation lifecycle. When a table’s retired encryption key is more than one year old, Snowflake automatically generates a new encryption key and re-encrypts the data previously encrypted by the old key using it. The table’s data will be decrypted with the new key going forward.

With Snowflake’s Tri-Secret Secure feature, you can restrict access to your data by using a master encryption key that you keep in the cloud provider’s key management service. To create a composite master key, Snowflake combines your key with a Snowflake-maintained key. This composite master key is then used to encrypt all data in your account. Your data can’t be decrypted if either of the composite master keys is canceled, giving you more security and control than Snowflake’s standard encryption. This explicit control over the key provides safeguards that are aligned to your business processes throughout the entire lifecycle. There is a lot of responsibility around safeguarding your key, however.

With Tri-Secret enabled, you can halt all data operations in Snowflake by disabling access to your key in the event of a data breach.

Governance

By allowing you to restrict access based on identity or role, Snowflake allows you to limit who has access to various resources like users and groups. Individual objects in the account (e.g., users, warehouses, databases, tables) can be accessed only with permission, through a hybrid model of DAC (discretionary access control) and RBAC (role-based access control). In the Snowflake model, access to secured items is permitted under permissions assigned to roles, which are in turn granted to other roles or individuals. Furthermore, each securable object has an owner who may grant access to other roles. This method of control differs from a user-based control system in which rights and privileges are assigned to individual users or groups. This object access model supports data governance implementations needed for HIPAA compliance.

Governance in Snowflake is implemented using:

  • column-level security
  • row access policies
  • object tagging

Column-level Security

Column-level security is realized through masking policies. When users submit a query that includes a masking policy, the masking policy’s conditions determine whether or not unauthenticated users see unmasked, partially masked, obfuscated, or tokenized data. The policy-driven approach allows security teams to define restrictions that limit sensitive data exposure, even for the owner of an object who has unlimited access to the underlying data. Masking policies as a schema-level item also provide choices in selecting a centralized, decentralized, or hybrid management approach.

Masking can be implemented either through dynamic data masking or external tokenization. External tokenization requires using an external function in the masking policy body to make a REST call to a third-party tokenization provider. With external tokenization, analytical value is preserved after de-identification. Because tokenization provides a distinct value for each string of characters, records may be classified based on this numerical value without disclosing sensitive information.

Snowflake provides secure views to control access to sensitive data, but secure views have administrative difficulties owing to the large number of views and derived business intelligence (BI) dashboards from each view. Masking policies solve this management challenge by avoiding an explosion of views and dashboards to manage. Masking policies support segregation of duties (SoD) through the role separation of policy administrators from object owners. Secure views lack segregation of duties (SoD).

Use masking instead of secure views to maintain Separation of Duties. Mask all sensitive data and tokenize where distinct values are analytically meaningful (patient diagnosis code versus social security number).

Row Access Policies

Row access policies implement row-level security to determine which rows are visible in the query result. Snowflake supports nested row access policies, such as a row access policy on a table and a row access policy on a view for the same table. Like column-level security, row access policies can include conditions and function to transform the data when certain conditions are met and potentially limit sensitive data exposure. This Separation of Duties at the row and column level is a powerful tool for achieving compliance.

Row access policies are used to control which rows are viewable in the query result by employing row-level security. Snowflake supported nested row-access policies. For example, a table and a view can have the same row access policy. Row access policies, like column-level security, can contain conditions and functions to alter the data when certain criteria are met, potentially limiting sensitive data exposure. This row and column level Separation of Duties is an important compliance tool.

A row access policy is a set of logic used to control which rows are visible in the context of a specific condition. The simplest implementation is to provide an attribute (e.g., member_id) and then define a role that can be filtered on that parameter. Mapping tables can be used to provide more elaborate or fine-grained restrictions. However, mapping tables may decrease performance in some instances.

Cluster by attributes used for policy filtering when possible.

Object Tagging

A tag is a schema-level item that can be linked to another Snowflake object such as tables, views, or columns. When you apply the tag to a Snowflake entity, you may assign it any string value. The tag and its string value are recorded as a key-value pair. Setting a tag and then querying it allows you to discover a wide range of database objects and columns with sensitive information.

Tags enable data stewards to track sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach. By using tags in the data discovery, information security professionals and/or data stewards can examine how it should be made available, such as whether it should be filtered using row access controls or whether it should be tokenized, fully masked, partially unmasked, or completely unmasked.

Create and implement a tagging policy BEFORE you create the column and row level policies.

Support for Private Connectivity to the Snowflake Service

Private connectivity allows you to bypass the public internet when working with Snowflake on the cloud. Snowflake does not provide private connectivity as a service, but it has partnered with Amazon Web Services, Microsoft Azure and Google Cloud Platform to support private connectivity as its implemented natively on each platform. Regardless of cloud provider, you will need to have Business Critical Edition. You will need contact both Snowflake Support and the cloud provider to initiate and manage the process. Although each cloud provider is

AWS PrivateLink is an AWS service that allows you to connect your VPCs without crossing the public Internet. Because Snowflake on AWS is based in a VPC, PrivateLink allows you to establish a highly secure network between Snowflake and other AWS VPCs in the same region, as well as being fully protected against unwanted external access. PrivateLink with private endpoints supports external functions. External functions are used, among other things, to support external tokenization for column-level security. You can also use AWS Direct Connect to connect all of your virtual and physical environments in a single, secure network if you have an on-premises environment (e.g., a non-hosted data center).

Azure Private Link allows for private connectivity to Snowflake by ensuring that access to Snowflake is via a private IP address. Only traffic from the customer virtual network (VNet) to the Snowflake VNet is allowed using the Microsoft backbone, avoiding public Internet access.

The Google Cloud Private Service Connect allows you to get private access to Snowflake by utilizing a private IP address for access. Snowflake is represented in your network (i.e., client network), but the data travels in one direction along the Google networking backbone from your VPC to Snowflake VPC.

Private connectivity should be considered as part of your advanced security profile. All three major cloud providers support private connectivity at some level but Amazon Web Services currently has a somewhat more comprehensive offering.

Conclusion

Snowflake is quickly becoming the data warehouse of choice for healthcare providers looking to ensure HIPAA compliance. Its cloud-based design makes it ideal for big data analytics, and its security features provide a safe environment for storing PHI. Snowflake’s governance features also make it well-suited for healthcare organizations. By selectively granting access to individual objects (e.g., users, warehouses, databases, tables), Snowflake enables organizations to precisely control who can view and work with PHI. To implement HIPAA compliance, healthcare providers will want Snowflake’s masking features and external tokenization support.

If you’re ready to move to the next level of your data-driven enterprise journey in the heavily regulated healthcare space, contact Juliet.Silver@perficient.com with Healthcare or Bill.Busch@perficient.com with Data Solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

David Callaghan, Solutions Architect

As a solutions architect with Perficient, I bring twenty years of development experience and I'm currently hands-on with Hadoop/Spark, blockchain and cloud, coding in Java, Scala and Go. I'm certified in and work extensively with Hadoop, Cassandra, Spark, AWS, MongoDB and Pentaho. Most recently, I've been bringing integrated blockchain (particularly Hyperledger and Ethereum) and big data solutions to the cloud with an emphasis on integrating Modern Data produces such as HBase, Cassandra and Neo4J as the off-blockchain repository.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram