The importance of metadata
Metadata is best defined as data that characterizes data. When you query a database, it returns a specific piece of information. Metadata provides the who, what, where, when, why and how of that information. When companies have a properly engineered process to create, store and manage metadata, it benefits all focus areas of the business.
Security officers are now able to log and alert on who accesses the data. Marketing teams can determine where and how a piece of data was generated. Purchasing departments will be able to identify when an event occurred. Metadata enriches whatever it is associated with and allows the business to gain insights that otherwise might have been missed.
Data security has been the number-one concern in recent years due to data breaches and extreme concerns around privacy. Many businesses around the world are implementing self-imposed guidelines and regulations to help protect their data. In addition, government organizations are imposing their own regulations, such as the European Union’s General Data Protection Regulation (GDPR), to regulate how organizations protect personal data.
The GDPR was created to protect the privacy rights of EU citizens, but its regulations and penalties apply to every organization that processes or stores the data of those citizens—in other words, practically every organization in the world. It is also serving as a template for similar regulations that are starting to appear in the United States. For example, the California Consumer Privacy Act (CCPA) shares many of the same objectives and requirements as GDPR.
So how does metadata fit in? According to Article 24 of the GDPR, the controller (data owner) must be able to “demonstrate that processing is performed in accordance with this Regulation.”1
The easiest way to validate compliance is by using metadata. With metadata, organizations can document all database activities including—but not limited to—query history, login history, securable data objects, and data transfers. The ability to track the movement of all data through its entire life cycle provides protection to the business and their customers.
Introduction to the SNOWFLAKE database
Snowflake provides every customer with an object metadata database. The data is provided via Snowflake Sharing in a database called SNOWFLAKE. Every Snowflake account will have access to the shared database and two schemas.
The ACCOUNT_USAGE schema contains object metadata and usage metrics, which provides great insight into the account. The INFORMATION_SCHEMA utilizes identical data objects and naming conventions but has some slight differences.
- ACCOUNT_USAGE includes data for dropped objects
- ACCOUNT_USAGE has data latency (45 minutes to three hours) while INFORMATION_SCHEMA does not.
- Data retention is finite; one year for ACCOUNT_USAGE and seven days to six months for INFORMATION_SCHEMA.
The purpose of this document is to show how Snowflake customers can retain this valuable data for longer periods.
NOTE: Compute and storage costs will be incurred for the procedures in this article.
Building the target database and warehouse
By following these instructions, you will be able to extend the life of your Snowflake metadata, enabling you to better track the usage of your data and demonstrate good security practices