Load data from sql server to snowflake
Home » How to Load Data from SQL Server to Snowflake

How to Load Data from SQL Server to Snowflake

This post will take you through the many intricacies of extracting data from Microsoft SQL Server and loading it into Snowflake. You can either choose the more tiring manual process or use an optimized tool to complete the heavy task in just a few clicks. 

Microsoft SQL Server

Microsoft SQL Server is a relational database management system. Applications are supported on a single machine either across the web or on a local area network on a single machine. It integrates seamlessly into the Microsoft ecosystem and supports Microsoft’s .NET framework out of the box.

Snowflake

Snowflake is a relatively recent data warehousing solution based on the cloud. It has taken the business world by storm as it resolves many issues that are inherent in traditional data warehouse solutions. This is the primary reason why organizations prefer to load data from SQL Server to Snowflake. 

Some of the benefits offered by Snowflake are –

  • High performance – Compute and storage capabilities are separated and users can scale up and down without any interruptions, paying only for the quantum of resources used.   
  • Supports multiple cloud vendors – Snowflake architecture supports a range of cloud vendors and new ones are being added to the list all the time. Hence users get more choices as the same tools can be used to analyze data of different cloud vendors.  
  • Loading various types of data – Both structured and unstructured data can be optimized and natively loaded to Snowflake. It supports JSON, Avro, XML, AND Parquet data. 
  • Access to multiple workloads – The same data is made available to multiple workgroups and for multiple workloads simultaneously. This feature ensures that there is no performance degradation and users do not hit any contention roadblocks. 
  • All-round support – Snowflake takes care of various aspects, from auto-scaling up or down in computing to encoding of columns. It will automatically cluster data and no indexes need to be defined. However, for very large tables, clustering keys used by Snowflake to co-locate table data may be used by the user.     

How to load data from SQL Server to Snowflake? 

There are several steps to be followed for carrying out the process.

# Extracting data from SQL Server – The most preferred method to get data by those working with databases is to use queries for extraction. You can sort, filter, and limit the data needed to be retrieved through Select statements. For exporting bulk data or entire databases in formats like text, CSV, or SQL queries, use Microsoft SQL Server Management Studio. 

# Processing data from Snowflake – Before you can load data from SQL Server to Snowflake, it is necessary to prepare and process the data. This again depends largely on the data structures. You have to verify the data types supported by Snowflake to ensure that your data fits neatly into it. However, you do not have to specify a schema beforehand when loading JSON or XML data into Snowflake. 

# Staging data files – Data files have to be uploaded first to a temporary location before inserting MySQL data into a Snowflake table. This activity is called Staging and has two components.

  • Internal Stage – It is created exclusively with respective SQL statements by the user for getting a greater degree of flexibility while loading data from SQL Server. Data loading is made easier by assigning file format and other options to named stages. 
  • External stage – Presently, Amazon S3 and Microsoft Azure are the external staging locations supported by Snowflake. External stages can be created with these locations to load data from SQL Server to Snowflake with the data being uploaded using the respective cloud vendor interfaces. 

# Loading data into Snowflake – A module of Snowflake’s documentation is the Data Loading Overview. It will guide you through the process of loading the data. For smaller databases, apply Snowflake’s data loading wizard. But it has its limitations in organizations where the quantum of data is very large. In such cases – 

  • Use the PUT command to stage files.
  • Use the COPY INTO table command for loading processed data into an intended table.
  • Copy from your local drive or from Amazon S3 where the data is lodged in an external stage.

An advantage when you load data from SQL Server to Snowflake is that you can make a virtual warehouse that can greatly facilitate the insertion activity. 

Maintain data freshness

Loading data into Snowflake is not the end of the story, what happens when you have to add new data. Loading the entire SQL Server database again is tedious and can cause all kinds of latency. A wiser approach would be to build a script that would recognize new data in the source database and use an auto-incrementing field as a key to continually update incremental data.

Initially, this whole procedure might look overwhelming but with the right skills, you can easily go through the process.

Zaraki Kenpachi