Azure Data Factory | Introduction to ETL in Azure

Shivani Mishra - Easy Azure for Everyone

Data Factory: A Brief Introduction

In this blog post, we will see Azure Data Factory, a service that allows you to move data between different sources. Whether you are working in the cloud or on-premises, data movement is an essential part of any project. Let's explore how Data Factory works and how you can use it in your own projects.



How does Data Factory work?

To understand how Data Factory works, let's use an analogy. Imagine you want to move a cabinet from a shop to your home. In this scenario, you need certain information:

·         A key and address to locate the shop and access the cabinet

·         Information about the cabinet, such as disassembly instructions or packaging details

·         Delivery details, including what to do with the package once you receive it

·         A key and address of your home to deliver the cabinet

In Data Factory, the shop can be replaced with a blob storage and the cabinet with a customer CSV file. Here's how the process looks:

·         Link Service: This defines the connectivity to the blob storage and includes authorization details.

·         Data Set: This defines the structure of the customer CSV file.

·         Copy Activity: This is the job in Data Factory that moves the data from the source (blob storage) to the destination (SQL server).

·         Integration Run Time: This is the workforce of Data Factory that handles all the jobs.

·         Pipeline: A pipeline consists of one or more activities (such as copy activity) and defines the flow of data.

Importing CSV file from Blob to Azure SQL

Now, let's try importing a CSV file from a blob storage into Azure SQL using Data Factory. Here are the steps:

1.    Create a Data Factory in the Azure portal.

2.    Create an Azure SQL database.

3.    Create a storage account.

Once everything is set up, you can configure the necessary link services, data sets, and copy activities in Data Factory to move the data from the blob storage to Azure SQL.

Using Blob Storage to Store Files

I will create a new container called "input" in the blob storage and set it to private for unauthorized access. I will upload a file called "cars.csv" to the "input" container. The file contains data in CSV format, with headers in the first row and data separated by commas. Next, I will prepare an SQL query to create a table for importing the cars data. Once the table is created, I will go to the Data Factory and start working on the pipeline.

Connecting Blob Storage

In order to connect to the blob storage, I need to create a link service. I will go to the connections tab and create a new link service called "input blob" for the blob storage. I will select the "auto resolve" integration runtime and choose the account key authentication for simplicity. I will select my subscription and test the connection. Once the connection is successful, I will finish creating the link service.

Defining a Data Set

Next, I need to define a data set for the blob storage. I will go to the data sets tab and create a new data set called "cars.csv" for the csv file. I will select the "cars.csv" file from the "input" container and ensure that the checkbox for headers is selected. I will leave the import schema as default and continue. A new data set will be created with options to preview the data and make adjustments if needed.

Connecting to SQL

In order to connect to SQL, I need to create another link service. I will go to the connections tab and create a new link service called "output sql" for the SQL database. I will use the same integration runtime and select the connection string for my subscription server and database. I will choose the username and password authentication type and provide the necessary credentials. Once the test connection is successful, I can proceed.

Comments

Popular posts from this blog

The fundamentals of using Azure Secret Computing to Safeguard your data

Azure Data Factory | Copy multiple tables in Bulk with Lookup Activity & ForEach

Azure SQL Database Blog