Azure Data Factory | Copy multiple tables in Bulk with Lookup Activity & ForEach

 

Shivani Mishra - Easy Azure for Everyone

Introduction to Azure Data Factory Lookup and ForEach Actions

In this blog, we will learn how to build dynamic pipelines in Azure Data Factory using lookup and forEach actions. The lookup action allows you to retrieve a data set from a file or table within your database, and you can use this data set to determine the list of objects that you want to work with in your pipeline. The lookup action supports most of the data sources currently supported in Data Factory, so you can use various types of databases or flat file solutions. The lookup action is often paired with the forEach action, which allows you to execute a set of activities for each object in the list returned by the lookup action. This blog will walk you through a step-by-step setup of a pipeline that lists tables from a SQL database using a lookup activity, and then iterates over the results to perform a simple copy extraction.

Step-by-Step Setup

Initial Setup

Before we begin, let's quickly go over the initial setup of the services we will be using in this blog:

·         Data Factory: This is the main piece of work we will be doing.

·         Storage Account: We will be uploading all our exported files to this account.

·         SQL Server: We will be working with a SQL database that has three tables.

Now that we have our services set up, let's move on to the next steps.

Connecting to SQL Server

The first step is to connect to the SQL server. In the Azure portal, go to the Connections section and create a new connection. Select Azure SQL Database and provide the necessary details to connect to your database. Test the connectivity to ensure everything is working fine.

Setting Up the Data Set

Before setting up the data set, let's determine what data this data set will be returning. In the query editor, use a simple query to list all the tables within your database. Replace 'database_name' with the name of your database. This query will return a list of tables.

Now, in the Data Factory, create a new data set by selecting Azure SQL as the data source. Choose the linked service you created for the database and name the data set 'table list'. Leave the table name empty for now, as we will be using a dynamic query. Save the data set.

Adding the Lookup Activity

Create a new pipeline and add a lookup activity to it. Rename the activity to 'list tables'. In the settings tab, select the 'table list' data source and change the option from 'table' to 'query'. Enter the query you prepared in the previous step. Preview the data to confirm that the query returns a list of tables.

It's important to understand the difference between returning a list and returning the first row only. If you select the 'first row only' option, the lookup activity will return an object with one property called 'first row', which contains the data for the first row. If you uncheck this option, the lookup activity will return an object with two properties: 'count', which contains the number of rows returned, and 'value', which contains an array of all the rows. You can access the data using expressions or by referring to the 'count' and 'value' properties.

Conclusion

That's it for the initial setup and configuring the lookup activity. In the next steps, we will continue building the pipeline and performing the copy extraction. Stay connected for the next part of this blog.

To set up the pipeline, follow these steps:

1.    Create a lookup activity to retrieve the list of tables from the SQL database.

2.    Add an iteration activity to loop through each table.

3.    Create a connection to the Azure Blob Storage.

4.    Create a dataset for the SQL table.

5.    Create a dataset for the Blob Storage file.

6.    Add a copy activity to export the table to a CSV file in the Blob Storage.

Here are the details for each step:

Step 1: Create a Lookup Activity

Create a lookup activity to retrieve the list of tables from the SQL database. In the lookup activity, uncheck the box to preview data. Set the items property to @activity('LookupTables').output.value.

Step 2: Add an Iteration Activity

Add an iteration activity to loop through each table. Drag and drop the "ForEach" activity from the "Iteration and Conditionals" section. Connect the "ForEach" activity after the lookup activity.

Step 3: Create a Connection to Azure Blob Storage

Create a connection to the Azure Blob Storage. In the Azure portal, go to "Connections" and create a new connection for Azure Blob Storage. Test the connectivity to ensure it is set up correctly.

Step 4: Create a Dataset for the SQL Table

Create a dataset for the SQL table. In the Azure portal, go to "Datasets" and create a new dataset for Azure SQL. Use the table name and schema name as dynamic content in the connection settings.

Step 5: Create a Dataset for the Blob Storage File

Create a dataset for the Blob Storage file. In the Azure portal, go to "Datasets" and create a new dataset for Azure Blob Storage. Select the CSV format and set the file name as a parameter.

Step 6: Add a Copy Activity

Add a copy activity to export the table to a CSV file in the Blob Storage. In the pipeline, edit the "ForEach" activity and add a copy data activity. Select the SQL table as the data source and the Blob Storage file as the data sink. Use dynamic content to set the table name in the data source settings.

 

Comments

Popular posts from this blog

The fundamentals of using Azure Secret Computing to Safeguard your data

Azure SQL Database Blog