Azure Data Factory | Copy multiple tables in Bulk with Lookup Activity & ForEach
Shivani Mishra - Easy Azure for Everyone
Introduction to Azure Data Factory Lookup and ForEach Actions
In this blog,
we will learn how to build dynamic pipelines in Azure Data Factory using lookup
and forEach actions. The lookup action allows you to retrieve a data set from a
file or table within your database, and you can use this data set to determine
the list of objects that you want to work with in your pipeline. The lookup
action supports most of the data sources currently supported in Data Factory,
so you can use various types of databases or flat file solutions. The lookup
action is often paired with the forEach action, which allows you to execute a
set of activities for each object in the list returned by the lookup action.
This blog will walk you through a step-by-step setup of a pipeline that
lists tables from a SQL database using a lookup activity, and then iterates
over the results to perform a simple copy extraction.
Step-by-Step Setup
Initial Setup
Before we
begin, let's quickly go over the initial setup of the services we will be using
in this blog:
·
Data
Factory: This is the main piece of work we will be doing.
·
Storage
Account: We will be uploading all our exported files to this account.
·
SQL Server:
We will be working with a SQL database that has three tables.
Now that we
have our services set up, let's move on to the next steps.
Connecting to SQL Server
The first
step is to connect to the SQL server. In the Azure portal, go to the
Connections section and create a new connection. Select Azure SQL Database and
provide the necessary details to connect to your database. Test the
connectivity to ensure everything is working fine.
Setting Up the Data Set
Before
setting up the data set, let's determine what data this data set will be
returning. In the query editor, use a simple query to list all the tables
within your database. Replace 'database_name' with the name of your database.
This query will return a list of tables.
Now, in the
Data Factory, create a new data set by selecting Azure SQL as the data source.
Choose the linked service you created for the database and name the data set
'table list'. Leave the table name empty for now, as we will be using a dynamic
query. Save the data set.
Adding the Lookup Activity
Create a
new pipeline and add a lookup activity to it. Rename the activity to 'list
tables'. In the settings tab, select the 'table list' data source and change
the option from 'table' to 'query'. Enter the query you prepared in the
previous step. Preview the data to confirm that the query returns a list of
tables.
It's
important to understand the difference between returning a list and returning
the first row only. If you select the 'first row only' option, the lookup
activity will return an object with one property called 'first row', which
contains the data for the first row. If you uncheck this option, the lookup
activity will return an object with two properties: 'count', which contains the
number of rows returned, and 'value', which contains an array of all the rows.
You can access the data using expressions or by referring to the 'count' and
'value' properties.
Conclusion
That's it
for the initial setup and configuring the lookup activity. In the next steps,
we will continue building the pipeline and performing the copy extraction. Stay
connected for the next part of this blog.
To set up
the pipeline, follow these steps:
1. Create a lookup activity to
retrieve the list of tables from the SQL database.
2. Add an iteration activity to
loop through each table.
3. Create a connection to the Azure
Blob Storage.
4. Create a dataset for the SQL
table.
5. Create a dataset for the Blob
Storage file.
6. Add a copy activity to export
the table to a CSV file in the Blob Storage.
Here are
the details for each step:
Step 1: Create a Lookup Activity
Create a
lookup activity to retrieve the list of tables from the SQL database. In the
lookup activity, uncheck the box to preview data. Set the items property
to @activity('LookupTables').output.value.
Step 2: Add an Iteration Activity
Add an
iteration activity to loop through each table. Drag and drop the
"ForEach" activity from the "Iteration and Conditionals"
section. Connect the "ForEach" activity after the lookup activity.
Step 3: Create a Connection to Azure Blob Storage
Create a
connection to the Azure Blob Storage. In the Azure portal, go to
"Connections" and create a new connection for Azure Blob Storage.
Test the connectivity to ensure it is set up correctly.
Step 4: Create a Dataset for the SQL Table
Create a
dataset for the SQL table. In the Azure portal, go to "Datasets" and
create a new dataset for Azure SQL. Use the table name and schema name as
dynamic content in the connection settings.
Step 5: Create a Dataset for the Blob Storage File
Create a
dataset for the Blob Storage file. In the Azure portal, go to
"Datasets" and create a new dataset for Azure Blob Storage. Select
the CSV format and set the file name as a parameter.
Step 6: Add a Copy Activity
Add a copy
activity to export the table to a CSV file in the Blob Storage. In the
pipeline, edit the "ForEach" activity and add a copy data activity.
Select the SQL table as the data source and the Blob Storage file as the data
sink. Use dynamic content to set the table name in the data source settings.
Comments
Post a Comment