Scheduling SSIS packages to run in Azure
Migrating SSIS Packages in Azure can be a complex task. SSMS 18 comes with new features for helping us to start configuring SSIS packages to run in Azure.
SSMS 18 provides support for scheduling SSIS package, located in SSIS Catalog in Azure or File System, in Azure. There are three entries for launching the New Schedule dialog, New Schedule menu item is shown when right-clicking the SSIS package in SSIS Catalog in Azure, Schedule SSIS Package in Azure menu item under Migrate to Azure menu item under Tools menu item and “Schedule SSIS in Azure” shown when right-clicking Jobs folder under SQL Server agent of Azure SQL Database managed instance.
A new selection menu item has been added in Tools > Migrate to Azure > Configure Azure-enabled DTExec that will invoke SSIS package executions on Azure-SSIS Integration Runtime as Execute SSIS Package activities in ADF pipelines.
There is yet another option in the SSIS catalog menu – Try SSIS in Azure Data Factory.
How to avoid confusion having all those new items in SSMS 18 and where to start if you want to try to run and schedule your SSIS Packages in Azure. Let me try to get these in a better order of usage.
The main purpose of these tools is to force the Lift and Shift approach of migrating and running existing SSIS Packages in Azure. I wouldn’t say that this is the most effective approach of transferring the ETL to Azure, but it could be a good start on a road of a Modern Azure Datawarehouse Architecture. If you have already deployed SSIS packages in Azure SSIS Catalog, then SSMS 18 helps you to put them on schedule very quickly.
Running SSIS Packages in Azure requires provisioning of SSIS Runtime Engine, an Azure Data Factory instance and a SQL Database which hosts the SSIS catalog. Scheduling SSIS Packages in Azure requires creating a data flow pipeline in ADF which has a trigger defined for scheduled execution. While describing all those concepts is far above the scope of this chapter, a short description would be useful.
ADF is the main cloud-based service in Azure for performing data orchestration and operationalization of complex flows that integrate data of different sizes and from many sources, arriving at different intervals and speeds and to help to refine those data and use them for actionable insights. The ADF provides a way to create and manage data-driven workflows for orchestrating data movement and transforming data at scale.
Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
Additionally, you can publish your transformed data to data stores such as Azure SQL Data Warehouse for business intelligence (BI) applications to consume. ADF can help in organize raw data into meaningful data stores and data lakes for better business decisions.
The Integration Runtime (IR) is the computing infrastructure used by Azure Data Factory to provide data integration and data orchestration. IR is used for:
- Executing a data flow in the Azure environment
- Dispatching and monitoring transformation activities provided by different Azure data processing services such as Azure Databricks, Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more
- Copying data between different networks and environments
- Running and natively executing SSIS Packages
To run SSIS Packages in Azure we need to have them deployed in an SSIS catalog just the same way we do in on-premises SQL Server environment. The SSIS catalog implemented as SSIS Database in an on-premises SQL Server, gives a manageable environment for deployment, running, versioning packages, and for troubleshooting their execution. In Azure, the SSIS Catalog is provided as an Azure SQL Database, created automatically by the ADF when we provision SSIS Integration Runtime. Once we have SSIS Catalog we can deploy our SSIS Packages there, and we need to schedule them to execute. Scheduling is not an easy task in Azure, but ADF Pipeline helps in this task. We use pipelines to encapsulate a task like Execute SSIS Package, and we need a trigger that will enforce the running.
After getting a slight overview of the concepts and their dependencies it became clear that running and scheduling packages in Azure are not a trivial task. Let’s see an example of it and how SSMS 18 helps us getting there.
Let’s use a very simple SSIS Package deployed in our SSIS Catalog on-premises, which copies data from a text file in a BLOB container in Azure Storage and transfer those data in a table in Azure SQL Database.
We need to run the package in Azure once a week.
Assuming we have the blob container with the file stored in it and we have the Azure database (2019testdb) already hosted in Azure SQL Server (sql209testsrv.database.windows.net) shown on the figure, what we will further need to achieve our goal is:
- Azure Data Factory Instance
- ADF Integration Runtime for SSIS Packages (SSIS IR)
- SSIS Catalog in Azure to deploy the package to
- Data Pipeline and a trigger into schedule the deployed package to run once a week
SSMS 18 can help us in performing all the above tasks including the deployment of the package itself!
Let’s start this journey in SSMS 18.3.1 by using the new Try SSIS in Azure Data Factory, which we can find in the context menu of Integration Service Catalog of our on-premises SQL Server.
The command runs the Integration Runtime Creation Wizard, which promises to create almost everything we need as prerequisites for reaching our goal of lifting and shifting our package in Azure:
After completing the wizard’s steps, we expect to have ADF, SSIS IR and even SSIS Catalog created, so the only remaining task for us will be to re-deploy the package and to schedule its execution. Sound promising! Let’s continue!
We will use our already provisioned Azure SQL Database Server which hosts our target database – sql2019testsrv.database.windows.net for our target SSIS catalog server:
And indeed, after providing the list of values the Wizard comes at the end with the expected results:
A resource group has been created, probably not in a location where I would like it to be, but you can potentially move it later to another resource group. An ADF Instance has been provisioned and an IR for running SSIS Packages:
We click START on Integration Runtime Setup in Portal where we have been forwarded by the Wizard, and we have IR Up and running.
We need to be careful in keeping the IR running because the billing for the services is performed per hour.
It seems that we have everything provisioned and ready to redeploy our package and then to schedule it to run – the ADF is provisioned, the IR is up and running, and the SSIS catalog is hosted by our SQL Instance in Azure:
Now its time to get back in SSMS and to perform the remaining steps of redeploying our sample package in our new catalog in Azure and to schedule it. We connect to our SSIS Catalog in Azure by using a new connection in SSMS
Note: To be able to connect to an Azure SQL Database the IP address should be added to the server
We create a 2019Sales folder for deploying the project and we start the Integration Services Deployment Wizard by right-clicking on Projects Folder. We specify the source of our Project, which is our on-premises server and catalog location:
Then we are provided with a choice where to deploy our project, and we check the SSIS in Azure Data Factory option since we have everything required for performing this, but note that even if we haven’t had it, we are still provided with a button to run Integration Runtime Wizard from here:
In the Select Destination step we need to provide our SSIS Catalog server name and the folder for the deployment.
The wizard will deploy the project and the package it contains into the new environment. Now its time for performing the final steps – scheduling the package to run once a week. Even for this final step we still use SSMS. We start from just right-clicking on the deployed package and choose Schedule:
Scheduling a package in Azure is not a trivial task since we don’t have SQL Server Agent. To perform a scheduled automated run of a package we need to get the package in an ADF pipeline and to define a trigger. Again, this is something the new version of SSMS can do for us by providing yet another Wizard covering the task.
Once we sign in the Wizard will be able to identify the runtime engine and the package, so we only need to define the schedule in the corresponding tab
Once we click OK the Wizard will create the ADF Pipeline, the execute SSIS Package activity within that pipeline and the trigger for running the pipeline on schedule. After completing the actions, we will be provided with the results description and the option to go directly to the ADF monitor:
We can see our Pipeline in ADF with an Execute SSIS Package activity deployed. We have a trigger for package runs which have been already configured according to the schedule we set in SSMS
We can edit the trigger to change the schedule in the Triggers menu
And we can perform a manual run of the pipeline just to test the package
It seems a lot has been included in the new version of SSMS to easy the process of migration to Azure. All the Wizards save us a lot of time in performing all those small steps to bring the configuration its desired end.