Skip to main content This browser is no longer supported. Show
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support. Integration runtime in Azure Data Factory
In this articleAPPLIES TO: Azure Data Factory Azure Synapse AnalyticsThe Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory and Azure Synapse pipelines to provide the following data integration capabilities across different network environments:
In Data Factory and Synapse pipelines, an activity defines the action to be performed. A linked service defines a target data store or a compute service. An integration runtime provides the bridge between activities and linked services. It's referenced by the linked service or activity, and provides the compute environment where the activity is either run directly or dispatched. This allows the activity to be performed in the closest possible region to the target data store or compute service to maximize performance while also allowing flexibility to meet security and compliance requirements. Integration runtimes can be created in the Azure Data Factory and Azure Synapse UI via the management hub directly, as well as from any activities, datasets, or data flows that reference them. Integration runtime typesData Factory offers three types of Integration Runtime (IR), and you should choose the type that best serves your data integration capabilities and network environment requirements. The three types of IR are:
Note Synapse pipelines currently only support Azure or self-hosted integration runtimes. The following table describes the capabilities and network support for each of the integration runtime types:
Note Outbound controls vary by service for Azure IR. In Synapse, workspaces have options to limit outbound traffic from the managed virtual network when utilizing Azure IR. In Data Factory, all ports are opened for outbound communications when utilizing Azure IR. Azure-SSIS IR can be integrated with your vNET to provide outbound communications controls. Azure integration runtimeAn Azure integration runtime can:
Azure IR network environmentAzure Integration Runtime supports connecting to data stores and computes services with public accessible endpoints. Enabling Managed Virtual Network, Azure Integration Runtime supports connecting to data stores using private link service in private network environment. In Synapse, workspaces have options to limit outbound traffic from the IR managed virtual network. In Data Factory, all ports are opened for outbound communications. The Azure-SSIS IR can be integrated with your vNET to provide outbound communications controls. Azure IR compute resource and scalingAzure integration runtime provides a fully managed, serverless compute in Azure. You don't have to worry about infrastructure provision, software installation, patching, or capacity scaling. In addition, you only pay for the duration of the actual utilization. Azure integration runtime provides the native compute to move data between cloud data stores in a secure, reliable, and high-performance manner. You can set how many data integration units to use on the copy activity, and the compute size of the Azure IR is elastically scaled up accordingly without requiring you to explicitly adjust the size of the Azure Integration Runtime. Activity dispatch is a lightweight operation to route the activity to the target compute service, so there isn't need to scale up the compute size for this scenario. For information about creating and configuring an Azure IR, see How to create and configure Azure Integration Runtime. Note Azure Integration runtime has properties related to Data Flow runtime, which defines the underlying compute infrastructure that would be used to run the data flows. Self-hosted integration runtimeA self-hosted IR is capable of:
Note Use self-hosted integration runtime to support data stores that require bring-your-own driver, such as SAP Hana, MySQL, etc. For more information, see supported data stores. Note The Java Runtime Environment (JRE) is a dependency of the Self Hosted IR. Please make sure you have the JRE installed on the same host. Self-hosted IR network environmentIf you want to perform data integration securely in a private network environment that doesn't have a direct line-of-sight from the public cloud environment, you can install a self-hosted IR in your on-premises environment behind a firewall, or inside a virtual private network. The self-hosted integration runtime only makes outbound HTTP-based connections to the internet. Self-hosted IR compute resource and scalingInstall a Self-hosted
IR on an on-premises machine or a virtual machine inside a private network. Currently, the self-hosted IR is only supported on a Windows operating system. Azure-SSIS integration runtimeTo lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages. Azure-SSIS IR network environmentThe Azure-SSIS IR can be provisioned in either public network or private network. On-premises data access is supported by joining Azure-SSIS IR to a virtual network that is connected to your on-premises network. Azure-SSIS IR compute resource and scalingThe Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. You can bring your own Azure SQL Database or SQL Managed Instance for the catalog of SSIS projects/packages (SSISDB). You can scale up the power of the compute by specifying node size and scale it out by specifying the number of nodes in the cluster. You can manage the cost of running your Azure-SSIS Integration Runtime by stopping and starting it as your requirements demand. For more information, see How to create and configure the Azure-SSIS IR. Once created, you can deploy and manage your existing SSIS packages with little to no change using familiar tools such as SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS), just like using SSIS on-premises. For more information about the Azure-SSIS runtime, see the following articles:
Integration runtime locationRelationship between factory location and IR locationWhen you create an instance of Data Factory or a Synapse Workspace, you need to specify its location. The metadata for the instance is stored here, and triggering of the pipeline is initiated from here. Metadata is only stored in the chosen region and will not be stored in other regions. Meanwhile, a pipeline can access data stores and compute services in other Azure regions to move data between data stores or process data using compute services. This behavior is realized through the globally available IR to ensure data compliance, efficiency, and reduced network egress costs. The IR Location defines the location of its back-end compute, and where the data movement, activity dispatching, and SSIS package execution are performed. The IR location can be different from the location of the Data Factory it belongs to. Azure IR locationYou can set the location region of an Azure IR, in which case the activity execution or dispatch will happen in the selected region. The default is to auto-resolve the Azure IR in the public network. With this option:
If you enable Managed Virtual Network with auto-resolve for the Azure IR, the IR in the Data Factory or Synapse Workspace region is used. You can monitor which IR location takes effect during activity execution in pipeline activity monitoring view in the Data Factory Studio or Synapse Studio, or in the activity monitoring payload. Self-hosted IR locationThe self-hosted IR is logically registered to the Data Factory or Synapse Workspace and the compute used to support its functionalities is provided by you. Therefore there is no explicit location property for self-hosted IR. When used to perform data movement, the self-hosted IR extracts data from the source and writes into the destination. Azure-SSIS IR locationNote Azure-SSIS integration runtimes are not currently supported in Synapse pipelines. Selecting the right location for your Azure-SSIS IR is essential to achieve high performance in your extract-transform-load (ETL) workflows.
The following diagram shows the location settings for Data Factory and its integration runtimes:
Determining which IR to useIf an activity associates with more than one type of integration runtime, it will resolve to one of them. The self-hosted integration runtime takes precedence over the Azure integration runtime in Azure Data Factory or Synapse Workspace instances using a managed virtual network. And the latter takes precedence over the global Azure integration runtime. For example, one copy activity is used to copy data from source to sink. The global Azure integration runtime is associated with the linked service to source and an Azure integration runtime in an Azure Data Factory managed virtual network associates with the linked service for sink, then the result is that both source and sink linked services use the Azure integration runtime in the Azure Data Factory managed virtual network. But if a self-hosted integration runtime associates the linked service for source, then both source and sink linked service use the self-hosted integration runtime. Copy activityThe Copy activity requires both source and sink linked services to define the direction of data flow. The following logic is used to determine which integration runtime instance is used to perform the copy:
Lookup and GetMetadata activityThe Lookup and GetMetadata activity is executed on the integration runtime associated to the data store linked service. External transformation activityEach external transformation activity that utilizes an external compute engine has a target compute linked service, which points to an integration runtime. This IR instance determines the location from where that external hand-coded transformation activity is dispatched. Data Flow activityData Flow activities are executed on their associated Azure integration runtime. The Spark compute utilized by Data Flows are determined by the data flow properties in your Azure IR, and are fully managed by the service. Integration Runtime in CI/CDIntegration runtimes don't change often and are similar across all stages in your CI/CD. Data Factory requires you to have the same name and type of integration runtime across all stages of CI/CD. If you want to share integration runtimes across all stages, consider using a dedicated factory just to contain the shared integration runtimes. You can then use this shared factory in all of your environments as a linked integration runtime type. Next stepsSee the following articles:
FeedbackSubmit and view feedback for What is the purpose of selfA self-directed team is a set of individuals in an organization who incorporate various talents and abilities to work toward a common goal or objective without the standard administrative oversight.
What is selfSelf-directed work teams are groups of employees who combine their talents to work without the influence of traditional manager-based supervision. They work towards company goals just as teams run by a manager do.
What are the elements of selfWhat are the characteristics of self-managed teams?. They're self-driven. These teams collaborate on one central, common goal every day. ... . They trust each other. Self-managed teams are all-for-one and one-for-all. ... . Employee-driven decisions are the norm. ... . They have high self-awareness. ... . They have strong communication.. Which of the following is a characteristics of selfAnother characteristic of self-directed teams is a collective responsibility. Each team member is accountable for a specific interest area.
|