Skip to main content

Microsoft

Log Pipeline Executions to File in Azure Data Factory

Person designing a website for Accessibility

Why Azure Data Factory?

Nowadays, the data our customer’s applications generate is increasingly exponential, especially if data is coming from several different products. Organizations have data of several types located in the cloud and on-premises, in structured, unstructured, and semi-structured formats all arriving at different time-frequency and speeds. It will be a critical task to analyze and store all this data. Azure Data Factory (ADF) is a cloud-based data integration service that exactly solves such complex scenarios.

ADF first stores data with the help of a data lake storage. Once it is stored, data is analyzed, then with the help of pipelines, ADF transforms the data to be organized for publishing. Once data is published, we can visualize the data with the help of applications like Power BI, Tableau.

For a more in-depth look at ADF and its basic functions, please check out my colleague’s blog post here.

Log Pipeline Executions to File in Azure Data Factory

Data integration solutions are complex with many moving parts and one of the major things that our customers want is to make sure they are able to monitor their data integration workflows or pipelines. However, Data Factory Monitor only stores data of pipeline run for 45 days with very limited information. But with log pipeline executions, we can store custom log data in Azure Data Lake Storage (ADLS) for a longer time with the help of query.

How to create CSV log file in Azure Data Lake Store.

For demonstration purposes, I have already created a pipeline of copy tables activity which will copy data from one folder to another in a container of ADLS.

Now we will see how the copy data activity will generate custom logs in the .csv file. We will begin with adding copy data activity next to copy-tables in canvas.

Copy Tables

For the source dataset, as we need to define query in the source of copy data activity, I will select dataset as on-prem SQL Server by selecting linked service of on-prem SQL server.

New Dataset

After creating the source dataset, I will add a query to the source. This query will contain a pipeline system variable and other metrics that I can retrieve on each individual task.

Copy Data

Below is the current list of pipeline system variables.

@pipeline().DataFactory – Name of the data factory

@pipeline().Pipeline – Name of the pipeline

@pipeline().RunId – ID of the pipeline run

@pipeline().TriggerType – Type of the trigger that invoked the pipeline (Manual, Scheduled)

@pipeline().TriggerName – Name of the trigger that invokes the pipeline

@pipeline().TriggerTime – Time when the trigger invoked the pipeline.

Dynamic Content

Query –

SELECT ‘@{pipeline().DataFactory}’ as DataFactory_Name,

‘@{pipeline().Pipeline}’ as Pipeline_Name,

‘@{activity(‘copytables’).output.executionDetails[0].source.type}’ as Source_Type,

‘@{activity(‘copytables’).output.executionDetails[0].sink.type}’ as Sink_Type,

‘@{activity(‘copytables’).output.executionDetails[0].status}’ as Execution_Status,

‘@{activity(‘copytables’).output.rowsRead}’ as RowsRead,

‘@{activity(‘copytables’).output.rowsCopied}’ as RowsWritten

‘@{activity(‘copytables’).output.copyDuration}’ as CopyDurationInSecs,

‘@{activity(‘copytables’).output.executionDetails[0].start}’ as CopyActivity_Start_Time,

‘@{utcnow()}’ as CopyActivity_End_Time,

‘@{pipeline().RunId}’ as RunId,

‘@{pipeline().TriggerType}’ as TriggerType,

‘@{pipeline().TriggerName}’ as TriggerName,

‘@{pipeline().TriggerTime}’ as TriggerTime

The above query will write the events information to a .CSV file. For that, we must define a sink dataset which will create a directory in ADLS container and CSV log file. Below snapshot shows you that I have selected the dataset type of Azure Data Lake Store Gen 2 and file format as .CSV.

Data Lake

I have used below the parameterized path that will make sure that the log file is generated in the correct folder structure with the proper file name.

Dynamic Content in Filename –

@concat(formatDateTime(convertTimeZone(utcnow(),’UTC’,’Central Standard Time’),’dd’),’/’,item().filename,’_’,formatDateTime(convertTimeZone(utcnow(),’UTC’,’Central Standard Time’),’dd-MM-yy_hh-mm-ss’),’_log’)

With this we are done with the configuration of log pipeline, after I save the pipeline, I need to publish it and run my pipeline. Now I can see the log file generated in ADLS container.

Authentication Method

After downloading the file, we can see that as per our query all the output is populated in the.CSV file.

Output .csv File

So, in this way we have configured the log pipeline with the help copy activity in ADF. The main advantage of configuring the log pipeline is we can get custom event’s output as data in .CSV file which will help a customer check the Execution status, Rows read, and rows written, etc.

Why Perficient?

Our more than 20 years of data experience across industries gives us a deep understanding of current data trends. As an award-winning, Gold-Certified Microsoft partner and one of just a handful of National Solution Providers, we are a recognized cloud expert with years of experience helping enterprises make the most out of the Microsoft cloud.

Ready to bring your data together to take advantage of advanced analytics with Azure? Contact our team about this solution.

Thoughts on “Log Pipeline Executions to File in Azure Data Factory”

  1. An immensely informative article!!👌
    I did have knowledge of Log Pipeline Executions to File in Azure Data Factory but never before did I understand this much about it. Thanks for putting together all the important information to be known about the system as a whole.
    Will surely keep this in mind!!
    Many, many congratulations Rohit, for this gem of an informative blog-post!! 😊👍💐
    Looking forward to reading many more such articles from you in future.
    All the best!

  2. Suraj Singh Thakur

    Awesome!! Blog and Very Good demonstration for Log Pipeline in ADF. Surely this will going to help me and other as well in their tasks. Thanks Rohit for this informative blog!!

  3. Really happy to see ur blog Rohit. It will help many ones, good going rohit keep it up. Best wishes☺

  4. Very well done👍 good job… Sure it will help me a lot in my upcoming project… Keep it up !! All the best!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Rohit Dhande, Senior Technical Consultant

Rohit Dhande is a Senior Technical Consultant at Perficient, with over 6 years of experience in Cloud and on-prem solutions. He has hands-on in DevOps, ETL tools, data warehousing, application configurations, source code management, patch management, building, automating, managing, and release of code in different environments and deploying to Servers. He is Certified ScrumMaster® and has global certification in Red Hat and Microsoft Azure. Rohit is a fun-loving, creative problem solver who enjoys working with the team to create exceptional outcomes.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram