Data Profiling Task and Viewer

  The task provides extracting, transforming, and loading data.

  • Analyze the source data more
  • Understand the source data better

Prevent data quality problems before they introduced into the data warehouse.

 Data quality is important to every business. The build analytical and business intelligence systems on top of their transactional systems. the reliability of key performance indicators. The data mining predictions depend completely on the validity of the data. the importance of valid data for business decision-making is increasing. the challenge of making sure of this data’s validity is also increasing. Data is streaming into the systems and sources, and large numbers of users.

 Metrics for data quality difficult specific to the domain or the application. The common approach to defining data quality is data profiling.

A data profile is a collection of total statistics about data 

  • The number of rows in the Customer table.
  • The values in the State column.
  • The number of null or missing values in the Zip.
  • The distribution of values in the City column.
  • The strength of the functional dependency of the State column on the Zip column. that is, the state should always be the same for a given zip value.
      that a data profile provides information to cut the quality issues from the source data.  

    Integration Services and Data Profiling

    In Integration Services and data profiling process consist of

    Step 1: Setting up the Data Profiling Task

      The Data Profiling task is use to configure the profiles compute. then run the package contains the Data Profiling task to compute the profiles. The task saves output in XML format to a file or a package variable.

    Step 2: Reviewing the Profiles that the Data Profiling Task Computes

      To view the data profiles that the Data Profiling task computes. send the output to a file, and use the Data Profile Viewer. This viewer is a stand-alone utility. that displays the profile output in both summary and detail format.

    Addition of Conditional Logic Data Profiling Workflow

     The Data Profiling task does not have built-in features. The use conditional logic to connect downstream tasks based on the profile output. But, add this logic, with a small amount of programming, in a Script task. the Script task perform an XPath query against the output file of the Data Profiling task. The query determines percentage of null values in a particular column. If the percentage exceeds interrupt the package and resolve the problem.  

