(also data review)
Data profiling definition
Data profiling is reviewing and analyzing data from various sources to understand its structure, content, and quality. Data profiling evaluates the condition and quality of data by assessing its completeness, accuracy, timeliness, consistency, and accessibility. Performing data profiling helps companies gain a better understanding of the data they own and make better, data-informed decisions.
How data profiling works
- Data profiling involves data collection, analysis, cleaning, enrichment, and documentation.
- It can be performed manually using spreadsheets or automatically using specially-designed software tools.
- The first step is collecting the data to be analyzed from various sources (e.g., databases, files, and data warehouses).
- The next step is analyzing the data (e.g., its values, patterns, and relationships).
- If any incorrect, inconsistent, or incomplete data is identified, it is removed or replaced.
- The data is enriched by adding additional information or correcting incomplete information.
- The analysis results are documented in a data profiling report, which includes information on the data’s structure, content, and quality.
- The results are validated to ensure they are complete and accurate.