Data Cleaning In Six Steps
The first step before starting a data cleaning project is to first look at the big picture. Ask yourself: What are your goals and expectations?
To achieve those goals youve set, next, you must plan a data cleanup strategy. A great guideline is to focus on your top metrics. Some questions to ask:
- What is your highest metric looking to achieve?
- What is your companys overall goal and what is each member looking to achieve from it?
A good way to start is to get the key stakeholders together and brainstorm.
Here are some best practices when it comes to create a data cleaning process:
Introduce Temporary Process Fixes
The existing supplier management process creates data issues that are targeted by the prepared data cleansing activities. Digital transformation will possibly bring tighter data governance, preventing the appearance of data issues, but it will only come into effect with go-live. However, on approaching the deadline with almost 100 per cent vendor data quality requires a temperate patch to the legacy processes for stopping the leaking tap.
Introducing in such temperate fixes requires an analysis of the existing process in order to understand precisely how incorrect data enters the system or is being corrupted or duplicated during some BAU process. It is not uncommon for the supplier evaluation and onboarding process to emerge and evolve with the organisation therefore, processes are not always adequately formalised for the analysis. Datanovel can help to discover, map, and analyse the existing process, evaluate steps that should be temporarily fixed, and provide tools for minimising the impact to cycle time.
Process fix implementation for a legacy process is a change management challenge which should be avoided when data can be fixed at the ETL process level during data migration. Usually, a process fix is the only solution when the data becomes irreversibly changed, removed or added without a compliant review or verification.
An example
Although the solution from the AP is understandable, it poses several risks to the organisation:
The possible solution:
How To Implement A Data Cleansing Strategy Plan
Weve discussed what data cleaning is as well as some of the potential benefits. Are you convinced yet that you need a solid data cleaning strategy plan?
Below, we’ll walk you through the steps for developing a solid execution plan.
When youre creating a data cleaning strategy plan, its important to look at the big picture as well as your unique situation. What are your goals and expectations? What are your current struggles? How will you execute the plan?
An effective strategy will depend on your unique situation. However, let’s walk through the steps. The data cleansing strategy documentation below is a great starting point.
Also Check: Paula’s Choice Bha Cleanser
What Is Data Cleansing
Data cleansing is the process of resolving issues with duplicate data and making sure that such issues don’t occur in the first place. This includes double-checking the system’s data quality and addressing all problems: starting with data extraction from SAP and satellite systems, followed by automatic and manual checks in Microsoft Excel or other data visualization programs, and completed with system update during a business interruption.
It Greatly Improves Your Decision Making Capabilities

This one is a no brainer. In addition, its one of the biggest benefits of data cleaning.
Data that is cleaned and that has high quality can support better analytics and business intelligence. Consequently, this can ensure better decision making and execution towards objectives. This is one of the most significant benefits of a implementing a sophisticated data cleansing process.
You May Like: Cetaphil Cleanser For Dry Skin
Guide To Data Cleaning: Definition Benefits Components And How To Clean Your Data
When using data, most people agree that your insights and analysis are only as good as the data you are using. Essentially, garbage data in is garbage analysis out. Data cleaning, also referred to as data cleansing and data scrubbing, is one of the most important steps for your organization if you want to create a culture around quality data decision-making.
Why Is Clean Data Important
Business operations and decision-making are increasingly data-driven, as organizations look to use data analytics to help improve business performance and gain competitive advantages over rivals. As a result, clean data is a must for BI and data science teams, business executives, marketing managers, sales reps and operational workers. That’s particularly true in retail, financial services and other data-intensive industries, but it applies to organizations across the board, both large and small.
If data isn’t properly cleansed, customer records and other business data may not be accurate and analytics applications may provide faulty information. That can lead to flawed business decisions, misguided strategies, missed opportunities and operational problems, which ultimately may increase costs and reduce revenue and profits. IBM estimated that data quality issues cost organizations in the U.S. a total of $3.1 trillion in 2016, a figure that’s still widely cited.
You May Like: Tresemme Cleanse & Replenish Deep Cleansing Shampoo
Have Confidence In Your Data
Youve done it! Your master data has been successfully migrated to SAP S/4HANA, and you can now make a fresh start with a clean database without legacy data.
A well-planned changeover with cleansed data has several positive developments. These include, for example:
- Improved performance of SAP S/4HANA
- Optimized business processes
- Less disk space required after cleanup
- Easy administration of the database
- Lower running costs for space in the main memory
- Fewer resources for the changeover and reduced overall operating costs
Your hard work and advance planning has therefore paid off. To prevent its benefits from fizzling out, however, it is advisable to ensure that the data quality remains consistently high. Even if it seems arduous: take time to complete a regular quality check. Have duplicates crept in again? Are any data records incomplete or incorrect? This not only affects the material master data. Remember that all master data can change regularly, due to relocations or changes in company name, for instance.
The time that you spend on quality control and maintenance of your material master data is therefore well invested if you want to work with it to optimum effect.
Optimize SAP® master data
Free Whitepaper
In this white paper you will learn how to start a project to optimize your material master data in SAP and thus create a solid basis for your future business success.
Data Profiling Vs Data Cleansing Whats The Key Difference
In a data quality system, data profiling is a powerful way to analyze millions of rows of data to identify errors, missing information, and any anomalies that may affect the quality of information. By profiling data, you get to see all the underlying problems with your data that you would otherwise not be able to see.
Data cleansing is the second step after profiling. Once you identify the flaws within your data, you can take the steps necessary to clean the flaws. For instance, in the profiling phase, you discover that more than 100 of your records have phone numbers that are missing country codes. You can then write a rule within your DQM platform to insert country codes in all phone numbers missing them.
The key difference between the two processes is simple one check for errors and the other lets you clean up errors.
Data profiling and data cleansing arent new concepts. However, they have largely been limited to manual processes within data management systems. For instance, data profiling has always been done by IT and data experts using a combination of formulas and codes to identify basic-level errors. The mere profiling process would take weeks to accomplish and even then, critical errors would be missed. Data cleansing was another nightmare. It could take months to clean up a database, including removing duplicates . While these methods may have worked for simple data structures, its next to impossible to apply the same methods on modern data formats.
Also Check: Renew Life Cleanse More Side Effects
Get Rid Of Unwanted Observations
The first stage in any data cleaning process is to remove the observations you dont want. This includes irrelevant observations, i.e. those that dont fit the problem youre looking to solve. For instance, if we were running an analysis on vegetarian eating habits, we could remove any meat-related observations from our data set. This step of the process also involves removing duplicate data. Duplicate data commonly occurs when you combine multiple datasets, scrape data online, or receive it from third-party sources.
Remove Duplicate Or Irrelevant Observations
Remove unwanted observations from your dataset, including duplicate observations or irrelevant observations. Duplicate observations will happen most often during data collection. When you combine data sets from multiple places, scrape data, or receive data from clients or multiple departments, there are opportunities to create duplicate data. De-duplication is one of the largest areas to be considered in this process. Irrelevant observations are when you notice observations that do not fit into the specific problem you are trying to analyze. For example, if you want to analyze data regarding millennial customers, but your dataset includes older generations, you might remove those irrelevant observations. This can make analysis more efficient and minimize distraction from your primary targetas well as creating a more manageable and more performant dataset.
Also Check: Cleanse Your Body In 3 Days
How Do You Work With The Sap Data Migration Cockpit
The SAP S/4HANA Migration Cockpit is a browser based automated migration tool embedded and delivered with implementation of the S/4HANA system.
The tool comes preconfigured with content and mapping for migration objects regarding all major operating areas, such as: product, customer, cost center, profit center, GL and so on. Each of these migration objects has a corresponding, predefined template that allows for source data to be automatically mapped to target structures.
The migration cockpit allows for the download of a predefined excel sheet which can be filled in using legacy data and then uploaded to the S/4HANA system.
Some limitations of this tool include no built-in data cleansing or data extraction and only preset templates are able to be used.
Key Benefits Of Data Cleaning

As weve covered, data analysis requires effectively cleaned data to produce accurate and trustworthy insights. But clean data has a range of other benefits, too:
Key to data cleaning is the concept of data quality. Data quality measures the objective and subjective suitability of any dataset for its intended purpose. There are a number of characteristics that affect the quality of data including accuracy, completeness, consistency, timeliness, validity, and uniqueness. You can .
Recommended Reading: Anthony Glycolic Facial Cleanser 32 Oz
Reevaluate Your Processes And Techniques
Expect to periodically reevaluate your data cleansing processes that you run in 2021. As you acquire other businesses, add new data systems, and redesign your services, your data needs will change.
You will want to repeat steps one through four to get your existing data cleaned to fit your new goals while ensuring the best data cleansing processes remain unchanged. At this time, update your strategy and get your data governance involved.
Is There A Real Problem With My Data
Simple validation of the supplier master data is an excellent starting point for checking the data quality.
In order to undertake such assessment, Datanovel offers a ‘magic button’ a self-serve Excel tool that takes your vendor file and validates the data, counting a percentage of lines ready for migration. This means that this data will fit into the new system because of its correct format and structure. .
However, validation will not indicate every issue . Therefore, whenever the tool shows an overall readiness rate below 95 per cent, it is recommended that a complete health assessment of the vendor file is conducted by verifying all essential information with official databases. Datanovel is here to help you with such a health check, and you pay no fee until you request another report .
Recommended Reading: Make Face Cleanser At Home
Data Hygiene How Often To Do It
The data hygiene of our clients ‘and potential clients’ business databases is not a topic that we can leave alone. No good manager should assume in advance that employees of various departments have never made and will not make mistakes when entering new data or that everyone will adhere to uniform recording standards. Error is a human thing, so it simply has to be time for data cleansing in the enterprise. Either it should be performed by a properly trained employee , or we should outsource this task to an external company specializing in this subject, preferably one that has an ISO / IEC 27001 information security certificate.
How often should data hygiene be carried out in the company? Well, it depends on the size of the base. Medium and large enterprises with a large number of records should repeat data cleaning every 3-6 months. For smaller companies, it is enough to do data hygiene about once a year.
Type Conversion And Syntax Errors
Once youve tackled other inconsistencies, the content of your spreadsheet or dataset might look good to go. However, you need to check that everything is in order behind the scenes, too. Type conversion refers to the categories of data that you have in your dataset. A simple example is that numbers are numerical data, whereas currency uses a currency value. You should ensure that numbers are appropriately stored as numerical data, text as text input, dates as objects, and so on. In case you missed any part of step two, you should also remove syntax errors/white space .
Don’t Miss: 5 Day Juice Cleanse Before And After
Data Cleansing Step : Filling Missing Data Vs Erasing Incomplete Data
The next step in database hygiene is preventing the possession of incomplete data. Anyone who works with data at least a little knows well that the information, in addition to being reliable and up-to-date, should also be complete.Incomplete data contaminates the database, lowering its business quality.As an example, let’s take the database of B2B contractors addresses, which are saved in CRM in the following format: voivodship, commune, postal code, city and street.
Let’s assume that in our system we want to have only complete company addresses, i.e. complete data sets . We can approach this topic in two ways:
Of course, we decide to clean the database the second way.In order to facilitate this task and perform it fully professionally, it is necessary to define some repetitive and exhaustive rules that will apply to this data set in turn. They take the following form:
After applying the above set of rules, our cleaned database of company addresses looks like this:
What Kind Of Data Errors Does Data Scrubbing Fix
Data cleansing addresses a range of errors and issues in data sets, including inaccurate, invalid, incompatible and corrupt data. Some of those problems are caused by human error during the data entry process, while others result from the use of different data structures, formats and terminology in separate systems throughout an organization.
The types of issues that are commonly fixed as part of data cleansing projects include the following:
Also Check: No 7 Cleanser For Dry Skin
Data Cleansing Step : Cleaning Up Duplicates
After standardizing the data format, the next step in data cleaning is to check whether our database has some duplicates that could not be detected earlier due to a different save format.After conducting such an analysis, we discover that in our original database it was possible to find two records with the same tax number: 7540335340 and 754 033 53 40.Our table, after removing duplicates from it, looks as follows:
The above example is limited to finding duplicates by values in one column. In practice, however, some data defines a unique record with more data arranged in different columns. For example, you can search for duplicated people by first name and last name, and in this case use two separate columns – one for the first name and the other for the last name.
Build A Data Map For The Target Supplier Profile And Create An Instruction

The system’s standard supplier profile is not a data map . The typical supplier profile in the ERP/S2C/P2P system provides hundreds of fields in which to store any type of supplier information with the primary purpose of enabling flexibility for fitting in any kind of processes in any sector. In practice, only a small number of fields is typically needed to run a specific business successfully. The designer of the target data model ought to understand both the company and the existing practices, and how these will improve after the transformation.
The target data map must explicitly indicate what field is for which data element and what fields are not used. Some fields should be used for the supplier’s information, whereas others should be used internally during supplier onboarding and evaluation processes. These include risk level evaluation, approval process, pay priority, and tax classification and others. Moreover, such business information should be specified, although this cleansing is usually performed with input from various business functions, designed logic, and with standard ETL tools. It is unnecessary to contact the supplier and enter business data manually into the system therefore, cleansing of business information is beyond the scope of this article.
The target data map must be compatible with both the legacy system’s supplier profile and the supplier profile target system. Where field naming differs, the connection between the two must be specified.
Don’t Miss: Non Foaming Cleanser For Dry Skin
Making Sap Data Migration A Breeze: 10 Concepts To Ensure Success
- auritasmarketing
SAP ERP is the leader in business applications. This software contains all necessary functions that a business needs to thrive in a digital world. If you want to switch your system to SAP, youll first have to undergo a data migration.
SAP data migration isnt for the faint of heart theprocess typically lasts eight monthsand there is a range of methodologies that ensure your data is transfer effortlessly. This is why many customers have questions about SAP data migration services.
Here are 10 commonly asked SAP data migration questions.