Success: RWD cleaning and harmonization

Success: RWD cleaning and harmonization

Motivation

We receive thousands of files a year that need to be cleaned and harmonized into a standardized format. Today, we have to write a custom code script for each file. If we keep going on this way, we’ll 🔥 out our team and our budget.

Key Results

🖼️
No-code dashboard replaces the need for thousands of lines of custom code
🧹
Automated cleaning and harmonization system ensures highest quality results
📈
System is ready for future growth and scale

Challenge

  1. In order for new customers to be onboarded to a client, their census data must be ingested into the clients’ database.
  2. Census files are often messy, requiring extensive cleaning before they can be uploaded.
  3. The client’s data science team was on track to spend 1 FTE / year writing hundreds of custom scripts to clean files.
  4. Preparing for January 2023 with the current process would have taken 100% of the data foundation team capacity, blocking any new investments.

Deliverables

  • No-code configuration and testing dashboard. Empowers implementation team to define company specific business logic and test outputs in a UI
  • Smart cleaning package. Extensible package automates file cleaning tasks, removing the need for company-specific scripts
  • Daily CRON job to process latest census files. Deployed, scheduled ETL runs without the need for manual running or babysitting

Business Impact

  • All files for January 2023 launches leveraged the new process. 100% passed the new quality control checks.
  • One implementation team member implemented and monitored cleaning using a no-code interface.