Onboarding to Real-world Data Assets: Hurdles and Solutions

We are excited to kick off Plinth’s first blog series!

Unlike our success stories which are focused on specific collaborations with clients (check out ) these series will be focused on sharing our high insights at the intersection of analytics and real-world data and hopefully start new discussions in the community.

The theme for our first, Q1 2023 campaign is “Onboarding to new RWD: Hurdles and Solutions”

Why did we choose “Onboarding” as our first theme?

When we sat down as a team to discuss what our first theme should be, we started by asking ourselves “What are the most common problems we see users of RWD face that are both high in magnitude but also aren’t discussed enough?”

As we started throwing ideas on a Miro board, we quickly saw a recurring theme: Onboarding.

What do we mean by getting onboarded? Essentially, we mean setting up the knowledge and technical processes that enable teams to actually start getting business value from RWD.

One metric we like to use to define getting initial value from RWD is “Days-to-First-Insight”. We define days-to-first-insight as the number of days that pass from receiving a RWD asset to sharing a trustworthy, actionable, business-relevant insight gained from the data with key stakeholders.

“Days-to-first-insight” = Number of days from receiving a new RWD asset to sharing actionable insights with key stakeholders

In our experience both as providers and users of RWD, the shorter a team’s days-to-first-insight value is, the more likely they are to continue investing in RWD and achieve value. By contrast, the longer a team’s days-to-first-insight is, the less likely they are to continue investing in it and the more likely they are to abandon it.

In our experience, if a first-time user of RWD has a days-to-first-insight value less than 90 days, they are highly likely to embrace the asset and continue investing in it. By contrast, if the onboarding process for a new RWD asset delays a team’s day-to-first-insight to more than 90 days they are likely to abandon the asset.

Day-to-first-insight values greater than 90 days often lead to RWD abandonment.

What are the common hurdles teams face in their onboarding process that delay their days-to-first-insight from real world data assets? Below, we share our top 5.

5 Hurdles for Teams Onboarding to Real-World Data Assets

🧗🏾‍♀️1: Finding the data

If you work in a clinical research organization, you likely have a wealth of data available to you. This data can come from a variety of sources, including electronic health records, claims data, and clinical trials. However, simply having this data is not enough. To truly maximize its value, you need to make sure it is discoverable and usable for your organization's analysts and scientists.


In our experience, organizations often over-focus, and overspend on acquiring access to additional data assets before fully understanding the wealth of data already available to them.

See Also

🧗🏾‍♀️2: Understanding the data model

Understanding the data dictionary and how data are distributed and related across tables can be a daunting task. A well-structured and documented data model will enable users to quickly isolate the necessary tables and fields to support an analysis. Conversely, a poorly structured or poorly documented data model will lead to confusion and frustration.

See Also

🧗🏾‍♀️3: Detecting and understanding data quality issues (quickly)

Real-world data is messy and complex. Even the highest quality real-world data will often present data quality issues, such as substantive missingness or unexpected values, for specific use cases. Some amount of data missingness might be entirely consistent with how the data were collected and easy to account for, while others might indicate substantial data quality issues and fundamentally jeopardize the researcher’s ability to answer their desired research question (see for a commentary on characterizing real-world data relevance and quality). When RWD quality issues are not identified, understood, and addressed quickly, they can lead to countless hours of wasted resources.

See Also


🧗🏾‍♀️4: Generating analysis-ready data

Analysis-ready data (ARD) are datasets that are in a "tidy" format, with each row representing a patient, and each column representing patient-level information relevant to a specific analysis use-case. Creating ARDs involves summarizing events (such as reducing multiple-row-per-patient tables to single values), joining tables, calculating derived values (such as defining index dates and calculating derived variables such as follow-up time from index to outcomes) and filtering results, while preserving data provenance and metadata.

See Also


🧗🏾‍♀️5: Capturing metadata and data provenance

Every analytic report generated from RWD should contain key metadata, such as data cut-off and capture dates, and data provenance. This includes information about when and how data were processed or filtered from the raw data to the final analysis data. When metadata and data provenance are not captured and surfaced correctly, users may face seemingly conflicting results due to differences in how the data were processed.


Solutions to these Hurdles

Over the next couple of weeks, we will be revealing a few of our favorite tools and techniques for solving these hurdles. We hope that providing users with RWD “starter kits” will give them a much better chance to quickly onboard onto new data sets, and reduce their time-to-first insight from months to days.

Solution 1: Tools to discover existing RWD assets

Solution 2: Entity Relationship Diagrams

Solution 3: Interactive Data Dictionaries

Solution 4: Mock Data

Solution 5: Use-case templates and analysis recipes

Solution 6: Functional code and packages

Other Blog Posts