Solution 1: Tools to discover existing RWD assets

Solution 1: Tools to discover existing RWD assets

This is part of an ongoing blog series on onboarding to real-world data. See
Q1 2023 Plinth Blog Series | Onboarding to RWD
for an overview!

Help your teams discover new RWD assets

Screenshot of ckan (from

If you work in a clinical research organization, you likely have a wealth of data available to you. This data can come from a variety of sources, including electronic health records, claims data, and clinical trials. However, simply having this data is not enough. To truly maximize its value, you need to make sure it is discoverable and usable for your organization's analysts and scientists.

Data that isn’t easily discovered will never be used

In our experience, organizations often over-focus, and overspend on acquiring access to additional data assets before fully understanding the wealth of data already available to them.

Create a Centralized Data Catalogue to maximize discoverability

The first step in leveraging your pre-existing clinical data assets is to make sure they are discoverable. This means that analysts and scientists should be able to easily find the appropriate data they need, without spending hours searching through different databases and systems.

One way to maximize discoverability is to create a centralized data catalog. This catalog should include information about all of the data assets available within your organization, including information about the data's source, format, and any relevant metadata.

By having a centralized catalog, analysts and scientists can quickly and easily find the data they need, without having to navigate through multiple systems and databases.

There are several open-source solutions available for generating centralized data catalogues. Three popular options are as follows:

For a video introduction to CKAN, check out the video below lead by Steven De Costa (, Co-Steward of CKAN.

For more tips on how to create and manage a centralized data catalog, check out for an introduction to using Apache Atlas.


By maximizing discoverability and minimizing onboarding effort, you can effectively leverage your pre-existing clinical data assets. This not only helps you get more value from your data, but it also makes it easier for analysts and scientists to do their work, ultimately leading to better insights and outcomes.

Want more?

Go to

to see the head post for our Q2 2023 Blog campaign!