The OpenEDC Assistant

The OpenEDC Assistant is a privacy-centric AI support system where every suggestion must be reviewed and approved by a human.

Leonard Greulich

Leonard Greulich

Co-Founder / CEO

Human Oversight and Privacy

1. Characteristics of a Medical AI Assistant

Medical research – and clinical trials in particular – are highly sensitive and heavily regulated areas where every data operation requires the utmost care. For example, Good Clinical Practice (GCP) requires that computer systems used in a study must be validated. Additionally, an audit trail must exist that logs every change and makes it traceable.

From a completely different perspective, AI systems today offer enormous potential to make repetitive processes more efficient and reduce the risk of human error. Anyone who has used ChatGPT or Gemini knows about the impressive capabilities of this seemingly all-knowing technology.

However, AI systems have certain characteristics that at first glance seem to make their use in clinical trials impossible. For example, they regularly hallucinate and make up answers that sound plausible but aren’t true. Additionally, potentially highly sensitive data often travels to international cloud providers where it’s processed and stored in non-transparent ways.

At OpenEDC Health, we’ve developed a medical AI assistant system that makes the benefits of AI accessible to medical research while addressing the risks as much as possible – traceable and transparent.

1.1. Human Oversight

Although hallucinations in Large Language Models (LLMs) probably can never be completely prevented, it’s possible to integrate human control mechanisms. In OpenEDC, the AI assistant can never make changes to forms or clinical data on its own. Instead, it makes suggestions that must be reviewed and accepted by humans before they make it into the database.

We’ve designed this manual control mechanism to be as simple as possible by having the system visually compare the current state of the database with the new suggestion. This makes changes immediately visible and allows for intuitive comparison.

When a suggestion is accepted, it doesn’t bypass the audit trail – all changes are logged just like before. This means that accepted changes can be reversed at a later time and are always traceable. This also ensures that humans remain responsible.

1.2. Privacy-Oriented

Another major problem with today’s AI systems is non-transparent privacy. Information is often transferred to international cloud providers where it’s processed and stored in ways that are hard to understand. This is particularly dangerous with highly sensitive patient data from medical research and isn’t an option for us.

Instead, we work together with Telekom. In addition to all our databases, the AI models are therefore also operated in local data centers by a certified European provider. Data is never logged, stored, or used for training.

We also offer the option to integrate local AI models on your own computer. This allows structured, pseudonymized information to be extracted from medical letters, for example, without the source data leaving the current computer or network. It’s also possible for clinics and companies to use their own, potentially specialized LLMs.

Create Forms, Extract Data, Proofread Documents

2. The Capabilities of the OpenEDC Assistant

Even if an AI system is traceable and secure, it also needs to be helpful to be used. An AI assistant isn’t an end in itself (as one might often assume), but only provides value when it makes repetitive and time-consuming processes more efficient or improves data quality.

Currently, we’re focusing on three use cases that we’d like to briefly introduce below. We’ve created videos for the first two use cases, with the first video linked at the beginning of this article and the second video following below.

2.1. Creating and Editing Forms

There are many Electronic Data Capture (EDC) systems and each works differently. We’ve learned that a major pain point is creating high-quality electronic Case Report Forms (eCRFs) or forms. We’ve therefore extended our form editor with AI functionality to simplify the repetitive creation and editing of forms.

Below we’ll briefly introduce four typical use cases.

2.1.1. Creating Forms

One of the primary use cases, as shown in the video above, is creating new eCRFs. Through the multimodal capabilities of the OpenEDC Assistant, even PDFs or scanned documents can be directly converted into digital forms in the CDISC-ODM standard. This can simplify the conversion of self-created or standardized forms and save time. Every draft from the assistant can be fully customized and extended after transfer.

2.1.2. Extending Forms

Extending existing eCRFs can also be simplified with the assistant. If a new section needs to be added, for example, it’s enough to briefly describe the fields in natural language and the assistant will create an initial draft of the new section.

2.1.3. Assigning Variable Names

Anyone who has ever created comprehensive eCRFs for clinical trials knows how repetitive assigning variable names can be. Often, short names like “body_height” for the body height field are assigned based on predefined patterns. The OpenEDC Assistant can be very helpful here too: The entire form can first be created quickly and easily without variable names. Then you can hand over the task of creating variable names to the assistant, which can process the entire form at once while following specified patterns.

2.1.4. Adding Translations

Also shown in the video above is creating translations of a form into additional languages. The assistant offers native support to translate an existing form into another language in just a few seconds. The translated form can of course be edited afterward to incorporate small corrections or preferred terminology.

2.2. Extracting Clinical Data from Unstructured Input

Historically, transferring unstructured or semi-structured data into a database is one of the core tasks of EDC systems. Source data like lab reports or manually filled forms were digitized, stored in a structured way, and then statistically analyzed. Today, data is increasingly collected digitally, though manual transfer is still widespread.

The OpenEDC Assistant can speed up such tasks. PDFs, scanned documents, or free text can be imported and automatically structured. A human still needs to review and accept all extracted data. Below we’ll briefly introduce typical use cases.

2.2.1. Structured Data from Free Text

The simplest use case is extracting structured data from free text. The existing text can be passed to the assistant, which searches it for the information being sought and then suggests its findings for review. After human review, these can then be accepted and saved.

2.2.2. Structured Data from Documents and Scans

As already shown in the video, files can be added instead of free text. Supported file formats are PDFs and images. This allows multi-page documents or scans to be converted into structured data with just a few clicks.

2.2.3. Structured Data from Spoken Language

In the future, it will also be possible to analyze spoken language with the OpenEDC Assistant. The use cases here are also diverse: For example, unstructured data can be read aloud and extracted, medical conversations can be recorded and documented, or ward rounds can be clearly logged.

2.3. Proofreading and Translating Documents

OpenEDC offers many functions to create, edit, version, and share documents for informed consent forms, study protocols, or Standard Operating Procedures (SOPs). The assistant can help proofread these documents, find suggestions for clearer wording, and even translate entire documents.

Local or Connected

Available Models

We place the highest value on data privacy and security. Therefore, you can choose which Large Language Model (LLM) should process your data. It is also important that the assistant must first be enabled in the settings by the project owner for each study before it can be used in that project.

OpenEDC Cloud

We offer all OpenEDC customers free access to a model in the Telekom Cloud. This is not only hosted locally but also by a local provider. Data is never logged, stored, or used for training. This model can be activated directly for each project in the settings.

Your Own Model

As an alternative to our model in the Telekom Cloud, you can also configure your own model if you have one. This can be helpful, for example, if you’ve trained your own model for specific medical domains.

Gemini Nano

You can also use models that run completely offline on your own computer. For this, we’ve integrated Gemini Nano, among others, which is available in Google Chrome. This model is also multimodal and can therefore process PDFs and images in addition to text. This can be useful, for example, when reading PDFs that contain identifying patient data. These PDFs never leave your computer, because only the successfully extracted pseudonymized data is saved.

We've Made a Start

A Look into the Future

We’ve made a start and integrated an expandable, multimodal assistant into OpenEDC – one that works in a guided and secure way. We’ve introduced three already available use cases above.

In the future, we’ll focus on additional capabilities of the assistant. Topics we’d like to explore further include:

  • Written Questions in the Help Module: OpenEDC’s help module already offers many articles in written form. It could be helpful here if users could ask their questions in natural language and then receive individual answers.
  • Automatic Queries: The assistant could regularly review new data in the background and automatically create queries when it notices something unusual. These would then be marked as Candidate Queries created by AI.
  • Voice or Audio Input: We’d like to expand the multimodal capabilities (text, PDF, and image) to include spoken language in the future. We’ve already briefly outlined this use case above.

Thank you for reading this long article. We’re very excited to hear about your experiences. If you have questions or would like to try our system for free, please contact us anytime.