AI and Clinical Data processing

Doug Bain

Chief Technology Officer at KCR

Published May 17, 2024

(Edit: Mark Gáncsos pointed out that chatGPT defaulted to the Veeva Vault API rather than the more specific API that we have with Veeva Vault CDMS. I have appended at the end of the article the output following a couple of additional prompts to get to the right syntax for CDMS. This is not a hallucination. I just didn't give it sufficient context to answer my question correctly. )

I spend quite a bit of time thinking and reading about Artificial intelligence and all of the progress that has been made over the last 2 years in particular.

I don't often blog on the topic, but I think the following example helps demonstrate the power of AI.

So - here are a series of Questions & Prompts that I made to ChatGPT 4 this morning over breakfast. I start by asking what the impact of AI might be on edit checking data in a clinical trial? Responses are in Italics.

Is it likely that we will see artificial intelligence replacing edit checking in electronic data capture systems?

It is highly likely that artificial intelligence (AI) will increasingly replace traditional edit checking in electronic data capture (EDC) systems. Here are several reasons supporting this trend:

Efficiency and Accuracy:
Advanced Pattern Recognition:
Reduction of Manual Errors:
Real-Time Monitoring:
Cost Savings:
Scalability:
Compliance and Standardization:

Examples in Practice:

IBM Watson Health and other similar platforms are already leveraging AI to enhance EDC systems by incorporating advanced data validation and anomaly detection capabilities.
Medidata Solutions integrates AI to optimize data capture and validation processes, providing real-time insights and ensuring high data integrity.

Conclusion: The adoption of AI in EDC systems for edit checking is not just a possibility but an emerging reality. The benefits of AI in terms of efficiency, accuracy, and cost savings make it a compelling choice for organizations looking to enhance their data management practices. As AI technologies continue to evolve, their role in replacing traditional edit checking is expected to grow, leading to more reliable and efficient EDC systems.

So, next I wanted to see if ChatGPT could help me achieve this. Note that I spoke to chatGPT rather than typed it, hence the spelling mistakes... that is dealt with;

Would you be able to take all of the data and queries from VivaVault EDC, process the data for anomalies, raise queries, and then, using the VivaVault API, push the data and queries back into their product?

That is quite a big ask. Veeva is a big company, but compared with larger industry solutions such as Accounting and ERP, it is lesser know... Lets see how ChatGPT got on;

Yes, it is feasible to take data and queries from VivaVault EDC, process the data for anomalies, raise queries, and then push the processed data and queries back into VivaVault using their API. Here’s how this can be accomplished:

Steps Involved

Extract Data from VivaVault EDC:
Process Data for Anomalies:
Raise Queries:
Push Processed Data and Queries Back into VivaVault:

Technical Implementation

1. Extract Data

2. Process Data for Anomalies

3. Raise Queries

4. Push Data and Queries Back

Considerations

API Documentation: Ensure you have access to the most recent VivaVault API documentation to understand the endpoints, authentication, and data formats required.
Data Privacy and Security: Handle all patient data in compliance with relevant regulations such as HIPAA or GDPR, ensuring secure data transmission and storage.
Scalability: Implement the solution in a way that it can handle large volumes of data

Ok, Wow... ChatGPT has just given us Python code that would perform the function. However, I know that this python code it not as accurate or detailed as it could be. Now... I wonder if I point chatGPT to the developer API's document that Veeva publish, will it do better?

First of all though, that spelling mistake is annoying me... ;

Could you correct your response by changing VEVA to VEVA spelled V-E-E-V-A?

Yes, it is feasible to take data and queries from Veeva Vault EDC, process the data for anomalies, raise queries, and then push the processed data and queries back into Veeva Vault using their API. Here’s how this can be accomplished..

I will skip the rest as this is largely the same as before, although it now has better context as he understands that I am referring to a clinical trials software company and what EDC really means.

Next, lets improve;

Can you improve the accuracy of the Python code by looking up the Veeva Vault API on the Veeva website?

To enhance the accuracy and functionality of your Python code for interacting with Veeva Vault's EDC (Electronic Data Capture) system, you can leverage the detailed API documentation provided by Veeva. Here's an updated version of your Python code, incorporating information from Veeva's REST API guidelines.

First, you'll need to authenticate with the Veeva Vault API to obtain a session ID, which is required for subsequent API calls. Then, you can retrieve data, process it for anomalies, and push queries back to Veeva Vault.

Here's an updated Python example:

This code example does the following:

Authentication: Authenticates to Veeva Vault using the /auth endpoint to get a session ID (Veeva Vault Developer Portal).
Data Retrieval: Retrieves data from Veeva Vault using the /vobjects/query endpoint, leveraging Veeva's Vault Query Language (VQL) (Veeva Vault Developer Portal) (Veeva Vault Developer Portal).
Anomaly Detection: Processes the retrieved data to detect anomalies.
Raising Queries: Uses the /vobjects/queries endpoint to push detected anomalies back into Veeva Vault as queries (Veeva Vault Developer Portal).

Ensure to replace placeholders (e.g., VAULT_URL, USERNAME, PASSWORD, object_name__v) with your actual Veeva Vault details. For more comprehensive examples and detailed API usage, refer to the Veeva Vault Developer Documentation (Veeva Vault Developer Portal) (Veeva Vault Developer Portal) (Veeva Vault Developer Portal).

Conclusion

Now the code above is valid Python code. Given the correct credentials it will login to Veeva and make API calls. It does not actually perform the steps to detect anomalies, - instead it inserts a placeholder for this - but, this code was based on 3 sentences of requirements. I could easily provide further prompts to ask it to read and process existing queries to avoid duplicates, or, call a data review service that detects inconsistent data.

My point though is the speed at which chatGPT is able to develop solutions with even very very limited prompting in a few minutes of me chatting via my phone... Imagine if we spend a few days working on this...

Addendum: Here is an extension to the original set of prompts asking chatGPT to use the Veeva Vault CDMS API instead.

To correct the Python example for retrieving data using the Veeva Vault CDMS API, we will use the specific endpoints provided by the Veeva Vault CDMS API for pulling clinical data.

Here’s an updated version of the Python code:

Updated Python Code for Veeva Vault CDMS

1. Authenticate and Get a Session ID

2. Retrieve Data from Veeva Vault CDMS

3. Process Data for Anomalies

4. Raise Queries Using the Veeva CDMS API

Considerations

API Documentation: Refer to the Veeva Vault CDMS API Documentation for detailed information on endpoints and parameters.
Data Privacy and Security: Ensure compliance with relevant data protection regulations such as HIPAA or GDPR.
Scalability: Optimize the solution to handle large volumes of data efficiently.

By following these steps and using the specific CDMS API endpoints, you can build a robust solution for processing data and managing queries in Veeva Vault CDMS.

Simon Wilson

Life sciences technology leader | Expert in clinical development

2mo

Good to speak to you last week Doug. Havng discussed you work with ChatGPT 4 I finally read he article - it's even more mind blowing than I thought. Looks like AI may finally deliver on the promise to accelerate clinical development while reducing costs. And, as we were saying, that'll free up human resources to work onthe more interesting, value recognition elements of the process - sounds like a win: win!

Elisabeth Frank

Director, Product Expert @Veeva #digitalHealtcare

2mo

Hi Doug, this is an interesting one. I play around a lot with ChatGPT4(o). German speakers: www.ainauten.com (not commercially interested, just great contents). The creativeness of ChatGPT amazes me. I ask every question for advice on how to do things e.g. in Veeva CDMS based on CDMS help (as was during the AI/ML training) to not miss out on potential opportunities it is giving us. Many answers are fascinatingly correct, too many are, however, still pure fantasy. ChatGPT simply would never say, 'Sorry, I don't know'... except for the lottery results next week, unfortunately. When it comes to EDC data. Sure, you can extract any data you want with ChatGPT as Python co-pilot. Then you have the data from say really hundreds of studies with decent data. And maybe you do get that data into some kind of common structure across studies, therapeutic areas, study phases. Then you feed it into ChatGPT to hunt the golden qurey... and then you get either amazing results or pure fantasy. I truly hope that AI/ML will help with improving our understanding of study data, querying, signal detection, RBQM.

Adersh (Sadasivan) Nair

Project & People Manager in CDM | Automation Enthusiastic | Masters in Clinical Research

2mo

Very interesting Use Case. Now I have few questions, 1. Process Data for Anomalies : Can the Python refer a document where common anomaly data scenarios are specified (e.g. Edit Check Specifications, Listing Specifications, Data Review Plan, Protocol Deviation List etc.), and recognize anomalies as per it? 2. Pushing Anomalies back to EDC as Queries: Before raising queries, can it verify criteria such as which folder/form/field query has to be raised, if a similar query was raised in the past, and if so, what was site’s response etc. 3. Are our existing guidelines for electronic systems (e.g. 21CFR P11) good enough to cover the use of AI technologies in clinical trials, or does it call for new set of guidelines?

3 Reactions

Sverre Bengtsson

2mo

Great article, thanks! Really interesting to see how AI is pushing the boundaries. But why does your robot need a chair 😂?

See more comments

To view or add a comment, sign in

See all

AI and Clinical Data processing

Doug Bain

Chief Technology Officer at KCR

Steps Involved

Technical Implementation

1. Extract Data

2. Process Data for Anomalies

3. Raise Queries

4. Push Data and Queries Back

Considerations

Conclusion

Addendum: Here is an extension to the original set of prompts asking chatGPT to use the Veeva Vault CDMS API instead.

Updated Python Code for Veeva Vault CDMS

1. Authenticate and Get a Session ID

2. Retrieve Data from Veeva Vault CDMS

3. Process Data for Anomalies

4. Raise Queries Using the Veeva CDMS API

Considerations

More articles by this author

Explore topics

Steps Involved

Technical Implementation

1. Extract Data

2. Process Data for Anomalies

3. Raise Queries

4. Push Data and Queries Back

Considerations

Conclusion

Addendum: Here is an extension to the original set of prompts asking chatGPT to use the Veeva Vault CDMS API instead.

Updated Python Code for Veeva Vault CDMS

1. Authenticate and Get a Session ID

2. Retrieve Data from Veeva Vault CDMS

3. Process Data for Anomalies

4. Raise Queries Using the Veeva CDMS API

Considerations

CSV 2024 - Definition of Verification

Jun 27, 2024

Computer Systems Validation in 2024

May 7, 2024

EDC - 2008 versus 2024

Mar 21, 2024

Digital Health Technologies - FDA final guidance thoughts

Jan 22, 2024

Clinical Trial Data Submissions - moving from a low lowest common denominator

Dec 13, 2023

Clinical Trial Platform Assessment Measure - Revisited

Nov 20, 2023

The Technology behind DCT 2.0

Nov 10, 2023

Investigator Sites, ClinicalTrials.gov and the failure of eClinical Technologies

Nov 1, 2023

EDC - the secret powers of Status Roll-up

Oct 30, 2023

FDA 'Task Log' and how technology must help in DCT

Aug 4, 2023

Explore topics