Download as pdf or txt
Download as pdf or txt
You are on page 1of 3


EMRBots are experimental artificially generated electronic medical records (EMRs).[1] The aim of EMRBots is to allow non-
commercial entities (such as universities) to use the artificial patient repositories to practice statistical and machine-learning
algorithms. A letter published in Communications of the ACM emphasizes the importance of using synthetic medical data, "...
EMRBots can generate a synthetic patient population of any size, including demographics, admissions, comorbidities, and laboratory
values. A synthetic patient has no confidentiality restrictions and thus can be used by anyone to practice machine learning

Academic use
Criticism by other creators of artificial repositories

EMRs contain sensitive personal information. For example, they may include details about infectious diseases, such as human
immunodeficiency virus (HIV), or they may contain information about a mental illness. They may also contain other sensitive
information such as medical details related to fertility treatments. Because EMRs are subject to confidentiality requirements,
accessing and analyzing EMR databases is a privilege given to only a small number of individuals. Individuals who work at
institutions that do not have access to EMR systems have no opportunity to gain hands-on experience with this valuable resource.
Simulated medical databases are currently available; however, they are difficult to configure and are limited in their resemblance to
real clinical databases. Generating highly accessible repositories of artificial patient EMRs while relying only minimally on real
patient data is expected to serve as a valuable resource to a broader audience of medical personnel, including those who reside in
underdeveloped countries.

Academic use
In April 2018 Bioinformatics (journal) published a study that relied on EMRBots data to create a new R package denoted as
"comoRbidity".[3] Co-authors on the study included scientists from Universitat Pompeu Fabra and Harvard University. Later on this
month researchers from Carnegie Mellon University used EMRBots data at the CMU HackAuton hackathon to create a prediction

The repositories have been used to accelerate research, e.g., researchers from Michigan State University, IBM Research, and Cornell
University published a study in the Knowledge Discovery and Data Mining (KDD) conference.[5][6] Their study describes a novel
neural network that performs better than the widely used long short-term memory neural network developed by Sepp Hochreiter and
Jürgen Schmidhuber in 1997.[7] In May 2018 scientists from IBM Research and Cornell University have used the repositories to test
a new deep architecture denoted as Health-ATM. To demonstrate superiority over traditional neural networks, they applied their
architecture to a congestive heart failure use case.[8]

Additional use includes The University of Chicago,[9] University of California Merced,[10][11] and The University of Tampere,
Finland.[12][13] Additional resources include.[14][15][16][17][18][19][20][21][22][23][24][25]
Criticism by other creators of artificial repositories
"[EMRBots] are ... pregenerated datasets of synthetic EHR with an insufficient explanation of how the datasets were generated.
These datasets exhibit several inconsistencies between health problems, age, and gender."[26][27] An additional criticism is described
in a thesis ("Realism in Synthetic Data Generation") granted byMassey University.[28]

1. Kartoun, Uri (2016). "A methodology to generate virtual patient repositories".
arXiv:1608.00570 (
1608.00570) .
2. CACM Staff (2018). "A leap from artificial to intelligence"(
Communications of the ACM. 60 (1): 10–11.
3. Gutiérrez-Sacristán, Alba; Bravo, Àlex; Giannoula, Alexia; Mayer, Miguel A.; Sanz, Ferran; Furlong, Laura I. (2018).
"comoRbidity: an R package for the systematic analysis of disease comorbidities" (
matics/advance-article/doi/10.1093/bioinformatics/bty315/4979545) . Bioinformatics.
doi:10.1093/bioinformatics/bty315( .
4. Gebert, Theresa; Jiang, Shuli; Sheng, Jiaxian (2018). "Characterizing Allegheny County opioid overdoses with an
interactive data explorer and synthetic prediction tool".arXiv:1804.08830 (
[stat.AP (].
5. "Patient Subtyping via Time-Aware LSTM Networks" (
time-aware-lstm-networks). Retrieved 24 May 2018.
6. "SIGKDD" ( Retrieved 24 May 2018.
7. Hochreiter, Sepp; Schmidhuber, Jürgen (1997). "Long short-term memory".Neural Comput. 9 (8): 1735–1780.
9. "Statistical Modeling of Clinical Data"(
Methods.pdf) (PDF). Retrieved 24 May 2018.
10. "A dynamic cloud computing platform for eHealth systems - IEEE Conference Publication"
11. "Publication - UC Merced Cloud Lab"(
12. "Fairness in Group Recommendations in the Health Domain"(
ecture08_fairgrouprecs.pdf)(PDF). Retrieved 24 May 2018.
13. "MLARAPP" ( Retrieved 24 May 2018.
14. "illidanlab/T-LSTM" ( GitHub. Retrieved 24 May 2018.
15. "FairGRecs: Fair Group Recommendations by Exploiting Personal Health Information"
( Retrieved
24 May 2018.
16. "Teaching data science fundamentals throughrealistic synthetic clinical cardiovascular data".bioRxiv 232611 (http
s:// .
17. "PRIIME: A generic framework for interactive personalized interesting pattern discovery - IEEE Conference
Publication" (
. Retrieved 24 May 2018.
(PDF). Retrieved
24 May 2018.
19. "How to get unique data sets on health system"(
stem). Retrieved 24 May 2018.
20. "Exploratory Statistical Analysis of EMR data Or Where Angels Fear to tread…"(
. 17 October 2015.
21. "Robot" (
. 31 December 2015. Retrieved
24 May 2018.
22. "Obstacle Avoider Robotic Vehicle" (
voider%20Robotic%20Vehicle.pdf?sequence=1) (PDF). Retrieved 24 May 2018.
23. Nithya, M.; Sheela, T. (4 January 2018). "Predictive delimiter for multiple sensitive attribute publishing"( Cluster Computing: 1–8. doi:10.1007/s10586-017-1612-y(
26. Walonoski, J; et al. (2017). "Synthea: An approach, method, and software mechanism for generating synthetic
patients and the synthetic electronic health care record".J Am Med Inform Assoc. 25 (3): 230.
doi:10.1093/jamia/ocx079 ( PMID 29025144 (
27. "Corrigendum" ( . Journal of the
American Medical Informatics Association. 2017. doi:10.1093/jamia/ocx147 (
28. "Realism in Synthetic Data Generation"(
(PDF). Retrieved 24 May 2018.

Retrieved from "


This page was last edited on 7 September 2018, at 18:31(UTC).

Text is available under theCreative Commons Attribution-ShareAlike License ; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of theWikimedia
Foundation, Inc., a non-profit organization.

You might also like