Design and Implementation of an Informatics Infrastructure for Standardized Data Acquisition, Transfer, Storage, and Export in Psychiatric Clinical Routine: Feasibility Study

Background: Empirically driven personalized diagnostic applications and treatment stratification is widely perceived as a major hallmark in psychiatry. However, databased personalized decision making requires standardized data acquisition and data access, which are currently absent in psychiatric clinical routine. Objective: Here, we describe the informatics infrastructure implemented at the psychiatric Münster University Hospital, which allows standardized acquisition, transfer, storage, and export of clinical data for future real-time predictive modelling in psychiatric routine. Methods: We designed and implemented a technical architecture that includes an extension of the electronic health record (EHR) via scalable standardized data collection and data transfer between EHRs and research databases, thus allowing the pooling of EHRs and research data in a unified database and technical solutions for the visual presentation of collected data and analyses results in the EHR. The Single-source Metadata ARchitecture Transformation (SMA:T) was used as the software architecture. SMA:T is an extension of the EHR system and uses module-driven engineering to generate standardized applications and interfaces. The operational data model was used as the standard. Standardized data were


Scientific Background
Psychiatric disorders represent one of the leading causes of disability worldwide. In the challenge to provide advanced treatment and prevention strategies for psychiatric disorders, previous research has focused on better understanding of the neurobiological basis of affective disorders [1]. However, the translation of such findings into clinical application remains an unresolved problem up to now. For this reason, the focus of psychiatric research has shifted from sole neurobiological characterization at the group level toward the application of multivariate machine learning methods trained on multimodal data for individualized prediction of clinical outcomes [2,3]. Multivariate machine learning applications have been proven to be innovative and powerful tools in translational psychiatric research. In this regard, the successful utilization of machine learning algorithms for individualized predictions of treatment response [4][5][6], depression severity [7], disease risk [8], differential diagnosis [9,10], and relapse risk [11] has yielded the first promising results. However, up to now, several obstacles have prevented the successful transfer of individual predictive modeling to clinical routine application, as discussed in recent reviews [12][13][14][15]. In this regard, the gap between homogeneous well-characterized samples acquired in experimental studies [16] and heterogeneous unvalidated data from day-to-day clinical routine has proven to be a major obstacle in the translation of predictive models to clinical application. Hence, ecologically valid predictive models would require access to standardized real world data collected at the point of care [17]. Importantly, large-scale studies reporting the successful application of multivariate models trained on data from electronic health records (EHRs), including features such as diagnosis and procedures, laboratory parameters, and medications for the prediction of suicide risk or weight gain following antidepressant treatment have demonstrated the capacity and generalizability of predictive models trained on real-world data [18][19][20]. Further extension of EHRs via standardized collection of predictive variables such as known risk factors might further enhance the potential of this novel data entity for predictive analytics in psychiatry [21,22]. Standardized electronic collection of patient-reported outcomes that has previously been shown to improve clinical outcomes such as survival in patients with cancer represents another possibility to enrich EHR data. Similarly, combining data from EHRs with research data might provide new opportunities for the discovery and validation of psychiatric endophenotypes as demonstrated via recent validation of a polygenic risk score in a Danish population study [23]. However, future application of predictive models for personalized diagnostic and treatment requires their validation via clinical trials that, in turn, critically depend on the availability of the informatics infrastructure for the application of predictive models in routine care. The required informatics infrastructure should facilitate the acquisition of standardized real world data at the point of care, potential enrichment with patient-reported outcomes or research data, and subsequent access to data for clinicians and researchers. However, while these technical requirements are already available in selected clinical settings, for example, in the United States [24], they are up to now absent in the clinical working environment of psychiatry hospitals in many European countries. More concretely, ORBIS, the EHR system that is the market leader in Germany, Austria, and Switzerland, does not currently support standardized form metadata, clinical data, or annotated data sets. Our approach thus addresses the currently unmet need to (1) implement the technical requirements for standardized data acquisition and analysis in one of the most widely used EHR systems in Europe and (2) to specifically design a technical solution, including appropriate data collection routines, for the domain of clinical psychiatry.
This study aims to present the design and implementation of the technical requirements to address the aforementioned challenges with the ultimate goal of providing the basis for a successful future translation of predictive models to clinical application in psychiatric disorders. The implementation of the outlined technical solution will ultimately allow the evaluation of the potential of predictive models for the clinical management of psychiatric disorders under real-world conditions. In detail, we present the design and implementation of the informatics infrastructure, including technical solutions for (1) extension of the EHR via standardized electronic collection of patient-reported outcomes, (2) data transfer between EHRs and research databases, (3) pooling of EHRs and research data in a unified database, and (4) visual presentation of the analyses results in the EHRs.

Objective of This Study
The main objective of this study was the design and successful implementation of the informatics infrastructure required to train and validate predictive models in day-to-day clinical application in psychiatry as part of the SEED 11/19 study [25]. Our study consisted of the following steps in detail:

System Details
The EHR system ORBIS by Dedalus Healthcare is used at Münster University Hospital in more than 40 clinics and is the market leader in Germany, Austria, and Switzerland with over 1300 installations [29]. The EHR system has an 8700 GB Oracle database, 7938 users, and 1927 user sessions per day (status at July 2020) at Münster University Hospital. No standardized metadata form, clinical data, and annotated data sets are supported.

Requirement Engineering
To address the study aims, the following requirements were identified through focus groups including physicians and researchers at Münster University Hospital in Germany.
1. Extension of the EHR via standardized data collection: At first sight, the widely established usage of electronic documentation systems in clinical routine might supplement the notion of a fast translation of predictive models. However, until now, the majority of clinical data is still acquired and stored in an unstructured way that cannot be directly used for predictive modeling. Extension of EHR data via standardized forms of data collection in routine care is therefore required to provide a sufficient database for the development of predictive models. Importantly, the technical solution should be flexible and allow to update the content of the collected EHR data. Content-wise, in an initial step, standardized extension of EHR data should include assessment of symptomatology in order to allow both patient stratification at baseline as well as outcome measurement following intervention. Furthermore, standardized assessment of known risk factors, including life events and sociodemographic data, appears meaningful. 2. Data transfer: Routine EHR data storage systems are usually strictly separated from research databases for safety reasons and hence are not directly accessible for predictive analyses. Training and validation of predictive models based on EHR data requires the set-up of interfaces and a database in which EHR data can be transferred and subsequently stored in a standardized way. In line with our study aim, the technical solutions should be scalable and allow data transfer in real time. EHR data transferred and stored in the database must be accessible for researchers in order to allow the development of predictive models. 3. Combination of EHRs and research data: Again, since routine EHR data storage systems are strictly separated from research databases, pooling of EHR and research data is not possible within state-of-the art EHR databases.
Pooling EHR with research data in a unified database would allow the enrichment of predictive models trained on EHR data by adding already existing research data and furthermore to validate EHR data based on research data. To this end, in order to combine each patient's EHR and research data, a unified scalable research database is needed that allows the integration of EHRs and research data acquired via experimental studies. 4. Presentation of standardized data within the EHR: Once collected, clinically useful standardized data as well as results of any analysis must be transferred back to the main EHR system in real time and presented to the clinician at the point of care.

Solution Requirements
An informatics infrastructure enabling real-time clinical predictive modeling based on the single-source architecture was derived from the named requirements. Custom metadata must be supported. The Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) (version 1.3.2) was used as a flexible standard for exchange and archiving of metadata within the framework of clinical studies [30,31]. Mobile apps must be able to communicate with the architecture. Automatic data transfer into the database of the EHR system and from the EHR system to a research database was carried out via a communication server. ODM files were transported automatically to the database of the EHR system with Health Level 7 (HL7) messages [32]. NextGen Connect [33] was used as a communication server. HL7 version 2.5 and message type ORU^R01 were used. The plausibility and completeness of form data were validated by the clinical users.

Analysis of Technical and Clinical Feasibility
The technical feasibility was demonstrated by the implementation of an infrastructure that enables clinical predictive modeling in real time. Java

System Architecture
The Single-source Metadata ARchitecture Transformation (SMA:T) was used as the software architecture [45]. SMA:T is an extension of the EHR system of the Münster University Hospital and uses module-driven software development [46] to generate standardized applications and interfaces. Every SMA:T form has a generic built-in interface for exchanging standardized data. Embedded applications [45] were used as the application type. These are linked to an ODM file in the EHR database, from which a documentation form is generated.  Figure 1. Data transfer to the research database takes place via the researcher module from SMA:T. This provides a front end to the EHR system and an extension of the communication server for data transfers. Both prospective and retrospective standardized data exports of EHR data points are supported, specifically, vital signs, laboratory data, medication data, and administrative data. Each data export can be customized by individual parameters. The following parameters are supported: name of data export, export interval, database query, destination parameters for electronic data capture systems, or research databases. MongoDB and RedCap [50] are currently provided as destination templates in the EHR system. The destination portfolio can easily be expanded with interface functions of SMA:T. The research database is embedded in a Docker container of a virtualized Red Hat Enterprise Linux server. The data flow from EHR to electronic data capture is shown in Figure 2. The software architecture is shown in Figure 3. In process steps 1-3, the patient completes the forms and sends data to the communication server. In process steps 4-8, the communication server sends data to the electronic health record system and creates a blank documentation form. This form is populated with imported data. In process steps 9-13, SMA:T creates the documentation form with metadata and imported data. EHR: electronic health record; HL7: Health Level 7; MoPat: Mobile Patient Survey; ODM: operational data model; SMA:T: Single-source Metadata ARchitecture Transformation.

Figure 2.
Unified Modeling Language sequence diagram of the data extraction workflow. In process steps 1-8, a study query is created with SMA:T and a generic operational data model file is saved in the database of the electronic health record system. In process steps 9-18, a generic Mirth Channel is created based on the study query. In process steps 19-20, data points are automatically extracted from the electronic health record system and transferred to the study database using operational data model standard format

System Implementation
The implementation of the architecture is divided into 4 areas: data collection, data transfer, data storage, and data visualization. Agile methods were used for Project Life Cycle and Development Cycle [51].

Data Collection
SMA:T provides 2 options for data collection, namely, the EHR system in clinical routine and dedicated web applications. Data input via web applications can be designed freely. In this study, EHR data generated as part of clinical routine documentation comprised, among others, laboratory data, medication, information on diagnosis, time of admission, and length of stay and are presented in detail in Table 1. MoPat [47,48] was selected for the collection of patient-reported outcomes. After input of the patient case ID, staff handed the patient an iPad with the MoPat app. Patients were then guided through a series of documentation forms comprising different questionnaires and they entered data on the mobile device ( Figure 4). The iPad was then returned to the medical staff. Further details regarding the collection of patient-reported outcomes during inpatient treatment have previously been described [25]. In brief, the self-reports applied in this study are based on well-established questionnaires and scales in the domain of psychiatry and clinical psychology. In addition, to the retention of single item information, sum scales were calculated based on the recommendations provided in the original manuals and references [52][53][54][55][56][57][58]. In addition, Mopat@home was used for the collection of patient-reported outcomes following discharge.
To this end, patients were sent an email, which provided a link to a website in which the above referenced questionnaires were presented and could be filled out [49].

Data Transfer
SMA:T provides 2 types of data transfer in the present scenario, that is, data transfer into the EHR system and transfer into the electronic data capture system. MoPat sends data to the EHR system via the communication server of the University Hospital. Data are saved in the ClinicalData structure of the ODM format. The ODM document is embedded in an HL7 message. Each HL7 message creates a form in the EHR system. The header of the HL7 message determines which form is automatically created. Data transfer to the electronic data capture takes place via SMA:T interfaces. Both retrospective and prospective data exports in real time are supported. When a study query was activated via the EHR frontend, metadata and corresponding structured query language statements were read by the SMA:T extension of the communication server. SMA:T uses its code library and channel framework to generate unique Mirth channels. These send a database query to the EHR system and transfers the output directly to the electronic data capture system. Both metadata (clinical documentation form) and clinical patient data are provided by SMA:T in the ODM format. Data records are combined into an ODM document. In this study, SMA:T converts the resulting XML-based ODM document into JavaScript Object Notation format [59] (JODM format [60]). The JavaScript Object Notation schema [61] of JODM [60] is open source and currently limited to Study and ClinicalData nodes, including all subnodes of the ODM in version 1.3.2.

Data Storage
Data storage addresses metadata and clinical data. Metadata of clinical documentation forms are stored centrally in the SMA:T database. The SMA:T database model is part of the EHR database model. Metadata and clinical data are available in the ODM format. MoPat also supports ODM format; therefore, the same data model can be used for both systems. Clinical data are clearly identified by unique object identifiers and the associated object identifier on the documentation form.

Data Presentation
Usability principles were applied to visualize data [62][63][64]. A one-column layout was implemented according to the requirements of the 10 web form design guidelines [65]. Those forms are displayed via SMA:T within the EHR system (see Multimedia Appendix 1 and Multimedia Appendix 2). SMA:T supports item-based real-time notifications as well as centralized notification services to display analysis results in real time.

Technical and Clinical Feasibility
As part of the study, 11 standardized documentation forms with 202 items were created for the clinic for psychiatry and psychotherapy (  [74], questions on individual disease course [75], and questions on somatic comorbidities [76]. Data models without license restrictions are available in the portal of medical data models. A documentation form is a document from the EHR system (see Multimedia Appendix 1) and consists of several items. An item consists of an input field and the associated label. For example, 1 item from Multimedia Appendix 1 is the drop-down box labeled A; 5866 instances were created by the patients and automatically transferred to the EHR system of the Münster University Hospital without errors. An instance is a form created by a user; 412 cases from 317 patients were processed by 86 users (Table 3). A case is defined as an inpatient stay or an outpatient visit to a hospital or clinic. Of the 123 of the medical staff of the clinic for psychiatry and psychotherapy, 86 (69.9%) worked with those documentation forms. The data quality could be improved by the ODM. Metadata was a critical step in building a generic and automated workflow. All items are now provided with a unique object identifier, have a typing of the data types, and a code list for converting text into numeric values. Automatic generation of documentation forms was accepted in routine clinical use. Standardized data transfer from the communication server into the EHR system was completed without error. It was possible to display all items (n=202) from ODM structures in full by using the generic workflow. Clinical data from 317 patients was stored in the EHR database; 96.7% (4360/4509) of the scores could be calculated and transferred into the EHR system (Beck's Depression Inventory [52,53], Big Five Inventory [54], Childhood Trauma Questionnaire [55], Hamilton Depression Scale [56], Narcissistic Admiration and Rivalry Questionnaire [57], and Symptom Checklist-90 Somatization Scale [58]) (Table 4), and 111,842 items were completed by patients on mobile devices ( The validity of the acquired data on depressive symptomatology was already analyzed in a feasibility study [25]. Eight standardized documentation forms with 39 items were created for the retrospective data export (Table 1): SEED ClinicalData Admission Date & Time [77], SEED ClinicalData Classification [78], SEED ClinicalData Diagnosis-Related group/Diagnosis [79], SEED ClinicalData electroconvulsive therapy [80], SEED ClinicalData Laboratory Assessments [81], SEED ClinicalData Medication [82], SEED ClinicalData Patient [83], and SEED ClinicalData Vital Signs [84]. A total of 96,323 instances of vital signs, laboratory data, medication data, and administrative data could be automatically transferred from the EHR system to the research database (Table 6). Retrospective ODM-based data export worked correctly without technical errors, and 585 instances were created by the patients with Mopat@home and transferred to the research database via SMA:T (Table 7).

Answers to the Study Questions
The aim of this study was the design and implementation of an informatics infrastructure enabling standardized data acquisition at the point of care and subsequent accessibility of clinical data for analytic purposes, which is required for future application of predictive models in day-to-day clinical routine in psychiatry. In this study, we have shown the overall technical feasibility of the implemented solution. Standardized documentation forms were implemented to extend EHR data domains and to improve data quality in the EHR system. An automated transfer of data into the EHR system and the research database was implemented, thus enabling the pooling of EHR data with already existing research data from ongoing cohort studies. This system was accepted by clinical staff from the Department of Psychiatry of Münster University Hospital in Germany. Widespread use of documentation forms could be demonstrated. Standardized electronic data collection in the EHR at the point of care was successfully implemented. The latter solution can similarly be applied for the presentation of results from predictive models.

Strengths and Weaknesses of This Study
The major strengths of this study are standardized acquisition, transfer, storage, and export of data in real time with a generic informatics infrastructure. This system fulfills the prerequisites for future predictive modelling in clinical routine in psychiatry [85][86][87]. Standardized data transfer in ODM format provides scalability in the context of complex medical data structures. The Define-XML standard, an extension of the ODM standard, is mandated by regulatory authorities such as Food and Drug Administration for metadata [88]. Compliance with regulatory standards is the major advantage of our infrastructure regarding future clinical studies. The data format had to be converted due to the research database, which is a limitation. MongoDB was chosen for rapid analysis of large amounts of data in previous work [89]. Standardized automatic data transfer into research databases was possible for both retrospective and prospective research questions. The data of the EHR system was responsible for the number of documentation forms for the retrospective export. Data export can be configured centrally from the EHR system in compliance with local data protection regulations. Our approach is scalable because ORBIS EHR systems are used in more than 1300 hospitals in Germany, Austria, and Switzerland. The evaluation concentrated on technical and clinical feasibility. Limitations include the lack of elaborated standardized evaluations of the user experience of the system by clinical staff. Moreover, further evaluation is necessary in order to assess the sustainable benefit in everyday clinical practice. Although the feasibility and acceptability of the implemented data input interface has been demonstrated in a recent publication [24] and the wide-spread use of the implemented data presentation format in the EHR indicates acceptability, it appears important to note that no further feedback from clinicians (ie, in the form of structured interviews or questionnaires) has been acquired, which limits the informative value regarding user satisfaction. This important issue should therefore be addressed by future works based on elaborated user feedback. Of note, the projected acquisition of data from several hundred cases per year based on our set-up results in a database of modest scale was comparable to that by successfully established deep learning models in other fields of medicine [90]. Yet, it appears important to take into account that that the current state-of-the-art machine learning approaches in psychiatric research are based on cohorts with smaller sample sizes that were acquired over a period of multiple years [91][92][93]. The present initiative that aims to train predictive models on data from clinical routine documentation thus offers a perspective to significantly increase sample sizes in machine learning research in psychiatry. The training of predictive models as well as their validation in clinical applications is not within the scope of the this study but will be the focus of subsequent work building on the technical infrastructure outlined in this study. Importantly, as our standardized data acquisition protocol covers established risk factors and symptom profiles that have in part already been successfully used for predictive analytics in psychiatric cohort studies [91,93], it appears reasonable to assume their predictive validity for the intended prediction of symptom trajectories and functional outcomes in future work.

Results in Relation to Other Studies
Through our study, we extend a previous line of research on predictive modeling based on EHR data. While previous studies have demonstrated empirical evidence for the predictive validity of EHR data in psychiatric use cases [18][19][20], to the best of our knowledge, our study is the first to not only report on the design but also on the successful implementation and technical feasibility of the informatics infrastructure for standardized acquisition, transfer, storage, and access of real world data for analytic purposes in psychiatric care, which is the basic requirement for the application and validation of predictive models in future clinical studies. Although we are not aware of any other study that has reported successful implementation of a comparable informatics infrastructure in psychiatric clinical routine, several preliminary reports should be taken into account. Complementary to the work presented in this study, Khalilia et al [94] described a Fast Healthcare Interoperability Resources (FHIR) web modeling service that was tested on a pilot intensive care unit dataset. A multi-source approach was used. No binding standard is used for clinical studies; instead, the standard Observational Medical Outcomes Partnership Common Data Model was applied [95] and an FHIR server and database are required for this system, which might limit potential implementations at multiple sites, considering that many EHR systems currently do not yet use an FHIR server. Of note, we are aware of several large-scale efforts aiming to translate predictive models into psychiatric practice [96] that, once implemented, might serve as a future base for comparison of system stability and performance. Importantly, the presented infrastructure represents a flexible solution that allows compatibility with existing initiatives and concepts of data standardization such as the Common Data Elements repository of the National Institutes of Health [97]. The choice of the ODM as the data standard implies the automatic provision of a metadata provider for each item. Thus, data points can be enriched with additional codes based on standards such as the Systematized Nomenclature of Medicine Clinical Terms or the Unified Medical Language System [98,99]. The integration takes place via the alias node or the SMA:T schema extension of the ODM. This makes it possible to enrich the survey data with additional metadata. International standardizations are hence compatible with the operating data model based on a 1-1 mapping of item definition nodes.

Generalizability of This Study
The informatics infrastructure for standardized data acquisition, transfer, storage, and export in real time for future predictive modelling outlined in this study is an important step in the complex process toward the implementation of machine learning and clinical decision support solutions in routine care. Our study shows that this approach is technically feasible. Owing to the standardization, this concept is also scalable for other medical areas. Data warehouse applications of a heterogeneous hospital landscape can be implemented with this software architecture. In addition to local artificial intelligence applications, multi-site implementations of the architecture could also transfer pseudonymized data points into a global predictive model. The implementation of national and international predictive models in medicine would be possible.

Future Work
Artificial intelligence systems rely on high-quality data. In the future, artificial intelligence applications might send real-time evaluations directly back into the EHR system. Clinical staff could access and respond to calculated predictions. Selected data will be provided in a modular dashboard. Medical device regulation needs to be taken into account for implementation of such systems. Direct data transfer back from the clinic would be possible. Real-time adjustments of the prediction models would thus be possible. Standardization of clinical routine documentation via SMA:T can provide high-quality structured data points. It is planned to augment this database with further research data from existing cohort studies, for example, covering neuroimaging and genetic data. Specific prediction models can be trained in this way with the same architecture. Generic model pipelines can be set up. Model clusters can be set up to answer complex medical questions. Basically, SMA:T forms a solid technical infrastructure for the implementation of artificial intelligence solutions in medicine. Scheme extensions of the ODM standard can be implemented to optimize communication between systems. Observational and interventional studies are warranted to evaluate the predictive validity of machine learning models in psychiatric routine. For multi-center studies, SMA:T needs to be reimplemented in the respective EHR environments to process CDISC ODM files. A software blueprint is available [45]. If SMA:T and MoPat are already in use, the architecture can be set up within a short time frame of approximately 1 week. The generic concept of the architecture enables the reuse of our data models, database queries, and server architecture. Retrospective database queries might have to be reimplemented in the EHR environments. The necessary data can be used from our repository on GitHub [100]. Another important consideration is the potential future enrichment of EHR data with mobile assessments, including ecological momentary assessments and passive sensor data derived from smartphones. Recent reports on successful real-time prediction of depressive symptoms based on ecological momentary assessment data supplement this notion [101]. Thus, future studies should explore technical solutions that allow data transfer between EHRs and patients' smartphones. Future work will evaluate the predictive potential of the acquired data entities by training and validating machine learning models for an individual level prediction of treatment response, functional outcome, and depression relapse. In accordance with findings from previous machine learning approaches in psychiatric cohort studies, in a first step, well-established predictive algorithms such as support vector machines will be trained on features covering risk and symptom profiles, sociodemographic variables, medication, and treatment history [7,91,93]. Yet importantly, as opposed to previous cohort studies, the technical infrastructure outlined in this study will allow to train and validate predictive models in naturalistic patient samples in routine care.

Conclusions
The presented informatics infrastructure enabling standardized data acquisition, transfer, storage, and export in real time for future predictive modelling in clinical routine in psychiatry is technically feasible. The outlined architecture provides a technical basis for the application, first and foremost, and the validation of clinical decision support systems and artificial intelligence applications in clinical studies.