This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
With a wide range of use cases in both research and clinical domains, collecting continuous mobile health (mHealth) streaming data from multiple sources in a secure, highly scalable, and extensible platform is of high interest to the open source mHealth community. The European Union Innovative Medicines Initiative Remote Assessment of Disease and Relapse-Central Nervous System (RADAR-CNS) program is an exemplary project with the requirements to support the collection of high-resolution data at scale; as such, the Remote Assessment of Disease and Relapse (RADAR)-base platform is designed to meet these needs and additionally facilitate a new generation of mHealth projects in this nascent field.
Wide-bandwidth networks, smartphone penetrance, and wearable sensors offer new possibilities for collecting near-real-time high-resolution datasets from large numbers of participants. The aim of this study was to build a platform that would cater for large-scale data collection for remote monitoring initiatives. Key criteria are around scalability, extensibility, security, and privacy.
RADAR-base is developed as a modular application; the backend is built on a backbone of the highly successful Confluent/Apache Kafka framework for streaming data. To facilitate scaling and ease of deployment, we use Docker containers to package the components of the platform. RADAR-base provides 2 main mobile apps for data collection, a Passive App and an Active App. Other third-Party Apps and sensors are easily integrated into the platform. Management user interfaces to support data collection and enrolment are also provided.
General principles of the platform components and design of RADAR-base are presented here, with examples of the types of data currently being collected from devices used in RADAR-CNS projects: Multiple Sclerosis, Epilepsy, and Depression cohorts.
RADAR-base is a fully functional, remote data collection platform built around Confluent/Apache Kafka and provides off-the-shelf components for projects interested in collecting mHealth datasets at scale.
The opportunity in health care for continuous monitoring of patients has steadily grown in parallel with the widespread availability of smartphones, more capacious mobile networks, and development of new wearable sensors that are able to continuously measure a growing set of physiological and phenomenological parameters. Many of these devices are currently in the lifestyle or fitness domain; however, vendors are increasingly developing these devices for medical-grade applications. If these streams of data can be reliably collected, analyzed, and acted on, it opens up the possibility of better understanding disease etiology, diagnosis, prognosis, and detecting relapse in most disease areas.
Existing mobile health (mHealth) platforms include some form of questionnaires, phone (Android, iOS) sensor data collection, wearables integration, backend infrastructure, and management user interface, but few presently include all or are scalable solutions. Moreover, 2 such examples are the Open source AWARE Framework, an Android platform for mobile phone–based context sensing [
The Euro 26 million Innovative Medicines Initiative (IMI) Remote Assessment of Disease and Relapse -Central Nervous System (RADAR-CNS) is a major international academic-industry research program aimed at developing novel methods and infrastructure for monitoring major depressive disorder (MDD), epilepsy (Epi), and multiple sclerosis (MS) using wearable devices and smartphone technology [
To facilitate adoption by the wider mHealth community, the RADAR-base platform was released under an open source Apache 2 license in January 2018. RADAR-base is composed of backend infrastructure and 2 Android mobile apps: a cross-platform Cordova app for active monitoring of participants (active remote monitoring technology, aRMT) through conscious action (eg, questionnaires, audio questions, timed tests) and a native Android app for passive monitoring via phone and wearable sensors (passive remote monitoring technology, pRMT). RADAR-base also includes capabilities for data aggregation, management of studies, and real-time visualizations.
A key differentiator of RADAR-base platform is that it makes use of Confluent technologies (based around Apache Kafka) to provide an end-to-end solution for remote monitoring use cases (eg, participant management and data analysis) that scales horizontally through the use of Kafka and Confluent ecosystem. Other approaches to centralize information flow and decouple systems exist, in particular Messaging Queues and Enterprise Service Bus/Service Oriented Architecture (SOA) type architectures, which have a number of overlapping and differentiating factors compared with Confluent/Kafka, in particular around routing, scaling, performance, and ecosystem [
The RADAR-base platform can be deployed both in local settings, such as a hospital, for local data collection or for ambulatory studies through remote deployment for centralized data collection. The RADAR-base backend has been deployed on various platforms (cloud, bare-metal, etc) as a set of microservices using Docker containers. Both scenarios are used in RADAR-CNS.
The RADAR-base platform is a scalable, secure, open source Internet of Things (IoT) platform for real-time remote sensor data collection in the context of mHealth clinical studies.
The RADAR-base platform consists of following major categories of components:
Data Collection Ecosystem
Data sources
Data Processing and Visualization
Study management and Security
High throughput, low latency data collection
Scalability
Generalized device integration for passive data sources
Abstracted and composable integration mechanism for sensor devices
Third-Party RESTful (Representational State Transfer) data source integration
Configurable data sources at runtime
Schema evolution
Real-time data processing and analytics
Hot and cold storage
Data access (Representational State Transfer [REST]-API)
Modular, extensible dashboards
Electronic Case Report Form (eCRF) integration (REDCap)
Remote configuration
Cohort Management Portal
Security
These functionalities are delivered through the following components:
Data ingestion: Recognizing and registering data sources (including smartphones and wearable devices), collecting the data via a direct Bluetooth connection or through a third-party application protocol interface (API), and streaming in near-real-time to the server (green box in
Data storage and management: Consists of 2 centralized storage systems behind an authorized security layer. The cold storage, based on Hadoop Distributed File System (HDFS), that is scalable and fault-tolerant, focused on storing large volumes of raw data, and the hot storage, based on MongoDB, for storing aggregated data to provide a near real-time overview of the raw data, principally for the data dashboards.
Data sharing: Visualizing aggregated data in a live dashboard and exporting raw data for further analyses in various formats including AVRO, JSON, and CSV.
Technical overview of the RADAR-base platform stack.
Current data sources: Empatica E4, Pebble 2, Fitbit, Biovotion, Faros, active Remote Monitoring Questionnaire app, and passive Remote Monitoring app.
The entire RADAR-base backend is deployable as a set of microservices based on Docker containers [
As the platform is based on Apache Kafka, we can either send data directly into Kafka via a native Producer or using HTTP and the Confluent REST proxy.
The data sent into the platform from the data source will be converted into AVRO format before going into Kafka topics. To convert our data to AVRO, the REST proxy needs to know the schema (or format) of the data being sent. These schemas are stored in the Schema Registry to reduce the payload size of each request.
Event-by-event stream processing is built on top of Kafka Streams. It provides an abstract layer to monitor and analyze streams of data and write aggregated/transformed data into Kafka topics. The data are
The data coming into Kafka is extracted into storage systems such as the HDFS and MongoDB with the help of Kafka Sink Connectors. The HDFS Sink Connector takes raw data coming into the system and deposits it into the HDFS storage in AVRO format (this time with the schema embedded, so the data are self-describing), which can be used for archival storage and historical analysis. The MongoDB sink connector takes aggregated data coming into Kafka from the Streams app’s Processed Topics and deposits it into the MongoDB storage.
The AVRO schema is a JSON format specification of the fields and data types, which data values can hold. The schema itself can be embedded in the message or a reference held to the schema stored in a Schema Registry. AVRO is a particularly convenient format for managing schema evolution in RADAR-base, as schema changes can occur frequently and without warning, especially where third-party data sources are concerned.
For illustration purposes, the schema for the phone acceleration is shown in
Schema overview for the phone acceleration.
Data Sources represent a wide variety of systems able to send data into the RADAR-base platform; these include devices containing sensors, mobile phones, questionnaires and digital games/assessments, and Web-APIs data portals.
In RADAR-base, passive data sources are collected via the pRMT app, active data sources via the aRMT app, and third-party data sources via the THINC-it app in RADAR-CNS.
Another type of data source includes middleware connecting a vendor’s Web API to the RADAR-base platform. For example, Fitbit does not provide a mobile Software Development Kit (SDK) to stream data to the pRMT app directly; instead, all the data are uploaded to the vendor data warehouse and provided to developers via a Web API. Getting these data into the RADAR system is achieved by implementing a server-side Kafka Source Connector, which continuously queries data from the vendor’s Web API and dumps it into Kafka inside the RADAR-base platform; this approach can be used to integrate other Web API/OAuth2 data sources [
The native passive Android application (pRMT) has been designed to passively collect data from sensors on the user’s smartphone as well as to integrate wearable devices that offer SDKs. Its enhanced modularity (via pRMT plugins) allows easy integration of new devices/sensors. It currently supports Empatica E4 Wristband, Pebble 2 Smartwatch, and Biovotion VSM devices.
Passive Remote Monitoring app user interface. The Device column lists all the devices that are connected to the app and collect data. Device connection/disconnection is shown by green and red icons, respectively. The 3 columns next to “Device show the different values that are being measured on the connected devices. The last column shows the amount of data (or records) that have been collected.
The primary goal for the aRMT mobile app is to allow users to submit questionnaires through the user’s smartphone at a notified time. The questionnaire definitions and their regimen are defined by simple JSON configuration files, which are Web-served and therefore remotely configurable. New questionnaire configuration files can be easily created either manually or authored as REDCap data dictionaries and parsed via a simple script. The regimen or protocol configuration file defines the sequence of the questionnaires delivered and the local notifications or Firebase Cloud Messaging push notifications used to alert the user.
The aRMT app was designed as a hybrid Cordova app and usable on both iOS and Android. Furthermore, it includes Cordova plugins to collect active audio responses to questions, allowing active samples of raw speech audio to be collected for analysis. Finally, the aRMT app also serves as a means of providing time markers to data collected in parallel by the pRMT app, such as start and end labels of walking and balance tests used in the MS study in RADAR-CNS.
User interface of the active Remote Monitoring app.
THINC-it is a third-party mobile app that makes use of 5 quick interactive tests to assess memory, concentration, and attention [
The THINC-it app uses the RADAR-base platform backend infrastructure as part of the RADAR-CNS project. It also provides a reasonable paradigm for other third-party app integration into the RADAR-base platform.
A common task is the exploration of collected raw data. In addition to the near-real-time visualization through the dashboard, the RADAR platform includes a python package for the processing and visualization of historic data. The package provides standard tooling for exploratory visualization of RADAR-base data (see
Contiguity of phone sensor data over 6 months collected through RADAR-base for aparticipant in the major depressive disorder study. The red line corresponds to the enrollment date, whereas a coloured segment on each row corresponds to recorded data at an hourly resolution.
The package can make use of the common structure of RADAR-base data through the defined AVRO
The RADAR platform exposes RESTful Services implemented using Jersey 2 and deployed on Grizzly server. Data collected in the platform are processed in real-time by a Kafka Streams application to provide aggregations (mean, max, etc) at various time resolutions (second, minute, hour, day, and week) and stored in MongoDB, which is served through the REST-API. The REST-API provides various API endpoints, which can be used to request the aggregated data in near-real-time and allows various combinations of queries. All the endpoints are secured; therefore, only a valid user or a client registered with the management portal with appropriate permissions/scopes can access these endpoints. All REST endpoints have been documented following OpenAPI specifications using Swagger: a powerful open source framework offering a large ecosystem of tools that help to design, build, document, and consume RESTful-APIs.
The REST-API also exposes some real-time information about the current status (connected or disconnected) of the sources registered for a particular subject along with when a data source was last detected sending data. All this information is used by the Dashboard to provide a real-time visualization of the current state of the studies/projects to project admins.
Overview and visualization are provided by a clear, customizable user interface with an emphasis on exploring different aggregation and zoom levels in the data. The RADAR-base dashboards use Angular, RxJS, and D3 to construct views on data from the REST-API. These presently provide management project/study lists, compliance views, and participant-level visualization of longitudinal data.
Participant data view (battery and accelerometer streams).
The management portal Web application is the main user interface for creating and organizing RADAR projects, enrolling participants, and managing the association of participants with corresponding data sources.
RADAR-base Management Portal.
Authentication, authorization, deidentification, and encryption are compulsory due to the sensitive information collected by the platform and to manage additional unknown risks associated with IoT using large numbers of network edge devices and endpoints. These are implemented for these following elements:
The management portal is used to issue a Quick Response (QR) code or Token for data-source to participant registration. This QR code can be scanned with the embedded QR code scanner in the integrated apps or alternatively a token can be entered directly as text. The decoded QR code provides some valuable information required by clients including sources, user ID, roles, scopes.
To provide authorization and authentication, we utilize the OAuth2 workflow, an industry standard protocol for authorization. In the mobile apps, we use the Refresh Token grant type [
For the mobile apps, an access token from the management portal is required for authorization across the platform. Without this token, data sources can neither register nor send data into the platform. The management portal provides a Refresh Token in the form of a QR code associated with a subject. This QR code can be scanned by the mobile apps to obtain a URL to a JSON Web Token, which embeds a Refresh Token and the authorization endpoint as a Bearer Token in an HTTP(S) request to obtain a new Refresh-token and Access token pair. This access token can then be used to access and post data according to the resources, roles, and scopes specified. Once the Access token expires, the most recently obtained Refresh Token is used to obtain a new (Refresh-token+Access token) pair.
The RADAR-base platform provides utilities for clients to easily manage the OAuth2 authorization flow [
As discussed in eCRF Integration and REDCap Integration WebApp Sections, the strongly identifiable information is saved in a separate eCRF system, which is isolated from the RADAR-base platform [
An nginx web server is used to proxy traffic into the platform and provide Cross-Origin Resource Sharing. Furthermore, the reverse proxy (nginx web server) can also be configured to act as a mitigation against Distributed Denial‑of‑Service attacks or using it as an HTTP load balancer.
This component controls access to the Confluent Kafka REST Proxy for posting data from clients to Kafka through HTTP POST requests. It performs authentication and authorization, content validation, and decompression if needed. It also verifies if the access token sent in the HTTP POST request is valid and has the required privileges to perform the POST request for the specified resource, role, and scope.
The various components of RADAR-base keep activity logs at levels appropriate for that component. As the management portal keeps track of all study-related information, device assignments, and participant enrollment information, it keeps the most detailed audit logs. Any modification to the management portal database is stored in an audit record. These audit records store the user who made the modification, the time at which the modification was made, as well as the old and new state of the modified entity, allowing the complete history of all study metadata to be tracked or roll back modifications when necessary. Finally, the management portal also logs when, to what application, and for which user access tokens are being granted. It is important to note that this log is only there for the purposes of auditing. Validation of the tokens is not handled by the management portal. Instead, clients can use the management portal’s public key to validate the digital signature embedded in the access token. This way, components in RADAR-base can horizontally scale up, without the need for the management portal to scale up with them just to be able to keep up with validation requests. See
Optional integration of one or more REDCap eCRFs servers is provided with RADAR-base. REDCap is a secure 21 CFR Part 11, FISMA, and HIPAA-compliant Web application for building and managing online surveys and databases [
A brief workflow of the registration is shown in
User registration workflow.
The entire RADAR-base platform is freely available at github repository as open source software [
Multiple instances of RADAR-base are deployed and in use for real-world studies of Epilepsy, MS, and MDD under the umbrella of RADAR-CNS [
The catalog of devices currently integrated into the pRMT app include onboard Android smartphone sensors, Empatica E4, Pebble 2 smartwatch, BiovotionEverion, Faros 180, and Fitbit; a list is maintained here [
The current deployments of RADAR-base for the different disorders are explained below.
The RADAR-base platform has been deployed centrally to collect active (questionnaires) and passively generated (wearable Fitbit and smartphone sensor) data remotely for participants recruited to 3 sites of the MDD study. The sites include King’s College Hospital, London; Centro de Investigacion Biomedica en Red, Barcelona; and VU University Medical Center, the Netherlands. The objective being to collect regular self-reported symptoms and metrics such as sleep and ambulatory behavior. High-resolution data are being collected over a period of up to 2 years for each participant. More details about the MDD studies and preliminary data analysis are provided in our study [
The Radar-base platform has been successfully tested and deployed in the Clinical Neurophysiology Department, King’s College London, and the Epilepsy Center, Medical Center University of Freiburg, in their respective video electroencephalograms monitoring units, and it is currently in active use in London and Freiburg with enrolled participants.
Latest participants have the facility to wear 3 devices (Faros, Biovotion, and E4) concurrently. We have explained the detailed deployment of the platform for Epilepsy studies and initial collected data in our study [
MS studies using RADAR-base are underway at different partner sites across Europe. Participant recruitment has been started and data are being streamed to central deployment. An important focus here is to collect data from the Faros 180 devices used for several mobility and balance tests in addition to similar ambulatory behavior collected in the depression study.
These 3 studies expose the versatility of the RADAR-base platform and generate data with very different complexity, volume, velocity, and durations.
Several technical challenges were addressed, including:
High throughput, volume, and velocity of the data.
Processing data in real time.
Optimizing phone resources to handle data collection and streaming (particularly high-resolution sensors).
Privacy concerns particularly around Global Positioning System data used to track location or audio exposing identifiable information. For this, the RADAR-base platform calculates and sends the relative location from a reference point. Similarly, background audio sampling is one-way-mapped to a vector representation of features using an OpenSmile plugin [
Identifiable information is kept separate from sensor data to make it pseudonymized.
Security (authorization and authentication) is also a major concern for sensitive data collected from participants. Access to data is provided via a secure data transfer protocol.
With the huge amount of raw data the platform is built to collect, it is essential to have efficient compression. All the data collected and stored are compressed and encrypted.
Maintaining performance and behavior of the data source pRMT and aRMT apps in an ever-changing Android landscape of OS versions, handsets vendors, and form factors.
A summary of other platforms comparing salient features with the RADAR-base platform are provided here. The recently developed mental health Nonintrusive Individual Monitoring Architecture platform, a prototype implementation used alongside an investigation of the key features required of a mHealth data collection platform; these include integrating data sources, a focus on privacy, and flexible user permissions [
ResearchStack is an SDK and UX framework for building research study apps on Android, with a similar application domain as ResearchKit [
A key differentiator in the RADAR-base platform is the use of the Confluent platform technologies [
RADAR-base aims to stimulate the field of mHealth by providing an off-the-shelf platform for general remote data collection at scale. The project has long-term goals to improve participant care with use cases including predicting and pre-empting relapses and improving outcome measures in trials through the use of remote assessment technologies in a wide variety of disorder areas. Beyond RADAR-CNS, RADAR-base is being deployed across a number of other large EU IMI2–funded programs including RADAR-Alzheimer’s Disease and is presently deployed for BigData@Heart for remote monitoring in an atrial fibrillation treatment trial (the UK National Institute for Health Research—NIHR—funded RATE-AF NCT02391337).
Better resolution figures.
Available sensing platform.
application protocol interface
active remote monitoring technology
Central Nervous System
Hadoop Distributed File System
Innovative Medicines Initiative
major depressive disorder
mobile health
multiple sclerosis
National Institute for Health Research
passive remote monitoring technology
Remote Assessment of Disease and Relapse
Representational State Transfer
Software Development Kit
quick response
This study has received support from the EU/EFPIA IMI Joint Undertaking 2 (RADAR-CNS grant No 115902) [
None declared.