Applying Artificial Intelligence in Medicine: Our Early Results

Published in

Cardiogram

5 min readMay 11, 2017

When did you last visit your primary care physician? During your appointment, the doctor placed a stethoscope over your chest, listening for whispers of abnormality in your heart beat. But most heart arrhythmias occur sporadically. 1 in 4 of us will develop abnormal heart rhythm in our lifetime — the scary thing is, we might not know it.

Picture a world where your heart can be monitored continuously using a device you could purchase at a Best Buy or Target. Algorithms transform the raw data coming from your watch into diagnoses, and your doctor will be notified when a problem is detected.

Today, Cardiogram is taking the first step down that path. We’ve developed an algorithm to use the Apple Watch to detect atrial fibrillation — the most common heart arrhythmia — with higher accuracy than previously validated methods. Our work is being presented at the Heart Rhythm Society, and has been picked up by TechCrunch, Buzzfeed, and CNET.

One year ago, we teamed up with UCSF Cardiology to start the mRhythm study, which 6,158 Cardiogram users enrolled in. Cardiogram trained a deep neural network on the Apple Watch’s heart rate readings and was able to obtain an AUC of 0.97, enabling us to detect atrial fibrillation with 98.04% sensitivity and 90.2% specificity.

Deep Learning in Medicine

Artificial intelligence is already making strides in detecting undiagnosed disease. In December of last year, a team of researchers at Google trained a deep neural network to detect diabetic retinopathy with higher accuracy than an ophthalmologist. The following January, Stanford published a seminal paper in Nature showing that their convolutional neural network can detect skin cancer from images of skin lesions. Our results represent the third major application of deep learning to medicine.

Our model architecture. Sensor data is fed into 4 layers of residual, convolutional neurons, with max-pooling applied after each layer. The outputs are fed into 4 residual, bidirectional LSTM layers. Finally, a single convolutional layer with filter length 1 produces a prediction score for each timestep.

The most promising finding of our study is proof that consumer-grade wearables can be used to detect disease. The future is bright here, and there are a few research directions that are particularly interesting to us.

In Medicine, Labels are Precious

Applying deep learning to medicine comes with its unique challenges. Top among these is the difficulty of obtaining data. Many of the tech giants train deep learning models on server logs: Google ad clicks and Facebook likes provide billions of labeled data points. But in medicine, each label is a human life at risk, and only very large studies can generate enough data to train a deep neural network.

To train their diabetic retinopathy classifier, Google hired a small army of ophthalmologists to manually classify 128,000 retinal photographs as healthy or diseased. To detect skin caner, Stanford combined 18 existing open-access image repositories and supplemented the data with images from the Stanford Medical Center.

In our study, we sent 200 AliveCor mobile ECG devices to Cardiogram users who suffered from atrial fibrillation. These users recorded a total of 6,338 mobile ECGs, each associated with a positive or negative atrial fibrillation label generated by AliveCor.

Labeled data is limited, so we are interested in training techniques that require fewer labels. One-shot learning, unsupervised, and semi-supervised learning methods will enable us to detect less common diseases. We’ve experimented with auto-encoders and heuristic pretraining, and are exploring state of the art methods like one-shot learning with Siamese Networks.

Deep Reinforcement Learning

We would like to explore deep reinforcement learning to deliver personalized care. Suppose you notify Cardiogram of your panic attacks. Using this data, a reinforcement learning algorithm can pick up on specific biometric triggers. Before your next panic attack, you’ll get a notification from Cardiogram: “Take three slow, deep breaths.”

Heuristic Pretraining

Cardiogram users have generated staggering amounts of unlabeled heart rate data, and we applied 139 million heart rate measurements to pretrain our neural network. Prior research has come up with a statistical method to detect atrial fibrillation using pulse. Taking inspiration from this method, we pre-trained our neural network to predict the average variation in heart rate readings over various time windows.

Validating our Neural Network

In order to validate the model, we obtained gold-standard labels of atrial fibrillation from cardioversions. In a cardioversion, a patient experiencing atrial fibrillation is converted back to normal sinus rhythm, either chemically or with a shock to the heart. 51 patients at UCSF agreed to wear an Apple Watch during their cardioversion. We obtained heart rate samples before the procedure, when the patient was in atrial fibrillation, and after, when patient’s heart was restored to a normal rhythm. On this validation set, our model performed with an AUC of 0.97, beating existing methods.

There is work to do before we start notifying our users of arrhythmias. First, we’d like to ensure that our algorithm works in a variety of conditions, whether you’re sleeping, running, or driving. Our detection algorithm has flagged some users who have opted into our study. We plan to send AliveCor devices to these users, and to a randomly selected control group. The ECG readings from these devices allow us to measure the accuracy of our algorithm on undiagnosed, ambulatory users.

There are challenges in scaling our model evaluation to run nearly continuously on all of our users. To deploy our algorithm in the wild, we must turn our research-grade machine learning setup in to a distributed model evaluation server.

What’s next?

Our work is far from complete. We do not just want to detect disease, we want to treat it. In the future, you could imagine Cardiogram sends you a notification: “We noticed an abnormality in your heartbeat. Want to chat with a cardiologist?” After connecting you with a doctor, we will monitor the effectiveness of your treatment plan. “Looks like your beta blocker medication is working, but loses effectiveness after 12 hours. Why don’t you increase your dosage?” Using wearables, we can not only detect disease early, but can also guide patients down the road to recovery.

Interested in learning more? Reach out to us in a comment or by emailing hello@cardiogr.am. And if you want to help, we’re hiring!