Machine Learning in Healthcare: What's Actually Working

Alex Kholodniak

• 10 Oct 2023 • 8 min read

Machine learning in healthcare is one of those topics where the hype often outpaces reality. But there are some genuinely useful applications happening right now. Let me break down what's actually working, with concrete examples and code you can learn from.

Where ML Helps With Diagnosis

Medical imaging is where ML has made the most progress. Convolutional Neural Networks (CNNs) -- specifically architectures like ResNet, DenseNet, and EfficientNet -- can scan through X-rays and MRIs faster than any radiologist. They're particularly good at spotting patterns in large datasets that humans might miss due to fatigue or volume.

The most practical use case? Flagging potential issues for human review. A trained model can look at thousands of chest X-rays and highlight the ones that need closer attention. This doesn't replace radiologists. It helps them focus on the cases that matter.

Here's a simplified example of how a chest X-ray classifier gets built with PyTorch:

import torch
import torchvision.models as models
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Use a pretrained DenseNet121 -- the same backbone
# behind CheXNet, which matches radiologist-level performance
model = models.densenet121(pretrained=True)
model.classifier = torch.nn.Linear(model.classifier.in_features, 14)

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

# Multi-label classification: one X-ray can show
# multiple conditions (e.g., pneumonia + cardiomegaly)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

DenseNet121 is the backbone behind CheXNet, one of the first models to match radiologist-level performance on detecting pneumonia from chest X-rays. The key insight: transfer learning from ImageNet works surprisingly well for medical images, even though the domains are completely different.

Early tumor detection is another area showing real results. Models trained on millions of scans can catch anomalies that might be missed in a busy clinic. Google's LYNA (Lymph Node Assistant) achieved 99% accuracy in detecting metastatic breast cancer in lymph node biopsies. But here's the catch: they still need human verification. False positives are a real problem, and in healthcare a false positive can mean unnecessary surgery.

Specific ML Models Used in Healthcare

Different clinical problems call for different model architectures. Here's what's actually deployed:

U-Net for medical image segmentation -- originally designed for biomedical images, it excels at outlining tumor boundaries in CT scans. Its skip connections preserve spatial detail that other architectures lose.
Random Forests and XGBoost for tabular patient data -- predicting sepsis risk, readmission likelihood, or mortality. These outperform deep learning on structured EHR data most of the time.
Transformer models (BioBERT, ClinicalBERT) for processing clinical notes -- extracting diagnoses, medications, and procedures from unstructured text.
Graph Neural Networks for drug interaction prediction -- modeling molecular structures as graphs to predict how compounds will behave.

The common mistake is reaching for deep learning when a gradient-boosted tree would work better. If your input is a spreadsheet of patient vitals and lab results, XGBoost will likely beat a neural network and be far easier to explain to clinicians. For a hands-on example of building a classifier from scratch, see our guide on building a Bayesian text classifier in Ruby.

Personalized Treatment Plans

This is where things get interesting. ML can process genetic data, medical history, and treatment outcomes to suggest personalized approaches.

The basic idea: instead of one-size-fits-all treatment, you use data to predict what works for specific patients. Some cancer treatments already use this approach. Genetic markers help determine which drugs are likely to be effective.

Here's a practical example -- building a treatment response predictor using patient genomic features:

import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score

# Features: gene expression levels + clinical metadata
# Target: whether the patient responded to treatment
X = patient_data[gene_columns + ['age', 'stage', 'prior_treatments']]
y = patient_data['responded']

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
auc_scores = []

for train_idx, val_idx in cv.split(X, y):
    model = GradientBoostingClassifier(
        n_estimators=200,
        max_depth=4,
        learning_rate=0.05,
        subsample=0.8
    )
    model.fit(X.iloc[train_idx], y.iloc[train_idx])
    preds = model.predict_proba(X.iloc[val_idx])[:, 1]
    auc_scores.append(roc_auc_score(y.iloc[val_idx], preds))

print(f"Mean AUC: {sum(auc_scores)/len(auc_scores):.3f}")

But let's be honest about the limitations. Most hospitals don't have the data infrastructure for this. You need clean, standardized data. Most healthcare systems are still working with fragmented records spread across multiple vendors that don't talk to each other.

Data Pipeline Considerations

Getting data into a usable state is 80% of the work in healthcare ML. Here are the real challenges:

HL7 FHIR and data ingestion. Healthcare data comes in formats like HL7v2, FHIR, and DICOM. If you're building a pipeline, you'll spend most of your time parsing and normalizing these. Ruby can help with the orchestration layer:

require 'net/http'
require 'json'

class FHIRClient
  def initialize(base_url, auth_token)
    @base_url = base_url
    @headers = {
      'Authorization' => "Bearer #{auth_token}",
      'Accept' => 'application/fhir+json'
    }
  end

  def fetch_patient_observations(patient_id, code:)
    uri = URI("#{@base_url}/Observation?patient=#{patient_id}&code=#{code}")
    response = Net::HTTP.get_response(uri, @headers)
    bundle = JSON.parse(response.body)

    bundle['entry']&.map do |entry|
      obs = entry['resource']
      {
        date: obs['effectiveDateTime'],
        value: obs.dig('valueQuantity', 'value'),
        unit: obs.dig('valueQuantity', 'unit')
      }
    end || []
  end
end

# Pull a patient's blood glucose readings for model input
client = FHIRClient.new('https://hospital-fhir.example.com/r4', token)
readings = client.fetch_patient_observations('patient-123', code: '2339-0')

De-identification. Before any data touches a model, Protected Health Information (PHI) must be stripped. Names, dates, medical record numbers, and even rare diagnoses that could identify someone. Tools like Microsoft's Presidio or custom NER models handle this, but it requires validation -- automated de-identification isn't perfect.

Feature engineering from EHR data. Raw EHR data is messy. Lab results come at irregular intervals, medications change, diagnoses accumulate over time. You need temporal aggregation: rolling averages of vitals, time since last abnormal lab, medication change frequency. This is where domain expertise matters more than model choice.

Predictive Analytics

Can we predict who's likely to get sick before symptoms appear? Sometimes.

Hospital readmission prediction is one area that works reasonably well. Similar prediction techniques apply in education, such as predicting student exam outcomes with machine learning. Models can identify patients at high risk of returning to the hospital within 30 days. This helps with resource planning and follow-up care. Epic's sepsis prediction model and the LACE index are real examples deployed in hundreds of hospitals, though their actual clinical impact is debated.

Disease outbreak prediction is more experimental. It works for some conditions with clear patterns. For others, the models are no better than traditional epidemiology.

Regulatory and Ethical Concerns

This is where many ML healthcare projects die -- not from technical failure, but from regulatory reality.

FDA clearance. In the US, clinical ML models are regulated as Software as a Medical Device (SaMD). The FDA has cleared over 500 AI/ML-enabled devices as of 2024, mostly in radiology. Getting clearance requires extensive clinical validation, documented training data provenance, and a plan for monitoring model drift in production. The FDA's "predetermined change control plan" framework lets you update models post-deployment, but only within pre-approved boundaries.

Bias and fairness. If your training data comes mostly from one demographic, your model will perform poorly on others. This isn't theoretical. It's already happened with dermatology models that performed worse on darker skin tones, and with pulse oximeters that overestimate oxygen levels in Black patients. Testing for disparate performance across demographic groups isn't optional -- it's a clinical safety requirement.

Explainability. Clinicians won't trust a black box. SHAP values and attention maps help, but they're approximations. For high-stakes decisions (should this patient go to the ICU?), simpler models that clinicians can reason about often win over marginally more accurate deep learning.

Then there's the integration problem. Even if you build a great model, getting it into clinical workflows is hard. Doctors are busy. New tools need to fit into existing EHR systems, fire alerts at the right time, and not add clicks to an already overloaded workflow.

What's Coming

Drug discovery is probably the most promising frontier. ML can screen millions of potential compounds faster than traditional methods. AlphaFold's protein structure predictions have already accelerated target identification. A few drugs developed with ML assistance are in clinical trials, and Insilico Medicine's ISM001-055 (an AI-discovered drug for idiopathic pulmonary fibrosis) reached Phase II trials.

Administrative automation is less exciting but more immediately practical. Scheduling, billing, documentation -- these are areas where ML can reduce the paperwork burden on healthcare workers. Ambient clinical documentation (using LLMs to generate clinical notes from doctor-patient conversations) is one of the fastest-adopted AI tools in medicine right now.

Continuous monitoring through wearables is another growth area. Devices that track heart rate, blood oxygen, and other vitals can feed into models that detect early warning signs. Apple Watch's irregular heart rhythm notifications have already sent thousands of people to cardiologists who wouldn't have gone otherwise.

The Bottom Line

ML in healthcare is useful but overhyped. The most successful applications are narrow and specific. They augment human decision-making rather than replace it.

If you're working in this space, focus on problems with clear success metrics and good data availability. Our practical guide to integrating machine learning with Ruby shows how to get started with ML in a language many healthcare startups already use. Avoid the temptation to build something general-purpose. Start small. Prove value. Then expand.

The technology will keep improving. The harder problems are organizational: data access, integration, trust, and regulation. Those take longer to solve than the algorithms. And if you're a developer entering this space, invest time understanding HIPAA, FHIR, and clinical workflows before you write a single line of model code. The best model in the world is useless if it can't legally access data or fit into a clinician's day.

Where ML Helps With Diagnosis

Specific ML Models Used in Healthcare

Personalized Treatment Plans

Data Pipeline Considerations

Predictive Analytics

Regulatory and Ethical Concerns

What's Coming

The Bottom Line

Share this article

Related Articles

Fine-Tuning LLMs with QLoRA: Run a 7B Model on a Single GPU

Building RAG from Scratch: A Python Guide with LangChain

AI in Medical Marketing: Simulation Models That Work