Phenotyping Preeclampsia Using Unsupervised Machine Learning: A Prospective Cohort Study

Study Title

Phenotyping Preeclampsia Using Unsupervised Machine Learning: A Prospective Cohort Study

Background

Preeclampsia is a pregnancy-specific hypertensive disorder that can affect both mother and baby in serious ways. It is usually defined by new-onset high blood pressure after 20 weeks of gestation, often together with signs of organ involvement such as protein in the urine, liver or kidney dysfunction, low platelet count, or evidence of placental problems. Although preeclampsia is diagnosed using standard criteria, it is not a single uniform disease. Some women develop it earlier in pregnancy with severe placental dysfunction and fetal growth restriction, while others present later and may have different underlying risk factors such as obesity or diabetes.

Because of this clinical diversity, researchers have been interested in whether preeclampsia can be divided into distinct phenotypes, or subgroups, that better reflect biology and clinical outcomes. If such phenotypes can be identified reliably, they may help clinicians predict risk earlier, guide monitoring, and personalize management.

Objective

The aim of this prospective cohort study was to explore clinically meaningful phenotypes of preeclampsia using unsupervised machine learning. Unlike traditional statistical methods that test a pre-defined hypothesis, unsupervised learning searches for patterns in the data on its own, making it useful for discovering hidden clusters of patients with similar characteristics.

Study Design and Setting

This was a prospective cohort study conducted at BCNatal, a tertiary maternal-fetal medicine center in Barcelona, Spain. The investigators followed pregnant women diagnosed with preeclampsia between August 2013 and April 2024.

A total of 482 women were included. For each patient, the researchers prospectively collected maternal demographic information, clinical data, ultrasound findings, laboratory measurements, delivery details, and maternal and neonatal complications.

Methods

To build a patient representation, the study used a combination of variables that are clinically relevant in preeclampsia. These included maternal age, height, weight, body mass index, blood pressure, angiogenic factors, urinary albumin/creatinine ratio, gestational age at birth, and birthweight centile.

Angiogenic factors are substances involved in blood vessel growth and placental function. In preeclampsia, an imbalance between pro-angiogenic and anti-angiogenic signals is often seen and can reflect placental disease. The urinary albumin/creatinine ratio is a marker of kidney involvement, while birthweight centile helps identify fetal growth restriction or small size for gestational age.

The researchers used Uniform Manifold Approximation and Projection, commonly known as UMAP, to reduce the complexity of the data and visualize patterns. They then applied k-means clustering, a machine-learning method that groups patients with similar profiles together, to identify phenotypes.

Main Outcomes

The main outcome was to compare maternal and neonatal characteristics, as well as complication rates, across the clusters. This allowed the team to evaluate whether the data-driven groups had real clinical meaning rather than being purely statistical patterns.

Results

Three phenotypes of preeclampsia were identified.

Cluster A included 223 women, representing 46.2% of the cohort. This group showed the earliest delivery, with a mean gestational age at birth of 33.1 ± 3.3 weeks. It also had the most marked angiogenic imbalance, frequent fetal growth restriction in 64% of cases, and the highest complication rates for both mothers and newborns. Maternal complications occurred in 23% of this cluster, while neonatal complications affected 41%.

Cluster B included 147 women, or 30.5% of the cohort. Delivery occurred later than in Cluster A, at a mean gestational age of 37.0 ± 2.1 weeks. These women showed a moderate angiogenic imbalance and intermediate birthweight centiles, with an average centile of 15.5 ± 9.7. This group appeared to have an intermediate clinical severity between the other two clusters.

Cluster C included 112 women, accounting for 23.2% of the cohort. This phenotype consisted mostly of term cases, with delivery at a mean gestational age of 38.2 ± 1.5 weeks. It had the lowest angiogenic imbalance and the highest birthweight centile, averaging 67.9 ± 27.3. Obesity was common in this group, affecting 31% of patients, and diabetes was present in 15%. Despite these maternal risk factors, this cluster had the lowest complication burden, with maternal complications in 4% and neonatal complications in 9%.

Interpretation

The three identified phenotypes suggest that preeclampsia is not one single disorder but rather a spectrum of conditions with different maternal profiles, placental biology, timing of onset, and perinatal outcomes.

Cluster A appears consistent with a more severe, early-onset placental phenotype. The combination of strong angiogenic imbalance, fetal growth restriction, and earlier delivery suggests major placental dysfunction. This group had the highest maternal and neonatal morbidity, reinforcing its high-risk nature.

Cluster B may represent an intermediate phenotype, with later delivery and moderate abnormalities. Its position between the most severe and the mildest cluster suggests it may reflect a mixed clinical pattern or a less extreme form of placental disease.

Cluster C appears to be a later-onset phenotype with comparatively preserved placental function and better fetal growth. However, it is notable that obesity and diabetes were more common here, which supports the idea that maternal metabolic factors may contribute more strongly in this subgroup than severe early placental dysfunction.

Clinical Relevance

These findings are important because clinicians often manage preeclampsia according to broad diagnostic criteria, even though patient risk varies widely. A more refined phenotyping approach could eventually help with several aspects of care:

Earlier risk stratification: Identifying women likely to follow a severe early-onset pattern may support closer surveillance and timely referral.

Personalized monitoring: Patients with different phenotypes may benefit from different follow-up schedules, laboratory testing, ultrasound surveillance, and timing of delivery.

Research into mechanism: Separating preeclampsia into biologically meaningful groups may improve future studies on placental dysfunction, cardiovascular risk, and metabolic contributions.

Treatment development: Better phenotyping may help target therapies or prevention strategies to the women most likely to benefit.

Strengths and Limitations

A major strength of this study is its prospective design, which reduces missing data and allows systematic collection of clinical variables. The use of machine learning also enabled the discovery of patterns that might not be obvious through traditional analysis.

However, there are important limitations. The study was conducted at a single tertiary center, which may limit generalizability to other populations and care settings. The clustering methods used are data-driven and depend on the chosen variables, so different datasets or different features might produce somewhat different groupings. In addition, while the identified phenotypes were clinically plausible, they require external validation in independent cohorts before they can be used in routine practice.

It is also important to note that machine learning does not replace clinical judgment. Rather, it can complement existing knowledge by revealing patterns that support more nuanced decision-making.

Conclusion

This prospective cohort study used unsupervised machine learning to identify three clinically meaningful phenotypes of preeclampsia. The clusters differed in gestational age at delivery, angiogenic imbalance, fetal growth, maternal risk factors, and complication rates. The findings support the idea that preeclampsia is a heterogeneous condition and may benefit from future risk stratification based on phenotype.

External validation in larger, diverse populations is needed before these findings can be translated into routine clinical care. Still, this study represents an important step toward more personalized management of preeclampsia.

Reference

Houri O, Youssef L, Crovetto F, Borrell M, Crimella M, Ferrante MG, Novoa RH, Casas I, Encabo N, Benitez L, Larroya M, Peguero A, Meler E, Castro-Barquero S, Bijnens B, Figueras F, Gratacos E, Bernardino G, Crispi F. Phenotyping Preeclampsia Using Unsupervised Machine Learning: A Prospective Cohort Study. BJOG: An International Journal of Obstetrics and Gynaecology. 2026-05-14. PMID: 42136148.

URL: https://pubmed.ncbi.nlm.nih.gov/42136148/

Phenotyping Preeclampsia Using Unsupervised Machine Learning: A Prospective Cohort Study

Study Title

Background

Objective

Study Design and Setting

Methods

Main Outcomes

Results

Interpretation

Clinical Relevance

Strengths and Limitations

Conclusion

Reference

Comments

Leave a Reply Cancel reply