Multi-ancestry Polygenic Risk Scores Substantially Improve Type 2 Diabetes Risk Prediction Across Diverse Populations

Article Structure

This article is organized around the clinical problem, study design, principal quantitative findings, implications for complication prediction, and the translational issues that determine whether polygenic risk scores can become useful in practice. Because the paper’s central contribution is methodological as well as clinical, the interpretation emphasizes both predictive performance and equity across ancestry groups.

Highlights

Multi-ancestry polygenic risk scores outperformed single-ancestry scores for type 2 diabetes prediction across European, African or African American, Admixed American, South Asian, and East Asian populations.

The strongest single-ancestry performance remained concentrated in European and East Asian datasets, but integrating data across ancestries improved effect sizes and precision in every group studied.

Individuals in the top 97.5th percentile of the multi-ancestry score had roughly 3-fold to 7-fold higher odds of type 2 diabetes than those in the interquartile range, depending on ancestry.

Beyond diabetes onset, higher polygenic risk was associated with earlier disease onset and with diabetic retinopathy, nephropathy, proliferative retinopathy, and end-stage diabetic nephropathy in several ancestry groups.

Background and Clinical Context

Type 2 diabetes is one of the most important chronic diseases in global medicine. Its burden continues to rise across high-, middle-, and low-income settings, driven by population aging, obesity, sedentary behavior, dietary change, and social determinants of health. Yet conventional clinical risk assessment remains imperfect. Age, body-mass index, family history, blood pressure, lipids, fasting glucose, hemoglobin A1c, and lifestyle variables identify many people at elevated risk, but not all. Some patients present with diabetes relatively early or develop complications despite modest traditional risk profiles.

Polygenic risk scores, which aggregate the small effects of many common genetic variants across the genome, have been proposed as one way to refine risk stratification. In type 2 diabetes, prior studies have shown that polygenic scores can modestly improve prediction beyond clinical factors. The central problem, however, has been portability. Most genome-wide association studies have been heavily weighted toward European populations, so the derived scores tend to perform best in Europeans and less well in populations of African, Admixed American, South Asian, or other ancestries. This is not a trivial statistical issue; it directly affects fairness, generalizability, and clinical usefulness.

The study by Huerta-Chagoya and colleagues addresses that gap by developing and validating multi-ancestry polygenic risk scores for type 2 diabetes using data from five major global ancestry groups. The work is notable not only for its scale, but also for its attempt to move beyond disease onset and test whether genetic risk is associated with microvascular complications and selected comorbidities.

Study Design and Methods

Study Objective

The investigators aimed to construct rigorously tested, publicly available multi-ancestry polygenic risk scores for type 2 diabetes and evaluate their performance across diverse ancestry groups. They also examined whether these scores added predictive information beyond clinical risk factors and whether they were associated with diabetic complications.

Data Sources and Population

The analysis incorporated data from genome-wide association studies across five ancestry groups: European, African or African American, Admixed American, South Asian, and East Asian. In total, the study drew on 409,959 individuals with type 2 diabetes and 1,983,345 controls. Of these, 359,819 cases and 1,825,729 controls contributed to the GWAS dataset, while 10,992 cases and 31,792 controls were included in a training dataset and 39,148 cases and 125,824 controls in a validation dataset.

Validation was conducted in at least four independent cohorts per ancestry, an important design strength because it reduces the risk that performance estimates are idiosyncratic to a single cohort or genotyping platform.

PRS Construction

The authors generated single-ancestry scores using PRS-CS, a Bayesian continuous-shrinkage method that estimates SNP effect sizes from GWAS summary statistics while accounting for linkage disequilibrium. They then constructed multi-ancestry scores using PRS-CSx, which jointly models summary statistics from multiple ancestral groups and leverages shared genetic architecture while preserving ancestry-specific effects. To improve calibration, ancestry-specific linkage disequilibrium reference panels were built to model pairwise SNP correlations during score construction.

Endpoints

The primary endpoint was association between the polygenic risk score and prevalent type 2 diabetes, summarized as the odds ratio per standard deviation increase in the score. The authors also assessed risk in high-score groups, specifically the 90th, 95th, and 97.5th percentiles, compared with individuals in the interquartile range.

Secondary analyses evaluated incident diabetes, incremental predictive value beyond clinical factors, age at onset, and diabetic complications. Complications included diabetic retinopathy, diabetic nephropathy, proliferative diabetic retinopathy, and end-stage diabetic nephropathy. The study also examined selected comorbidity associations, including coronary artery disease.

Key Findings

Single-ancestry Scores Performed Unevenly Across Populations

As expected, single-ancestry PRSs showed their best incremental discrimination in populations with the largest and best-powered GWAS datasets. Incremental area under the curve ranged from 0.07 to 0.14 in European populations and from 0.02 to 0.16 in East Asian populations. Performance was weaker in African or African American populations, with incremental AUCs of 0.02 to 0.03, and similarly modest in Admixed American and South Asian groups, both ranging from 0.02 to 0.04.

This pattern is clinically important because it reinforces a recurring lesson in genomic medicine: predictive performance closely tracks representation in discovery datasets. In other words, when some populations are underrepresented in GWAS, they often receive less accurate downstream tools.

Multi-ancestry PRSs Improved Prediction in Every Ancestry Group

The central result of the paper is that multi-ancestry PRSs outperformed single-ancestry scores across all populations studied. Compared with single-ancestry models, the multi-ancestry scores showed higher effect sizes and narrower confidence intervals, suggesting both stronger association and greater precision.

The odds ratio per standard deviation increase in the multi-ancestry PRS was 1.73 (95% CI 1.67-1.80) in African or African American individuals, 2.82 (2.67-2.97) in Admixed American individuals, 2.45 (2.36-2.54) in East Asian individuals, 2.36 (2.32-2.41) in European individuals, and 2.23 (2.05-2.42) in South Asian individuals.

These are substantial effect sizes for a common polygenic disease. They do not mean that genetics determines destiny, but they do indicate that inherited risk captured by common variants can stratify populations meaningfully, even after long-standing concerns about ancestry portability.

Risk at the Extremes of the Distribution Was Markedly Elevated

When the authors examined people at the high end of the score distribution, the signal became even more clinically tangible. Individuals in the 97.5th percentile had 3-fold to 7-fold higher odds of type 2 diabetes compared with those in the interquartile range. The corresponding odds ratios were 3.43 (95% CI 2.80-4.21) for African or African American individuals, 7.47 (5.64-9.89) for Admixed American individuals, 6.62 (5.58-7.85) for East Asian individuals, 6.25 (5.72-6.82) for European individuals, and 4.50 (2.70-7.53) for South Asian individuals.

For clinicians, percentile-based framing is often easier to interpret than per-standard-deviation effects. These numbers suggest that the score may be most actionable when used to identify a subset of individuals at the extreme tail of inherited susceptibility, particularly if paired with modifiable clinical risk factors.

Prediction Extended Beyond Baseline Clinical Models

The authors report that these PRSs provided additional predictive value beyond clinical factors. The abstract does not provide a full breakdown of all integrated model metrics, but the stated conclusion is that the score improved diabetes incidence prediction whether considered alone or alongside conventional risk variables. This is the minimum requirement for translational relevance: a genomic tool should not merely correlate with disease; it should add information that existing clinical models do not already capture.

At the same time, it is worth remembering that “incremental value” may be statistically significant without always being operationally transformative. Whether a change in discrimination or reclassification is large enough to alter screening intervals, prevention thresholds, or treatment intensity will need prospective evaluation.

Association With Earlier Onset and Microvascular Disease

An especially interesting aspect of the study is the extension from disease susceptibility to disease course. Higher multi-ancestry PRS was associated with earlier onset of type 2 diabetes, consistent with the idea that genetic burden may accelerate progression to clinical disease under environmental pressure.

Among individuals who already had diabetes, the score was also associated with microvascular complications and, more selectively, with cardiovascular comorbidity. In populations of African, Admixed American, and European ancestry, the PRS was significantly associated with diabetic retinopathy, with odds ratios per standard deviation ranging from 1.28 to 1.57; diabetic nephropathy, with odds ratios from 1.25 to 1.58; proliferative diabetic retinopathy, with odds ratios from 1.39 to 2.08; and end-stage diabetic nephropathy, with odds ratios from 1.44 to 1.87.

These findings matter because they suggest the score is not simply tagging glycemic liability in a narrow sense. It may capture broader inherited susceptibility related to duration of hyperglycemia, β-cell biology, insulin resistance, tissue vulnerability, or vascular injury pathways. That said, complication associations can also reflect mediation through earlier onset or longer disease duration rather than an independent genetic pathway to complications per se.

For coronary artery disease, the association was significant only in the Admixed American ancestry group, with an odds ratio of 1.16 (95% CI 1.08-1.25). This more limited finding is plausible because macrovascular disease is influenced by a wider range of shared risk factors, including lipids, blood pressure, smoking, kidney function, and inflammation. A diabetes-specific PRS would not necessarily be expected to predict coronary disease consistently across ancestries unless it overlaps strongly with those pathways.

Clinical Interpretation

Why This Study Is Important

This paper addresses one of the biggest barriers to equitable genomic medicine: the poor transferability of polygenic scores across ancestry groups. By combining data from multiple ancestral populations and using methods specifically designed to leverage cross-population information, the authors demonstrate that performance can be improved substantially without restricting utility to one dominant reference group.

For diabetes prevention, this raises several possible use cases. A validated multi-ancestry PRS could help identify younger adults at high lifetime risk before dysglycemia develops, particularly those whose body-mass index or fasting glucose may not yet trigger intensive intervention. It might also help prioritize earlier screening or more frequent follow-up in people with family history, gestational diabetes history, or borderline metabolic abnormalities. In people with established diabetes, the score may eventually contribute to complication surveillance strategies, though that application is less mature.

What the Study Does Not Yet Prove

Despite its strengths, the study does not establish clinical utility in the strict implementation sense. It shows improved risk prediction and association with complications, but it does not show that using the score changes clinician behavior, improves uptake of preventive therapy, reduces incident diabetes, or narrows disparities. Those questions require prospective clinical trials or pragmatic implementation studies.

The results also need to be interpreted alongside absolute risk. A person’s polygenic score may confer a high relative risk, but treatment decisions depend on baseline risk, competing risks, age, obesity, glucose status, socioeconomic context, and access to preventive care. In other words, a high PRS should probably be viewed as a risk enhancer rather than a stand-alone diagnostic or treatment trigger.

Methodological Strengths

The study’s principal strengths include its very large sample size, inclusion of five major ancestry groups, use of independent validation cohorts, modern Bayesian PRS construction methods, ancestry-specific linkage disequilibrium modeling, and extension to clinically meaningful secondary outcomes. Public availability of the validated scores is another major advantage, since it supports reproducibility and independent benchmarking.

Limitations and Cautions

Several limitations remain. First, performance still likely reflects unequal representation in the underlying GWAS literature, even if the gap has narrowed. Second, ancestry categories are broad and socially as well as genetically heterogeneous; labels such as African or African American and Admixed American can mask substantial internal diversity. Third, the abstract does not provide full calibration results, decision-curve analyses, or detailed net reclassification measures, all of which are relevant for clinical implementation.

Fourth, complication analyses may be influenced by disease duration, treatment exposure, survival bias, and ascertainment differences across cohorts. Fifth, polygenic scores do not capture rare high-impact variants, epigenetic influences, environmental exposures, or structural inequities that shape diabetes risk. Finally, operational issues remain unresolved, including how to report scores to patients, how to combine them with clinical calculators, how to avoid misuse in insurance or employment contexts, and how to ensure equitable access to testing.

Relation to the Broader Evidence Base

The study aligns with a growing body of literature showing that polygenic scores can improve risk stratification for common cardiometabolic disease, while also highlighting that ancestry diversity in discovery datasets is essential for fair performance. Current diabetes prevention guidelines still rely mainly on phenotypic risk factors and glycemic measures rather than genomics. That is appropriate, because preventive recommendations such as lifestyle intervention, weight loss, and treatment of obesity are already indicated for many high-risk individuals without the need for genetic testing.

However, genomics may become useful in selected contexts: earlier-life risk estimation, refinement of screening frequency, enrichment of prevention trials, and identification of individuals who might benefit from more intensive counseling before metabolic deterioration becomes evident. The public availability of a better-performing multi-ancestry PRS makes those next-step studies more feasible.

Practical Takeaways for Clinicians and Health Systems

First, this study provides strong evidence that multi-ancestry PRSs are preferable to single-ancestry scores when the goal is broad clinical applicability.

Second, the score appears most compelling as a stratification tool, especially at the upper percentiles where relative risk is highest.

Third, the association with retinopathy and nephropathy suggests that inherited diabetes susceptibility may carry information about downstream disease burden, although complication-directed use is still exploratory.

Fourth, implementation should be cautious and integrated. The most sensible near-term model is likely a combined framework in which polygenic risk complements age, adiposity, family history, glucose metrics, blood pressure, and social risk factors rather than replacing them.

Conclusion

Huerta-Chagoya and colleagues present one of the clearest demonstrations to date that polygenic risk prediction for type 2 diabetes can be improved across diverse ancestries through multi-ancestry modeling. The reported effect sizes are clinically meaningful, the high-risk tails of the distribution identify individuals with markedly elevated odds of disease, and the observed links with earlier onset and microvascular complications broaden the relevance of the work.

The study does not by itself justify routine clinical deployment, but it substantially advances the field toward that goal. Its most important contribution may be conceptual as much as technical: equitable genomic medicine for common disease is achievable only when prediction tools are built from, and validated in, the diversity of the populations they are meant to serve.

Funding and Registration

Funding: The National Human Genome Research Institute of the US National Institutes of Health.

ClinicalTrials.gov: Not applicable based on the published abstract.

References

Huerta-Chagoya A, Kim J, Mandla R, Lu Y, Suzuki K, Petty LE, Ng HK, Choi J, Lee S, Rout M, Lin K, Taylor K, ENSA Genomics Consortium, Genes & Health Research Team, VA Million Veteran Program, Aguilar-Salinas CA, García-García L, González-Villalpando C, Haiman CA, Kim YJ, Kwak SH, Leong A, Loos RJF, Moreno-Estrada A, Morris AP, Orozco L, Rotter JI, Sanghera D, Tusie-Luna T, Voight BF, Vujkovic M, Walters RG, Ge T, Manning AK, Loh M, Below JE, Sim X, Mercader JM, Ng MCY, D-PRISM Consortium. Multi-ancestry polygenic risk scores for the prediction of type 2 diabetes and complications in diverse ancestries. The Lancet Diabetes & Endocrinology. 2026-04-27. PMID: 42061389.

American Diabetes Association Professional Practice Committee. Standards of Care in Diabetes—2025. Diabetes Care. 2025;48(Suppl 1).

Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics. 2018;50(9):1219-1224.

Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 2019;51(4):584-591.

Wand H, Lambert SA, Tamburro C, Iacocca MA, O’Sullivan JW, Sillari C, Kullo IJ, Rowley R, Dron JS, Brockman D, Wang X, Chan A, Slob EAW, Gandin I, Le Pape S, Foote S, Maturana CJ, Lewis ACF, Vassy JL, Hsu L, Torkamani A, Chatterjee N, de Denus S, McCarthy MI, Loos RJF, Inouye M, Läll K, Murray MF, Abraham G, Thanassoulis G, Hindorff LA, Ritchie MD, Chasman DI, Tada H, Martin AR, Lewis CM, Musunuru K, Polygenic Score Catalog Consortium. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591(7849):211-219.