Highlight
- Most FDA-authorized AI/ML devices for Alzheimer’s disease and related dementias (ADRD) lack transparent demographic data on training and validation cohorts.
- The majority of devices focus on volumetric brain imaging; nearly all are for clinician use only.
- Incomplete demographic reporting may perpetuate algorithmic bias and limit equitable application in underrepresented populations.
- Recent FDA guidance moves toward mandating demographic data disclosure or justification, aiming to reduce bias and enhance clinical utility.
Background
Alzheimer’s disease (AD) and related dementias (ADRD) represent a significant and growing public health challenge, with an estimated 6.7 million Americans living with AD in 2024 and projections of substantial increases as the population ages. Early diagnosis and disease monitoring are critical, as timely intervention can improve quality of life and optimize care planning. Artificial intelligence (AI) and machine learning (ML)–enabled medical devices have shown promise in enhancing diagnostic accuracy, automating volumetric analysis, and enabling continuous symptom monitoring for patients with ADRD.
However, the real-world utility of these devices depends on their performance across diverse patient groups. Demographic diversity in the datasets used to train and validate AI/ML algorithms is crucial to avoid algorithmic bias—systematic underperformance in underrepresented populations—which can exacerbate health disparities. Regulatory agencies, including the US Food and Drug Administration (FDA), have recognized the need for transparency in demographic data to ensure AI/ML devices deliver equitable care.
Study Overview and Methodological Design
Chen et al. (JAMA, 2025) conducted a cross-sectional review of FDA-authorized AI/ML-based devices for ADRD between January 2015 and December 2024. Devices were identified via the FDA’s AI/ML-Enabled Medical Device List, 510(k) and De Novo databases, and supplementary online searches. For devices with multiple authorizations, the first (or, in one case, the second for comparator relevance) was selected for analysis.
The investigators reviewed FDA approval summaries, peer-reviewed publications, and company websites to extract details about device training and validation datasets, focusing on study design and demographic composition: disease status, age, sex, race, and ethnicity. Descriptive statistics summarized the findings. Importantly, the study followed STROBE reporting guidelines and was exempt from IRB review, as it did not involve human subjects.
Key Findings
A total of 24 FDA-authorized AI/ML-based devices for ADRD were identified:
- 22 (91.7%) cleared via 510(k) pathway; 2 (8.3%) via De Novo classification.
- 21 (87.5%) reviewed by radiology panels; 3 (12.5%) by neurology panels.
- All intended for clinician use, not direct-to-patient.
- Indications: 20 (83.3%) focused on volumetric quantification of brain structures (e.g., hippocampal volume), 3 (12.5%) on functional cognitive testing, and 2 (8.3%) on amyloid quantification.
Demographic transparency was notably limited:
- Disease status, age, and sex were reported for fewer than half of devices.
- Race and ethnicity data were rarely disclosed.
- This lack of transparency impedes evaluation of real-world generalizability and accuracy, particularly for minoritized or historically underrepresented patient populations.
Characteristic | Devices, No. (%) |
---|---|
Clearance year | |
2015-2018 | 7 (29.2) |
2019-2021 | 7 (29.2) |
2022-2024 | 10 (41.7) |
Time to clearance or approval, median, d | 228 |
Devices with multiple clearances or approvals | 6 (25.0) |
Median clearances or approvals for devices | 3 |
Approval pathway | |
510(k) | 22 (91.7) |
De Novo | 2 (8.3) |
Review panel | |
Radiology | 21 (87.5) |
Neurology | 3 (12.5) |
Indicationsa | |
Volumetric quantification of brain structures using MRI | 20 (83.3) |
Functional testing for memory, learning, or visuospatial awareness | 3 (12.5) |
Quantification of amyloid deposition using PET | 2 (8.3) |
No information on training or validation datasets was available for 12 devices each in FDA summaries or peer-reviewed articles. Training data were reported for 10 (41.7%) devices in FDA summaries and 5 (20.8%) devices in peer-reviewed articles, of which 3 and 5 reported patient disease status, 2 and 4 age, 4 and 5 sex, and 0 and 1 race and ethnicity, respectively (Table 2). Validation data were reported for 2 (8.3%) devices in FDA summaries and 10 (41.7%) devices in peer-reviewed articles, of which 1 and 9 reported patient disease status, 0 and 8 age, 1 and 7 sex, and 1 and 0 race and ethnicity, respectively. For 23 devices with incomplete reporting across the domains (disease status, age, sex, race and ethnicity), no justification was provided.
FDA approval summariesa | Peer-reviewed articlesa | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Training/testb (n = 10) | External validationc (n = 2) | Training/testb (n = 7) | External validationc (n = 16) | |||||||
Study design | ||||||||||
Devices represented, No. | 10 | 2 | 5 | 10 | ||||||
Sample size, mean (range)d | 825.5 (101.0-3729.0) | 299.5 (198.0-401.0) | 992.4 (60.0-3959.0) | 231.5 (40.0-820.0) | ||||||
Study site(s), No. (%) | ||||||||||
Single center | 0 | 0 | 1 (14.3) | 4 (25.0) | ||||||
Multicenter | 5 (50) | 2 (100) | 6 (85.7) | 12 (75.0) | ||||||
Unknown | 5 (50) | 0 | 0 | 0 | ||||||
Study recruitment | ||||||||||
Retrospective | 5 (50) | 1 (50) | 7 (100) | 14 (87.5) | ||||||
Prospective | 0 | 1 (50) | 0 | 2 (12.5) | ||||||
Unknown | 5 (50) | 0 | 0 | 0 | ||||||
Data quality | ||||||||||
Devices, No. | Aggregated proportione | Devices, No. | Aggregated proportione | Devices, No. | Articles, No. | Aggregated proportione | Devices, No. | Articles, No. | Aggregated proportione | |
Patient disease status | ||||||||||
Healthy controls, mean | 3 | 29.1 | 1 | 0 | 5 | 6 | 37.4 | 9 | 15 | 31.8 |
Individuals with ADRD, mean | 3 | 40.2 | 1 | 100 | 5 | 6 | 39.4 | 9 | 15 | 65.8 |
Individuals with pathology not including ADRD, mean | 3 | 30.7 | 1 | 0 | 5 | 6 | 10.6 | 9 | 15 | 2.5 |
Patient demographics | ||||||||||
Age, mean, y | 2 | 56.2 | 0 | NA | 4 | 6 | 64.5 | 8 | 14 | 71.7 |
Sex | ||||||||||
Female | 4 | 53.7 | 1 | 44.9 | 5 | 6 | 51.7 | 7 | 13 | 52.5 |
Male | 4 | 46.3 | 1 | 55.1 | 5 | 6 | 48.3 | 7 | 13 | 47.5 |
Education, mean, y | 0 | NA | 0 | NA | 1 | 1 | 10.1 | 4 | 5 | 13.7 |
Race and ethnicityf | ||||||||||
Asian | 0 | NA | 1 | 3.5 | 1 | 1 | 60.6 | 0 | 0 | NA |
Black or African American | 0 | NA | 1 | 3.5 | 1 | 1 | 0 | 0 | 0 | NA |
Hispanic or Latinx | 0 | NA | 1 | 0 | 1 | 1 | 0 | 0 | 0 | NA |
White or Caucasian | 0 | NA | 1 | 89.9 | 1 | 1 | 39.4 | 0 | 0 | NA |
Clinical Implications
The limited demographic transparency observed raises concerns about the equitable deployment of AI/ML devices in routine care. For example, if an AI-based volumetric MRI tool is trained predominantly on data from older, White adults, its accuracy in younger, non-White, or female patients may be untested or suboptimal. This could lead to underdiagnosis or misdiagnosis, especially in populations already facing healthcare inequities.
For clinicians, this highlights the need for critical appraisal of device validation studies before integrating AI/ML tools into diagnostic workflows. Healthcare systems and payers should be cautious about widespread adoption without assurances of demographic representativeness and real-world performance.
Limitations and Controversies
The study by Chen et al. is constrained by incomplete and inconsistent data reporting within FDA device summaries and the scientific literature. Some devices may lack publicly available validation data altogether, and devices not included in the FDA’s AI/ML device list or missed in literature searches could be omitted. Furthermore, because manufacturers are not uniformly required to disclose demographic data, the true representativeness of training datasets remains largely unknown.
This opacity perpetuates controversy in the field: while AI/ML tools hold promise for improving ADRD care, their unchecked deployment could exacerbate disparities. The FDA’s evolving guidance may help, but enforcement and standardization remain works in progress.
Expert Commentary or Guideline Positioning
Leading professional societies, such as the American Academy of Neurology and the Alzheimer’s Association, increasingly call for transparency in algorithm development and validation, including public reporting of demographic data. The FDA’s 2021 guiding principles and the 2025 draft guidance represent important steps toward regulatory oversight, but until demographic representativeness is required—and validated—concerns about bias and inequity will persist.
Conclusion
AI/ML-enabled devices for Alzheimer’s disease and related dementias are proliferating, with potential to transform early diagnosis and disease management. However, the lack of demographic transparency in supporting datasets challenges their equitable and effective use. Improved regulatory requirements for demographic reporting, combined with ongoing independent validation and post-market surveillance, are needed to ensure these technologies benefit all patient populations equitably.
References
1. Chen KY, Ross JS, Cohen AB, Karlawish J, Oh ES, Gupta R. Demographic Data Supporting FDA Authorization of AI Devices for Alzheimer Disease and Related Dementias. JAMA. 2025 Jul 30. doi: 10.1001/jama.2025.12779 IF: 55.0 Q1 .2. U.S. Food and Drug Administration. Artificial Intelligence and Machine Learning in Software as a Medical Device. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device
3. Parikh RB, Obermeyer Z, Navathe AS. Regulation of predictive analytics in medicine. Science. 2019;363(6429):810-812. doi:10.1126/science.aaw0029 IF: 45.8 Q1 4. U.S. Food and Drug Administration. Good Machine Learning Practice for Medical Device Development: Guiding Principles. 2021.
5. U.S. Food and Drug Administration. Draft Guidance: Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence/Machine Learning-Enabled Device Software Functions. 2025.