Novel Scalar-on-matrix Regression for Unbalanced Feature Matrices.
Stat Biosci 2025 Mar 05. [Online ahead of print]

Abstract

Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule from that subject's biopsy. Previous scalar-on-matrix regression approaches which can predict scalar outcomes using image feature matrices cannot handle varying numbers of tubules across subjects. We propose the CLUstering Structured laSSO (CLUSSO), a novel scalar-on-matrix regression technique that allows for unbalanced numbers of tubules, to predict scalar outcomes from the image feature matrices. Through classifying tubules into one of two different clusters, CLUSSO averages and weights tubular feature values within-subject and within-cluster to create balanced feature matrices that can then be used with structured lasso regression. We develop the theoretical large tubule sample properties for the error bounds of the feature coefficient estimates. Simulation study results indicate that CLUSSO often achieves a lower false positive rate and higher true positive rate for identifying the image features which truly affect outcomes relative to a naive method that averages feature values across all tubules. Additionally, we find that CLUSSO has lower bias and can predict outcomes with a competitive accuracy to the naïve approach. Finally, we applied CLUSSO to tubular image features from kidney biopsies of glomerular disease subjects from the Nephrotic Syndrome Study Network (NEPTUNE) to predict kidney function and used subjects from the Cure Glomerulonephropathy (CureGN) study as an external validation set.

Authors+Show Affiliations

Rubin J0000-0002-8288-6022Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, 210 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104, USA.
Fan FDepartment of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA.
Barisoni L0000-0003-0848-9683Division of AI and Computational Pathology, Department of Pathology, Duke University, Durham, NC, USA. Division of Nephrology, Department of Medicine, Duke University, Durham, NC, USA.
Janowczyk ARDepartment of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA. Oncology and Pathology Departments, Geneva University Hospitals, Geneva, Switzerland.
Zee J0000-0003-0586-1160Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, 210 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104, USA. The Children's Hospital of Philadelphia Research Institute, Philadelphia, PA, USA.

Pub Type(s)

Journal Article

Language

eng

PubMed ID

40995419