This technology improves how multiple clustering results are combined into a single, more accurate outcome using a soft correspondence approach. It captures relationships between clusters more effectively than traditional methods, resulting in more robust, stable, and interpretable clustering. The framework enhances performance across diverse datasets, including distributed and privacy-constrained environments.
Background: Combining multiple clustering results into a single consensus output is challenging due to the lack of explicit correspondence between clusters. Traditional approaches often assume direct one-to-one mappings or rely on indirect transformations that reduce interpretability. These methods can also introduce high computational cost and limited robustness when dealing with heterogeneous or distributed datasets. As clustering is increasingly applied in large-scale and privacy-sensitive environments, there is a need for more flexible and efficient techniques that can accurately align and integrate multiple clustering outputs without restrictive assumptions.
Technology Overview: The invention introduces a soft correspondence framework that models relationships between clusters using weighted correspondence matrices rather than one-to-one mappings. It defines an optimization problem based on minimizing the distance between transformed and target membership matrices. An iterative algorithm, using multiplicative updating rules, jointly computes consensus clustering and correspondence matrices. This approach enables more accurate alignment across clusterings while accommodating varying cluster structures and incomplete labeling. By directly modeling inter-cluster relationships, the framework improves robustness and interpretability while maintaining computational efficiency across diverse datasets.
Advantages: • Directly addresses the correspondence problem without restrictive assumptions • Improves clustering robustness and stability across diverse inputs • Reduces computational complexity compared to co-association methods • Provides interpretable correspondence matrices between clusterings • Handles missing labels effectively without data loss • Supports distributed and privacy-preserving data scenarios • Scales efficiently to large datasets
Applications: • Data mining and knowledge discovery • Distributed and federated learning systems • Privacy-preserving analytics • Bioinformatics and genomic data clustering • Document and web content clustering • Image and pattern recognition • Sensor data fusion
Intellectual Property Summary: • United States 8,195,73 Issued 6/5/2012 • United States 8,499,022 Issued 7/30/2013
Stage of Development: Evaluated on three real-world benchmark datasets (IRIS, PENDIG, ISOLET) with varying sizes and feature types; compared against four clustering ensemble methods (CSPA, MCLA, QMI, MMEC).
Licensing Status: This technology is available for licensing.
Licensing Potential: Strong potential for adoption by data analytics providers, AI developers, and organizations working with distributed or privacy-sensitive datasets seeking robust and interpretable clustering ensemble solutions.
Additional Information: Benchmark evaluation results and algorithmic implementation details available upon request.
Inventors: Bo Long, Zhongfei Zhang
Alternate NCS Title: