This technology introduces a probabilistic framework that clusters complex, interconnected data by modeling relationships across multiple object types. It enables more accurate pattern discovery by capturing both individual attributes and interactions simultaneously. The approach unifies multiple clustering methods into a single flexible system, supporting scalable analysis of relational datasets with improved accuracy and structural preservation.
Background:
Traditional clustering techniques assume independent data points represented as flat feature vectors, making them ineffective for real-world datasets that include multiple object types and interdependencies. These methods fail to preserve relational structure, cannot model influence propagation across entities, and do not capture interaction patterns between different data types. As data becomes increasingly interconnected across domains such as social networks, biological systems, and web data, there is a need for clustering approaches that can incorporate relationships and dependencies while maintaining computational efficiency and scalability.
Technology Overview:
The invention is a mixed membership relational clustering (MMRC) model that represents multi-type relational data using attribute, homogeneous, and heterogeneous relationship matrices. Each object is assigned probabilistic memberships across latent clusters, and relationships are modeled through parameterized distributions within an exponential family framework. The system uses an expectation-maximization algorithm with Gibbs sampling to iteratively estimate cluster memberships and interaction parameters. This approach enables both soft probabilistic and hard clustering across diverse relational structures while preserving interactions among entities. The unified framework supports flexible modeling of complex datasets and allows scalable clustering with computational complexity comparable to k-means.
Advantages: • Unifies multiple clustering paradigms including graph clustering co-clustering and semi-supervised clustering • Captures complex interdependencies across heterogeneous data types • Supports both soft probabilistic and hard clustering approaches • Adapts to various data distributions through exponential family modeling • Preserves relational structure without flattening data • Enables discovery of interaction patterns between clusters • Scales to large multi-relational datasets with comparable complexity to k-means
Applications: • Web and search engine data mining • Bioinformatics and protein interaction analysis • Social network and graph analytics • Recommendation systems and marketing analytics • Scientific publication and citation clustering • Multi-modal data analysis
Intellectual Property Summary: • United States 8,285,719 Issued 10/9/2012 • United States 8,676,805 Issued 3/18/2014 • United States 8,996,528 Issued 3/31/2015 • United States 9,372,915 Issued 6/21/2016 • United States 9,984,147 Issued 5/29/2018
Stage of Development: Evaluated on multiple real-world datasets including text corpora (20 Newsgroups, TREC, WebACE) and relational networks.
Licensing Status: This technology is available for licensing.
Licensing Potential: Strong potential for adoption by AI and data analytics companies, search engine developers, and organizations working with complex relational datasets seeking advanced clustering solutions that preserve structure and improve pattern discovery.
Additional Information: Validation across multiple benchmark datasets; additional modeling and performance details available upon request.
Inventors: Bo Long, Zhongfei Zhang