PPAGE TITLE
Overview
PAGE SUMMARY
Data clustering is an invaluable process in modern information systems enabling a range of applications including query expansion, search result organization, and interactive searching in exploratory settings. However, major challenges associated with clustering efficiency and scalability have hindered broad adoption. Computational complexity is a particular challenge when applying clustering techniques on the web because of the web’s scale. In recent years a technique called Affinity Propagation (AP) has grown in popularity. AP improves error rates and clustering speed relative to previous solutions such as k-centers clustering, however at web-scale AP continues to suffer efficiency problems. To address the inefficiencies of existing clustering techniques a team of Drexel researchers, led by Dr. Weimao Ke, have developed a modified AP algorithm called Pruned AP. Pruned AP incorporates a technique for eliminating weak associations in the AP similarity matrix that leads to a reduction in the computational complexity of AP from O(N3) to O(N). This significantly reduces the time for data processing and network communication with little impact on clustering quality. In addition, the Drexel team has developed a distributed implementation of both AP and Pruned AP for the Hadoop MapReduce platform, resulting in even greater performance gains.
APPLICATIONS
TITLE: Applications
Information Retrieval
Query expansion
Search result organization
Interactive search in exploratory settings
ADVANTAGES
TITLE:Advantages
Faster clustering speed
Comparable clustering quality
Leverages additional Hadooop MapReduce performance gains
FIGURES: Insert Figure Image Inside Figure Tags within Editor
Figure 1
PUBLICATIONS
References
Pubinfo should be the citation for your publication. Publink is the full url linking to the publication online or a pdf.
Ke, Weimao, Xiaoli Song, Sheik Hassan, and Xuemei Gong. "Scalable Text Clustering with Partial Affinity Propagation on MapReduce." In ACM WSDM 2015 Workshop on Scalable Data Analytics: Theory and Applications (SDATA'15), 1-9. Shanghai, China, 2015.
http://lincs.ischool.drexel.edu/?q=node/87
Commercialization Opportunities
----------------------------------------------
Contact Information
Robert B. McGrath, Ph.D.
Senior Associate Vice Provost
Office of Technology Commercialization
Drexel University
3180 Chestnut Street, Ste. 104
The Left Bank
Philadelphia, PA 19104
Phone: 215-895-0303
E-mail: RBM26@drexel.edu
For Technical Information:
Weimao Ke, Ph.D.
Assistant Professor
College of Computing and Informatics
3141 Chestnut Street
Philadelphia, PA 19194
Phone: (215) 895 5912
Email: wk@drexel.edu
Web: http://drexel.edu/cci/contact/Faculty/Ke-Weimao/