Scalable Data Clustering with Partial Affinity Propagation on MapReduce

PPAGE TITLE

Overview

 

PAGE SUMMARY

Data clustering is an invaluable process in modern information systems enabling a range of applications including query expansion, search result organization, and interactive searching in exploratory settings.  However, major challenges associated with clustering efficiency and scalability have hindered broad adoption. Computational complexity is a particular challenge when applying clustering techniques on the web because of the web’s scale.  In recent years a technique called Affinity Propagation (AP) has grown in popularity.  AP improves error rates and clustering speed relative to previous solutions such as k-centers clustering, however at web-scale AP continues to suffer efficiency problems.  To address the inefficiencies of existing clustering techniques a team of Drexel researchers, led by Dr. Weimao Ke, have developed a modified AP algorithm called Pruned AP.  Pruned AP incorporates a technique for eliminating weak associations in the AP similarity matrix that leads to a reduction in the computational complexity of AP from O(N3) to O(N).  This significantly reduces the time for data processing and network communication with little impact on clustering quality. In addition, the Drexel team has developed a distributed implementation of both AP and Pruned AP for the Hadoop MapReduce platform, resulting in even greater performance gains.

 

APPLICATIONS

TITLE: Applications

 

Information Retrieval

Query expansion

Search result organization

Interactive search in exploratory settings

 

ADVANTAGES

TITLE:Advantages

 

Faster clustering speed 

Comparable clustering quality

Leverages additional Hadooop MapReduce performance gains

 

 

FIGURES: Insert Figure Image Inside Figure Tags within Editor

Figure 1

 

 

 

 

PUBLICATIONS

References

 

Pubinfo should be the citation for your publication. Publink is the full url linking to the publication online or a pdf.

Ke, Weimao, Xiaoli Song, Sheik Hassan, and Xuemei Gong. "Scalable Text Clustering with Partial Affinity Propagation on MapReduce." In ACM WSDM 2015 Workshop on Scalable Data Analytics: Theory and Applications (SDATA'15), 1-9. Shanghai, China, 2015.

http://lincs.ischool.drexel.edu/?q=node/87

 

 

 

Commercialization Opportunities

 

----------------------------------------------

Contact Information     

 

 

Robert B. McGrath, Ph.D.

Senior Associate Vice Provost

Office of Technology Commercialization

Drexel University

3180 Chestnut Street, Ste. 104

The Left Bank

Philadelphia, PA 19104

Phone: 215-895-0303

E-mail: RBM26@drexel.edu

 

 

For Technical Information:

 

Weimao Ke, Ph.D.

Assistant Professor

College of Computing and Informatics

3141 Chestnut Street

Philadelphia, PA 19194

Phone: (215) 895 5912

Email: wk@drexel.edu

Web: http://drexel.edu/cci/contact/Faculty/Ke-Weimao/

 

 

 

 

Patent Information: