A Bernoulli Process Topic (BPT) model that enables knowledge discovery from citation networks to improve data mining tasks
Background:
Knowledge discovery from citation networks (i.e., textual data with links such as scientific articles, legal documents, webpages, and emails) provides insight into vast areas since huge repositories are made available by internet and digital databases. Digital libraries allow for the organization of an expansive amount of publications in a structured way in order to extract information of a user’s interest. However, Unsupervised Learning from documents is an issue in machine learning, which aims at modeling and understanding the topics of documents and providing a meaningful description of the documents while preserving the basic statistical information about the corpus. For example, in a corpus of scientific articles (i.e., a digital library), documents are connected by citations, and one document plays two different roles in the corpus: document itself and a citation of other documents.
Technology Overview:
This present technology provides a Bernoulli Process Topic (BPT) model which models the corpus at two levels: document level and citation level. Each document has two different representations in the latent topic space associated with its roles. Moreover, the multilevel hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The comparisons against other methods demonstrate a very promising performance.
http://binghamton.technologypublisher.com/files/sites/pexels-photo-10894383.jpeg
https://www.pexels.com/photo/battery-black-cable-charger-518530/
Advantages:
Intellectual Property Summary: