University of Oregon Researchers: Dejing Dou, Steven Fickas
Patent: 9,442,917 issued 9/13/2016 (UO-13-35)
Detecting Semantic Errors in Text Using Ontology-Based Extraction Rules.
Technology Background/Definition of Problem: Current natural language processing (NLP) tools can detect errors in text documents reflecting incorrect spelling, syntax, and grammar. These types of errors, however, do not relate to the underlying meaning of the subject matter of the text, i.e., they are not semantic errors. These limitations have motivated the development of more sophisticated tools for analyzing natural language documents. One important application of such tools is automatic grading systems for summaries and essays in education. Most existing automated grading systems for student summaries are based on statistical models, such as latent semantic analysis (LSA) which detects statistical word similarity between a teacher's model document and a student's submitted document. If words occur with similar frequencies in the two documents, then the documents are considered to be statistically similar, and the student submission is given a high grade by the system. More specifically, LSA treats each essay as a matrix of word frequencies and applies singular value decomposition (SVD) to the matrix to find an underlying semantic space. Each student essay is represented in that space as a set of vectors. A similarity measure is then computed based on the cosine similarity between the vectors of the student essay and vectors of a model text document. The cosine similarity is then transformed to a grade and assigned to the student essay. Although LSA and other semantic similarity techniques has proven to be very useful, they cannot detect logical errors which reflect a student’s misunderstanding of the proper relationships between the words. Consequently, a student’s essay that is semantically similar to an instructor's model essay but uses the terms in a logically incorrect manner would be inappropriately accorded a high grade. In short, LSA assigns inaccurate grades to student submissions that incorrectly use the correct terminology. In addition, because LSA is a statistical approach that treats each document as a whole, it cannot provide feedback about specific sentences in the document.
Our Technology Solution: University of Oregon researchers have created a system called Ontology-based Information Extraction (OBIE) which offers the same advantages that semantic networks offer, such as the possibility of generating feedback and no need of gold standard summaries. In addition, however, its use of ontologies provides it with more expressive power than semantic networks by allowing the representation of disjointedness and negations, which cannot be done by semantic networks. Ontologies can provide richer representations than semantic networks by offering consistency (e.g., semantic networks have ambiguous interpretations of ISA relationships), and can represent disjointness and negations. By incorporating a heuristic ontology debugging technique into the OBIE system, it can determine axioms that can create logical contradictions in the domain. These axioms are translated into rule-based information extractors that identify incorrect statements. By knowing which statements are inconsistent with respect to the ontology and why (through the inconsistent axioms), it is possible to produce more detailed and accurate feedback.
Thus, in addition to being able to determine whether or not certain information is present in the text or consistent with a semantic network, embodiments of the present invention are able to detect whether or not certain information is true (i.e., correct) or false (i.e., incorrect). This new innovation appears to provide the first system to identify semantic errors in natural language text based on an ontology. It is based in part on the insight that, because the statements of a summary should be entailed from the domain ontology, if a statement of a summary is incorrect, it will be inconsistent with the ontology. So, understanding how ontology inconsistency is managed can lead to mechanisms to identify and extract incorrect summary statements.
One major advantage of the new method that it separately considers sentences in the text, and does not need to consider the document as a whole (all sentences) simultaneously, as is the case for LSA systems. The present method thus divides the text into small pieces based on sentences, facilitating the processing of even extremely large text. The sentences may be processed sequentially, or in parallel if appropriate hardware is available.
Applications: One important application of this new tool is automatic grading systems for summaries and essays in education.