Summary:
UCLA researchers in the Department of Mathematics have developed a technology to improve Artificial Intelligence pattern recognition for image segmentation, bio-chemical reactions and security applications.
Background:
Recently, there has been an increase in the quantity and complexity of generated data. Businesses that deal with large or complex datasets often struggle to interpret that data in meaningful ways. One new approach is to use graph theory to process the data. The idea behind graph theory is that each object in the data can be represented as a node. Each node can be connected to other nodes and the strength of their connection can be weighted to represent the connection of multiple objects within the complex dataset. Together this creates a graph that can be visualized and inspected for patterns. These patterns can be represented multiple times within the data. The attempt to find these patterns is known as subgraph matching.
While the human mind is good at pattern recognition, we cannot scale this ability to process massive graphs of complex data. While there are computational approaches for finding subgraphs, they are computationally expensive, inaccurate and miss patterns. Some machine learning approaches have been proposed although they require a large, annotated dataset for training. Active learning is an approach in which a human will generate a smaller training dataset. The system will then train a machine learning model and attempt to process unlabeled data. The human will then curate the attempt and correct the incorrect attempts. Together the human and the machine can generate a large amount of training data and train an accurate machine learning model in a shorter amount of time than it would take for a human to manually generate all of the training data. These active learning strategies have yet to be incorporated into the subgraph matching problem despite the clear and pressing need.
Innovation:
Researchers at UCLA have developed a system that combines active learning with machine learning to efficiently solve the subgraph matching problem. This system used machine learning to determine nodes that will most likely reduce the solution space and allow humans to supply additional information about these nodes to sort through large data more efficiently. This reduces the need for large, annotated training sets that are conventionally used in machine learning algorithms. It also will allow for more accurate analysis of the vast complex data that exists and continues to grow. This system can be applied in a variety of solutions generating large datasets, including image segmentation processes, bio-chemical reactions, and security applications.
Publication: Yurun Ge, Dominic Yang, Andrea L. Bertozzi, Iterative active learning strategies for subgraph matching, Pattern Recognition, 2024, 110797, ISSN 0031-3203, https://doi.org/10.1016/j.patcog.2024.110797.
Potential Applications:
Advantages:
Development to Date:
A successful prototype of the system has been demonstrated.