2013-367 METHOD AND SYSTEM FOR PICK-AND-DROP SAMPLING FROM LARGE DATASET

Method and System for Pick-And-Drop Sampling from Large Dataset

SUMMARY

UCLA researchers in the Department of Computer Science have developed a new algorithm that approximates large frequency moments in big datasets with pick-and-drop sampling for analysis.

BACKGROUND

With increasing data volume, the ability to analyze the data becomes challenging. In some cases, the data is generated by a single event and stored for analysis, e.g. large simulations (financial or scientific). In other instances, the data is generated by singular simultaneous events, such as daily sales data from online purchases/retailers. While each day's data may be efficiently analyzed, the size the combined data is likely too big for practical in-depth analysis. Approximate frequency moments could be used to analyze retailers weekly or yearly sales figures when analysis of the data becomes impractically large to handle with conventional analysis.

INNOVATION

UCLA researcher Rafail Ostrovsky has developed an algorithm to estimate higher frequency moments of a given data stream. The algorithm provides useful statistics on the data set when the incoming data is too big to store or efficiently analyze.

ADVANTAGES

Provide analysis and robust statistics for very large and continuous data streams (e.g. online sales, commercial sales, big data science)

STATE OF DEVELOPMENT

Researchers have created and validated the algorithm.

RELATED MATERIALS

V. Braverman and R. Ostrovsky, Approximating Large Frequency Moments with Pick-and-Drop Sampling, in Approximation, Randomization, and Combinatorial Optimization, 2013.

Direct Link:

https://canberra-ip.technologypublisher.com/tech/2013-367_METHOD_AND_SYSTEM_F OR_PICK-AND-DROP_SAMPLING_FROM_LARGE_DATASET