Big Data, Data Mining, and Machine Learning Research: Proactive Drift Detection: Predicting Concept Drifts in Data Streams using Probabilistic Networks

This particular paper was published in International Joint Conference on Neural Networks (IJCNN), 2016 by Kylie Chen, Yun Sing Koh, Patricia Riddle. I presented this in Vancouver in July this year. Attached is the abstract and a brief summary of the paper. Enjoy!

Abstract

The application of current drift detection methods to real data streams show trends in the rate of change found by the detectors. We observe that these patterns of change vary across different data streams, and we use the term stream volatility pattern to describe change rates with a distinct mean and variance. First, we propose a novel drift prediction algorithm to predict the location of future drift points based on historical drift trends which we model as transitions between stream volatility patterns. Our method uses a probabilistic network to learn drift trends and is independent of the drift detection technique. We demonstrate that our method is able to learn and predict drift trends in streams with reoccurring stream volatility patterns. This allows the anticipation of future changes which enables users and detection methods to be more proactive. Second, we apply our drift prediction algorithm by incorporating the drift estimates into a drift detector, ProSeed, to improve its performance by decreasing the false positive rate.

Summary

The main contributions of our work are: (1) a drift prediction algorithm that can accurately learn drift trends of a stream and (2) a drift detector which incorporates historical drift rate information that is accurate for streams with reoccurring volatility trends. We analyze our drift prediction technique by comparing it to ground truth in synthetic data streams and show that it can accurately capture trends for streams with reoccurring volatility patterns. We evaluated the performance of our drift detector by comparing it against state of-the-art detectors on synthetic and real data streams and show that our technique is able to lower the rate of false positives for streams with these trends.

Current drift detection techniques follow a specific framework as shown in the figure below.

We modify the current framework to incorporate additional drift point prediction mechanism (mechanism Drift Prediction Method) that feeds into our new proactive drift detector. This drift point prediction mechanism uses a Probabilistic Networks. The prediction is then used to develop proactive drift detection methods. This adaptation is carried out on previous drift detectors SEED.

Relation to other research

Our research differs from research in drift detection with reoccurring patterns as their methods are aimed at detecting models that reoccur whereas our method aims to learn the characteristics of drift rate trends. For example, suppose we are trying to learn the concept of seasons, research in reoccurring patterns focuses on the order that concepts reoccur such as: spring, summer, autumn, winter, spring. Our research aims to look at the rate of concept change that is the time period between season changes. Unlike research in temporal forecasting for seasonal patterns, we do not assume there is any seasonal effect to the changes, and the trends do not necessarily occur periodically.

You can read the full paper here.

Big Data, Data Mining, and Machine Learning Research

Sunday, November 20, 2016

Proactive Drift Detection: Predicting Concept Drifts in Data Streams using Probabilistic Networks

Abstract

Summary

Relation to other research

No comments:

Post a Comment