Big Data, Data Mining, and Machine Learning Research

Sunday, December 4, 2016

Concept Profiling Framework for Recurring Drifts in Data Streams

This paper was published in Australasian Conference on Artificial Intelligence 2016, 2016 by Robert Anderson, Yun Sing Koh, Gillian Dobbie. Attached is the abstract and a brief summary of the paper.

Abstract

We propose the Concept Profiling Framework (CPF), a meta-learner that uses a concept drift detector and a collection of classification models to perform effective classification on data streams with recurrent concept drifts, through relating models by similarity of their classifying behaviour. We introduce a memory-efficient version of our framework and show that it can operate faster and with less memory than a naive implementation while achieving similar accuracy. We compare this memory-efficient version of CPF to a state-of-the-art meta-learner made to handle recurrent drift and show that we can regularly achieve improved classification accuracy along with runtime and memory use. We provide results from testing on synthetic and real-world datasets to prove CPF's value in classifying data streams with recurrent concepts.

Summary

We present the Concept Profiling Framework (CPF). This is a meta-learning approach that maintains a collection of classifiers and uses a drift detector. When our drift detector indicates a drift state i.e. that our current classifier is no longer suitable, we check our collection of classifiers for one better suited to the current stream. If one meets a set level of accuracy, we will select it as the current classifier; otherwise a new classifier is produced and trained on recent data. If this new classifier behaves similarly to a classifier in our collection, we will choose that existing classifier as our current model; otherwise we will add the new classifier to the collection and use that as our current classifier.

We introduce two techniques to allow efficient handling of recurrent concepts. First, we regularly compare behaviour of our classifiers, and over time, our certainty about their similarity will improve. If they behave similarly, we can use the older model to represent the newer one. Second, we implement a fading mechanism to constrain the number of models, a points-based system that retains models that are recent or frequently used. Through observing reuse patterns, we can understand how patterns recur in our stream.

The figure above describes the framework in further detail. We use a meta-learning framework with a collection of one or more incremental classifiers. One is designated as our current classifier. A drift detector signals warning and drift states. On a warning state, the meta-learner will stop training the current classifier and store instances from the data stream in a buffer. If a drift state follows, the meta-learner looks for an existing model in the collection that classifies the warning buffer accurately to use as the current classifier. If it cannot find one, it will create a new model trained on even buffer instances. When an existing model behaves similarly to this new model (when tested on odd buffer instances) that model will be reused; otherwise the new model is trained on odd buffer instances and used. Every model in the collection is tested on the buffer, and the results will be compared and stored. Where it is found that models classify similarly to one another, the older model will represent the newer one.

Sunday, November 20, 2016

Proactive Drift Detection: Predicting Concept Drifts in Data Streams using Probabilistic Networks

This particular paper was published in International Joint Conference on Neural Networks (IJCNN), 2016 by Kylie Chen, Yun Sing Koh, Patricia Riddle. I presented this in Vancouver in July this year. Attached is the abstract and a brief summary of the paper. Enjoy!

Abstract

The application of current drift detection methods to real data streams show trends in the rate of change found by the detectors. We observe that these patterns of change vary across different data streams, and we use the term stream volatility pattern to describe change rates with a distinct mean and variance. First, we propose a novel drift prediction algorithm to predict the location of future drift points based on historical drift trends which we model as transitions between stream volatility patterns. Our method uses a probabilistic network to learn drift trends and is independent of the drift detection technique. We demonstrate that our method is able to learn and predict drift trends in streams with reoccurring stream volatility patterns. This allows the anticipation of future changes which enables users and detection methods to be more proactive. Second, we apply our drift prediction algorithm by incorporating the drift estimates into a drift detector, ProSeed, to improve its performance by decreasing the false positive rate.

Summary

The main contributions of our work are: (1) a drift prediction algorithm that can accurately learn drift trends of a stream and (2) a drift detector which incorporates historical drift rate information that is accurate for streams with reoccurring volatility trends. We analyze our drift prediction technique by comparing it to ground truth in synthetic data streams and show that it can accurately capture trends for streams with reoccurring volatility patterns. We evaluated the performance of our drift detector by comparing it against state of-the-art detectors on synthetic and real data streams and show that our technique is able to lower the rate of false positives for streams with these trends.

Current drift detection techniques follow a specific framework as shown in the figure below.

We modify the current framework to incorporate additional drift point prediction mechanism (mechanism Drift Prediction Method) that feeds into our new proactive drift detector. This drift point prediction mechanism uses a Probabilistic Networks. The prediction is then used to develop proactive drift detection methods. This adaptation is carried out on previous drift detectors SEED.

Relation to other research

Our research differs from research in drift detection with reoccurring patterns as their methods are aimed at detecting models that reoccur whereas our method aims to learn the characteristics of drift rate trends. For example, suppose we are trying to learn the concept of seasons, research in reoccurring patterns focuses on the order that concepts reoccur such as: spring, summer, autumn, winter, spring. Our research aims to look at the rate of concept change that is the time period between season changes. Unlike research in temporal forecasting for seasonal patterns, we do not assume there is any seasonal effect to the changes, and the trends do not necessarily occur periodically.

You can read the full paper here.