Friday, June 2, 2017

Volatility-Adaptive Classifier System

This paper was presented at PAKDD 2017 in Jeju, Korea.

Abstract
A data stream's concept may evolve over time, which is known as the concept drift. Concept drifts affect the prediction accuracy of the learning model and are required to be handled to maintain the model quality. In most cases, there is a trade-off between maintaining prediction quality and learning efficiency. We present a novel framework known as the Volatility-Adaptive Classifier System (VACS) to balance the trade-off. The system contains an adaptive classifier and a non-adaptive classifier. The former can maintain a higher prediction quality but requires additional computational overhead, and the latter requires less computational overhead but its prediction quality may be susceptible to concept drifts. VACS automatically applies the adaptive classifier when the concept drifts are frequent, and switches to the non-adaptive classifier when drifts are infrequent. As a result, VACS can maintain a relatively low computational cost while still maintaining a high enough overall prediction quality. To the best of our knowledge, this is the first data stream mining framework that applies different learners to reduce the learning overheads.

Summary 

Data streams are sequences of unbounded data arriving in real time. For example, electricity usage records produced by a power station, online tweets generated in a region, transactions recorded in a stock market can all be presented as data streams. Such real-world data are generated in order and are considered to be infinite. The task of data stream mining is to find valuable information from these unbounded streams of data. Data stream's properties raise various requirements when designing data stream algorithms. Instances in a stream can arrive very fast, allowing only limited time and memory for the algorithm to learn its underlying concepts. Moreover, a data stream may evolve over time such that the underlying concepts in a stream may change. Consequently, the learning model loses prediction accuracy over time. This is known as concept drifts. To maintain the quality of a learning model, stream learning algorithms are expected to detect changes and update their models to overcome these concept drifts. The frequency of concept drifts is known as stream volatility \cite{huang2014detecting}. High volatility means a high frequency of concept drifts.

Some data stream learners can overcome a concept drift by adjusting their models to generalise the new concept and maintain a high prediction quality during the drift. These learning models can be classified as adaptive learners. Other data stream learners that cannot adjust their models are known as non-adaptive learners. Model adaptations come with a large computational cost. Thus, there is a trade-off between the model quality and the learning efficiency. It is also known that stream volatility may change in a stream over time. For example, in stock market transactions, an anomalous event can result in an increasing number of concept drifts over a short period. One way to balance the trade-off between model quality and efficiency is to apply the model adaptation only when the stream volatility is high to maintain a stable prediction quality. When the volatility becomes low, we disable the model adaptation to save cost. We are addressing this problem by creating a new learning framework containing both adaptive and non-adaptive learners.


We designed a framework called Volatility Adaptive Classifier System (VACS). VACS has lower computational cost than the state-of-art adaptive learner while maintaining a similar prediction quality in a stream with volatility changes. VACS is composed of both adaptive and non-adaptive classifiers. VACS uses stream volatility as the criterion to switch between classifiers. In particular, when the volatility is high, VACS applies the adaptive learner to maintain a better prediction quality. When volatility is low, it is deemed to be unnecessary to spend large overheads to handle infrequent concept drifts, so it switches to the non-adaptive classifier. As a result, VACS will maintain a sufficiently high prediction accuracy with relatively low overheads. 

Our contributions are as follows: (1) We proposed a Volatility Adaptive Classifier System (VACS), which is able to choose between the adaptive classifier and the non-adaptive classifier given different levels of stream volatility. (2) We show that the accuracy of VACS is comparable to state-of-the-art techniques, while maintaining low computational cost. To the best of our knowledge, this is the first data stream learning technique that uses stream volatility to adjust model adaptation behaviour to reduce computational cost while maintaining high model quality. 




Monday, January 23, 2017

Rare Pattern Mining


Association rule mining is predominately focuses on finding frequent co-occurring associations among a collection of items.  Sometimes it is referred to as “Market Basket Analysis”, as one of the original motivation of association rule mining was to mine supermarket transactions. The major aim here is to find associations of items that occur together more often than you would expect from a random probability. A  classic example of this application is the famous Beer and Diapers association. :)

Over the years more often people are using association rule mining to find rare patterns. Detecting rare patterns in data is a vital task, with numerous high-impact applications including medical, finance, and security. The main difficulty in finding rare patterns or rare association rule is the mining process can be likened to finding a needle in a haystack [1]. The difficultly of looking for a small needle is multiplied by the fact that the needle is hidden by a large amount of hay strands. 

Rare rules pose difficulties for data mining systems for a variety of reasons. The fundamental problem is the lack of data -- associated with rare cases. Rare rules tend to cover only a few examples or data points. The lack of data makes detecting rare cases even more difficult and, even when rare case are detected, it is hard for us to meaningfully generalize the results.

Overall there are two major categories of the types of rare patterns: basic and extended. In the basic category, we have association rule mining and rare patterns. In the extended category, there is a range of other patterns including subgraph, probabilistic, and sequence patterns. Based on pattern diversity, rare pattern mining can be classified using the following criteria basic and extended patterns.

Some rare patterns may involve sequences and structures. For example, by studying the order in which rare events occur we may be able to find sequential patterns, that is, rare subsequence’s in a sequence of ordered events. For example, when a diagnosing a patient’s condition, the context is reflected in the sequence of conditions that has appeared across a time period. By mining sequential patterns of symptoms we can capture a patient's condition. In this way, a patient's condition can be found at a more abstract level and it gets easier to track and detect changes in the patient's conditions.

In some datasets we need to consider the existential uncertainty of rare itemsets, indicating the probability that an itemset occurs in a transaction, makes traditional techniques inapplicable. For example a medical dataset may contain a table of patient records, each of which contains a set of symptoms and/or illnesses that a patient suffers from. Applying rare association analysis on such a dataset allows us to discover any potential correlations among the symptoms and illnesses for rare diseases. In these cases, symptoms would best be represented by probabilities based on historical data statistics that indicate their presence in the patients' records. These type of data is known as uncertain data.

Until now there are various techniques out there that focuses on finding these rare association rules. I cover a range of these techniques in my survey paper.

[1] Gary M. Weiss. 2004. Mining with rarity: a unifying framework. SIGKDD Exploration Newsletter 6, 1 (2004), 7 – 19.

Sunday, December 4, 2016

Concept Profiling Framework for Recurring Drifts in Data Streams

This paper was published in Australasian Conference on Artificial Intelligence 2016, 2016 by Robert Anderson, Yun Sing Koh, Gillian Dobbie. Attached is the abstract and a brief summary of the paper.

Abstract

We propose the Concept Profiling Framework (CPF), a meta-learner that uses a concept drift detector and a collection of classification models to perform effective classification on data streams with recurrent concept drifts, through relating models by similarity of their classifying behaviour. We introduce a memory-efficient version of our framework and show that it can operate faster and with less memory than a naive implementation while achieving similar accuracy. We compare this memory-efficient version of CPF to a state-of-the-art meta-learner made to handle recurrent drift and show that we can regularly achieve improved classification accuracy along with runtime and memory use. We provide results from testing on synthetic and real-world datasets to prove CPF's value in classifying data streams with recurrent concepts.

Summary


We present the Concept Profiling Framework (CPF). This is a meta-learning approach that maintains a collection of classifiers and uses a drift detector. When our drift detector indicates a drift state i.e. that our current classifier is no longer suitable, we check our collection of classifiers for one better suited to the current stream. If one meets a set level of accuracy, we will select it as the current classifier; otherwise a new classifier is produced and trained on recent data. If this new classifier behaves similarly to a classifier in our collection, we will choose that existing classifier as our current model; otherwise we will add the new classifier to the collection and use that as our current classifier.

We introduce two techniques to allow efficient handling of recurrent concepts. First, we regularly compare behaviour of our classifiers, and over time, our certainty about their similarity will improve. If they behave similarly, we can use the older model to represent the newer one. Second, we implement a fading mechanism to constrain the number of models, a points-based system that retains models that are recent or frequently used. Through observing reuse patterns, we can understand how patterns recur in our stream.


The figure above  describes the framework in further detail. We use a meta-learning framework with a collection of one or more incremental classifiers. One is designated as our current classifier. A drift detector signals warning and drift states. On a warning state, the meta-learner will stop training the current classifier and store instances from the data stream in a buffer. If a drift state follows, the meta-learner looks for an existing model in the collection that classifies the warning buffer accurately to use as the current classifier. If it cannot find one, it will create a new model trained on even buffer instances. When an existing model behaves similarly to this new model (when tested on odd buffer instances) that model will be reused; otherwise the new model is trained on odd buffer instances and used. Every model in the collection is tested on the buffer, and the results will be compared and stored. Where it is found that models classify similarly to one another, the older model will represent the newer one.

Sunday, November 20, 2016

Proactive Drift Detection: Predicting Concept Drifts in Data Streams using Probabilistic Networks


This particular paper was published in International Joint Conference on Neural Networks (IJCNN), 2016 by Kylie Chen, Yun Sing Koh, Patricia Riddle. I presented this in Vancouver in July this year. Attached is the abstract and a brief summary of the paper. Enjoy!

Abstract

The application of current drift detection methods to real data streams show trends in the rate of change found by the detectors. We observe that these patterns of change vary across different data streams, and we use the term stream volatility pattern to describe change rates with a distinct mean and variance. First, we propose a novel drift prediction algorithm to predict the location of future drift points based on historical drift trends which we model as transitions between stream volatility patterns. Our method uses a probabilistic network to learn drift trends and is independent of the drift detection technique. We demonstrate that our method is able to learn and predict drift trends in streams with reoccurring stream volatility patterns. This allows the anticipation of future changes which enables users and detection methods to be more proactive. Second, we apply our drift prediction algorithm by incorporating the drift estimates into a drift detector, ProSeed, to improve its performance by decreasing the false positive rate.

Summary

The main contributions of our work are: (1) a drift prediction algorithm that can accurately learn drift trends of a stream and (2) a drift detector which incorporates historical drift rate information that is accurate for streams with reoccurring volatility trends. We analyze our drift prediction technique by comparing it to ground truth in synthetic data streams and show that it can accurately capture trends for streams with reoccurring volatility patterns. We evaluated the performance of our drift detector by comparing it against state of-the-art detectors on synthetic and real data streams and show that our technique is able to lower the rate of false positives for streams with these trends.

Current drift detection techniques follow a specific framework as shown in the figure below.


We modify the current framework to incorporate additional drift point prediction mechanism (mechanism Drift Prediction Method) that feeds into our new proactive drift detector. This drift point prediction mechanism uses a Probabilistic Networks. The prediction is then used to develop proactive drift detection methods. This adaptation is carried out on previous drift detectors SEED.





Relation to other research

Our research differs from research in drift detection with reoccurring patterns as their methods are aimed at detecting models that reoccur whereas our method aims to learn the characteristics of drift rate trends. For example, suppose we are trying to learn the concept of seasons, research in reoccurring patterns focuses on the order that concepts reoccur such as: spring, summer, autumn, winter, spring. Our research aims to look at the rate of concept change that is the time period between season changes. Unlike research in temporal forecasting for seasonal patterns, we do not assume there is any seasonal effect to the changes, and the trends do not necessarily occur periodically.

You can read the full paper here.