DynamicWEB: A Conceptual clustering algorithm for a changing world
Scanlan, J (2011) DynamicWEB: A Conceptual clustering algorithm for a changing world. PhD thesis, University of Tasmania.
This research was motivated by problems in network security, where an attacker
often deliberately changes their identifying information and behaviour in order to
camouflage their malicious behaviour. Addressing this problem has resulted in a new
adaption to the unsupervised machine learning technique COBWEB.
In machine learning and data mining the aim is to extract patterns from data in order
to discover a meaning underlying the processes that are taking place. In most cases,
each object is observed once, and then the patterns that have been extracted can be
used to classify newly-observed objects. Conceptual clustering aims to do this in
such a way that the patterns that are learned are human readable. Concept drift
algorithms allow concepts to change over time, although most undertake this in a
supervised manner, which presents a challenge when looking for novel classes.
This research focuses on the classification of objects that change over time across
multiple observations. The objects may change their own characteristics (labelled as
object drift in this research) or maintain the same characteristics, but change their
identifier. In addition to this, it is also possible for the concept that describes a group
of objects to itself change (known as concept drift). In addition to the possible
application within the security domain, the method was generalised and tested across
a range of machine learning and data mining domains. In the process it was shown
that the method was robust in the presence of concept drift, which occurs when a
group of objects that define a given concept change their characteristics, resulting in
the definition of that concept having changed over time.
The ideas of concept drift and object drift are not only relevant within the computer
security field, but can be of significance in any knowledge domain. Therefore, any
method presented to address this learning problem should be generalised enough to
be applicable in many application areas.
The new method, entitled DynamicWEB, extends the existing conceptual clustering
method COBWEB to allow for profiles to be added and removed from the concept
hierarchy. An index structure was implemented using an AVL tree to facilitate fast
scalable searching of the knowledge structure. As the target objects change over time
the profiles of each target are updated within the structure, maintaining an up-to-date representation of the domain. The profiles contain derived attributes, which are
formed across multiple observations of each object, with the aim of retaining
knowledge of how the object has changed over time. As well as preserving context
over time, Dynamic Web uses multiple trees and so, transforms the learner into an
In addition to testing the method on the security and network based datasets, a
number of other datasets are also examined. A new dataset (a modified version of
Quinlan’s weather dataset) is presented in order to illustrate how Dynamic Web
operates in the presence of object drift. The method is also tested on several wellknown
machine learning datasets, some of which exhibit concept drift. Along with
these artificial datasets, a group of real-world datasets, including several sourced
from the Australian Bureau of Statistics, were also examined, illustrating
DynamicWEB’s ability to adapt to change.
This thesis describes the work done to enable DynamicWEB to adapt to both concept
drift and object drift, both of which are characteristic of many application domains.
DynamicWEB is also capable of profiling an object across multiple observations to
allow for accurate prediction and inter-object relationship discovery.
|Item Type:||Thesis (PhD)|
|Additional Information:||Copyright © the Author|
|Keywords:||Utas, thesis, machine learning, data mining, computer security|
|Deposited By:||ePrints Officer|
|Deposited On:||02 Sep 2011 14:32|
|Last Modified:||11 Dec 2012 14:27|
|ePrint Statistics:||View statistics for this ePrint|
Repository Staff Only: item control page