Skip to content
Skip to navigation menu

Example PhD

Anomaly Detection in Social Data

Supervisor: Professor N.J. Avis

Keywords: Anomaly Detection, Data Mining, Visualization, Social Network Data Analysis

This proposal aims to extend the software toolkit contained within the COSMOS framework (http://www.cardiff.ac.uk/socsi/research/researchgroups/comsc-socsi/projects.html) which has resulted from the on-going collaboration between the Schools of Computer Science and Informatics and Social Sciences at Cardiff University, by including the ability to automatically detect anomalies in time sequence datasets. By anomalies we mean here aberrant or unusual behaviour. In the content of social media data this might relate to usage patterns or number of messages sent, the fan out ratio, etc within a given time period. Similar techniques are used by credit card firms and banks to highlight unusual spend patterns by customers, which may indicate the credit card has been stolen (or the card holder has gone on vacation). This approach has also been used within healthcare to look for anomalies in patient data – such as in ElectroCardiogram signals. Unlike system monitoring applications linked to high value assets we can tolerate a relatively high false positive rate. This would be one of the assumptions to be verified as part of the project. The student will also make use of existing data sets that are being collected as part of the COSMOS framework to validate their findings.

Such a system would be constructed in two phases. In the first phase representative datasets will be presented to humans in a form which allows them to consume a large number of samples and allows their highly developed pattern matching skills to determine what is “normal” and what is “exceptional or anomalous” behaviour. We will use large (human) scale visual display systems and information visualization techniques to allow users to classify and inspect large amounts of instances of behaviour and performance data. This classified dataset will be used as a training set for various machine learning systems to allow internal representation and detection of normal and abnormal behaviour. We will use the WEKA framework to explore different machine learning strategies and characterise system performance in terms of true and false positives when the system is exposed to classified datasets not previously used in the training set.

Of particular interest will be the investigation of Extreme Value Methods and their extension – Extreme Function Methods. Additionally approaches such as SAX (Symbolic Approximate Reasoning) will be investigated to investigate exceptional behaviours.

Key Skills/Background: Open to Computing Graduates and Postgraduates

Contact: Professor N.J. Avis to discuss this research topic.