|Title:||Assessing Compliance of Web Pages using Machine Learning|
The project will focus on delivering an application capable of crawling a given set of web domains (100+), with the intention of finding pages displaying compliance related data and categorising them as compliant or non-compliant using a combination of machine learning and rule based approaches. Features used for classification will be extracted from the web pages using natural language processing (NLP); in particular the use of named entity recognition and basic information extraction is predicted. Main concepts in the webpages will be formally modelled via a small ontology, in order to support the semantic elements of NLP. The potential benefit of such a system is to dramatically reduce the manual workload in assuring disparate organisations are displaying data to the required level.
|Moderator:||Helen R Phillips|