The rate at which enterprises and governments accumulate data has never been as high as it is today. In many cases, these data are collected for specific purposes, e.g. to maintain a tax record for each worker in a country. More often than not though, these data are collected for no specific purpose, or they turn out to be useful for other unintended purposes. E.g., companies constantly look for new ways to monetize their customer relationship management database; governments mine various databases to detect tax fraud; security agencies mine and cross-associate numerous heterogeneous information streams from publicly accessible (e.g. social networks) as well as classified databases to understand and detect security threats. We call such data mining tasks exploratory data mining tasks. In contrast with the use of data for specific anticipated purposes, in exploratory data mining the objective is ill-defined. In other words, it is unclear how to formalize how interesting a ‘nugget of information’ extracted from the data is (we will refer to such nuggets of information as ‘patterns’). As a result, exploratory data mining is often a slow process of trial and error by the practitioner.
This project is concerned with the development of the mathematical principles of what makes a pattern interesting in a very subjective sense. Crucial in this endeavour will be the design of automatic mechanisms to understand and duly consider the prior beliefs and goals of the practitioner for whom the pattern is intended, relieving the practitioner of the task to attempt to formalize these themselves.
This ERC project will represent a radical change in how exploratory data mining is done. Currently, researchers typically dream up an imagined purpose for the patterns, try to formalize interestingness of such patterns given that purpose, and design an algorithm to mine them. Given the variety of practitioners (each with their own prior beliefs and purposes), this strategy has led to a multitude of algorithms. As a result, practitioners need to be data mining experts to understand which algorithm applies to their situation. Instead, we will develop a theoretically solid approach to design algorithms that automatically learn to understand the user (and their prior beliefs and goals) as much as the data itself, so as to maximize the amount of useful information transmitted to the user. This will ultimately bring the power of exploratory data mining within reach of the non-expert practitioner.