Dr Thomas Roelleke

Thomas Roelleke

Senior Lecturer

School of Electronic Engineering and Computer Science
Queen Mary University of London
ORCID ACM Google Scholar

Research

information retrieval (IR) and probability theory, structured, semantic and knowledge-oriented IR, integration of data management technologies (DB+IR/In-DB IR/ML/AI), generalisations of probabilistic concepts

Interests

My research interest lies is in information retrieval (IR). IR is related to data and information management, database (DB) technology, machine learning (ML) and AI. My research expertise and contributions are in the following areas:
1. probabilistic IR models and probability theory
2. structured, semantic and knowledge-oriented retrieval
3. integration of technologies (DB+IR, In-DB IR/ML)
4. modelling of uncertainty in data (probabilistic databases)
5. generalisations of ranking functions and probabilistic reasoning

IR models (ranking functions, e.g. BM25) are rooted in probability and information theory, but apply some magic quantifications and logarithmic expressions to achieve good retrieval quality. My research focuses on explaining model, and achieving mathematical standards. Publications include "IR Models: Foundations and Relationships" (Morgan Claypool book 2013), Harmony Assumptions (Computer Journal 2015), TF-IDF Uncovered, (ACM SIGIR 2008), General Matrix Framework (IP&M Journal), The Probability of Being Informative, (ACM SIGIR 2003), etc. My long-term research aim is finding the undiscovered parts of mathematics that explain the connection between ranking functions and probability theory.

Database-oriented research includes the integration of DB and IR (and ML, and AI), and it is an ongoing research challenge. The areas and methods are closely related, but surprisingly different and separated. My contributions include probabilistic object-relational, logic-based knowledge representations (Retrieval of Complex Objects, and various publications) that are beneficial for solving tasks in the domain of semantic and knowledge-oriented (so-called complex) information management tasks. Under the remit of DB+IR (in recent terminology, In-DB IR/ML), this led to a patented technology: the "Relational Bayes" (VLDB Journal 2008, extended SQL, WHERE ASSUMPTION IS MAX_INFORMATIVE).

Recent publications focus on probabilistic, information-theoretic and structured IR in the context of investigative IR (Journal of Information Systems, 2023), and the Dirichlet-multinomial modelling of recommendation and urgency (Big Data, ML and Intelligent Systems, Frontiers of AI, 2021).