Dr Thomas Roelleke

Thomas Roelleke

Senior Lecturer

School of Electronic Engineering and Computer Science
Queen Mary University of London
ORCID ACM Google Scholar

Research

information retrieval (IR) and probability theory, structured, semantic and knowledge-oriented IR, integration of data management technologies (DB+IR, In-DB IR/ML/AI), generalisations of probabilistic concepts

Interests

My research interest lies is in information retrieval (IR). IR is related to data and information management, database (DB) technology, machine learning (ML) and AI. My research expertise and contributions are in the following areas:
1. probabilistic IR models and probability theory
2. structured, semantic and knowledge-oriented retrieval
3. integration of technologies (DB+IR, In-DB IR/ML)
4. modelling of uncertainty in data (probabilistic databases)
5. generalisations of ranking functions and probabilistic reasoning

IR models (ranking functions, e.g. BM25) are rooted in probability and information theory, but apply some magic quantifications and logarithmic expressions to achieve good retrieval quality. My research focuses on explaining model, and achieving mathematical standards. Publications include "IR Models: Foundations and Relationships" (Morgan Claypool book 2013), Harmony Assumptions (Computer Journal 2015), TF-IDF Uncovered, (ACM SIGIR 2008), General Matrix Framework (IP&M Journal), The Probability of Being Informative, (ACM SIGIR 2003), etc. My long-term research aim is finding the undiscovered parts of mathematics that explain the connection between ranking functions and probability theory.

Database-oriented research includes the integration of DB and IR (and ML, and AI), and it is an ongoing research challenge. The areas and methods are closely related, but surprisingly different and separated. My contributions include probabilistic object-relational, logic-based knowledge representations (Retrieval of Complex Objects, and various publications) that are beneficial for solving tasks in the domain of semantic and knowledge-oriented (so-called complex) information management tasks. Under the remit of DB+IR (in recent terminology, In-DB IR/ML), this led to a patented technology: the "Relational Bayes" (VLDB Journal 2008, extended SQL, WHERE ASSUMPTION IS MAX_INFORMATIVE).

Recent publications focus on probabilistic, information-theoretic and structured IR in the context of investigative IR (Journal of Information Systems, 2023), and the Dirichlet-multinomial modelling of recommendation and urgency (Big Data, ML and Intelligent Systems, Frontiers of AI, 2021).

Publications

solid heart iconPublications of specific relevance to the Centre for Multimodal AI

2024

Relevant PublicationKetola T and Roelleke T (2024). Document structure-driven investigative information retrieval. Information Systems, Elsevier vol. 121, 102315-102315.  
01-03-2024

2023

Relevant PublicationKetola T and Roelleke T (2023). Automatic and Analytical Field Weighting for Structured Document Retrieval. Advances in Information Retrieval  489-503.  
01-01-2023

2022

Relevant PublicationKetola T and Roelleke T (2022). Formal Constraints for Structured Document Retrieval. Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
23-08-2022

2021

bullet iconBahrani M and Roelleke T (2021). ADOR: A New Medical Dataset for Sentiment-based IR. CIKM’21: Fourth Workshop on Knowledge-driven Analytics and Systems Impacting Human Quality of Life
01-11-2021
Relevant PublicationBahrani M and Roelleke T (2021). Opinion-Aware Retrieval Models Based on Sentiment and Intensity of Lexical Features. Modern Management based on Big Data II and Machine Learning and Intelligent Systems III  
29-10-2021

2020

bullet iconBahrani M and Roelleke T (2020). FDCM. Proceedings of the 29th ACM International Conference on Information & Knowledge Management
19-10-2020
bullet iconKetola T and Roelleke T (2020). BM25-FIC: Information content-based field weighting for BM25F. 
01-01-2020

2018

Relevant PublicationLipani A, Roelleke T, Lupu M and Hanbury A (2018). A systematic approach to normalization in probabilistic models. Inf Retr Boston, Springer vol. 21 (6), 565-596.  
30-06-2018
bullet iconGray PMD (2018). P/FDM. Encyclopedia of Database Systems  2643-2644.  
01-01-2018
Relevant PublicationRoelleke T, Wang J and Robertson S (2018). Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model. Encyclopedia of Database Systems  2839-2845.  
01-01-2018

2016

bullet iconFrommholz I and Roelleke T (2016). Scalable DB+IR Technology: Processing Probabilistic Datalog with HySpirit. Datenbank-Spektrum, Springer Nature vol. 16 (1), 39-48.  
26-01-2016
bullet iconRoelleke T, Wang J and Robertson S (2016). Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model. Encyclopedia of Database Systems  1-7.  
01-01-2016

2015

bullet iconMilajevs D, Sadrzadeh M and Roelleke T (2015). IR meets NLP. Proceedings of the 2015 International Conference on The Theory of Information Retrieval
27-09-2015
bullet iconRoelleke T, Kaltenbrunner A and Baeza-Yates R (2015). Harmony Assumptions in Information Retrieval and Social Networks. The Computer Journal, Oxford University Press (OUP) vol. 58 (11), 2982-2999.  
14-05-2015

2013

bullet iconMartinez-Alvarez M, Bonzanini M and Roelleke T (2013). Mathematical Specification and Logic Modelling in the context of IR. Proceedings of the 2013 Conference on the Theory of Information Retrieval
29-09-2013
bullet iconRoelleke T (2013). IR Models. Proceedings of the 2013 Conference on the Theory of Information Retrieval
29-09-2013
bullet iconRoelleke T, Bonzanini M and Martinez-Alvarez M (2013). On the modelling of ranking algorithms in probabilistic datalog. Proceedings of the 7th International Workshop on Ranking in Databases
30-08-2013
bullet iconBonzanini M, Martinez-Alvarez M and Roelleke T (2013). Extractive summarisation via sentence removal. Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
28-07-2013
bullet iconRoelleke T (2013). Information Retrieval Models. 
26-07-2013
bullet iconMartinez-Alvarez M, Bellogin A and Roelleke T (2013). Document Difficulty Framework for Semi-automatic Text Classification. 
01-01-2013
bullet iconRoelleke T, Azzam H, Bonzanini M, Martinez-Alvarez M and Lalmas M (2013). The D2Q2 framework: On the relationship and combination of language modelling and TF-IDF. 
01-01-2013
bullet iconRoelleke T (2013). Information Retrieval Models, Foundations and Relationships. 
01-01-2013

2012

bullet iconBonzanini M, Martinez-Alvarez M and Roelleke T (2012). Investigating the use of extractive summarisation in sentiment classification. 
01-12-2012
bullet iconBonzanini M, Martinez-Alvarez M and Roelleke T (2012). Opinion summarisation through sentence extraction. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
12-08-2012
bullet iconRoelleke T (2012). IR models. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
12-08-2012
bullet iconAzzam H, Yayhaei , Roelleke and Bonzanini M (2012). A Schema-driven Approach for Knowledge-oriented Retrieval and Query Formulation. KEYS 2012, The 3rd International Workshop on Keyword Search and Structured Data Scottsdale, Arizona, USA 20 May 2012
01-01-2012
bullet iconMartinez-Alvarez M, Yahyaei S and Roelleke T (2012). Semi-automatic document classification. 
01-01-2012

2011

bullet iconAzzam H, Roelleke T and Yahyaei S (2011). Ranking-based processing of SQL queries. Proceedings of the 20th ACM international conference on Information and knowledge management
24-10-2011
Relevant PublicationSmeraldi F, Martinez-Alvarez M, Frommholz I and Roelleke T (2011). On the probabilistic logical modelling of quantum and geometrically–inspired IR. 
01-01-2011
bullet iconAzzam H and Roelleke T (2011). A Generic Data Model for Schema-Driven Design in Information Retrieval Applications. 
01-01-2011
bullet iconYahyaei S, Bonzanini M and Roelleke T (2011). Cross-Lingual Text Fragment Alignment Using Divergence from Randomness. 
01-01-2011
bullet iconMartinez-Alvarez M and Roelleke T (2011). A Descriptive Approach to Classification. 
01-01-2011
bullet iconAzzam H, Klampanos IA and Roelleke T (2011). Large-Scale Logical Retrieval: Technology for Semantic Modelling of Patent Search. Current Challenges in Patent Information Retrieval  181-195.  
01-01-2011
bullet iconBlank D, Fuhr N, Henrich A, Mandl T, Rölleke T, Schütze H and Stein B (2011). Teaching IR: Curricular Considerations. Teaching and Learning in Information Retrieval  31-46.  
01-01-2011

2010

bullet iconAzzam H and Roelleke T (2010). An attribute-based model for semantic retrieval. 
01-12-2010
bullet iconAzzam H and Roelleke T (2010). SQR. Proceedings of the third workshop on Exploiting semantic annotations in information retrieval
30-10-2010
bullet iconGurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Rüger S and Van Rijsbergen K (2010). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface. 
20-05-2010
bullet iconMartinez-Alvarez M and Roelleke T (2010). Modelling Probabilistic Inference Networks and Classification in Probabilistic Datalog. 
01-01-2010
bullet iconGurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Rüger S and Van Rijsbergen K (2010). Recent developments in information retrieval. 
01-01-2010
bullet iconKlampanos IA, Wu HZ, Roelleke T and Azzam H (2010). Logic-Based Retrieval: Technology for Content-Oriented and Analytical Querying of Patent Data., Editors: Cunningham H, Hanbury A and Ruger S. 
01-01-2010
bullet iconGurrin C, He YL, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S and van Rijsbergen K (2010). Recent Developments in Information Retrieval., Editors: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S and VanRijsbergen K. 
01-01-2010

2009

bullet iconKlampanos IA, Azzam H and Roelleke T (2009). A case for probabilistic logic for scalable patent retrieval. Proceedings of the 2nd international workshop on Patent information retrieval
06-11-2009
bullet iconForst JF, Tombros A and Roelleke T (2009). Less Is More: Maximal Marginal Relevance as a Summarisation Feature., Editors: Azzopardi L, Kazai G, Robertspm S, Ruger S, Shokouhi M, Song D and Yilmaz E. 
01-01-2009
bullet iconGray PMD (2009). P/FDM. Encyclopedia of Database Systems  2011-2012.  
01-01-2009
bullet iconRoelleke T, Wang J and Robertson S (2009). Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model. Encyclopedia of Database Systems  2156-2160.  
01-01-2009
bullet iconWu HZ and Roelleke T (2009). Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification., Editors: Azzopardi L, Kazai G, Robertspm S, Ruger S, Shokouhi M, Song D and Yilmaz E. 
01-01-2009

2008

bullet iconAmer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR integration. ACM SIGIR Forum, Association for Computing Machinery (ACM) vol. 42 (2), 84-89.  
30-11-2008
bullet iconAmer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR integration. ACM SIGMOD Record, Association for Computing Machinery (ACM) vol. 37 (3), 46-49.  
30-09-2008
bullet iconAmer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR Integration: Report on the Dagstuhl Seminar Ranked XML Querying. SIGMOD RECORD vol. 37 (3), 46-49.  
01-09-2008
bullet iconROELLEKE T and Wang J (2008). TF-IDF Uncovered: A Study of Theories and Probabilities. 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Singapore
01-01-2008
bullet iconRoelleke T, Wu H, Wang J and Azzam H (2008). Modelling retrieval models in a probabilistic relational algebra with a new operator: the relational Bayes. VLDB J vol. 17 (1), 5-37.  
01-01-2008
bullet iconAmer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR Integration: Report on the Dagstuhl Seminar Ranked XML Querying. 
01-01-2008

2007

bullet iconForst JF, Roelleke T and Tombros A (2007). Modelling a summarisation logic in probabilistic datalog. 
01-12-2007
bullet icon (2007). TOIS reviewers January 2006 through May 2007. ACM Transactions on Information Systems, Association for Computing Machinery (ACM) vol. 25 (4), 15-es.  
01-10-2007

2006

bullet iconForst JF, Tombros A and Rölleke T (2006). Solving the enterprise TREC task with probabilistic data models. 
01-12-2006
bullet iconROELLEKE T and Wang J (2006). A Parallel Derivation of Probabilistic Retrieval Models. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, US
27-08-2006
bullet iconRolleke T, Tsikrika T and Kazai G (2006). A general matrix framework for modelling Information Retrieval. INFORMATION PROCESSING & MANAGEMENT vol. 42 (1), 4-30.  
01-01-2006
bullet iconWang J and Roelleke T (2006). Context-specific frequencies and discriminativeness for the retrieval of structured documents., Editors: Lalmas M, MacFarlane A, Ruger S, Tombros A, Tsikrika T and Yavlinsky A. 
01-01-2006

2005

bullet iconAmer-Yahia S, Case P, Rolleke T, Shanmugasundaram J and Weikum G (2005). Report on the DB/IR panel at SIGMOD 2005. SIGMOD RECORD vol. 34 (4), 71-74.  
01-12-2005
bullet iconRoelleke T, Ashoori E, Wu H and Cai Z (2005). The QMUL team with probabilistic SQL at enterprise track. 
01-12-2005
bullet iconROELLEKE T and de Vries A (2005). Relevance Information: A Loss of Entropy but a Gain for IDF? 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil
17-08-2005
bullet iconSzlavik Z and Rolleke T (2005). Building and experimenting with a heterogeneous collection., Editors: Fuhr N, Lalmas M, Malik S and Szlavik Z. 
01-01-2005

2004

bullet iconBaeza-Yates R, Maarek YS, Roelleke T and de Vries AP (2004). Third edition of the XML and information retrieval workshop first workshop on integration of IR and DB (WIRD) jointly held at SIGIR'2004, Sheffield, UK, July 29th, 2004. ACM SIGIR Forum, Association for Computing Machinery (ACM) vol. 38 (2), 24-30.  
01-12-2004
bullet iconLalmas M and Rolleke T (2004). Modelling vague content and structure querying in XML retrieval with a probabilistic object-relational framework., Editors: Christiansen H, Hacid MS, Andreasen T and Larsen HL. 
01-01-2004

2003

bullet iconROELLEKE T (2003). A Frequency-based and a Poisson-based Definition of the Probability of Being Informative. 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada
31-07-2003
bullet iconLalmas M and Rolleke T (2003). Four-valued knowledge augmentation for structured document retrieval. 
01-02-2003
bullet iconLALMAS M, Roelleke T and Ruthven I (2003). Abductive retrieval for multimedia information seeking. 10th International Conference on Human - Computer Interaction, HCI International, Crete, Greece, vol. 4
01-01-2003
bullet iconLalmas M, Rölleke T and Fuhr N (2003). Intelligent Retrieval of Hypermedia Documents. Intelligent Exploration of the Web  324-344.  
01-01-2003
bullet iconRoelleke T (2003). A Frequency-based and a Poisson-based Definition of the Probability of Being Informative. 
01-01-2003

2002

bullet iconPearmain A, Lalmas M, Moutogianni E, Papworth D, Healey P and Rolleke Y (2002). Using MPEG-7 at the consumer terminal in broadcasting. EURASIP J APPL SIG P vol. 2002 (4), 354-361.  
01-04-2002
bullet iconLalmas M and Roelleke T (2002). Four-valued knowledge augmentation for representing structured documents., Editors: Hacid MS, Ras ZW, Zighed DA and Kodratoff Y. 
01-01-2002
bullet iconHealey P, LALMAS M, Roelleke T, Papworth D, Moutogianni E and Pearmain A (2002). Using MPEG7 at the Consumer Terminal in Broadcasting. European Association for Signal, Speech and Image Processing Journal of Applied Signal Processing vol. Issue 4, 354-361.  
01-01-2002
bullet iconRoelleke T, Lalmas M, Kazai G, Ruthven I and Quicker S (2002). The accessibility dimension for structured document retrieval., Editors: Crestani F, Girolami M and VanRijsbergen CJ. 
01-01-2002
bullet iconLalmas L, ROELLEKE T and Fuhr N (2002). Intelligent Hypermedia Retrieval. Intelligent Exploration of the Web  
01-01-2002
bullet iconKazai G, Lalmas M and Roelleke T (2002). Focussed Structured Document Retrieval. String Processing and Information Retrieval  241-247.  
01-01-2002

2001

bullet iconRölleke T, Lübeck R and Kazai G (2001). The HySpirit retrieval platform. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
01-09-2001
bullet iconPearmain A, Lalmas M, Moutogianni E, Papworth D, Healey P and Rolleke T (2001). Using MPEG-7 at the consumer terminal in broadcasting., Editors: Izquierdo E. WIAMIS 2001 Workshop on Image Analysis for Multimedia Services Tampere, Finland 16 May 2001 - 17 May 2001
01-01-2001
bullet iconKazai G, Lalmas M and Rolleke T (2001). A model for the representation and focussed retrieval of structured documents based on fuzzy aggregation. 
01-01-2001
bullet iconLalmas M, Rolleke T, Turra F and Fuhr N (2001). Concepts for a graphical user interface for hypermedia retrieval., Editors: Larsen HL, Kacprzyk J, Zadrozny S, Andreasen T and Christiansen H. 
01-01-2001

1998

bullet iconFuhr N, Gövert N and Rölleke T (1998). DOLORES: a system for logic-based retrieval of multimedia objects. 
01-08-1998
bullet iconFuhr N, Gövert N and Rölleke T (1998). DOLORES: A System for Logic-Based Retrieval Objects. 
01-08-1998
bullet iconRölleke T and Fuhr N (1998). Querying for facts and content in hypermedia documents. 
01-01-1998
bullet iconFuhr N and Rölleke T (1998). HySpirit — A probabilistic inference engine for hypermedia retrieval in large databases. 
01-01-1998

1997

bullet iconFuhr N and Rölleke T (1997). A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, Association for Computing Machinery (ACM) vol. 15 (1), 32-66.  
01-01-1997

1996

bullet iconRoelleke T and Fuhr N (1996). Retrieval of complex objects using a four-valued logic. 
01-12-1996