Dr Thomas Roelleke
Senior Lecturer
School of Electronic Engineering and Computer Science
Queen Mary University of London
Queen Mary University of London
Research
information retrieval (IR) and probability theory, structured, semantic and knowledge-oriented IR, integration of data management technologies (DB+IR, In-DB IR/ML/AI), generalisations of probabilistic concepts
Interests
My research interest lies is in information retrieval (IR). IR is related to data and information management, database (DB) technology, machine learning (ML) and AI. My research expertise and contributions are in the following areas:1. probabilistic IR models and probability theory
2. structured, semantic and knowledge-oriented retrieval
3. integration of technologies (DB+IR, In-DB IR/ML)
4. modelling of uncertainty in data (probabilistic databases)
5. generalisations of ranking functions and probabilistic reasoning
IR models (ranking functions, e.g. BM25) are rooted in probability and information theory, but apply some magic quantifications and logarithmic expressions to achieve good retrieval quality. My research focuses on explaining model, and achieving mathematical standards. Publications include "IR Models: Foundations and Relationships" (Morgan Claypool book 2013), Harmony Assumptions (Computer Journal 2015), TF-IDF Uncovered, (ACM SIGIR 2008), General Matrix Framework (IP&M Journal), The Probability of Being Informative, (ACM SIGIR 2003), etc. My long-term research aim is finding the undiscovered parts of mathematics that explain the connection between ranking functions and probability theory.
Database-oriented research includes the integration of DB and IR (and ML, and AI), and it is an ongoing research challenge. The areas and methods are closely related, but surprisingly different and separated. My contributions include probabilistic object-relational, logic-based knowledge representations (Retrieval of Complex Objects, and various publications) that are beneficial for solving tasks in the domain of semantic and knowledge-oriented (so-called complex) information management tasks. Under the remit of DB+IR (in recent terminology, In-DB IR/ML), this led to a patented technology: the "Relational Bayes" (VLDB Journal 2008, extended SQL, WHERE ASSUMPTION IS MAX_INFORMATIVE).
Recent publications focus on probabilistic, information-theoretic and structured IR in the context of investigative IR (Journal of Information Systems, 2023), and the Dirichlet-multinomial modelling of recommendation and urgency (Big Data, ML and Intelligent Systems, Frontiers of AI, 2021).
Publications
Publications of specific relevance to the Centre for Multimodal AI
2024
Ketola T and Roelleke T (2024). Document structure-driven investigative information retrieval. Information Systems, Elsevier vol. 121, 102315-102315.
01-03-2024
01-03-2024
2023
Ketola T and Roelleke T (2023). Automatic and Analytical Field Weighting for Structured Document Retrieval. Advances in Information Retrieval 489-503.
01-01-2023
01-01-2023
2022
Ketola T and Roelleke T (2022). Formal Constraints for Structured Document Retrieval. Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval.
23-08-2022
23-08-2022
2021
Bahrani M and Roelleke T (2021). ADOR: A New Medical Dataset for Sentiment-based IR. CIKM’21: Fourth Workshop on Knowledge-driven Analytics and Systems Impacting Human Quality of Life.
01-11-2021
01-11-2021
Bahrani M and Roelleke T (2021). Opinion-Aware Retrieval Models Based on Sentiment and Intensity of Lexical Features. Modern Management based on Big Data II and Machine Learning and Intelligent Systems III
29-10-2021
29-10-2021
2020
Bahrani M and Roelleke T (2020). FDCM. Proceedings of the 29th ACM International Conference on Information & Knowledge Management.
19-10-2020
19-10-2020
Ketola T and Roelleke T (2020). BM25-FIC: Information content-based field weighting for BM25F.
01-01-2020
01-01-2020
2018
Lipani A, Roelleke T, Lupu M and Hanbury A (2018). A systematic approach to normalization in probabilistic models. Inf Retr Boston, Springer vol. 21 (6), 565-596.
30-06-2018
30-06-2018
Gray PMD (2018). P/FDM. Encyclopedia of Database Systems 2643-2644.
01-01-2018
01-01-2018
Roelleke T, Wang J and Robertson S (2018). Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model. Encyclopedia of Database Systems 2839-2845.
01-01-2018
01-01-2018
2016
Frommholz I and Roelleke T (2016). Scalable DB+IR Technology: Processing Probabilistic Datalog with HySpirit. Datenbank-Spektrum, Springer Nature vol. 16 (1), 39-48.
26-01-2016
26-01-2016
Roelleke T, Wang J and Robertson S (2016). Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model. Encyclopedia of Database Systems 1-7.
01-01-2016
01-01-2016
2015
Milajevs D, Sadrzadeh M and Roelleke T (2015). IR meets NLP. Proceedings of the 2015 International Conference on The Theory of Information Retrieval.
27-09-2015
27-09-2015
Roelleke T, Kaltenbrunner A and Baeza-Yates R (2015). Harmony Assumptions in Information Retrieval and Social Networks. The Computer Journal, Oxford University Press (OUP) vol. 58 (11), 2982-2999.
14-05-2015
14-05-2015
2013
Martinez-Alvarez M, Bonzanini M and Roelleke T (2013). Mathematical Specification and Logic Modelling in the context of IR. Proceedings of the 2013 Conference on the Theory of Information Retrieval.
29-09-2013
29-09-2013
Roelleke T (2013). IR Models. Proceedings of the 2013 Conference on the Theory of Information Retrieval.
29-09-2013
29-09-2013
Roelleke T, Bonzanini M and Martinez-Alvarez M (2013). On the modelling of ranking algorithms in probabilistic datalog. Proceedings of the 7th International Workshop on Ranking in Databases.
30-08-2013
30-08-2013
Bonzanini M, Martinez-Alvarez M and Roelleke T (2013). Extractive summarisation via sentence removal. Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval.
28-07-2013
28-07-2013
Martinez-Alvarez M, Bellogin A and Roelleke T (2013). Document Difficulty Framework for Semi-automatic Text Classification.
01-01-2013
01-01-2013
Roelleke T, Azzam H, Bonzanini M, Martinez-Alvarez M and Lalmas M (2013). The D2Q2 framework: On the relationship and combination of language modelling and TF-IDF.
01-01-2013
01-01-2013
2012
Bonzanini M, Martinez-Alvarez M and Roelleke T (2012). Investigating the use of extractive summarisation in sentiment classification.
01-12-2012
01-12-2012
Bonzanini M, Martinez-Alvarez M and Roelleke T (2012). Opinion summarisation through sentence extraction. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval.
12-08-2012
12-08-2012
Roelleke T (2012). IR models. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval.
12-08-2012
12-08-2012
Azzam H, Yayhaei , Roelleke and Bonzanini M (2012). A Schema-driven Approach for Knowledge-oriented Retrieval and Query Formulation. KEYS 2012, The 3rd International Workshop on Keyword Search and Structured Data Scottsdale, Arizona, USA 20 May 2012.
01-01-2012
01-01-2012
Martinez-Alvarez M, Yahyaei S and Roelleke T (2012). Semi-automatic document classification.
01-01-2012
01-01-2012
2011
Azzam H, Roelleke T and Yahyaei S (2011). Ranking-based processing of SQL queries. Proceedings of the 20th ACM international conference on Information and knowledge management.
24-10-2011
24-10-2011
Smeraldi F, Martinez-Alvarez M, Frommholz I and Roelleke T (2011). On the probabilistic logical modelling of quantum and geometrically–inspired IR.
01-01-2011
01-01-2011
Azzam H and Roelleke T (2011). A Generic Data Model for Schema-Driven Design in Information Retrieval Applications.
01-01-2011
01-01-2011
Yahyaei S, Bonzanini M and Roelleke T (2011). Cross-Lingual Text Fragment Alignment Using Divergence from Randomness.
01-01-2011
01-01-2011
Azzam H, Klampanos IA and Roelleke T (2011). Large-Scale Logical Retrieval: Technology for Semantic Modelling of Patent Search. Current Challenges in Patent Information Retrieval 181-195.
01-01-2011
01-01-2011
Blank D, Fuhr N, Henrich A, Mandl T, Rölleke T, Schütze H and Stein B (2011). Teaching IR: Curricular Considerations. Teaching and Learning in Information Retrieval 31-46.
01-01-2011
01-01-2011
2010
Azzam H and Roelleke T (2010). SQR. Proceedings of the third workshop on Exploiting semantic annotations in information retrieval.
30-10-2010
30-10-2010
Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Rüger S and Van Rijsbergen K (2010). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface.
20-05-2010
20-05-2010
Martinez-Alvarez M and Roelleke T (2010). Modelling Probabilistic Inference Networks and Classification in Probabilistic Datalog.
01-01-2010
01-01-2010
Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Rüger S and Van Rijsbergen K (2010). Recent developments in information retrieval.
01-01-2010
01-01-2010
Klampanos IA, Wu HZ, Roelleke T and Azzam H (2010). Logic-Based Retrieval: Technology for Content-Oriented and Analytical Querying of Patent Data., Editors: Cunningham H, Hanbury A and Ruger S.
01-01-2010
01-01-2010
Gurrin C, He YL, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S and van Rijsbergen K (2010). Recent Developments in Information Retrieval., Editors: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S and VanRijsbergen K.
01-01-2010
01-01-2010
2009
Klampanos IA, Azzam H and Roelleke T (2009). A case for probabilistic logic for scalable patent retrieval. Proceedings of the 2nd international workshop on Patent information retrieval.
06-11-2009
06-11-2009
Forst JF, Tombros A and Roelleke T (2009). Less Is More: Maximal Marginal Relevance as a Summarisation Feature., Editors: Azzopardi L, Kazai G, Robertspm S, Ruger S, Shokouhi M, Song D and Yilmaz E.
01-01-2009
01-01-2009
Gray PMD (2009). P/FDM. Encyclopedia of Database Systems 2011-2012.
01-01-2009
01-01-2009
Roelleke T, Wang J and Robertson S (2009). Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model. Encyclopedia of Database Systems 2156-2160.
01-01-2009
01-01-2009
Wu HZ and Roelleke T (2009). Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification., Editors: Azzopardi L, Kazai G, Robertspm S, Ruger S, Shokouhi M, Song D and Yilmaz E.
01-01-2009
01-01-2009
2008
Amer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR integration. ACM SIGIR Forum, Association for Computing Machinery (ACM) vol. 42 (2), 84-89.
30-11-2008
30-11-2008
Amer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR integration. ACM SIGMOD Record, Association for Computing Machinery (ACM) vol. 37 (3), 46-49.
30-09-2008
30-09-2008
Amer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR Integration: Report on the Dagstuhl Seminar Ranked XML Querying. SIGMOD RECORD vol. 37 (3), 46-49.
01-09-2008
01-09-2008
ROELLEKE T and Wang J (2008). TF-IDF Uncovered: A Study of Theories and Probabilities. 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Singapore.
01-01-2008
01-01-2008
Roelleke T, Wu H, Wang J and Azzam H (2008). Modelling retrieval models in a probabilistic relational algebra with a new operator: the relational Bayes. VLDB J vol. 17 (1), 5-37.
01-01-2008
01-01-2008
Amer-Yahia S, Hiemstra D, Roelleke T, Srivastava D and Weikum G (2008). DB&IR Integration: Report on the Dagstuhl Seminar Ranked XML Querying.
01-01-2008
01-01-2008
2007
Forst JF, Roelleke T and Tombros A (2007). Modelling a summarisation logic in probabilistic datalog.
01-12-2007
01-12-2007
(2007). TOIS reviewers January 2006 through May 2007. ACM Transactions on Information Systems, Association for Computing Machinery (ACM) vol. 25 (4), 15-es.
01-10-2007
01-10-2007
2006
Forst JF, Tombros A and Rölleke T (2006). Solving the enterprise TREC task with probabilistic data models.
01-12-2006
01-12-2006
ROELLEKE T and Wang J (2006). A Parallel Derivation of Probabilistic Retrieval Models. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, US.
27-08-2006
27-08-2006
Rolleke T, Tsikrika T and Kazai G (2006). A general matrix framework for modelling Information Retrieval. INFORMATION PROCESSING & MANAGEMENT vol. 42 (1), 4-30.
01-01-2006
01-01-2006
Wang J and Roelleke T (2006). Context-specific frequencies and discriminativeness for the retrieval of structured documents., Editors: Lalmas M, MacFarlane A, Ruger S, Tombros A, Tsikrika T and Yavlinsky A.
01-01-2006
01-01-2006
2005
Amer-Yahia S, Case P, Rolleke T, Shanmugasundaram J and Weikum G (2005). Report on the DB/IR panel at SIGMOD 2005. SIGMOD RECORD vol. 34 (4), 71-74.
01-12-2005
01-12-2005
Roelleke T, Ashoori E, Wu H and Cai Z (2005). The QMUL team with probabilistic SQL at enterprise track.
01-12-2005
01-12-2005
ROELLEKE T and de Vries A (2005). Relevance Information: A Loss of Entropy but a Gain for IDF? 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil.
17-08-2005
17-08-2005
Szlavik Z and Rolleke T (2005). Building and experimenting with a heterogeneous collection., Editors: Fuhr N, Lalmas M, Malik S and Szlavik Z.
01-01-2005
01-01-2005
2004
Baeza-Yates R, Maarek YS, Roelleke T and de Vries AP (2004). Third edition of the XML and information retrieval workshop first workshop on integration of IR and DB (WIRD) jointly held at SIGIR'2004, Sheffield, UK, July 29th, 2004. ACM SIGIR Forum, Association for Computing Machinery (ACM) vol. 38 (2), 24-30.
01-12-2004
01-12-2004
Lalmas M and Rolleke T (2004). Modelling vague content and structure querying in XML retrieval with a probabilistic object-relational framework., Editors: Christiansen H, Hacid MS, Andreasen T and Larsen HL.
01-01-2004
01-01-2004
2003
ROELLEKE T (2003). A Frequency-based and a Poisson-based Definition of the Probability of Being Informative. 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada.
31-07-2003
31-07-2003
Lalmas M and Rolleke T (2003). Four-valued knowledge augmentation for structured document retrieval.
01-02-2003
01-02-2003
LALMAS M, Roelleke T and Ruthven I (2003). Abductive retrieval for multimedia information seeking. 10th International Conference on Human - Computer Interaction, HCI International, Crete, Greece, vol. 4.
01-01-2003
01-01-2003
Lalmas M, Rölleke T and Fuhr N (2003). Intelligent Retrieval of Hypermedia Documents. Intelligent Exploration of the Web 324-344.
01-01-2003
01-01-2003
Roelleke T (2003). A Frequency-based and a Poisson-based Definition of the Probability of Being Informative.
01-01-2003
01-01-2003
2002
Pearmain A, Lalmas M, Moutogianni E, Papworth D, Healey P and Rolleke Y (2002). Using MPEG-7 at the consumer terminal in broadcasting. EURASIP J APPL SIG P vol. 2002 (4), 354-361.
01-04-2002
01-04-2002
Lalmas M and Roelleke T (2002). Four-valued knowledge augmentation for representing structured documents., Editors: Hacid MS, Ras ZW, Zighed DA and Kodratoff Y.
01-01-2002
01-01-2002
Healey P, LALMAS M, Roelleke T, Papworth D, Moutogianni E and Pearmain A (2002). Using MPEG7 at the Consumer Terminal in Broadcasting. European Association for Signal, Speech and Image Processing Journal of Applied Signal Processing vol. Issue 4, 354-361.
01-01-2002
01-01-2002
Roelleke T, Lalmas M, Kazai G, Ruthven I and Quicker S (2002). The accessibility dimension for structured document retrieval., Editors: Crestani F, Girolami M and VanRijsbergen CJ.
01-01-2002
01-01-2002
Lalmas L, ROELLEKE T and Fuhr N (2002). Intelligent Hypermedia Retrieval. Intelligent Exploration of the Web
01-01-2002
01-01-2002
Kazai G, Lalmas M and Roelleke T (2002). Focussed Structured Document Retrieval. String Processing and Information Retrieval 241-247.
01-01-2002
01-01-2002
2001
Rölleke T, Lübeck R and Kazai G (2001). The HySpirit retrieval platform. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.
01-09-2001
01-09-2001
Pearmain A, Lalmas M, Moutogianni E, Papworth D, Healey P and Rolleke T (2001). Using MPEG-7 at the consumer terminal in broadcasting., Editors: Izquierdo E. WIAMIS 2001 Workshop on Image Analysis for Multimedia Services Tampere, Finland 16 May 2001 - 17 May 2001.
01-01-2001
01-01-2001
Kazai G, Lalmas M and Rolleke T (2001). A model for the representation and focussed retrieval of structured documents based on fuzzy aggregation.
01-01-2001
01-01-2001
Lalmas M, Rolleke T, Turra F and Fuhr N (2001). Concepts for a graphical user interface for hypermedia retrieval., Editors: Larsen HL, Kacprzyk J, Zadrozny S, Andreasen T and Christiansen H.
01-01-2001
01-01-2001
1998
Fuhr N, Gövert N and Rölleke T (1998). DOLORES: a system for logic-based retrieval of multimedia objects.
01-08-1998
01-08-1998
Fuhr N, Gövert N and Rölleke T (1998). DOLORES: A System for Logic-Based Retrieval Objects.
01-08-1998
01-08-1998
Rölleke T and Fuhr N (1998). Querying for facts and content in hypermedia documents.
01-01-1998
01-01-1998
Fuhr N and Rölleke T (1998). HySpirit — A probabilistic inference engine for hypermedia retrieval in large databases.
01-01-1998
01-01-1998
1997
Fuhr N and Rölleke T (1997). A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, Association for Computing Machinery (ACM) vol. 15 (1), 32-66.
01-01-1997
01-01-1997