20February2026
10:00 Master's Defense Room 85 of the IC, hybrid at meet.google.com/yxi-mria-apc
Topic on
LLM-based Information Retrieval for B2B Matchmaking: Exploring Composite Representations of Multiple Unstructured Documents (MUd)
Student
Bruno Nogueira Renzo
Advisor / Teacher
Marcelo da Silva Reis
Brief summary
B2B e-marketplaces are platforms where buyers and sellers meet to transact goods and services. These platforms have grown substantially in recent years, with B2B e-commerce, more generally, reaching a market size approximately twice that of B2C e-commerce. B2B matchmaking (i.e., the pairing of business entities) is an essential component of these platforms. For example, robust matchmaking reduces the cost of market search and increases its efficiency. One common approach in the literature is to formulate this pairing as an information retrieval task, in which a set of relevant sellers must be returned for a given buyer. Furthermore, each of these business entities is often represented using a single structured document or a single semi-structured document. That is, each buyer and seller has their representation built from a set of predefined attributes, such as 'company name', 'company size', 'company industry', etc. We argue, however, that these representations restrict and simplify the true profiles, demands, and offers of these business entities, and that unstructured documents should be used instead. Furthermore, particularly for salespeople, we should use not a single, but multiple unstructured documents (UMs), leveraging the large volume of data that these companies already have available on the internet (their websites, blog posts, podcasts, pitch presentations, etc.). Most likely, representations of unstructured documents have not been extensively explored in the literature given the limitations of previous technologies. However, we understand that with the constant advancements in LLM capabilities, these representations can now be used effectively. Therefore, in this dissertation, we propose a new information retrieval architecture based on LLMs in which each buyer is represented by a single unstructured document, and each salesperson is represented by multiple unstructured documents. We conducted three main sets of experiments with empirical results demonstrating the superior efficiency of our proposed architecture, its practical utility in B2B applications, and the value/properties of each of the main components of this architecture.
Examination Board
Headlines:
Marcelo da Silva Reis IC / UNICAMP
Hélio Pedrini IC / UNICAMP
Thiago Alexandre Salgueiro Pardo ICMC / USP
Substitutes:
André Santanchè IC / UNICAMP
Ronaldo Cristiano Prati CMCC / UFABC