DEFESA DE DISSERTAÇÃO DE MESTRADO Nº 303

Aluno: José Antônio Pedro dos Santos

Título: “Ulysses-HIRS: A Hybrid Information Retrieval System for Legislative Documents"

Orientador: Carmelo José Albanez Bastos Filho

Coorientadora: Ellen Polliana Ramos S. Pereira - (UFRPE)

Examinador Externo: Adriano Lorena Inácio Oliveira - (UFPE)

Examinador Interno: Cleyton Mário de Oliveira Rodrigues

Data-hora: 29 de Agosto de 2024, às 14:00.
Local: Formato remoto.


Resumo:

         "The use of Transformers for text processing has attracted a large deal of attention in the last years. This is particularly true for sentence models, which present high capacity to comprehend and generate text contextually, improving the predictive performance in different Natural Language Processing tasks, when compared with previous approaches. Even so, there are still several chal- lenges when applied to long documents, especially for some knowledge areas with very specific characteristics, such as legislative proposals. Therefore, the Brazilian Portuguese language has complex constructions, and these features are even more relevant for legal texts. This study investigated different strategies for utilizing BERT-based models in long document retrieval written in Brazilian Portuguese. We used three corpora from the Brazilian Chamber of Deputies to build a dataset and assess the models, incorporating zero-shot and fine-tuning strategies. Five sentence models were evaluated: BERTimbau, LegalBert, LegalBert-pt, LegalBERTimbau, and LaBSE. We also assessed a summarized corpus of bills considering the input size limitation of the sentence models. Finaly, we propose developed a hybrid model, named Ulysses-HIRS, combining BM25 Large and BERTimbau with fine-tuning. According to the experimental results, the predictive performance obtained by Ulysses-HIRS was superior to the performance obtained by the other models, with a Recall of 84.78% for 20 documents."

Defesa 303
Go to top Menú