DEFESA DE TESE DE DOUTORADO Nº 22

Aluno: Antonio Victor Alencar Lundgren

Título: “Development of a visual semantic analysis framework"

Orientador: Carmelo José Albanez Bastos Filho

Coorientador: Byron Leite Dantas Bezerra

Examinador Externo: Anthony Jose da Cunha C. Lins - (UNICAP)

Examinador Externo: Cleber Zanchetin - (UFPE)

Examinador Interno: Pablo Vinicius Alves de Barros

Examinador Interno: Bruno José Torres Fernandes

Data-hora: 02 de Agosto de 2024 às 9h.
Local: Formato remoto.


Resumo:

         "Assistive robotics show the potential to alleviate the life of those most in need in our society, the ill, the elderly, or even the too-young, helping ease the burden of accomplishing common everyday tasks, securing environments, or even stimulating socialization through simulated companionship. Applications in assistive robotics usually make use of sensors to detect the ambient around them to aid in decision-making and task completion, and visual sensors are the most present among those, with cameras, lidars, and other devices helping the robots "see" around themselves. Recent advances in machine learning allowed those robots to achieve specific tasks with high levels of efficiency, with deep learning techniques as a highlight. However, those techniques are not capable of understanding or adapting to shifts in contexts, current literature lacks data, models, and even methodology capable of using the current advances in deep learning towards contextual understanding of tasks. Setting autonomous assistive robotics as the goal, this work taps the gap in machine learning for semantic analysis in visual tasks by creating a dataset to be used as a baseline for visual semantic analysis approaches, the HOD dataset, along with a framework for building and testing modular visual semantic analysis models to exploit state-of-the-art models and operations by the use of semantic variables. Semantic variables are secondary branches added to a model to extract contextual information, those branches can be any model for a specific output, unrelated to the tackled task. Those semantic variables are frozen, and their outputs are then merged with the main model’s outputs in an output head, helping the learning process by using complementary information. The HOD dataset is a novel object detection and dangerousness classification on indoor natural scenes, simulating the view of an NAO robot. It contains a total of 100,602 images and 435,753 annotated objects. We compare the use of a RetinaNet on the HOD dataset as a baseline, against the RetinaNet modified using the framework to attach a DenseNet161 as a semantic variable, this semantic variable pre-trained on the Places365 dataset to classify scenes. The model created and trained using the VisualSAF framework achieves a mean average precision of 0.834 and a mean average recall of 0.862 with intersection over union varying from 0.5 to 0.95 in steps of 0.05. Those results are 0.049 and 0.010 better than the baseline for mean average precision and mean average recall, respectively. This work also provides, as far as we know, the first definition of visual semantic analysis and categorization for its methodologies."

Defesa 301
Go to top Menú