Страница публикации

An approach how to automate labeling data for the training ANN models for page layout analysis

Тип публикации: Статья в журнале

Тип материала: Текст

Авторы: Mikhailov A.

Журнал: CEUR Workshop Proceedings: 3rd Scientific-Practical Workshop Information Technologies: Algorithms, Models, Systems (ITAMS 2020; Irkutsk, 3 September 2020)

Язык публикации: english

Серия книг: CEUR Workshop Proceedings

Том: 2677

Год публикации: 2020

Отчетный год: 2020

Аннотация: Object detection and recognition is an important task in many document analysis applications. It is a difficult problem due to different page layouts and representation formats. Recently the deep learning in computer vision has significantly boosted the data-driven image-based approaches for page layout analysis. In this paper, we consider open formats of electronic documents to generate training datasets. Formats of these documents should contain markup allowing obtaining information about page layout regions. It will allow us to generate a training dataset automatically for training ANN models of page layout analysis.

Индексируется WOS: Нет

Индексируется Scopus: Нет

Индексируется УБС: Нет

Индексируется РИНЦ: Да

Индексируется ВАК: Нет

Индексируется CORE: Нет