Страница публикации
Table extraction, analysis, and interpretation: The current state of the TabbyDOC project
Авторы: Shigarov A., Dorodnykh N., Mikhailov A., Paramonov V., Yurin A.
Журнал: CEUR Workshop Proceedings: 4th Scientific-Practical Workshop Information Technologies: Algorithms, Models, Systems (ITAMS 2021, Irkutsk, 14 September 2021)
Том: 2984
Номер:
Год: 2021
Отчётный год: 2021
Издательство:
Местоположение издательства:
URL:
Проекты:
DOI:
Аннотация: The freely available tabular data represented in various digital formats, such as print-oriented documents, spreadsheets, and web pages, are a valuable source to populate knowledge graphs. However, difficulties that inevitably arise with the extraction and integration of the tabular data often hinder their intensive use in practice. TabbyDOC project aims at elaborating a theoretical basis and developing open software for data extraction from arbitrary tables. Previously, it was devoted to the following issues: (i) table extraction tables from print-oriented documents, (ii) data transformation from spreadsheet tables to relational and linked data. This paper summarizes the project’s results that are intended for the following tasks: (i) automation of fine-tuning artificial neural networks for table detection in document images, (ii) a synthesis of programs for spreadsheet data transformation driven by user-defined rules of table analysis and interpretation, and (iii) generating RDF-triples from entities extracted from relational tables.
Индексируется WOS: Нет
Индексируется Scopus: Нет
Индексируется УБС: Нет
Индексируется РИНЦ: Да
Индексируется ВАК: Нет
Индексируется CORE: Нет
Публикация в печати: 0