Страница публикации
Table Header Correction Algorithm Based on Heuristics for Improving Spreadsheet Data Extraction
Авторы: Paramonov V., Shigarov A., Vetrova V.
Журнал: Communications in Computer and Information Science: 26th Intern. Conf. on Information and Software Technologies (ICIST 2020; Kaunas, Lithuania; 15-17 October 2020)
Том: 1283
Номер:
Год: 2020
Отчётный год: 2020
Издательство:
Местоположение издательства:
URL:
Проекты:
DOI: 10.1007/978-3-030-59506-7_13
Аннотация: A spreadsheet is one of the most commonly used forms of representation for datasets of similar type. Spreadsheets provide considerable flexibility for data structure organisation. As a result of this flexibility, tables with very complex data structures could be created. In turn, such complexity makes automatic table processing and data extraction a challenging task. Therefore, table preproccessing step is often required in the data extraction pipeline. This paper proposes a heuristic algorithm for the correction of a table header in a spreadsheet. The aim of the proposed algorithm is to transform a machine-readable structure of the table header into its visual representation. The algorithm achieves this aim by iterating through table header cells and merging some of them according to proposed heuristics. The transformed structure, in turn, allows to improve quality of spreadsheet understanding and data extraction further in the pipeline. The proposed algorithm was implemented in the TabbyXL toolset.
Индексируется WOS: Нет
Индексируется Scopus: Нет
Индексируется УБС: Нет
Индексируется РИНЦ: Да
Индексируется ВАК: Нет
Индексируется CORE: Нет
Публикация в печати: 0