Страница публикации

Some approaches for improving quality of tabular data

Тип публикации: Статья в журнале

Тип материала: Текст

Авторы: Paramonov V.V., Lomaeva E.A.

Журнал: CEUR Workshop Proceedings: 3rd Scientific-Practical Workshop Information Technologies: Algorithms, Models, Systems (ITAMS 2020; Irkutsk, 3 September 2020)

Язык публикации: english

Серия книг: CEUR Workshop Proceedings

Том: 2677

Год публикации: 2020

Отчетный год: 2020

Аннотация: A spreadsheet is one of popular forms for presentation and transferring data of the same types. The area of using this kind of documents is very widespread. Extraction tables from spreadsheets and their understanding are significant tasks that allow getting useful information for further use, for example in processes of integration data that obtained from various sources. As rule tables in spreadsheets create by humans and for humans use. This feature could be the reason that tables may contain messy data such as misprints, errors of calculation, incorrect structure etc. It leads to the complication of automated table processing and understanding. This paper has discussed some approaches to data cleanse that improve the quality of tabular data. The approaches consist of checking and correction of cells calculation and spelling errors. We use phonetic words similarity to correct spelling mistakes in words and heuristic algorithms to detect calculated values in cells.

Индексируется WOS: Нет

Индексируется Scopus: Нет

Индексируется УБС: Нет

Индексируется РИНЦ: Да

Индексируется ВАК: Нет

Индексируется CORE: Нет