Effective quality assurance for data labels through crowdsourcing and domain expert collaboration

Wei Lee, Chi Hsuan Huang, Chien Wei Chang, Ming Kuang Daniel Wu, Kun Ta Chuang, Po An Yang, Chu Cheng Hsieh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Researchers and scientists have been using crowdsourcing platforms to collect labeled training data in recent years. The process is cost-effective and scalable, but research has shown that the quality of truth inference is unstable due to worker bias, work variance, and task difficulty. In this demonstration, we present a hybrid system, named IDLE (Integrated Data Labeling Engine), that brings together a well-trained troop of domain experts and the multitudes of a crowdsourcing platform to collect high-quality training data for industry-level classification engines. We show how to acquire high quality labeled data through quality control strategies that dynamically and cost-effectively leverage the strengths of both domain experts and crowdsourcing.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2018
Subtitle of host publication21st International Conference on Extending Database Technology, Proceedings
EditorsMichael Bohlen, Reinhard Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, Katja Hose
PublisherOpenProceedings.org
Pages646-649
Number of pages4
ISBN (Electronic)9783893180783
DOIs
Publication statusPublished - 2018
Event21st International Conference on Extending Database Technology, EDBT 2018 - Vienna, Austria
Duration: 2018 Mar 262018 Mar 29

Publication series

NameAdvances in Database Technology - EDBT
Volume2018-March
ISSN (Electronic)2367-2005

Conference

Conference21st International Conference on Extending Database Technology, EDBT 2018
Country/TerritoryAustria
CityVienna
Period18-03-2618-03-29

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Effective quality assurance for data labels through crowdsourcing and domain expert collaboration'. Together they form a unique fingerprint.

Cite this