TY - GEN
T1 - Effective quality assurance for data labels through crowdsourcing and domain expert collaboration
AU - Lee, Wei
AU - Huang, Chi Hsuan
AU - Chang, Chien Wei
AU - Wu, Ming Kuang Daniel
AU - Chuang, Kun Ta
AU - Yang, Po An
AU - Hsieh, Chu Cheng
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s)
PY - 2018
Y1 - 2018
N2 - Researchers and scientists have been using crowdsourcing platforms to collect labeled training data in recent years. The process is cost-effective and scalable, but research has shown that the quality of truth inference is unstable due to worker bias, work variance, and task difficulty. In this demonstration, we present a hybrid system, named IDLE (Integrated Data Labeling Engine), that brings together a well-trained troop of domain experts and the multitudes of a crowdsourcing platform to collect high-quality training data for industry-level classification engines. We show how to acquire high quality labeled data through quality control strategies that dynamically and cost-effectively leverage the strengths of both domain experts and crowdsourcing.
AB - Researchers and scientists have been using crowdsourcing platforms to collect labeled training data in recent years. The process is cost-effective and scalable, but research has shown that the quality of truth inference is unstable due to worker bias, work variance, and task difficulty. In this demonstration, we present a hybrid system, named IDLE (Integrated Data Labeling Engine), that brings together a well-trained troop of domain experts and the multitudes of a crowdsourcing platform to collect high-quality training data for industry-level classification engines. We show how to acquire high quality labeled data through quality control strategies that dynamically and cost-effectively leverage the strengths of both domain experts and crowdsourcing.
UR - http://www.scopus.com/inward/record.url?scp=85072046829&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072046829&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2018.75
DO - 10.5441/002/edbt.2018.75
M3 - Conference contribution
AN - SCOPUS:85072046829
T3 - Advances in Database Technology - EDBT
SP - 646
EP - 649
BT - Advances in Database Technology - EDBT 2018
A2 - Bohlen, Michael
A2 - Pichler, Reinhard
A2 - May, Norman
A2 - Rahm, Erhard
A2 - Wu, Shan-Hung
A2 - Hose, Katja
PB - OpenProceedings.org
T2 - 21st International Conference on Extending Database Technology, EDBT 2018
Y2 - 26 March 2018 through 29 March 2018
ER -