TY - GEN
T1 - The Data Recovery Service in NoSQL
AU - Tsai, Chia Ping
AU - Hsiao, Hung Chang
AU - Lai, Yu Chen
N1 - Funding Information:
ACKNOWLEDGEMENTS Hung-Chang Hsiao was partially supported by Ministry of Science and Technology (MOST) under Grant MOST 111-2218-E-006-012 in Taiwan, and by the Intelligent Manufacturing Research Center (iMRC) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to the data recovery issue in NoSQL. Data recovery for any database table entails restoring the table to a prior state or replaying (insert/update) operations over the table given a time period in the past. Recovery of NoSQL database tables enables applications such as failure recovery, analysis for historical data, debugging, and auditing. In this paper, we first identify the design and implementation issues with regard to the data recovery problem for NoSQL databases, including time length of recovery, fault tolerance, scalability, memory constraint, software compatibility, and quality of recovery. Particularly, our study emphasizes on columnar NoSQL databases. We then propose and evaluate four solutions to address the data recovery problem in NoSQL; each solution has its pros and cons. We implement our solutions based on Apache HBase, a popular NoSQL database in the Hadoop ecosystem widely adopted by industry. Our implementations are extensively benchmarked with an industrial NoSQL benchmark under real environments. Specifically, our research findings and implementations in this paper have been contributed to and integrated with Apache HBase for global distribution.
AB - Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to the data recovery issue in NoSQL. Data recovery for any database table entails restoring the table to a prior state or replaying (insert/update) operations over the table given a time period in the past. Recovery of NoSQL database tables enables applications such as failure recovery, analysis for historical data, debugging, and auditing. In this paper, we first identify the design and implementation issues with regard to the data recovery problem for NoSQL databases, including time length of recovery, fault tolerance, scalability, memory constraint, software compatibility, and quality of recovery. Particularly, our study emphasizes on columnar NoSQL databases. We then propose and evaluate four solutions to address the data recovery problem in NoSQL; each solution has its pros and cons. We implement our solutions based on Apache HBase, a popular NoSQL database in the Hadoop ecosystem widely adopted by industry. Our implementations are extensively benchmarked with an industrial NoSQL benchmark under real environments. Specifically, our research findings and implementations in this paper have been contributed to and integrated with Apache HBase for global distribution.
UR - http://www.scopus.com/inward/record.url?scp=85147911094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147911094&partnerID=8YFLogxK
U2 - 10.1109/BigData55660.2022.10020447
DO - 10.1109/BigData55660.2022.10020447
M3 - Conference contribution
AN - SCOPUS:85147911094
T3 - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
SP - 2394
EP - 2401
BT - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
A2 - Tsumoto, Shusaku
A2 - Ohsawa, Yukio
A2 - Chen, Lei
A2 - Van den Poel, Dirk
A2 - Hu, Xiaohua
A2 - Motomura, Yoichi
A2 - Takagi, Takuya
A2 - Wu, Lingfei
A2 - Xie, Ying
A2 - Abe, Akihiro
A2 - Raghavan, Vijay
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Big Data, Big Data 2022
Y2 - 17 December 2022 through 20 December 2022
ER -