TY - JOUR
T1 - Learning Privacy-Preserving Embeddings for Image Data to Be Published
AU - Li, Chu Chen
AU - Li, Cheng Te
AU - Lin, Shou De
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/11/14
Y1 - 2023/11/14
N2 - Deep learning shows superiority in learning feature representations that offer promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender, and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to releasing the embeddings rather than the original images. In this article, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keeping the performance of the main task, such as disease prediction) and to simultaneously prevent the private attributes of data instances from being accurately inferred. We also want the hard embeddings to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-Task learning to reach the goal. The key idea is twofold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification's performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-ray datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also shows the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.
AB - Deep learning shows superiority in learning feature representations that offer promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender, and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to releasing the embeddings rather than the original images. In this article, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keeping the performance of the main task, such as disease prediction) and to simultaneously prevent the private attributes of data instances from being accurately inferred. We also want the hard embeddings to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-Task learning to reach the goal. The key idea is twofold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification's performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-ray datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also shows the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.
UR - http://www.scopus.com/inward/record.url?scp=85178472457&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178472457&partnerID=8YFLogxK
U2 - 10.1145/3623404
DO - 10.1145/3623404
M3 - Article
AN - SCOPUS:85178472457
SN - 2157-6904
VL - 14
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
IS - 6
M1 - 105
ER -