ArXiv Preprint
The recent rapid advances in machine learning technologies largely depend on
the vast richness of data available today, in terms of both the quantity and
the rich content contained within. For example, biometric data such as images
and voices could reveal people's attributes like age, gender, sentiment, and
origin, whereas location/motion data could be used to infer people's activity
levels, transportation modes, and life habits. Along with the new services and
applications enabled by such technological advances, various governmental
policies are put in place to regulate such data usage and protect people's
privacy and rights. As a result, data owners often opt for simple data
obfuscation (e.g., blur people's faces in images) or withholding data
altogether, which leads to severe data quality degradation and greatly limits
the data's potential utility.
Aiming for a sophisticated mechanism which gives data owners fine-grained
control while retaining the maximal degree of data utility, we propose
Multi-attribute Selective Suppression, or MaSS, a general framework for
performing precisely targeted data surgery to simultaneously suppress any
selected set of attributes while preserving the rest for downstream machine
learning tasks. MaSS learns a data modifier through adversarial games between
two sets of networks, where one is aimed at suppressing selected attributes,
and the other ensures the retention of the rest of the attributes via general
contrastive loss as well as explicit classification metrics. We carried out an
extensive evaluation of our proposed method using multiple datasets from
different domains including facial images, voice audio, and video clips, and
obtained promising results in MaSS' generalizability and capability of
suppressing targeted attributes without negatively affecting the data's
usability in other downstream ML tasks.
Chun-Fu Chen, Shaohan Hu, Zhonghao Shi, Prateek Gulati, Bill Moriarty, Marco Pistoia, Vincenzo Piuri, Pierangela Samarati
2022-10-18