ArXiv Preprint
Named entity recognition models (NER), are widely used for identifying named
entities (e.g., individuals, locations, and other information) in text
documents. Machine learning based NER models are increasingly being applied in
privacy-sensitive applications that need automatic and scalable identification
of sensitive information to redact text for data sharing. In this paper, we
study the setting when NER models are available as a black-box service for
identifying sensitive information in user documents and show that these models
are vulnerable to membership inference on their training datasets. With updated
pre-trained NER models from spaCy, we demonstrate two distinct membership
attacks on these models. Our first attack capitalizes on unintended
memorization in the NER's underlying neural network, a phenomenon NNs are known
to be vulnerable to. Our second attack leverages a timing side-channel to
target NER models that maintain vocabularies constructed from the training
data. We show that different functional paths of words within the training
dataset in contrast to words not previously seen have measurable differences in
execution time. Revealing membership status of training samples has clear
privacy implications, e.g., in text redaction, sensitive words or phrases to be
found and removed, are at risk of being detected in the training dataset. Our
experimental evaluation includes the redaction of both password and health
data, presenting both security risks and privacy/regulatory issues. This is
exacerbated by results that show memorization with only a single phrase. We
achieved 70% AUC in our first attack on a text redaction use-case. We also show
overwhelming success in the timing attack with 99.23% AUC. Finally we discuss
potential mitigation approaches to realize the safe use of NER models in light
of the privacy and security implications of membership inference attacks.
Rana Salal Ali, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Tham Nguyen, Ian David Wood, Dali Kaafar
2022-11-04