TY - GEN
T1 - TLDR
T2 - 20th Workshop on Privacy in the Electronic Society, WPES 2021, held in conjunction with the ACM CCS 2021
AU - Alabduljabbar, Abdulrahman
AU - Abusnaina, Ahmed
AU - Meteriz-Yildiran, Ülkü
AU - Mohaisen, David
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/11/15
Y1 - 2021/11/15
N2 - Privacy policies are the primary channel where service providers inform users about their data collection and use practices. However, privacy policies are often long and lack any specific structure. The average user struggles to understand their contents and usually skips them, regardless of their importance. Moreover, privacy policies may lack information on critical practices used by the service providers, such as data collection, use disclosure, tracking, and access. We tackle these challenges by introducing TLDR, a machine learning-based automated ensemble of privacy policy classifiers, for (i) categorizing the content into nine privacy policy categories with high performance and (ii) detecting missing information in the privacy policies. Towards addressing the length of the privacy policies, TLDR labels each paragraph in a policy by its content class, which enables users to focus on paragraphs of interest, such as paragraphs with information regarding data collection or tracking practices used by the service operators. TLDR reduces the average reading time by 39.14% by reducing the presented information to users. This process results in an increased understanding of the privacy policies by 18.84%. TLDR reduces the number of paragraphs and words required to be read by the user. This, in turn, reduces the required efforts to understand the service operator's practices.
AB - Privacy policies are the primary channel where service providers inform users about their data collection and use practices. However, privacy policies are often long and lack any specific structure. The average user struggles to understand their contents and usually skips them, regardless of their importance. Moreover, privacy policies may lack information on critical practices used by the service providers, such as data collection, use disclosure, tracking, and access. We tackle these challenges by introducing TLDR, a machine learning-based automated ensemble of privacy policy classifiers, for (i) categorizing the content into nine privacy policy categories with high performance and (ii) detecting missing information in the privacy policies. Towards addressing the length of the privacy policies, TLDR labels each paragraph in a policy by its content class, which enables users to focus on paragraphs of interest, such as paragraphs with information regarding data collection or tracking practices used by the service operators. TLDR reduces the average reading time by 39.14% by reducing the presented information to users. This process results in an increased understanding of the privacy policies by 18.84%. TLDR reduces the number of paragraphs and words required to be read by the user. This, in turn, reduces the required efforts to understand the service operator's practices.
KW - machine learning
KW - natural language processing
KW - privacy
KW - privacy policy
UR - http://www.scopus.com/inward/record.url?scp=85119356041&partnerID=8YFLogxK
U2 - 10.1145/3463676.3485608
DO - 10.1145/3463676.3485608
M3 - Conference contribution
AN - SCOPUS:85119356041
T3 - WPES 2021 - Proceedings of the 20th Workshop on Privacy in the Electronic Society, co-located with CCS 2021
SP - 103
EP - 118
BT - WPES 2021 - Proceedings of the 20th Workshop on Privacy in the Electronic Society, co-located with CCS 2021
PB - Association for Computing Machinery, Inc
Y2 - 15 November 2021
ER -