TY - GEN
T1 - Measuring the Privacy Dimension of Free Content Websites through Automated Privacy Policy Analysis and Annotation
AU - Alabduljabbar, Abdulrahman
AU - Mohaisen, David
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/16
Y1 - 2022/8/16
N2 - Websites that provide books, music, movies, and other media free of charge are a central piece of the web ecosystem, although they are vastly unexplored, especially for their security and privacy risks. In this paper, we contribute to the understanding of those websites by focusing on the comparative analysis of their privacy policies, a primary channel where service providers inform users about their data collection and use. To better understand the data usage risks associated with such websites, we study 1,562 websites and their privacy policies in contrast to premium websites. We uncover that premium websites are more transparent in reporting their privacy practices, particularly in categories such as "Data Retention"and "Do Not Track", with premium websites are 85.00% and ≈ 70% more likely to report their practices in comparison to the free content websites. We found the free content websites' privacy policies to be more similar to one another and generic in comparison to the premium websites' privacy policies. Our findings raise several concerns, including that the reported privacy policies may not reflect the data collection practices used by service providers, and various pronounced biases across privacy policy categories. This calls for further investigation of the risks associated with the usage of such free content websites and services through active measurements.
AB - Websites that provide books, music, movies, and other media free of charge are a central piece of the web ecosystem, although they are vastly unexplored, especially for their security and privacy risks. In this paper, we contribute to the understanding of those websites by focusing on the comparative analysis of their privacy policies, a primary channel where service providers inform users about their data collection and use. To better understand the data usage risks associated with such websites, we study 1,562 websites and their privacy policies in contrast to premium websites. We uncover that premium websites are more transparent in reporting their privacy practices, particularly in categories such as "Data Retention"and "Do Not Track", with premium websites are 85.00% and ≈ 70% more likely to report their practices in comparison to the free content websites. We found the free content websites' privacy policies to be more similar to one another and generic in comparison to the premium websites' privacy policies. Our findings raise several concerns, including that the reported privacy policies may not reflect the data collection practices used by service providers, and various pronounced biases across privacy policy categories. This calls for further investigation of the risks associated with the usage of such free content websites and services through active measurements.
KW - Free Content Websites
KW - Natural Language Processing
KW - Privacy Policy
KW - Web Security
UR - http://www.scopus.com/inward/record.url?scp=85134517121&partnerID=8YFLogxK
U2 - 10.1145/3487553.3524663
DO - 10.1145/3487553.3524663
M3 - Conference contribution
AN - SCOPUS:85134517121
T3 - WWW 2022 - Companion Proceedings of the Web Conference 2022
SP - 860
EP - 867
BT - WWW 2022 - Companion Proceedings of the Web Conference 2022
PB - Association for Computing Machinery, Inc
T2 - 31st Companion of the World Wide Web Conference, WWW 2022
Y2 - 25 April 2022
ER -