As AI becomes a more significant part of the technological landscape, organizations require more data to train their models. Many tools, such as chat-bots, rely on Large Language Models (LLMs), which require immense text repositories to train. The question then arises, where can organizations find large amounts of text to train their models? Some organizations have looked to their users as a means of acquiring large quantities of quality text data.
Meta Platforms, Inc., finds itself amongst these organizations. Meta will use users’ Instagram and Facebook posts to supply training text data to its models. Though this may seem benign now, it shows a worrying trend of organizations using their users’ data to train AI models. This practice may pose privacy risks to users, as there is a chance that their data can be inadvertently leaked by the models.
Meta has claimed that it will use posts dating as far back as 2007 and future posts made on the platforms to train its future AI tools. By using posts made in the past, Meta is removing users’ ability to boycott the app to prevent their data from being used in the training of AI Models, as the posts have already been made on the platforms. To further raise the privacy risk to certain users, the ability to opt out of the program is only legally guaranteed to those living in countries or states whose privacy legislation demands mandatory opt-out for its citizens. Therefore, some have no choice in the matter, and their past and future data will be used to train AI models.
This issue poses a large risk to users’ data privacy; it is, therefore, imperative that users stay up to date on the terms of service for the platforms they use to ensure that their data is being handled in a way they deem acceptable.