GDPR vs. ChatGPT: A Tug of War for Data Accuracy

Shradhanjali Sarma and Shatakshi Shekhar

May 04, 2024

Background

The NOYB – European Center for Digital Rights recently lodged a complaint with the Austrian data protection authority regarding ChatGPT's handling of personal data accuracy. The complaint stemmed from an incident involving a public figure who sought correction of inaccurate birth date information generated by ChatGPT. This raises pertinent questions about compliance with the General Data Protection Regulation (GDPR) and the responsibility of AI systems like ChatGPT in ensuring data accuracy.

The crux of the complaint revolves around ChatGPT's inability to provide accurate birth date information upon request from the complainant. Despite access to publicly available data about the individual, ChatGPT's algorithm failed to deduce the correct birth date, generating various inaccurate outputs. When approached for rectification, ChatGPT maintained that its systems lacked the capability to prevent the dissemination of inaccurate data.

The incident underscores the challenges associated with AI-driven systems like ChatGPT in handling personal data accurately. Under the GDPR, individuals have the right to request correction of incorrect personal data. This right is fundamental to safeguarding individuals' privacy and ensuring the accuracy of information stored and processed by entities. ChatGPT's inability to rectify inaccuracies raises concerns about its compliance with GDPR provisions.

The Issue

In the present complaint, on being requested to erase the information, OpenAI, the controller, acknowledged having filters capable of blocking the display of personal data upon request. However, it emphasised the inherent challenge in selectively blocking the data subject's date of birth without inadvertently affecting the presentation of other relevant information about them by ChatGPT. This means that ChatGPT has no means of correcting false information, but can only hide it at the final output stage. The false information would still be present in the system, and only the presentation of it to a wider audience would be hidden.

In their own words, OpenAI has explained that ChatGPT’s functionality relies on processing extensive data sets. ChatGPT predicts the most probable succeeding word in response to a user input, and subsequently, predicts subsequent words. This mechanism closely resembles the auto-complete features found in search engines, smartphones, and email applications. ChatGPT is based on Generative Pretrained Transformer (GPT) architecture. Transformer uses a mechanism called ‘attention’ where it tries to comprehend different words through self-attention to generate output. It makes an attempt at understanding the influence of each word, before providing an output. This has been termed as ‘hallucination’ problem of ChatGPT.

Further explaining the mechanism, OpenAI states that there is an element of randomness in the manner in which ChatGPT responds. In most cases, the same question will be answered in different ways by ChatGPT. Therefore, ChatGPT predicts random incorrect words to predict more accurate words. However, the level of accuracy can never be guaranteed, which is evident from the disclaimer that ChatGPT provides on their platform, stating “ChatGPT can make mistakes. Consider checking important information”.

In the Help section of their website, OpenAI mentions that individuals “may” have the right to access, correct, restrict, delete, or transfer their personal information that may be included in our training information. OpenAI mentions the right to erasure as a permissive provision, subject to discretion rather than an imperative command.

Maartje de Graaf, lawyer for NOYB, has stated that false information about individuals pose inherent risks. Additionally, she mentioned that technology which cannot generate accurate information about individuals should not be used for generating individual specific data. One important aspect that she mentioned was that technology must adhere to legal standards rather than expecting regulations to accommodate its functionalities.

Personal Data and ChatGPT

The complainant has cited two breaches of the GDPR: firstly, Article 12(3) and 15, pertaining to OpenAI's failure to provide any information on the data erasure request. Secondly, the complainant has mentioned that there is a violation of Article 5(1)(d) of the GDPR, which is attributed to the controller's alleged inability to ensure accurate processing of personal data. The complainant has requested the Austrian data protection authority to investigate the data processing practices of ChatGPT. It has also requested an imposition fine for non-compliance. Under the GDPR, penalties for non-compliance can reach up to almost 4% of the annual turnover.

In March, the Italian data protection regulator asked OpenAI not to use personal data of Italians to train their model. The regulator has mentioned that OpenAI does not have the legal right to use people’s personal information for training their models. In response to this restrictive order, OpenAI blocked access of the chatbot to Italians.

NOYB has explained that inaccuracy is tolerable to a certain extent. For example, a student using ChatGPT to do his homework can tolerate inaccuracy to an extent. However, as mandated under the GDPR, personal data needs to be accurate.

Concluding Remarks

As discussed in our previous article, every innovation has to be backed by regulation and every innovation has to follow the legal structure. While the balance between regulation and innovation is difficult to be struck, every new technology is required to follow the existing laws.

The present incident is a reminder that while AI technologies like ChatGPT hold immense potential for innovation and efficiency, they must operate within the bounds of legal frameworks and ethical standards. Moving forward, it is imperative for stakeholders, including AI developers, regulatory bodies, and advocacy groups, to collaborate closely to ensure that AI systems uphold privacy rights, ensure data accuracy, and comply with legal requirements, thereby fostering trust and accountability in the digital age.

Tech It Easy Substack

Discussion about this post