Artificial intelligence

EDPB delivers view on using personal data to train and deploy AI models

Published on 17th Jan 2025

EDPB's opinion provides guidance to organisations using personal data in AI model development and deployment

AI photo

On 17 December 2024, the European Data Protection Board (EDPB) issued an opinion on processing personal data in the context of artificial intelligence (AI) models.

The opinion considers some fundamental aspects of data protection law, such as whether mathematical information counts as personal data, and when legitimate interests is an appropriate legal basis for processing. It analyses how this operates in the world of AI model training and use and examines some data issues unique to AI systems. In particular, it considers the compliance consequences of an AI model which is trained using unlawfully processed personal data, but whose outputs are anonymous.

Why the EDPB has given this opinion

Opinions can be issued by the EDPB on request from a national data protection authority (supervisory authority (SA)) – in this case the Irish data protection regulator – where it clarifies an important area of data protection law relevant to an ongoing regulatory activity undertaken by that SA.

The opinion provides a framework for SAs throughout the European Economic Area (EEA) to assess specific AI cases involving any of the questions in the request.

The questions

The EDPB was asked to consider the following questions:

  1. When and how an AI model can be considered "anonymous", meaning that the model is not considered to contain personal data.
  2. How can controllers demonstrate the appropriateness of legitimate interest as a legal basis for processing during the development phases of an AI model (for example, during model training using personal data)?
  3. As for the above, but during the deployment phase of the AI model?
  4. If an AI model is created, updated or developed using unlawfully processed personal data, what is the impact of this on subsequent use of the AI model?

The term "AI model" is not defined in the EU Artificial Intelligence Act (AI Act), or in the EU General Data Protection Regulation (GDPR). For the purposes of the questions, the EDPB considered it to mean the core underlying AI model which results from a training process whereby the model learns how to perform its task by processing a set of training data; and is intended to be incorporated into a larger framework of one or more specific AI systems.

Question 1:  Whether an AI model contains personal data

A fundamental question for the application of the GDPR to AI models is whether the AI model consists of personal data. A controversial discussion of this issue has been sparked by a recent discussion paper from the Hamburg Commissioner for Data Protection and Freedom of Information.

According to the EDPB, the answer requires a case-by-case assessment.  Where an AI model has been trained on personal data, it is quite possible for the model in itself to contain personal data, even assuming that it is not the AI model's purpose to provide personal data as part of responses to queries.  In other words, personal data may remain "absorbed" in the parameters of the model, represented through mathematical objects. With this in mind, the EDPB leaves room to consider a certain AI model to be “anonymous”.

The key question is whether information relating to the individuals whose personal data was used to train the model can be obtained from the model by any means reasonably likely to be used (applying the usual techniques for determining how likely it is that anyone would try).

Questions 2 and 3: Whether the legitimate interest basis can be used

The opinion reiterates the key principles of the legitimate interest assessment, stressing that it must be considered on a case-by-case basis on the particular facts, but usefully contextualising them to the AI model context.  For example, it emphasises:

  • That the controller will need to look at the likely purposes for which the model may be used (including by downstream deployers), and consider whether these might be illegal, unwanted, or otherwise unfair.
  • Developers and deployers must ensure that individuals whose personal data is used in the training data are given the ability to object to the AI processing.
Question 4: Consequences of unlawful processing during training

The EDPB considered the ramifications of several scenarios where the development of an AI model used personal data unlawfully. There was an unsurprising, but still welcome, clarification that, where the subsequent use of the model (either by the same organisation or a third party) involves only anonymised data, the GDPR would not apply to that subsequent use, and so there would be no GDPR enforcement action in respect of it.

In relation to the scenario where the unlawfully developed model is used by a different entity to the developer, the opinion sets out some of the factors SAs should consider when assessing the lawfulness of the subsequent processing, including:

  • The impact of corrective measures on the initial developer (for instance, what would happen if they issue an order to erase the personal data used by the developer?).
  • Whether the subsequent organisation conducts a lawfulness assessment in accordance with their accountability requirements as a data controller. In essence, did the subsequent organisation make appropriate enquiries of the developer?

In this instance, data protection enforcement in respect of an AI model which has been developed by processing personal data unlawfully can hinge on whether the personal data was anonymised in the subsequent use of the model. While this may seem like an attractive work-around for some developers, achieving true anonymisation is often technically difficult in practice, and doing so effectively may compromise a model's commercial usefulness.

The EDPB makes clear that, in situations where it was not possible to selectively delete the unlawfully used personal data, an SA could order the whole of a training data set to be erased.

Is this opinion binding on organisations?

Opinions of the EDPB are not directly binding in the EEA. However, this one is likely to be relied on by the Irish regulator and other European SAs, in which case it may well become applicable to organisations by being incorporated into case law or other regulatory enforcement.

It may also become directly applicable to UK or other non-EEA businesses if they interact with EEA countries, for instance if a UK or USA AI developer provides a service incorporating its model to a user based in France.

The UK's privacy regulator, the Information Commissioner's Office (ICO) is also giving a high priority to AI.  As data protection law continues to be very similar between the UK and EU, the ICO may well take on board the EDPB's reasoning as it continues to consider AI issues.

Osborne Clarke comment

As the growth in the worlds of AI and data accelerates, data protection law increasingly collides with the sphere of AI.  This opinion provides some insight into the views of the EDPB on some of the many data protection issues that have come to the fore recently, particularly following the release of ChatGPT and other generative AI systems.

It is relevant for the many businesses that are keen to innovate and use AI, but find themselves grappling with the data protection issues involved.

AI businesses and their customers will welcome some clarity in this area of law. However, the opinion still lacks specific guidance and definite answers meaning that legal assessments can vary on a case-by-case basis.  It seems likely that SAs are not aligned on many points. Therefore, the interaction between the GDPR and AI still raises a host of unanswered questions, and this is a key area of focus for data protection regulators in 2025. 

AI developers and deployers need to take great care in their use of personal data and be scrupulous about complying with the GDPR (and documenting that compliance) as well as with other regulatory regimes such as the AI Act.  Important measures include:

  • Assessing if personal data is contained in the respective AI model or processed when using the AI model.
  • Conducting a very diligent legitimate interest test, including considerations regarding: the specific use cases (and impact on data subjects), the specific data being processed (volume and type), the expectations of data subjects, access rights during development and deployment, anonymisation or pseudonymisation, and transparency.

If you have any questions about legal issues arising with the use of data in AI, get in touch with one of our team below or your usual Osborne Clarke contact.

 

Share

* This article is current as of the date of its publication and does not necessarily reflect the present state of the law or relevant regulation.

Interested in hearing more from Osborne Clarke?