ICO updates its views on using personal data in generative AI in the UK
Published on 31st Jan 2025
Consultation response provides interim guidance to organisations using personal data in training and deploying generative AI models, raising many issues to consider
In December 2024, the Information Commissioner's Office (ICO) published its response to the results of its extensive consultation on the application of the UK General Data Protection Regulation (GDPR) to generative artificial intelligence (GenAI).
The consultation, initiated in January 2024, took the form of a five-part series on the following areas:
In its response, the ICO has not changed its position on purpose limitation, accuracy and controllership. On the areas of lawful basis and individuals' rights, it has largely maintained its original views, but modified them to a degree.
Legitimate interest
For web scraping, the ICO maintains its original view that legitimate interest is likely to be the only lawful basis for collecting data to train AI models, but doing so successfully could be a difficult hurdle for developers to overcome. To rely on the legitimate interest basis, developers must meet the usual tests, namely purpose, necessity and balancing. Key points include:
When assessing the purpose test, clear and specific legitimate interest purposes must be identified, even though this may be difficult for models with a disparate array of downstream uses. Developers cannot assume that purposes such as "general societal interests" or "innovation" will be good enough: any benefits must be detailed, and of sufficient weight to pass the balancing test. They must also institute controls to ensure that these broad interests are actually being achieved in practice, for example by including appropriate restrictions in their terms and conditions with external users, and enforcing them.
In some cases, it may be possible for GenAI models to be trained on smaller, licensed data sets (rather than those obtained from wide-scale web scraping). Where this is the case, it could be difficult for developers to demonstrate that web scaping meets the "necessary" test.
The balancing test can also be a challenge. The ICO warns that web scraping poses particular risks because most data subjects do not know that it takes place, making it difficult for them to effectively exercise their rights, for example, by objecting to processing. The ICO is concerned that the consultation did not identify any ways to mitigate these risks. The fact that web scraping may be common practice for training AI models does not equate to it being well known to individuals.
The response stresses the importance of transparency. Without the provision of information to data subjects, it says, the legitimate interest balancing test is difficult to meet, because individuals cannot know the purposes for which their data is being processed, or even that it is being processed at all, and so cannot make informed decisions about whether to exercise their rights in relation to it. Transparency failings may also put developers in breach of their GDPR Article 14 obligations.
The ICO is unhappy with current levels of transparency. It wants developers to significantly improve, ensuring that they provide explicit, clear and accessible information about web scraping and its AI model purposes. If they do not, this could become an area of regulatory enforcement action.
Exercise of individual rights
Regarding the exercise of individuals' rights as data subjects, the ICO retains its original view, but has updated it, making the following points:
It is concerned that GenAI model developers are not meeting their obligations and are often ignoring data subject requests.
Developers must comply with the principles of embedding data protection into their systems by design and by default. However, in practice, developers are too often not building their systems in a way which allows the controller(s) to respond to data access requests. They should design systems which allow for specific individuals' personal data to be identified at all stages, including within the training data set, and within the AI model itself.
Developers must not assume that they can avoid their obligations by relying on GDPR Article 11 (processing which does not require identification), that is, by taking a view that their AI models are "anonymous", unless they can demonstrate that individuals cannot be identified.
Views unchanged
The ICO has not changed its original position in the remaining areas of purpose limitation, accuracy and allocating responsibility. The response helpfully summarises its overall position, including:
On purpose limitation, if developers re-use personal data obtained for other purposes, they must assess whether their new purpose (of training a GenAI model) is compatible with their original purpose. If it is not, they will need to identify a new purpose. The ICO also reminded developers that there will often be different purposes for different aspects of the development process, for example for developing an AI model versus developing an application based on the model.
Regarding accuracy, the ICO points out that it is for developers to ensure that the degree of statistical accuracy of their model is proportionate to its final intended application (including by appropriately curating training data). They should understand the accuracy of their model training data and be transparent about it with downstream deployers and other users.
The ICO acknowledges that correctly determining who is controller and/or processor, and/or joint controller for each stage of the AI development model is complex. It disagrees with some positions that industry players put forward. For instance, the ICO thinks it unlikely that developers will be processors in respect of the downstream deployment of systems based on their models, because decisions made by the developer in model training will have so much influence on how personal data is processed when the system is in use.
How does this relate to the EU position?
Businesses subject to the EU GDPR (whether because they are based in the EEA, or because they are caught by its extra-territorial reach provisions) will also need to take account of the European Data Protection Board's (EDPB's) recently issued opinion on the processing of personal data in the context of AI models. (See our Insight for more on the EDPB opinion.)
The ICO's position chimes with the EDPB opinion. For instance, they hold similar views that legitimate interest is the main lawful basis for processing for GenAI, that using the legitimate interest basis poses challenges, and that trained models in themselves may incorporate personal data.
The ICO's consultation and response are more in-depth. They cover a range of issues not addressed by the EDPB's opinion. Likewise, the EDPB's opinion looks at something not addressed by the ICO: the question of whether the downstream deployer of a compliant AI system will be infected with liability stemming from an underlying model which was developed using personal data unlawfully.
What next?
After changes to data protection law are made by the Data (Use and Access) Bill (once finalised), the ICO intends to formally update its guidance to take account of GenAI, and to align this with its upcoming joint statement with the Competition and Markets Authority on AI foundation models.
The release of the AI Opportunities Action Plan shows that the government sees AI as a key priority area for the UK. The plan emphasises that data is a crucial part of the AI landscape and notes the importance of regulators in facilitating growth and uptake of AI. Likewise, the ICO's recent letter to the government emphasises the importance of providing regulatory certainty for AI developers, saying that it will create a single set of rules for AI developers and users and would support legislation to enact this into a statutory Code of Practice on AI. It is possible that this impetus could help the Data (Use and Access) Bill progress relatively swiftly through parliament, and then the ICO to promptly update its AI guidance.
While the formal consultation has closed, the ICO has said that it intends to engage further with relevant stakeholders. This may result in further substantive adjustments to the ICO's position when it does finally issue updated guidance.
What does it mean for business?
The ICO's views are important not only for AI model developers, but also for others in the AI chain, irrespective of whether they develop, commission or incorporate GenAI models. Organisations need to consider the consultation and response alongside the existing core guidance on AI and data protection, paying particular attention to the following:
Giving individuals as much (readily accessible) transparency information as possible.
- Providing tools to allow individuals to effectively exercise their rights as data subjects.
Making sure that models/systems are designed with data privacy compliance built in from the start. For example in respect of allowing personal data about particular individuals to be pinpointed, and allowing data subjects to object to processing.
Conducting a thorough, nuanced analysis of controller/processor/joint controller relationships, documenting the basis of the position settled on, and making sure that contracts reflect this.
If you have any questions about legal issues arising with the use of data in AI, get in touch with one of our team below or your usual Osborne Clarke contact.