The Spanish Data Protection Agency (AEPD) has recently published guidance on the importance of ensuring that the data used to train an AI system is accurate.
Adequate safeguards should be implemented by design to prevent inaccuracy of input data and mitigate the impact of potential inaccuracies.
AI and accuracy principle
As discussed in our previous post, AI systems generally rely on vast amounts of data, including personal data. This means that processing personal data in AI systems requires compliance with GDPR requirements, including the accuracy principle (Article 5(1)(d) GDPR).
According to AEPD, inaccurate data inputs can compromise the output generated by the algorithm, leading to potential biases or errors. To address this, appropriate measures must be taken to ensure the accuracy of the data used in AI systems and minimise the impact of potential inaccuracies.
Following a privacy by design approach is essential, and the effectiveness of the AI system must be revised and updated when necessary.
Key considerations
AEPD provides a number of key considerations regarding the implementation of the accuracy principle in AI, including:
- Entire processing cycle – The accuracy principle should be applied throughout the entire processing cycle, including input, output, and intermediate data.
- Precise definition of input data – Each input data must be accurately defined during the design phase. The range of values, such as “yes or no” or “1 to 100”, should be adequately adjusted to the specific context.
- Assessment of input data –The impact of each input data on the generated outcome should be assessed "by design" for each specific purpose.
- Input data collection – When input data is directly collected from data subjects, they have to be made aware of how their responses may impact the output generated. When collecting data from other sources, it is important to consider that data may undergo modifications from its collection to the execution of the algorithm.
- Implementing adequate safeguards – AI systems and algorithms can make mistakes (just like humans!). Adequate measures must therefore be implemented, especially where the outcome significantly affects or has legal implications on the data subject. These measures have to be reviewed and updated when necessary.
- Erasure and rectification – AEPD reminds organisations of their obligation to erase or rectify inaccurate personal data without delay. This obligation applies to each specific purpose and throughout the entire processing cycle.
ChatGPT and data accuracy
This is not the first time a European data protection authority has addressed the importance of data accuracy in AI systems.
The Italian Garante previously raised concerns about ChatGPT’s compliance with the accuracy principle when it temporarily banned ChatGPT in Italy at the beginning of April.
AI Act
The EU is in the process of adopting the first ever rules regulating AI. The proposed AI Act follows a risk-based approach and imposes obligations on providers and users based on the level of risk associated with the AI system.
The proposed AI Act also highlights the importance of data accuracy. For instance, high-risk AI systems should meet appropriate levels of accuracy, robustness, and cybersecurity throughout their lifecycle. Users would also need to be informed of such level of accuracy.
Looking ahead
Companies need to design their AI systems ensuring compliance with the accuracy principle and other GDPR requirements when processing personal data.
Data accuracy will be required not just for GDPR compliance, but also to ensure compliance with accuracy requirements introduced by the upcoming AI Act.
Stay tuned for any updates!