Generative AI and Personal Information: Legal Obligations and Practical Compliance

About 10 minutes

Legal, compliance, and information security teams at companies using generative AI; AI product developers working to comply with personal data protection laws

AI Governance Overview

As generative AI tools become embedded in business workflows, organizations increasingly face the question of what happens to the personal information — customer data, employee records, contract details — that flows into these systems.

This article identifies three key scenarios where generative AI and personal data intersect, and outlines what organizations need to do under Japan’s Act on the Protection of Personal Information (APPI) and the EU’s General Data Protection Regulation (GDPR).

Three Scenarios Where Generative AI Meets Personal Data

1. Personal Data in Training Sets

When an organization trains or fine-tunes an AI model using data that includes personal information, data protection law applies. Using customer data collected for service delivery as training data for a new AI product is a change of purpose that typically requires disclosure or consent.

2. Personal Data Entered as Prompts

When employees use SaaS AI tools (ChatGPT, Claude, etc.) in their daily work, they may paste customer names, contact details, contract terms, or health information into a prompt. That input is transmitted to a third-party server — a transfer that triggers legal obligations in most jurisdictions.

3. Personal Data in AI Outputs

AI models trained on data that includes personal information may reproduce that information in their outputs — a phenomenon known as memorization. A specific individual’s details could appear in a response to an unrelated query, constituting an unintended disclosure.

Japan: The Act on the Protection of Personal Information (APPI)

Defining Personal Information

Under the APPI, “personal information” means information relating to a living individual that can identify a specific person, including information that can be cross-referenced with other data to enable identification.[1]

In the AI context, the following types of data are likely to qualify:

Names, addresses, phone numbers, email addresses
Facial images, voice data (individual identification codes)
Purchase history, browsing history (may identify an individual when combined with other data)
Medical and health information (subject to stricter rules as “special care-required personal information”)

Prohibition on Use Beyond Specified Purpose

Personal information handling businesses must not process personal information beyond the scope necessary to achieve the specified purpose of use without the individual’s consent (Article 18).[1]

Using personal data collected for customer service purposes to train an AI model constitutes a new purpose. The organization must either obtain consent for this new purpose or have disclosed it as an intended use at the time of collection.

Restrictions on Third-Party Disclosure

Providing personal information to a third party generally requires the individual’s consent (Article 27).[1]

Sending personal data to an external AI API may constitute third-party provision. Whether exceptions for outsourcing or joint use apply depends on the contractual structure and actual arrangements — legal review is advisable.

Stricter Rules for Special Care-Required Personal Information

Medical and health information, race, creed, and criminal history are classified as “special care-required personal information.” Obtaining consent at the point of collection is required (Article 20, Paragraph 2).[1]

When AI systems in healthcare, HR, or similar domains process this category of data, requirements are materially stricter than for ordinary personal information.

The GDPR may apply to Japanese organizations that process personal data of EU residents or offer products and services to individuals in the EU.[2]

Lawful Basis for Processing

Processing personal data requires one of the following lawful bases under the GDPR:

Basis	Description
Consent	The individual has given explicit consent
Performance of a contract	Processing is necessary for a contract with the individual
Legal obligation	Processing is required by law
Legitimate interests	Necessary for the legitimate interests of the controller or a third party, unless overridden by the individual’s rights

When relying on legitimate interests for AI data processing, organizations must conduct and document a balancing test weighing those interests against the rights and interests of the individuals concerned.

Restrictions on Automated Decision-Making

Article 22 of the GDPR gives individuals the right not to be subject to decisions based solely on automated processing that produce significant effects on them.[2]

AI-driven hiring screenings, loan decisions, and insurance assessments fall into this category. Building a human review process (Human-in-the-Loop) into the decision workflow is a governance requirement in these contexts.

Data Protection Impact Assessment (DPIA)

Before undertaking high-risk processing — large-scale personal data processing, processing of sensitive categories of data — a Data Protection Impact Assessment is mandatory (Article 35).[2]

Organizations should assess whether a DPIA is required before deploying a new AI system.

Practical Steps for Organizations

1. Review AI Service Terms Before Use

Before using an external AI service for work that involves personal data, check:

Whether input data is used to train or improve the model
Data retention period and storage location (country or region)
Availability of a Data Processing Agreement (DPA)
Whether an enterprise plan with data processing restrictions is available

Many AI service providers offer enterprise contracts that prohibit using customer input for model training. Verifying these terms before deploying AI tools in personal data workflows is essential.[3]

2. Establish Prompt Input Guidelines

Define what employees may and may not include in prompts when using AI tools.

Which categories of personal information are prohibited or restricted in prompts
Rules for anonymization or pseudonymization before AI input (replacing identifying details)
Specific handling rules for special care-required personal information (medical data, etc.)
Situations where customer disclosure or consent is required before using AI tools

3. Manage Personal Data in Training Sets

For organizations developing or fine-tuning their own models:

Confirm purpose specification and consent where required
Apply privacy-preserving techniques (differential privacy, anonymization)
Minimize data: use only what is necessary for the intended purpose
Implement access controls and audit logging for training data

4. Respond to Data Subject Rights

Both the APPI and GDPR recognize rights that individuals can exercise over their personal data:

Right	Description	AI Implementation Consideration
Disclosure	Right to know what personal data is held	Identify and provide the individual’s data used in or by AI systems
Correction/Deletion	Right to correct or delete inaccurate data	Deletion from trained models is technically difficult; preventing inclusion is the more practical control
Suspension of use	Right to stop processing for a specific purpose	Build a procedure for stopping processing when consent is withdrawn

Deletion of personal information from a trained model is technically challenging. Research into “machine unlearning” is ongoing, but preventing personal data from entering training sets in the first place remains the most practical approach.

5. Incident Response Planning

Prepare a response procedure for personal data breaches.

Under the APPI, certain types of data breaches must be reported to the Personal Information Protection Commission and disclosed to affected individuals (Article 26)[1]
Under the GDPR, data breaches must generally be reported to the supervisory authority within 72 hours of discovery[2]

Summary

The intersection of generative AI and personal information creates obligations in three scenarios: training data, prompt input, and AI output. Under the APPI, the key issues are use beyond the specified purpose and third-party disclosure restrictions. Under the GDPR, the key issues are establishing a lawful basis for processing and complying with restrictions on automated decision-making.

Practical priorities include reviewing AI service terms, establishing prompt input guidelines for employees, managing personal data in training sets, and building procedures for responding to data subject rights requests. Addressing these incrementally while staying current with regulatory developments is a realistic path forward.

For specific legal judgment, consulting a lawyer or privacy specialist with expertise in data protection is recommended.

Related topics: Generative AI and Privacy | AI and Copyright

References

Personal Information Protection Commission, Act on the Protection of Personal Information, Articles 2, 18, 20, 26, 27
European Union, Regulation (EU) 2016/679 (General Data Protection Regulation), Articles 6, 22, 35
Personal Information Protection Commission, Notice regarding generative AI services, June 2, 2023

Quiz

Generative AI and Privacy: Risks and Design Principles

AI and Copyright: Comparing Legal Frameworks in Japan and the US