Copyright and Risks of AI-Generated Code: OSS Licenses, Copyleft, and Enterprise Best Practices

About 5 minutes

Developers and engineers using generative AI for code generation in business, corporate legal and IP staff, and team leads involved in OSS usage

AI and Copyright: Comparing Legal Frameworks in Japan and the US and Rights Issues with AI-Generated Images

As AI-assisted coding (GitHub Copilot, Amazon Q Developer, and others) becomes widespread, enterprises need to understand the legal risks spanning copyright law, OSS licenses, and patent law before adopting these tools [1][2][3]. This article organizes the risks specific to AI-generated code and the measures enterprises should take.

Copyright Ownership of AI-Generated Code

Does AI-Generated Code Have Copyright?

Under Japanese copyright law, a work is defined as “a creative expression of thoughts or feelings,” and human creative authorship is a prerequisite [1]. Code autonomously generated by AI without any human creative contribution is generally considered unlikely to qualify for copyright protection.

In practice, however, it is rare for AI to generate code with complete autonomy. Humans participate in the creative process in the following ways:

Designing and iteratively refining prompts
Selecting the best option from multiple generated candidates
Modifying, refactoring, and combining generated code

The prevailing interpretation is that the portions reflecting human judgment, editing, and creative selection may give rise to copyright [1].

Who “Owns” the Code?

Party	Nature of Rights
AI tool vendor	Output use conditions, indemnity, training-data use, and log handling depend on the service’s terms and administrative controls [2][3]
User (individual or enterprise)	Portions with human creative contribution may be protectable, while autonomously generated AI portions have a limited protection scope [1]
Copyright holders of training data	May have grounds to assert rights if generated code is derived from or substantially similar to the training data [1]

Tool-selection premise: Services such as GitHub Copilot and Amazon Q Developer differ in features, administrative controls, and treatment of output [2][3]. Rights, indemnity, code referencing or duplicate detection, and handling of input data must be checked against the official documentation and contract terms in effect at adoption time.

OSS License Issues: Open-Source Code as Training Data

The Nature of the Problem

When using generative AI coding tools that may learn from or reference public code, teams need to verify which license conditions may affect inputs, outputs, and candidate code. Public repositories include code under permissive licenses such as MIT and Apache 2.0, but also code under copyleft licenses such as GPL, LGPL, and AGPL [4].

Training Data
├── MIT-licensed code (permissive: free to reuse and use commercially)
├── Apache 2.0-licensed code (permissive: includes patent clause)
├── GPL code (copyleft: same license propagates to derivatives)
├── LGPL code (weak copyleft: conditions relaxed for dynamic linking)
└── AGPL code (strong copyleft: also applies to network-served software)

The Copyleft “Propagation” Risk

The core condition of the GPL license is that when software incorporating GPL code is distributed, the corresponding source code must be made available under GPL conditions [4].

Risk scenario with AI-generated code:

AI outputs code substantially similar to GPL code
A developer incorporates that code into a proprietary product
The GPL conditions “propagate,” potentially requiring release of the entire product’s source code

This “copyleft contamination” is a risk that directly impacts enterprise intellectual property strategy [4].

Assessing Substantial Similarity

Copyright infringement requires both access (provenance) and substantial similarity, and for code, similarity is evaluated on these dimensions:

Literal similarity: Code copied verbatim or near-verbatim
Non-literal similarity: Similarity in code structure, algorithms, or program flow

Note that ideas and algorithms themselves are not protected by copyright (the idea-expression dichotomy). Code that independently implements the same algorithm with a different expression does not constitute copyright infringement [1].

Patent Rights: Implementation Risk of Algorithms

Separately from copyright, code generated by AI may infringe existing patents covering the technology or process it implements [5].

Even if AI-generated, code that practices a patented invention constitutes patent infringement [5]
Whether the generation process was automated does not affect the patent infringement determination
Patent investigation (Freedom to Operate analysis) is required separately from copyright checks

Patent investigation of generated code is especially important in high-patent-density fields such as healthcare, finance, telecommunications, and security.

Security Risks: Cautions Beyond Copyright

AI-generated code can also contain security issues.

Problems Reported in Academic Research

Research published in 2021 found that code generated by GitHub Copilot can contain security vulnerabilities [6].

Risk Type	Example
Use of outdated or vulnerable APIs	Deprecated cryptographic methods, deprecated functions
SQL injection	Improper query construction
Authentication weaknesses	Missing permission checks, incomplete input validation
Sensitive information leakage	Hardcoded credentials or secrets
OS command injection	Command execution involving external input

Generative AI produces code that “looks right,” but does not guarantee security correctness [6].

Practical Enterprise Responses

Key Factors in Tool Selection

Item to Verify	Reason
Training-data and referenced-code license policy	Assess copyleft contamination risk [2][3][4]
Rights assignment and output use conditions	Confirm the scope in which the enterprise can use the code [2][3]
Availability of copyright indemnification programs	Confirm the protection scope if infringement claims arise [2][3]
Enterprise-grade options	Manage input data, logs, access control, and code-reference detection [2][3]

Designing a Code Generation Policy

Enterprise Code Generation Policy (Example)

1. Permitted Tools
   - Use only approved generative AI tools
   - Inputting code with personal accounts is prohibited (prevent leakage of sensitive code)

2. Mandatory Code Review
   - All generated code must go through human code review
   - Code critical to security (authentication, encryption, etc.) requires expert review

3. OSS Compliance
   - Scan generated code with an OSS scanning tool before committing
   - When copyleft licenses are detected, consult legal or IP departments

4. Prompt Management
   - Do not include sensitive information, customer data, or internal system details in prompts
   - Maintain a record of the correspondence between prompts used and generated code

Implementing OSS Scanning

Tools used to detect license contamination in generated code:

FOSSA / Black Duck / Snyk — commercial OSS compliance tools
ScanCode / FOSSology — open-source scanning tools
Code reference or duplicate-detection features provided by AI tool vendors, such as GitHub Copilot code referencing [2]

Record Keeping

Maintaining the following information about generated code can assist in responding when issues arise later.

Information to Record
- Name and version of the AI tool used
- Overview of the prompt used for generation
- Date and time of generation
- Code reviewer and review date
- OSS scan results and actions taken

Summary

Legal risks in AI-generated code extend beyond copyright to OSS licenses, patents, and security.

Risk	Description	Primary Response
Copyright	Ambiguity in rights ownership; similarity derived from training data	Verify terms of service; verify indemnification programs
Copyleft contamination	Risk of GPL and similar licenses propagating to the entire enterprise product	OSS scanning; copyleft isolation policy
Patent rights	Risk that generated code infringes existing patents	FTO analysis; caution in high-patent-density areas
Security	Automatic generation of vulnerable code	Mandatory code review; security scanning

Generative AI can dramatically improve development productivity, but recognizing these risks and designing governance around them is a prerequisite for sustainable enterprise adoption.

References

e-Gov Law Search, Copyright Act (Act No. 48 of 1970)
GitHub Docs, GitHub Copilot features
AWS, Amazon Q Developer
GNU Project, GNU General Public License version 3
e-Gov Law Search, Patent Act (Act No. 121 of 1959)
Pearce et al., Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions, arXiv:2108.09293 (2021)

Quiz

Rights Issues with AI-Generated Images: Copyright, Portrait Rights, Trademarks, Design Rights, Similarity, and Provenance

Balancing Guardrails and Governance for Enterprise Generative AI