Who owns AI-generated outputs?
The short answer: it depends, and the law is still catching up. The US Copyright Office has taken the position that purely AI-generated works — where a human simply prompted an AI system and accepted the output without meaningful creative selection — are not copyrightable. Copyright requires human authorship.
The practical implication for AI companies: the content your AI generates may not be protectable by copyright. What is protectable is the system, pipeline, code, and training methodology — through a combination of trade secrets, copyright in the software, and contractual restrictions.
For products where AI-generated content is the core deliverable (copywriting tools, image generators, code assistants), this creates a fundamental IP question: if the output isn't protectable, where is the competitive moat? Usually the answer is: in the model quality, fine-tuning, proprietary training data, and UX — none of which is the output itself.
Training data rights
This is the most active legal battleground in AI. Multiple major lawsuits — against OpenAI, Stability AI, GitHub Copilot and others — are testing whether training AI models on copyrighted content constitutes infringement. The outcomes are uncertain, jurisdiction-dependent, and evolving.
For AI startups, the practical risk management framework:
- Know what's in your training data — document sources, licenses, and how you acquired each dataset. Provenance documentation is increasingly required by enterprise customers and will be required by regulators.
- Prefer licensed or permissively licensed data — data licensed for AI training, public domain content, or data you generated yourself is safest. Web scraping of copyrighted content carries litigation risk.
- Assess fair use carefully — in the US, transformative use arguments provide some protection, but the scope of "transformative" for AI training is unresolved. Don't assume fair use is a complete defense.
- Get contractual indemnification from data providers — if you're using licensed datasets, your vendor contract should include IP representations and indemnification.
Model weights and trade secrets
The trained model — the weights that encode what the model has learned — occupies uncertain IP territory. Copyright in model weights is legally unresolved (the US Copyright Office has not taken a clear position). Patents may protect training methodologies in some cases, but the disclosure requirement makes patents poorly suited to protecting competitive advantages in AI.
Trade secret protection is the most practical tool for model protection. Trade secrets protect information that: (1) is secret (not publicly disclosed), (2) has economic value from its secrecy, and (3) is subject to reasonable efforts to maintain secrecy.
To maintain trade secret protection for your model:
- Maintain access controls on model weights and training code
- Use NDAs with anyone who has access to the model architecture
- Restrict API access to prevent model extraction attacks (model stealing via systematic querying)
- Include anti-extraction provisions in your API terms of service
- Document your trade secret program — courts expect evidence of active protection
Code and software IP
The software that builds, trains and runs your AI system is protectable by copyright from the moment it's written — no registration required in the US, though registration enables statutory damages in litigation. The key requirement: all code must be assigned to the company.
The most common AI startup IP problem: a founding engineer who wrote core model code before the company was incorporated, or as a contractor, without a proper IP assignment agreement. This creates a gap in the chain of ownership that due diligence will find. Fix it with a retroactive IP assignment agreement — uncomfortable but standard, and much better than leaving the gap.
Open source considerations
Many AI companies build on open source foundations: PyTorch, TensorFlow, Hugging Face models, open source data processing libraries. Most of these use permissive licenses (MIT, Apache) that allow commercial use without restriction. But some components use copyleft licenses (GPL, AGPL) that require you to release your modifications as open source if you distribute the software.
AGPL is particularly relevant for AI: it applies to software accessed over a network (i.e., your API). If your product uses any AGPL-licensed components, you may be required to release your model server code. Review your dependency tree with an open source audit before any major licensing discussion.
What to do now
For AI founders, the priority IP checklist:
- Ensure PIIA agreements are signed by everyone who contributed to code, models or data pipelines
- Audit training data sources and document provenance
- Implement a trade secret protection program for your model
- Review your dependency tree for open source license compliance
- Draft API terms that restrict extraction and reverse engineering
- Register key trademarks before competitors do
The IP landscape for AI is moving fast. Founders who build proper IP frameworks now are better positioned for fundraising, enterprise sales and eventual exit — where IP ownership is examined closely.