For AI firms, anything "public" is fair game
Leading AI companies have a favorite phrase when it comes to describing where they get the data to train their models: They say it's "publicly available" on the internet. "Publicly available" can sound like the company has permission to use the information—but, in many ways, it's more like the legal equivalent of "finders, keepers." "That phrase confuses people," said developer Ed Newton-Rex. "It's probably designed to confuse people." Newton-Rex spent years building AI audio systems before resigning from Stability AI, citing concerns about generative systems built with copyrighted material. The term, perhaps by design, sounds like "public domain"—which refers to information that is no longer subject to copyright protection or otherwise made freely available. As for what constitutes "publicly available" content, OpenAI says, "We only use publicly available information that is freely and openly available on the internet—for example, we do not use information that is password protected or behind paywalls."
For AI firms, anything "public" is fair game