The Financial Times and OpenAI strike content licensing deal

https://www.ft.com/content/33328743-ba3b-470f-a2e3-f41c3a366613

37 points by kmdupree on 2024-04-29 | 51 comments

Automated Summary

The Financial Times (FT) has entered into a licensing agreement with OpenAI, the artificial intelligence research organization. This collaboration enables OpenAI to utilize FT's content in the development and training of its AI models. This partnership aims to enhance the credibility and contextual understanding of AI-generated text and other media. With this agreement, FT maintains control over its content usage and protects its intellectual property rights. The alliance showcases the potential of media organizations working with AI technology companies to foster responsible AI development.

Archive links

https://archive.md/x0zn7

Other submissions

The Financial Times and OpenAI strike content licensing deal
6 points by jmsflknr on 2024-04-29 | 1 comment

Comments

artninja1988 on 2024-04-29

Grim. This is just confirmation that the future of llms will be a walled garden of big tech and big rights holders. There's no open ecosystem if content needs to be licensed at prohibitive prices for small players/ universities/ open source. Between this and them lobbying for regulation closed ai is obviously pulling up the ladder behind them now that they're valued at $80 billion from being reportedly on the brink of bankruptcy just a few years ago before chatgpt

mvdtnz on 2024-04-29

Sounds good to me. I don't see why anyone else should feel they have the rights to content just to train crappy chat bots. Real people with real families and real mortgages created that content.

philipwhiuk on 2024-04-29

They're only licensing what they have to under threat of lawsuit. Little people's data will still be taken without refund.

BriggyDwiggs42 on 2024-04-29

In this case is it better that they get money and rich people consolidate more power over an emerging and likely influential technology, or is it better that they don’t and open source is able to flourish?

__loam on 2024-04-29

If you can't run your business without violating the property rights of millions of people, maybe your business shouldn't exist.

nomel on 2024-04-29

This comment is direct, but not unreasonable. To steel man it...

The success of the LLM were built by the work product of others, without compensation. Now those others are looking for their due compensation.

Freely profiting off the, quite literally, compressed output of others isn't really a business model that's sustainable, for either side. The only sustainable solution would necessarily involves money going from the content users (multi B $ LLM companies) to the content producers (artists, news orgs, etc). For a logical litmus test, apply what's happening to any other content/industry.

stale2002 on 2024-04-29

Hey thats fine if you have that opinion.

I just hope you are consistent in that you oppose the existing ubiquitous copyright violations done by the entire art industry, in the form of commercial "fan art", or similar.

A whole lot of content is built off of other people's works, and much of it is not done "in fair use", but of course, then the shoe is on the other foot, those same creators complain.

__loam on 2024-04-29

I have a lot of trouble taking this obviously bad faith argument seriously. Aside from the fact that we're comparing random fan artists with well resourced tech corporations that trained on hundreds of millions of images, most rightsholders seem to appreciate that fan art is free publicity for them. If nobody is going after fan art, then there's not really a problem. On the other hand, many people and organizations seem pretty happy to file suits against OpenAI.

__loam on 2024-04-29

I'm not sure AI businesses would be viable if they had to license all the content in addition to paying all the compute costs. But, that really isn't the rightsholders' problem.

madeofpalk on 2024-04-29

That's the clearest indication that OpenAI should be seeking permissions and paying rightsholders for training data.

If OpenAI loses value as a company if it does not have that content, that indicates the content has value, and OpenAI should pay for the materials they 'use' to create their own value.

artninja1988 on 2024-04-29

These models will exist one way or another. Even the most IP concerned lawmakers in the deepest pockets of the RIAA and whatnot. It's just that some want there to be monopoly rents payed to themselves every time someone uses them

jsheard on 2024-04-29

> These models will exist one way or another.

This line of reasoning seems more or less equivalent to "movie piracy will exist one way or another, therefore I should be allowed to launch a commercial Netflix competitor which streams movies without paying the studios". Just because it's possible to appropriate content doesn't mean we should just give up and put it in the public domain, that's obviously not sustainable.

BriggyDwiggs42 on 2024-04-29

Actually that sounds great. Commercial piracy netflix competitor please and thank you.

jsheard on 2024-04-29

Sounds great until the entertainment industry collapses because the inconvenience barrier which kept piracy at reasonable levels vanishes, with every theatre, TV channel and streaming service suddenly no longer paying for anything they show. I think you know that's not going to work.

BriggyDwiggs42 on 2024-04-30

Yeah, but wouldn’t it be a heck of a lot of fun

nomel on 2024-04-29

Maybe profit caps for models that use licensed/copyright content would make sense? That would leave general use of open source models alone, while still having the potential to flow money to the right people.

SirMaster on 2024-04-29

What's the alternative?

Do you feel that entities like The Financial Times should just willingly or be forced to give up all their data to the public?

dkjaudyeqooe on 2024-04-29

That was the intent of copyright, until it was extended from 14 years to beyond the heat death of the universe.

Maybe we should go back to 14 years for use in AI models?

cess11 on 2024-04-29

Sure. Just like it always was, like when libraries buy magazines and papers and archive them, giving the public access to their contents.

sgt101 on 2024-04-29

The difference is that mega-corps are ripping off content and reselling it. I hope that the FT can sustain itself for a long time using the money from this deal.

cess11 on 2024-04-30

OK, so you think this is something new? There are no examples of mega-corporations "ripping off" and "reselling" from, say, the sixties?

0xedd on 2024-04-29

[flagged]

sgt101 on 2024-04-29

Do you work for free? If so - if you are a volunteer and do not take state aid or use funds from your inheritance - then your position is defensible.

If not, then I invite you to look at the huge attrition of journalism jobs in the UK and the USA. Local newsrooms have been destroyed by the web and national newsrooms have been gutted.

Folks that create content should be allowed to monetize content - in exactly the same way that I am allowed to monetize the work I do.

pksebben on 2024-04-29

They are currently allowed to do so, and this attrition has happened without a proliferation of LLMs.

This feels like an oranges-apples kinda argument. The thing that's destroying journalism is the current framework for monetizing and how utterly inadequate it is - that, and the lack of a robust middle class; the primary consumer of news.

It's easy to point at tech and say that it's what changed so it's the problem - but just like dirt roads were better for horses, sometimes it's the existing structures that need to change. Our ownership model is lost in the modern era and badly needs revisiting.

andy99 on 2024-04-29

OpenAI and other big VC funded LLM companies were all destined for enshittification anyway. They'll try and regulatory capture their way to monopoly and then offer a much worse product for more money. Luckily (hopefully) the cat is out of the bag and there are enough good, geographically diverse alternatives to prevent any serious regulatory capture efforts. Worst case we'll all be fleeing to the freedom of the Chinese LLMs.

sylvainkalache on 2024-04-29

The publishing industry is in big trouble; journalists are fired, and publications are closing.

Ads are not enough, and readers are fed up with ads, so they use adblockers, cutting publications' revenues. With AI chatbots, people browse these publications even less, further reducing revenues.

Paywall and licensing content is the next best option, if not what it is?

nickthegreek on 2024-04-29

Just wanted to chime in and say that I think you are both right. I haven't thought of a way to square this circle yet. I find this issue to be the fundamental concern around generative AI. Of course the Courts could always end up saying its fair game and at that point, I could see OpenAI no longer paying. And then content producers are more likely to gate content in a myriad of ways and the cat and mouse game will continue. I could also see the US say scraping is allowed and EU saying otherwise.

__loam on 2024-04-29

It seems to me that half the closures happen after some wealthy ass hole that seems to hate journalists buys a publication then runs it into the ground.

slt2021 on 2024-04-29

publishing indstury maybe be in trouble, but publishing itself is not.

content creators are going direct to consumer with their content, and there is endless opportunity for actual content creators to thrive without publishing houses as middlemen

dkjaudyeqooe on 2024-04-29

This is fine because LLMs are a dead end.

If LLMs are the core of 'AI' then I'm not interested, but I feel confident they're a sideshow on the way to stronger AI.

Lets avoid the huge mistake that was made with the internet, where the web happened to be the first broadly useful tool which exploded in popularity and then de facto became the internet, with everything shoehorned into it. The problem is, on the scale of possible interfaces and software, the web is absolute shite, but now its gravity is too great and we can't escape it.

I've tried to use LLMs but they're just not useful. Getting answers that may be great or may be nonsense just doesn't work for me. I know there is something better and I won't be distracted by a very impressive novelty.

throwup238 on 2024-04-29

I don't think they're a dead end, they're just a very small piece of a much bigger puzzle. LLMs are to AI what word2vec was to machine learning - it's the key to encoding language in a way that algorithms can operate on but they need to be connected to a bunch of other systems to really be useful.

I think it'll be a while before we perfect memory, neuroplasticity, the physical experimentation feedback loop, etc. but LLMs at least give us a way to represent and manipulate human language.

dkjaudyeqooe on 2024-04-30

> just a very small piece of a much bigger puzzle

I agree with that, but plenty of people are talking about LLMs as the precursor to AGI, especially the OpenAI crowd. Plenty of people here and elsewhere say that LLMs are actually intelligent.

I think LLMs are an important illustration of what's possible, but it's not the one true way.

lumb63 on 2024-04-29

I can’t understand why publications would strike deals for their content to be used for training LLMs. It seems incredibly short-sighted to me. They gain a windfall in exchange for what is basically nailing their own coffin shut.

bilbo0s on 2024-04-29

I don't know man? Ad revenue is plummeting. No one else is buying their content.

What is the alternative path that is as lucrative as taking that big 4$$ check from OpenAI/Microsoft? I mean the only real alternative is that Google, Facebook, or Amazon write you a bigger check. But you're still at the same place because they'd only write that check to train their models as well.

The industry is backed into a corner. If no one else will pay, they have to take money from the only person who will.

tivert on 2024-04-29

> What is the alternative path that is as lucrative as taking that big 4$$ check from OpenAI/Microsoft?

I can imagine LLMs tipping the balance back in favor of something like traditional media. In short: people go nuts with them, and flood the open web with garbage (which may have the added benefit of degrading LLMs). That could, compared to the last 20 years, make curation and verified provenance far more important for people who value having the chance to know real things, which would drive customers back to pay-walled media organizations and publishers.

That may not happen, or may not happen on a time-frame compatible with most current media organizations. It would certainly take some time for everyday users to develop the necessary fatigue with LLMs and their output to motivate action.

I think it'd also depend on the LLM companies losing in court on copyright grounds. If the LLM companies win, I think we'll still get the crapflood, but the islands of sanity resisting the tide will be smaller or nonexistent. If LLMs put the final nail in the coffin of the media orgs, Wikipedia will die soon after (it's got separate culture problems that may do it in, but it is also totally dependent on the editorial decisions of traditional media and publishing).

adammarples on 2024-04-29

Well the FT charges £39 per month for their content, without even paying extra for premium. And they have ads on top!

stale2002 on 2024-04-29

If those people don't sell their content, other publishers will and they will lose out anyway.

Each individual publisher isn't needed. There is lots of training data in the world. So you can either get a payday or get nothing.

aelmeleegy on 2024-04-29

Is this just inevitable now?

I also can't believe that these media companies haven't learned the lesson they shouldn't have learned by looking what happened with Google and saying maybe we shouldn't give that away, at any cost. Like what's their bargaining position after the training has been done?

bilbo0s on 2024-04-29

They did learn their lesson with google.

That's why they're taking the money that OpenAI/Microsoft is offering.

They tried to do it in court with google and never got anywhere with it. So this time around they just take the money from whoever is willing to write the biggest check.

micromacrofoot on 2024-04-29

that lesson isn't worth much to an industry that has constant solvency issues

dmurray on 2024-04-29

So does this set a precedent that OpenAI knows it should be paying the millions of rights holders who didn't grant it a license and yet saw it use their IP to generate billions of dollars of revenue?

jcranmer on 2024-04-29

One of the factors in fair use analysis is whether or not there exists a market for licensing for the use in question, and this is conventionally given the strongest weight in fair use analysis.

So yes, making agreements to license content does illustrate that there exists a market for using text in AI training, and it will do a lot of damage to arguments that it's all fair use.

madeofpalk on 2024-04-29

This is not the first publisher OpenAI has paid for rights to its content.

https://www.bloomberg.com/news/articles/2023-12-13/openai-ax...

ChrisArchitect on 2024-04-29