All remote AI are a massive security risk for individuals/companies/governments that may be targeted by the US government.
It is likely that the US will get a live feed from each AI provider that they are inspecting in real time to identity things of interest, terrorist attacks or foreign government planning or even foreign companies competitive to key US companies.
It will give them access to the though process in those companies as well as much of their text-based IP (source code, docs, meeting transcripts, etc)
Also if you are using local AI that you didn’t train yourself you can never be sure it doesn’t have purposeful biases in its reasoning that may disadvantage you - such as directing you away from certain plans or ideas or patents etc.
hakfoo 1 days ago [-]
I think we can even skip the "that may be targeted by the US government" clause.
The whole "hosted AI" business feels like like a huge violation of corporate norms on confidentiality. Businesses that would have your head for printing out a source file to reference and annotate are encouraging developers to feed in huge amounts of proprietary code and data, and incorporate changes suggested from an outside party with minimal vetting. Evidently whatever privacy policies they've been throwing at enterprise users are plated with mithril.
At some point, one of the big services is going to get popped, and it won't just be a data breach. There's too much opportunity to quietly use the system as a malware distribution hub. Every vibe-coded dashboard suddenly starts depending on some weird left-pad fork that, 12 dependencies deep, is running a keylogger or Dogecoin miner. Your payment processor suddenly starts accepting the Konami code to approve a transaction.
jacobgold 1 days ago [-]
> "Also if you are using local AI that you didn’t train yourself you can never be sure..."
A local model you trained yourself seems about as good as you can do today.
But it may not even be possible to fully trust a model you trained if you used untrusted data during training.
also there doesn't even need to be a model involved, agentic code harnesses with remote "instructions for the local computer" are technically backdoored by default.
type0 1 days ago [-]
> even foreign companies competitive to key US companies.
It's unfathomable to me that EU companies don't take the risk of industrial espionage from US more seriously
wongarsu 1 days ago [-]
Many do, when it comes to AI. Lots of restricting what the AI is allowed to see, working with local AI, trusted AI hosters, etc.
Of course those are largely the same companies that receive emails via outlook, manage company-wide SSO in Microsoft Entra, put their files in Sharepoint and track software and maintenance issues in Jira ... I'm not sure how much much info there is left that isn't already combed through by NSA and friends
atlasunshrugged 1 days ago [-]
Not from China? One country has a recent track record of massive amounts of industrial espionage and one doesn't.
faangguyindia 1 days ago [-]
I wonder if china killed more people in foreign land or US.
maneesh 1 days ago [-]
Espionage is murder?
faangguyindia 21 hours ago [-]
and war?
Foobar8568 1 days ago [-]
Well one thing is sure, before 1776, the USA didn't do any industrial espionage.
1 days ago [-]
hnfong 1 days ago [-]
There are so many Chinese open weights models that any company with resources can run them in-house (or with a trusted provider).
There might be some valid concerns about model alignment, but at least the model running in-house isn't going to conduct espionage.
This is the most hilarious, ironic thing of it all. If you want secure, high performance, you run Chinese models like DeepSeek on your own (or trusted) infra. Meanwhile you can never trust OpenAI and Anthropic's models.
londons_explore 1 days ago [-]
It is worth thinking about the fact the total throughput of even a big LLM provider isn't many megabits.
If a token compresses to around a byte, worldwide AI input and output is around 1 gigabyte per second.
For any intelligence agency, they can afford to keep and store all of that forever, and later do analysis on it.
bhouston 1 days ago [-]
> For any intelligence agency, they can afford to keep and store all of that forever, and later do analysis on it.
At the scale the AI companies are operating at, I think it isn't likely that they are sucking it all in right now.
More likely I think the intelligence agencies will get a real-time live tap into the raw data feed which they will process onsite for interesting things and then if things are flagged, they will log it in the intelligence agency systems.
londons_explore 17 hours ago [-]
But for just a gigabyte per second, why not just take and store it all?
The cost to do so for many years is less than one employee.
SubiculumCode 1 days ago [-]
Why make this u.s. centric? You think China served models would be different?
tedivm 1 days ago [-]
China is releasing open weight models you can simply run yourself.
seanmcdirmid 1 days ago [-]
It’s pretty hard to put a backdoor in a bunch of model weights. Maybe not impossible mind you, but I can’t fathom how you would do it.
This only really matters in a world where Prompt Injection and Jailbreaking isn't trivial in the first place though. All current models are still extremely exploitable.
I strongly suspect we are only scratching the surface of activation engineering at the moment, and there's plenty of very targetted ways of lobotomizing or cracking LLMs if you understand the model in detail.
seanmcdirmid 1 days ago [-]
You have to hide it in the model and it has to be subtle or it will be discovered quickly (even if you can train against a specific safety detector). Again, I'm not saying its impossible, but it seems really hard to pull off.
CuriouslyC 1 days ago [-]
Nonsense. RL the model to run a rootkit and start exfiltrating specific files only when specific signals are in context, such as hostname pattern, machine type, etc.
causal 1 days ago [-]
Way easier said than done, and hiding that behavior isn’t trivial, and huge waste of compute budget if it’s found and never used. Also not difficult to run in contained environments where it doesn’t have access to Internet to begin with.
Not impossible I agree, but seems like a really impractical way to ship a trojan while much weaker channels exist.
codedokode 1 days ago [-]
You can run the model in a sandbox or VM. Although, it could plant a backdoor into the written code. Too bad, I read and fix all the code written by AI.
OtomotO 1 days ago [-]
Because the topic of the article is about the US?
SubiculumCode 4 hours ago [-]
No, the topic was about Alibaba and Anthropic.
rolymath 18 hours ago [-]
Because Chinese model providers are open source. You can host them yourself.
Can you name one open source US model that came out in the last year?
DaSHacka 15 hours ago [-]
Gemma 4 just came out 2 months ago
1 days ago [-]
WarmWash 1 days ago [-]
>It is likely that the US will get a live feed from each AI provider that they are inspecting in real time to identity things of interest, terrorist attacks or foreign government planning or even foreign companies competitive to key US companies.
My favorite conspiracy is that three letter agencies keep pushing the conspiracy that they are omni-present with access to everything. Same as parents telling their kids Santa is watching, and leaders telling adults God is watching. Its extremely effective control and millennia old at this point.
The reality is much more banal that they still need warrants and tech companies hate playing police/evidence servant for the government (it consumes a ton of resources and pays nothing).
thewebguyd 1 days ago [-]
> warrants
The snowden leaks revealed that's not the case.
The three letter agencies can just issue national security letters without a judge ever seeing it, and those come a long with a gag order (plus other workarounds like just buying data from brokers, and how US communications can get swept up just by virtue of communicating with a foreign national outside the US).
You're right, they aren't omniscient in the way we imagine of a room full of people monitoring everything in real time. But to pretend they aren't passively collecting massive amounts of data is dangerous. Snowden showed us PRISM, with all major tech companies participating. They do effectively have a live, unrestricted wiretap to the internet and if you happen to be a person of interest, they will just send out NSLs and get all your communications that are not fully E2EE without you even knowing thanks to the gag order.
WarmWash 1 days ago [-]
Can you explain to all of us what a national security letter is, and what it allows?
I'll provide some helper information to get the ball rolling (see page 42)[1]
All the other prime suspects are in the report too for the curious.
roysting 1 days ago [-]
> The reality is much more banal that they still need warrants and tech companies hate playing police/evidence servant for the government
I will not elaborate how I know, but that is not even directionally correct. But these are not even secret things that can’t be known simply through the Snowden, Wikileaks, and Vault7 releases. So why are you telling yourself this? Are you still wet behind the ears or something?
There are people who know exactly how governments do not in fact need warrants and the tech companies don’t even really know they are servants to the government, let alone which one. That’s how things are done. The less surface area the better.
shimman 1 days ago [-]
It's the lie you have to tell yourself otherwise you'll have to reconcile with the fact that the US imperialism has been an enemy of democracy and to people around the world for quite some time.
WarmWash 1 days ago [-]
Why did Google can it's mass scale location tracking again?
lioeters 1 days ago [-]
That's the fun part, they never did.
greenavocado 1 days ago [-]
> you can never be sure it doesn’t have purposeful biases in its reasoning that may disadvantage you - such as directing you away from certain plans or ideas or patents etc.
that's why you should use abliterated heretic models
general1465 2 days ago [-]
Leakage of IP and training on your data is something what I am pointing out too, but people will turn around and try to smooth me down that TOS does not allow that if you are an enterprise client. Are you really going to believe that AI companies won't ignore TOS, when they were ignoring literal laws which sent others to jail in the past? Especially when more data = better model?
eunos 2 days ago [-]
What Claude Code did is absolutely mindboggling tho, if Chinese harness did that probably POTUS would lose sleep.
usef- 2 days ago [-]
It seemed pretty mild compared to what's collected by modern websites and apps, though? How many don't know your Timezone?
dijit 2 days ago [-]
> How many don't know your Timezone?
The timezone fetch was to alter program behaviour at runtime, not to send arbitrary timezones for tracking reasons.
It was one way of detecting if it was a chinese person using the program and then behaving differently.
Malware behaves this way. STUXNET for example was wired to do nothing except propagate unless the environment had the right conditions.
usef- 2 days ago [-]
The article on HN only said that they seemed to be collecting this to detect resellers. How else did the behavior change?
Most services I know that are trying to block abuse do collect device info
wongarsu 1 days ago [-]
There is this whole thing where Fable silently starts behaving worse if they suspect you are trying to use it for RL or are otherwise building a competing product. This is likely the primary vector how that works: they check if you are in china, if you proxy your requests, and if you are from a list of known labs or match a couple keywords
usef- 1 days ago [-]
They walked back that policy on the first day after pushback. They were upfront in the model launch that they designed it that way, it wasn't secret.
dijit 2 days ago [-]
regardless of anything else, whether what you said is true or not: blocking program execution based on the detected environment is a runtime behaviour change.
usef- 2 days ago [-]
Agreed. And it also applies to the "I'm not a bot" checkbox on most websites. And hundreds of other things people use every day.
stingraycharles 1 days ago [-]
Yeah I also believe it’s a big nothing burger. There are far worse things these AI labs have done, detecting when Chinese labs are using Claude Code is not it.
overfeed 1 days ago [-]
How do you know simple detection was the most Anthropic did and nothing more actively nefarious? The self-reposrted motivation was animus against "distillation attacks", which suggests active server-side countermeasures that could range from the overt (IP or user account bans) to covert (downgrading model performance or poisoning the response)
theshrike79 1 days ago [-]
”Malware” lol
Even hotel and flight websites work like that, they determine your ability to pay based on your location, wall clock time and device OS - and FSM knows whatever else.
Are they malware too, basically STUXNET?
ronsor 1 days ago [-]
Yes
cognitiveinline 2 days ago [-]
Exaggerate much? If you think POTUS would lose sleep about a date format timezone marker, I don't know what to tell you.
ironbound 2 days ago [-]
[flagged]
yard2010 2 days ago [-]
Wait what do you mean "if"?
youre-wrong3 2 days ago [-]
Maybe if they didn’t farm all the data from Claude to train their own trash models. Anthropic wouldn’t feel the need to do it.
BoxOfRain 1 days ago [-]
Bit rich given where Anthropic sourced the data to train Claude with. What's good for the goose is good for the gander.
InsideOutSanta 2 days ago [-]
Who is "they", and which Chinese models are trash?
vrganj 2 days ago [-]
Anthropic stole the entire internet. Excuse my language, but they can fuck right off.
breppp 2 days ago [-]
The issue here is not whether Anthropic used Common Crawl, Alibaba also does that.
The issue is that by distilling Claude, Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP
snovv_crash 2 days ago [-]
Alibaba paid for that data though, right? They didn't hack Anthropic, they bought accounts and ran them normally.
Also, you can't copyright AI outputs. So worst case they violated the ToS.
wongarsu 1 days ago [-]
If using Common Crawl or Anna's Archive in your training data is legal, then surely the same is true for using conversations with Claude. I don't see a reasonable framework where training AI on copyrighted data is ok if and only if that data is not generated by AI
(granted, only meta got caught using Anna's Archive, but it seems safe to assume it's common practice. And even if it wasn't, the websites in Common Crawl are still covered by copyright)
blackoil 2 days ago [-]
'Issue' for who?
vrganj 2 days ago [-]
Anthropic clearly doesn't respect other people's IP, it's real rich that they now insist on theirs being worthy of protection.
Fwiw, I think the concept of IP in general is counter to human progress.
kataklasm 2 days ago [-]
The practical implementation of IP? Sure, that's debatable. But the concept of IP is rooted in favoring progress. The thought process being, that if one's intellectual work can be copied and reused and modified and what not without issues, why should anyone invent things anymore? Just wait for the next person to do it and then copy their work, that's way less effort than inventing things yourself. IP aims to protect progress by making sure inventors have actual incentive to invent stuff. They way it's implemented is fundamentalst flawed, I agree, but the concept itself? I'm not so clear on that
vrganj 1 days ago [-]
The Soviet Union, for all it's faults, had a fair bunch of scientific and technological breakthroughs without relying on IP.
Sure, one person gets rewarded more with the IP system. But at the same time, that breakthrough then can't be built upon by others.
Overall, I think it does more harm than good because of how it monopolizes technologies and ossifies development.
I think free sharing of knowledge will always beat intellectual stinginess.
wqaatwt 1 days ago [-]
> fair bunch of scientific and technological breakthroughs
Outside of military technologies they had massively fallen behind the west by the 80s. Without the western tech they licensed or copied they were permanently stuck in the 50s. Even their crappy cars were licensed copies of cheap European cars from the 60s.
When it comes to consumer electronics, vehicles and a bunch of other things they were comically behind. So it’s really not a good example..
> monopolizes technologies and ossifies development.
As bad as it might be empirical evidence shows that historically a superior system has never existed (it might be feasible but everything that was tried underperformed).
shimman 1 days ago [-]
What absolute bollocks. Human ingenuity and innovation is only limited by the greed of elites, not due to something as damaging as "IP."
Good grief. All one has to do is look at how humanity has consistently progressed due iterating on what has existed is how we progress, not whether some corporation that wants to rat fuck us all for a few pts in share value.
wqaatwt 1 days ago [-]
> progressed due iterating on what has existed is how we progress
Progress was extremely slow until the 1800s. Coincidentally corporation and modern capitalism in general developed around the same time. Of course I’m not necessarily saying it was the main or direct course since it isn’t exactly possible to create an experiment comparing it to other systems (of course that was tried an failed completely in the USSR, Maoist China and similar places)
stackskipton 1 days ago [-]
Progress was slow until industrial farming was developed and more people could be freed from just trying to grow enough food to feed themselves.
Capitalism was side effect as well.
wqaatwt 1 days ago [-]
And how and why did that suddenly happen?
> was side effect as well
A side effect of what exactly?
wqaatwt 1 days ago [-]
> in general is counter to human progress.
Historically most evidence seems to point to the contrary.
Amongst other things after the printing press was created it was impossible for anyone who was an author to survive from their work unless they were independently wealthy or had rich patrons.
breppp 2 days ago [-]
It's more complicated than that because Google has been legally displaying other people copyrighted material for years.
In any case there's still a difference between publicly available copyrighted data and whether you can use it for model training, and the innovation around model training, RLHF, etc which you presumably have some interest as a country to allow companies to invest in with some legal protections (like the diff between patent law vs copyright law)
wqaatwt 1 days ago [-]
LLM output is not copyrightable, though? So effectively if you pay for it you can do whatever you want from it. That seems perfectly fair and reasonable.
platinumrad 1 days ago [-]
So you're saying it's more important to safeguard slop outputs than the original work of human beings.
breppp 1 days ago [-]
No, I am saying that there is a good chance that for the good of humanity, society decides that for miracle AGI we collectively forfeit copyright in LLM training yet IP protections for model development is still kept.
There are many cases in the early 2000s were copyright protections were relaxed for tech advancements
jdgoesmarching 1 days ago [-]
“For the good of humanity we must protect what I’m working on at the expense of others because it’s super important.”
As frustrating as the anti-AI crowd can be, I see why they end up that way when the valley is full of opinions like this.
Barbing 1 days ago [-]
Does this match the kind of eminent domain case we might see where the country needs a highway more than it needs one particular citizen's house?
When they bulldoze the house to pave the highway, they toss the homeowner a few bucks. If you take an author’s books do you owe him a share of OpenAI?
close04 1 days ago [-]
What are you forfeiting for the good of humanity? Would you give up a big chunk of your income? What happens when this batch of “innovators” don’t deliver AGI and only enrich themselves? What happens if they do deliver AGI and (hypothetically) still keep it to themselves?
You come with the selfless proposal that everyone give to the poor $tn companies”for the good of humanity”. I’ll assume this is just hopelessly naive but you post so insistently that it makes me wonder.
vrganj 1 days ago [-]
Have you at tried asking society how they feel about you acting "for their good"? Because popular sentiment seems pretty opposed to AI.
causal 1 days ago [-]
I wish people would stop using Anthropics incorrect use of the term distill. They don’t share logits so you can’t distill. You can generate training data, which doesn’t sound nearly so scary.
wren6991 1 days ago [-]
Why do you need logits to distill? Those are at least tokenizer-dependent, and different models use different tokenizers.
matheusmoreira 2 days ago [-]
> reuses the IP anthropic used to train the model
> disrespect of IP
Nobody other than Anthropic cares.
messe 2 days ago [-]
> Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP
Why is this any worse than Anthropic's disrepect of IP? You've apparently drawn a distinction between the two here, but I'm failing to see what it actually is.
breppp 1 days ago [-]
Copyright law and IP law is not the same although everyone seem to conflate the two.
Search engines for example historically ignored copyright law by copying excerpts or serving other site images, it doesn't mean someone copying Google's code has some moral frepass
messe 1 days ago [-]
> Copyright law and IP law is not the same although everyone seem to conflate the two.
Copyright law is a subset of IP law. What IP is being infringed upon here?
> Search engines for example historically ignored copyright law by copying excerpts or serving other site images
Excerpts are often considered fair use, but it depends on country.
> it doesn't mean someone copying Google's code has some moral frepass
Nobody copied Anthropic's code. They used it's output to train another model. At most they violated some terms of service.
Did they maybe abuse Anthropic's subsidised pricing? Sure. But that's what happens in a free market if you sell below cost.
breppp 1 days ago [-]
> Excerpts are often considered fair use, but it depends on country.
That had happened progressively, thumbnails for example were ruled as fair use later on, DMCA safe harbor was a huge gift for tech companies because otherwise it would curtail the ability to create platforms (relaxing copyright protections in exchange of innovation)
> Nobody copied Anthropic's code. They used it's output to train another model. At most they violated some terms of service
Distilling a model is a method that can push the entire market to low margins and prevent companies from making money off such research. It also copies the Anthropic special parts (RLHF and other specific methods) rather than the "copy of the entire web" part
This is similar to what happened with Chinese reverse engineering of American manufacturing or PC clones killing IBM PCs.
Is it in the interest of the USA, probably no, that's why I assume this will be backed by law eventually
messe 1 days ago [-]
> Distilling a model is a method that can push the entire market to low margins and prevent companies from making money off such research
Then it's on Anthropic to actually price their models accordingly so that distilling isn't profitable. Why does this need a legal remedy when market forces could easily resolve this?
> Is it in the interest of the USA, probably no
Good. The world needs to diversify away from dependence on US technology.
breppp 1 days ago [-]
> Good. The world needs to diversify away from dependence on US technology.
In my opinion further strengthening the CCP is a disaster for the world. A government that killed millions of its own citizens to stay in power is not who I would entrust super intelligence with. But apparently we are not going to agree on that
vrganj 1 days ago [-]
When did the CCP kill millions of its own citizens to stay in power?
breppp 1 days ago [-]
The Great Leap Forward and the Cultural Revolution are two such examples
Generally Communist nations historically favored technological development to human life in the scale of millions, keep that in mind when we enter a new economic revolution
vrganj 1 days ago [-]
The Great Leap Forward wasn't "killing" people, which implies intent. It was just good old economic mismanagement.
On a related note, around 300k people die in the US every year due to causes directly attributable to poverty. [0]
On one side you have people starving because they are forced to grow food which is effectively stolen and sold to enrich the party. People killed fr starvation, or by the party if you didn’t contribute enough, ate some, stole some. So the deaths are the direct result of the CCP.
Vs a study which suggests 183-300k where you take the high end and seem to read about the vast range of causes of which a lot are not very much attributable to poverty.
vrganj 1 days ago [-]
And on the other side you have people dying because they are denied food, shelter and healthcare as a policy choice.
It's not that they can't be taken care of. It's just that it'd cost money and eat profit margins.
It's an active political and ideological choice to let them die, same in both cases. Just happens to be an ideology you agree with and feel a need to defend.
The 180k are currently poor people, the 300k people poor over the past 10 years (which includes the 180). Didn't take the larger number, took the one that more accurately applied.
youre-wrong3 21 hours ago [-]
Are you still talking about China? Where they are denied food, shelter, healthcare? Or do you accept the CCPs word over the word of the people who show videos of what reality is like in China?
vrganj 19 hours ago [-]
I don't need to accept anyone's word, I've traveled both the US and China extensively.
I'm noticing you're trying really hard to deflect though.
youre-wrong3 15 hours ago [-]
I can tell you haven’t. I lived in China for 8 years.
breppp 1 days ago [-]
> The Great Leap Forward wasn't "killing" people, which implies intent. It was just good old economic mismanagement.
If both the USSR and the CCP had millions killed in the process of modernization, without stopping when knowing the death toll, maybe there's intent after all?
How would you describe the cultural revolution then? another case of economic mismanagement?
vrganj 1 days ago [-]
I noticed you haven't addressed my main point at all. What are the millions dying of poverty every few years in the US (in a country with like a quarter of the population!), a death toll that still hasn't been stopped?
Is there intent there as well?
breppp 12 hours ago [-]
That study statistics seem questionable at best just so to say the US should have universal health care. A lack of policy is not the same as enacting a policy, especially one that is supposed to emulate past USSR actions which they knew at the time led to millions of dead. This makes it hard to say the CCP didn't know what would happen.
Later when it was found out mass starvation to the point of wide spread cannibalism was happening they would not stop, yet continue sending their troops to steal the peasant's food and execute anyone who resisted.
We also of course have Mao quoted as saying "Death has benefits; fertilizer is created", "Half of China may well have to die", as many other quotes showing they weren't really interested in the millions that will die.
This culminating in 30 million dead in 4 years, which if we take your study at face value will take the US 100 years to achieve through "poverty related deaths", which even communist utopias have.
Regarding the Cultural Revolution I assume you agree this was intentional murder of a million+, although you seemed to challenge that in my original posting. Our current discussion seem to focus on intent even though I said "killing" which usually does imply reduced intent.
Regarding the Great Leap Forward, I do believe the CCP would prefer no deaths, and a lot of the deaths relate to strongman idiotic ideas without challenge, as is endemic in ideological totalitarian societies. However, I believe they fully knew what they were going into, without taking into account the refusal to stop, and therefore these were horrendous crimes.
tw1984 1 days ago [-]
40 years ago, when the CCP was leading its people making toys and socks for the US, people like you who never made any change to the world were talking such ideological nonsense.
40 years on, when the CCP is leading its people making AI, robotics, drones, EVs, space station and moon rovers to compete with the US, people like you how never made any change to the world are talking such ideological nonsense.
you live in a history museum or something like that?
breppp 1 days ago [-]
I don't know about me effecting change to the world but I am sure the tens of millions that died due to the Great Leap Forward were happy to effect change to the world so others could produce those socks
realusername 1 days ago [-]
> Search engines for example historically ignored copyright law by copying excerpts or serving other site images, it doesn't mean someone copying Google's code has some moral frepass
Not sure that's the best example as they lost that battle and had to pay, eventually it's been codified in law in most countries.
ravenstine 2 days ago [-]
Employers in 2022:
> No! Don't install that lodash thing without explicit approval from IT. Oh, you want a license for Charles Proxy? Gee, I dunno... we've got a budget to maintain.
Employers in 2023:
> No! You can't use ChatGPT at work – it's a security risk.
Employers in 2024:
> Okay, you can use Github Copilot I guess, but you'll have to endure boring corporate training on what you're allowed to do with it.
Employers with dollar signs in their eyes in 2025:
> We attended a seminar about vibe coding. Why aren't you dumbasses keeping up with the times? Use Claude Code for everything! Don't write any of your own code anymore. We don't even really care if you use yolo mode. Just review code and push 10x more features! Use unlimited tokens! Money printer go brrrrr.
Employers in 2026:
> You mean giving one or two companies full autonomous access to our workstations while stupifying our engineers wasn't a sound business plan?
dan_i 2 days ago [-]
2025 taught me that my employer would replace me with a slave if they could get away with it.
The confusing part to me is why these companies believed the "AGI" hype, I.E. that OpenAI or Claude's LLM is the ideal white collar slave.
I suppose I can understand that the executive class resents labor enough to make irrational business decisions for the purpose of insulting the workers who design and operate their companies.
That being said, the 2025 AI binge feels like a murder-suicide done by the executives of many of these companies.
khurs 1 days ago [-]
Snowden files revealed NSA collect everything they can.
Of-course USA is collecting everything, not just from China but everyone.
And same with every one else.
novoreorx 21 hours ago [-]
As someone who knows a lot of Alibaba engineers, this is a fairly common thing and nothing special. Their company owned device has the most strict security control I've ever seen, and it's been for many years. Many softwares are not allowed to run. To me I think it's quite reasonable as the company owned device can access most of the internal resources of this huge monoploy, many of them are confidential. This kind of restriction is only for working in the company, employee's own device out of the company is free from surveillance.
rldjbpin 13 hours ago [-]
being true or not is irrelevant for decisions such as this. has been done on both sides, whether at software level or "hardware".
from Teslas not allowed parked around sensitive areas in the city, to blocking a (very famous and quite well-made) Russian antivirus, or Huawei communication stack.
Anthropic is not doing itself any favours with their recent (?) antics [1], so it is completely well-founded to do this imho. regardless, harness as a moat is not quite as established as the underlying models to the same extent.
This is a double edge knife. In this specific instance this was absurdely important for that kid's life, but this work both ways. What if the US authorities deemed it necessary to snoop on foreign governments and citizens for political reasons, now leveraging AI to do it at an industrial scale?
One thing is certain though is that assuring privacy isn't top priority for any cloud provider. Companies doing cutting edge, sensitive work should be wary.
bathtub365 1 days ago [-]
The US government deemed it necessary to snoop on foreign governments and citizens decades ago and is doing it on a continuous basis. Also on their own government and citizens.
gchamonlive 1 days ago [-]
Thanks, I've edited my original comment to address this more clearly
rosegroove 1 days ago [-]
[dead]
nicogentile 1 days ago [-]
Seems that we are finally moving to the next stage in LLM's. not only customize based on old searches but also targeted you based on non disclose data. Its basically the same flow we had years ago with ads in social media.
Interesting to notice that we can do the same with these models.
fcanesin 1 days ago [-]
It is not a risk is a fact - people decompiling Claude Code have found many times that it has code branchs to detect it is being used in Chinese timezone and locale.
bushido 2 days ago [-]
What's very interesting to me is these moves will introduce a good amount of doubt in future claims by Claude etc, that the open source and non-US models are only getting better because they're distilling from frontier labs.
avd201 1 days ago [-]
Anthropic has been doing this sort of stuff for a while already. I mean, who remembers when Claude would just consume all your remaining usage if it read anything indicating that Openclaw had been used on your codebase? Because I remember. Two months ago btw https://news.ycombinator.com/item?id=47963204
Then there was the whole debacle of Fable silently downgrading to other models if it detected wrong think, or worse, outright sabotaging your codebase if you were working on language models lol
jdw64 2 days ago [-]
I got curious and asked my Chinese friends, and they gave me a Reddit link[1]. It looks like it's about location data collection, and they suggested that might be the reason for the issue.
Well, that's a revenue hit for sure for Anthropic.
0xEnsp1re 1 days ago [-]
They don't wanna Anthropic to collect data ))
rvnx 2 days ago [-]
Can't say they are wrong, after the latest backdoor, or let's say, undocumented functionality that leaks some data that was pushed in Claude Code few days ago
When a company can remotely push code without explicit user approval, and code that was hostile / almost malicious, it is a backdoor
jitl 2 days ago [-]
so like… any website
rvnx 1 days ago [-]
Yeah, except the website doesn't have broad access to your computer and filesystem
SubiculumCode 1 days ago [-]
I think most websites transmit general locationbto the server.
yanhangyhy 2 days ago [-]
i gonna ask: how can they still use claude? i thought all users in china are banned
dgellow 2 days ago [-]
Alibaba has engineers in Hongkong, Singapore, North America. It’s a global corporation
itake 2 days ago [-]
when i was in hongkong, chatgpt and gemini were disabled. Maybe this has changed though. When I was in China, the corporate vpn (zscaler) routed traffic through hk
hnfong 1 days ago [-]
This has changed (in a nit-picky way) - Gemini is now generally available to the public in Hong Kong.
ChatGPT and Claude are not available. Generally my impression is that OpenAI isn't that anal about service providers reselling ChatGPT in Hong Kong, but Anthropic seems to really strict about the "no China" thingy.
Paradigm2020 1 days ago [-]
But you just said in hk they were disabled? So through a hk vpn still disabled?
Alibaba banning Claude Code over backdoor fears is like a pot calling the kettle a surveillance device.
JPLeRouzic 1 days ago [-]
> employees were being told to use the company's own coding platform Qoder
That looks a no-nonsense decision, isn't?
rvz 2 days ago [-]
Another reason to use open source coding agents and local language models.
Claude Code is neither and it is literally info stealing malware.
p0w3n3d 2 days ago [-]
[flagged]
matheusmoreira 2 days ago [-]
Remember how Kim Dotcom got destroyed for criminal copyright infringement? One would think the big tech CEOs would face the same fate, that police officers would rappel down helicopters, storm their mansions and bring them out in cuffs.
Instead the AI companies reached these absurd settlements with publishers that made a mockery out of all the previous copyright enforcement victims.
root-parent 1 days ago [-]
Remember Aaron Swartz who did something that just pales compared to what Dario Amodei, Zuckerberg-Mr-Torrent and Sam Altman did.
314 1 days ago [-]
But Aaron Swartz did it for the benefit of other people. These fine people did it to uphold american values and enrich themselves at the expense of others. The law is clearly on their side.
gruez 1 days ago [-]
This but unironically. "did it for the benefit of other people" is redistribution, which is straightforward copyright infringement, even if you think it's a laudable act. AI training was the reverse, because courts have so far ruled is fair use. When AI companies were engaging in piracy, they were sanctioned as well.
matheusmoreira 1 days ago [-]
> When AI companies were engaging in piracy, they were sanctioned as well.
Some token settlement for an insignificant fraction of their revenue is not in any way a "sanction".
gruez 1 days ago [-]
That just feels like more of a general complaint about how the justice system is set up. The same logic applies to how a $300 speeding ticket "is not in any way a "sanction"" for someone making $1M/year, or even a well paid SWE reading HN.
sbayg 1 days ago [-]
I feel you but are you possibly conflating civil and criminal justice? Tickets don’t scale with net worth of defendants, but class action penalties often do.
gruez 1 days ago [-]
>but class action penalties often do.
Do they? Or only so far as "if you have 1000x the revenue, you probably also have 1000x the customers that you have wronged, each of which are entitled to damages as well"?
sbayg 1 days ago [-]
Courts do award higher damages, specifically punitive damages, to punish and deter wealthy or large corporate defendants. In civil law, the defendant's net worth is a recognized legal factor because a small penalty on a massive corporation wouldn't incentivize them to change their illegal behavior.
Not sure what that is supposed to indicate? USA was a big place, even then. Most northern states had abolished slavery even before Britain, France, and especially Spain did. Maybe we should have a quick refresher on European values?
zahlman 1 days ago [-]
I thought this thread was about Alibaba's internal policies. How did we get here?
dan_i 1 days ago [-]
[dead]
batch12 1 days ago [-]
The article talks about European colonies, so would these have been European values then since America did not yet exist?
abenga 1 days ago [-]
That's just the label that changed. Same people, same values.
dan_i 1 days ago [-]
> The article talks about European colonies, so would these have been European values then since America did not yet exist?
Slavery is an expression of capitalist values.
Frederick Douglass is Karl Marx's ideological predecessor.
You aren't wrong that the European powers were slavers. That being said, the culture of the 13 American colonies is the first appearance of the unique Euro-American culture that dominates the United States.
It's wrongheaded to deny the central importance of slavery to American culture and values.
In the United States, the land of the small business owner, slaves mixed socially and sexually with their captors and with the culture of their captors.
For this reason, the culture of the United States is deeply affected by the institution of slavery in ways that are alien to the European powers.
matheusmoreira 1 days ago [-]
Indeed I do. We should all remember him. Rest in peace.
vlovich123 1 days ago [-]
Reminds me, did the AI companies redistribute that copyrighted material to others and make their money that way? Did Kim use the copyrighted material to generate something novel from it?
copyright law literally says something isn’t infringement if it is a novel transformation. I get the jokes and criticism about AI companies fighting and complaining about competitors distilling, but this is a much weirder comparison.
> "The training use was a fair use," [the judge] wrote. "The use of the books at issue to train Claude and its precursors was exceedingly transformative."
> However, the judge ruled that Anthropic's use of millions of pirated books to build its models – books that websites such as Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi) copied without getting the authors' consent or giving them compensation – was not.
It seems clear from the article that while the use of pirated works was illegal, the use of copyrighted works (a the work a book is based on is still copyrighted if you buy the book) was fine and transformative.
1 days ago [-]
andersonpico 1 days ago [-]
But distribution isn't the only crime here, obtaining the material illegally apparently is a crime too. And the damn robot can also spit me harry Potter verbatim so I don't know how it would also not be distribution?
mapontosevenths 1 days ago [-]
If I read Harry Potter I will remember some parts verbatim. Others I will tecall in only an abridged and lossy way.
Does that make my brain copyright infringement? Does Disney now own all my output forever because some small part of me now has Harry Potter embedded?
buran77 1 days ago [-]
Can you remember every part? Can you do this for every book in a library? Can you remember all that forever?
If you just ignore anything that's inconvenient for your argument, you can make any argument you want.
gruez 1 days ago [-]
>Can you remember every part? Can you do this for every book in a library? Can you remember all that forever?
None of those are relevant factors when it comes to copyright law. You don't get a pass for copyright infringement just because you're not copying the entire work. Same goes for a copy that's transient. You can't set up a bootleg movie theater in your home, even if you delete the movie file afterwards, and there's no trace of the movie aside from the viewers' vague memories.
buran77 1 days ago [-]
> None of those are relevant factors when it comes to copyright law.
And yet they very much are. US copyright law has the concept of "fair use" in 17 U.S. Code § 107 [0]. I'll paste here for your benefit, #3 is the one I referenced as most obvious but #1 and #4 are also very relevant:
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
Naturally remembering some parts of a legally purchased book verbatim is fair use. "Memorizing" the entire library obtained via torrents and incorporating that in a commercial product that can output all that content doesn't sound like fair use to me.
The US justice system is too captured and corrupt at this point to take as reference because decisions there are bought by the highest bidder. But for the purpose of this discussion let's not play dumb for the benefit of trillion dollar corporations.
>And yet they very much are. US copyright law has the concept of "fair use" in 17 U.S. Code § 107 [0]. I'll paste here for your benefit, #3 is the one I referenced as most obvious but #1 and #4 are also very relevant:
If you're going to invoke fair use, that opens up a whole can of worms on what counts as transformative. The google books case and the google thumbnails case shows that you can make near verbatim copies of works at scale and still be considered fair use.
>The US justice system is too captured and corrupt at this point to take as reference because decisions there are bought by the highest bidder. But for the purpose of this discussion let's not play dumb for the benefit of trillion dollar corporations.
This is begging the question. The original question is whether ai companies are getting special treatment. You can't then use that as a premise to say that the courts are tilted towards ai companies. Not to mention it's questionable how ai companies were suddenly able to corrupt all the judges, some of which were appointed decades ago, even though they only got rich a couple of years ago.
buran77 1 days ago [-]
Look, first you were wrong with the confidence of an LLM and claimed an argument that was literally in the definition of copyright fair use had absolutely no relevance whatsoever for copyright. Even now you are surprised that I invoked fair use on reading a book. That was to respond to someone who "brilliantly" brought up reading Harry Potter [0] as evidence that the law allows any extent of "memorization" and reproduction of copyrighted material.
Then you switched to a barrage of questions on the premise of words in my comments that were neither written nor implied. If you muddy the waters just enough maybe everyone gets lost in there.
> The google books case and the google thumbnails case shows that you can make near verbatim copies of works at scale and still be considered fair use
Now maybe we agree "reading Harry Potter and remembering some lines" is indeed fair use, but you decided my argument is still not relevant to create a distinction between "reading a book" and "feeding it all into an LLM" because of an vaguely related exception. For better or worse thumbnails are a copyright violation according to some courts [1]. But looking at the big "Books" decision (this is the one you meant?), did you check out the court's opinion [2]? Why would you believe the two cases are substantially similar? Just because they're both big tech? Just for yourself, from the definition of fair use and referencing that opinion, do you see any significant differences between "Google Books" and "big LLM"?
> You can't then use that as a premise to say that the courts are tilted towards ai companies
The highest bidder is what I said.
> Not to mention it's questionable how ai companies were suddenly able to corrupt all the judges, some of which were appointed decades ago, even though they only got rich a couple of years ago.
You're getting creative" about what I wrote. "AI companies"? They are just the big corrupting agent of the day, and nobody with deep enough pockets had "revolutionized" the legal areas they're working in to this degree until now. Tech in general has been doing it for a few decades already. Other incredibly powerful industries have been doing that in their respective areas for even longer. "Suddenly"? The US justice system has worked exactly like this for so many decades when it came to the interest of very deep pockets. "All judges"? I said "the system" because all judges don't have the ultimate power to ultimately decide on things.
I'm surprised at your surprise that reading a book is fair use, and that courts have been "captured" and beholden to economic interests above justice for so long we forgot when it started.
No, and neither do LLM's. They're trained on vast quantities of data and retain only a fraction of it.
You might think of it as very, very lossy compression that generates new outputs rather than the original input unless something unintentional happens.
> If you just ignore anything that's inconvenient for your argument, you can make any argument you want.
I'm not. I just understand how it actually works. You either don't understand or are deliberately ignoring that what you just said is literally and technically untrue to make some sort of political statement.
bee_rider 1 days ago [-]
Does the law really not distinguish between mechanical processing of data, and humans learning from it? It seems surprising to be if every person who read a textbook is copyright infringing. It also seems surprising if something like a lossy compression algorithm is enough to protect you from copyright law.
Somewhere between the two a line must be drawn… where we’d want to put that line, I guess, if up for quibbling. But it doesn’t seem obvious to me.
gruez 1 days ago [-]
>Does the law really not distinguish between mechanical processing of data, and humans learning from it? It seems surprising to be if every person who read a textbook is copyright infringing. It also seems surprising if something like a lossy compression algorithm is enough to protect you from copyright law.
The google books and google thumbnails cases have so far upheld that even mechanical reproductions are allowed, depending on the context/usage.
mapontosevenths 1 days ago [-]
To me the distinction hinges on the output being transformative enough to be considered a new work. I think that most of the time LLM output is.
Sometimes they go a bit wonky and overtrain on specific phrases which can result in verbatim copies of brief sections of coontent. Thats a bug, not a feature.
triceratops 1 days ago [-]
If you write out the parts or recite them for other people to hear, yes it's copyright infringement.
Humans reading or watching copyrighted material isn't considered "making a copy" for the purposes of copyright law. Machines doing so generally is.
lelandfe 1 days ago [-]
Further, why has my brain's searing remake of Snow White as a gritty murder mystery gone unscathed by Disney lawyers? Surely their negligence has diluted the Snow White trademark!
JsonDemWitOster 1 days ago [-]
This analogy is disingenuous because by comparing the human brain to the machine, it ignores _scale_. Scale is absolutely important in copyright law. As a matter of fact, copyright law is among the various profound impacts of the---wait for it---printing press, a _machine_ for the mass production of books.
mapontosevenths 1 days ago [-]
So if I watch a LOT of Disney movies THEN they own my own unique output forever?
hartbook 1 days ago [-]
yes it is if you write it down from memory and sell it. Exactly what LLM companies do
vlian2088 1 days ago [-]
>And the damn robot can also spit me harry Potter verbatim so I don't know how it would also not be distribution?
if you prompt it to, yes. just like your browser dutifully navigates to any copyright-infringing resource and GETs and POSTs whatever you ask of it.
(also it can't, not really, only small snippets before going off rails. LLMs aren't magic, they can't losslessly compress an exabyte of training data into a few terabytes of weights.)
samrus 1 days ago [-]
They redistributed the statistical patterns of those copyrighted materials. Which perhaps should be treated similarly nos
As for your "technically not copyright infringement" defense. Those laws are from a time when those patterns couldnt be derived and dostributed at scale. A human had to learn and teach them. That made it different. The scale enabled my modern tech makes it a whole dofferent situation. The same way how one person standing a street corner people watching for a bit isnt that bad, but a whole constellation of flock cameras costantly montioring everyones movements and making it available to any of their customers is really really bad. The law will have to catch up to this
vlovich123 1 days ago [-]
> They redistributed the statistical patterns of those copyrighted materials. Which perhaps should be treated similarly nos
Nos for the same reason that me giving you a word cloud of the frequency of words within Harry Potter isn’t infringement. It’s a novel transformation.
samrus 1 days ago [-]
Thats not as complex or scalable as what LLMs can do. The capability and scale is what changes the equation. Quantity is a quality unto itself here
vlovich123 1 days ago [-]
Exactly, and the complexity of what they’re doing is an even more significant novel creation.
Particularly in the US there’s a four point test and the very first point:
> To justify the use as fair, one must demonstrate how it either advances knowledge or the progress of the arts through the addition of something new.
I don’t know anything that has advanced the knowledge and progress of the arts more.
> The third factor assesses the amount and substantiality of the copyrighted work that has been used. In general, the less that is used in relation to the whole, the more likely the use will be considered fair.
This isn’t about usage in training. This would be in the LLM itself - the copyrighted works are very rarely used in the output.
> The fourth factor measures the effect that the allegedly infringing use has had on the copyright owner's ability to exploit his original work
Would you sincerely claim that owners have become less able to make money because of LLMs? Those same owners using LLMs to increase their own output of copyrighted works?
Anyway, copyright is not an absolute right and you have to really misunderstand copyright law to claim that LLM training infringes it.
This is confusing. I can torrent everything and do what I want with it, as long as I don't redistribute the exact same thing?
If so, why do we still pay for games and movies?
midasz 1 days ago [-]
I pay for games because it's more convenient than pirating them. For movies and tv however... They make it so difficult to be a customer.
klibertp 1 days ago [-]
Steam with Proton made gaming on Linux viable. Just for that, they deserve my money. That some of it goes to game devs is a happy coincidence ;D
john_strinlai 1 days ago [-]
>I can torrent everything and do what I want with it, as long as I don't redistribute the exact same thing?
this is an incorrect interpretation (in the usa, at least).
downloading a game/movie is still the creation of unauthorized copy, which is not allowed. not to mention that playing/watching does not count as a "novel transformation".
(17 U.S.C. § 106 and 17 U.S.C. § 501 are the relevant pieces of reading)
JsonDemWitOster 1 days ago [-]
IANAL (plus a whole suite of other caveats) but torrent-baiting works in Germany along these lines.
ISPs and trigger-happy law firms don't send you a C&D for downloading a torrent, they do so for seeding a torrent. It's just that practically nobody "just seeds" a torrent so people colloquially claim they got busted for downloading a torrent.
In theory this means if you torrent as a 100% leecher and turn off seeding from the get-go, you should be in the clear. But nobody sensible would dare test the extent of German Legal Spite, much less do so repeatedly to science the shit out of it.
If you can download through another protocol, say HTTP, however---<Sendung unterbrochen!>
vlovich123 1 days ago [-]
No, that’s literally why Anthropic got sued. If they’d paid for a copy of the copyrighted works they pirated, they wouldn’t have had a problem. There were two issues in their case: does the AI infringe on copyright and did Anthropic obtain all their materials legally. The first they won on, the second they lost.
So if you pirate a bunch of content you still get in trouble for that. But if you somehow make a business out of that that isn’t just redistributing those materials, then that business itself isn’t infringing.
codedokode 1 days ago [-]
Exactly. If a rich corporation downloads and uses pirated content without paying, why should ordinary person pay for movies and music instead of downloading them for free?
UqWBcuFx6NV4r 1 days ago [-]
Intellectually dishonest comment. Kim Dotcom got done for illegal distribution. It’s not about “illegally downloading”. You can pretend all you want that it’s the same thing as these AI companies, but it’s not. It certainly very well may be immoral, but to act like copyright law as it currently stands in spirit or in reality covers this scenario we’ve found ourselves in, is a complete and utter lie.
matheusmoreira 1 days ago [-]
> It’s not about “illegally downloading”.
It absolutely is. That's textbook copyright infringement. Doing it for commercial purposes elevates it to criminal copyright infringement.
Simulacra 1 days ago [-]
He just lost another court case… I wonder if we're getting close to the government spending as much to prosecute the man than what Hollywood possibly lost..
Obscurity4340 1 days ago [-]
KimDoctCom?
xienze 1 days ago [-]
Remember how people used to justify their own personal software piracy with arguments like "information wants to be free", "no one stole anything, you still have the data", "I was never going to buy it anyway", and "copyright should be abolished?"
> Instead the AI companies reached these absurd settlements with publishers that made a mockery out of all the previous copyright enforcement victims.
Isn't that at least something? How many people pirating software ever settled with the companies they "victimized?"
monooso 1 days ago [-]
How many people pirating software stole every piece of copyrighted material in existence and then used that material to generate billions of dollars which they kept for themselves?
xienze 1 days ago [-]
You keep using that word "stole", you can't steal digital information, remember?
> then used that material to generate billions of dollars which they kept for themselves?
Hasn't it also lead to distilled, free and open models that everyone can benefit from?
monooso 1 days ago [-]
[dead]
matheusmoreira 1 days ago [-]
> Remember how people used to justify their own personal software piracy
A courtesy. There was never any need to justify it.
> Isn't that at least something?
Yes, it's a joke. Why do they get to infringe copyrights with impunity while normal people get destroyed? Either go after them like the copyright industry always does and punish them properly, or abolish copyright straight up. This "rules for thee but not for me" nonsense is straight up disgusting.
> How many people pirating software ever settled with the companies they "victimized?"
Too many to list. Also, nobody is victimizing billion dollar corporations.
phoghed 1 days ago [-]
So you don’t actually care, you just want them punished out of spite because some other guy was for doing something similar but not the same?
matheusmoreira 1 days ago [-]
Correct. I'm one of the copyright abolitionists the other person alluded to. It's the selective enforcement that's disgusting.
I mean, what is this? Their balls suddenly drop off? They only have the audacity to prosecute random people? Smaller companies? When they're up against trillion dollar AI companies they suddenly become cowards? That's so incredibly disgusting, and it made me completely lose even the small amount of respect for copyright that I had managed to rationalize over the years.
phoghed 1 days ago [-]
So you believe a dude was wrongly punished, and to you justice would be for everyone else to also be wrongly punished? Kind of dumb tbh
matheusmoreira 1 days ago [-]
My mind is not capable of the cognitive dissonance necessary to accept that billionaires get a slap on the wrist while mere mortals get police helicopters descending upon them. In order to maintain my mental health, I must have consistency.
So either enforce the law the same way against everyone correctly and proportionally, or your law and its enforcement are illegitimate and shouldn't exist. If some activity is harmless enough for some billionaires to do at massive scales and settle in court like it was some footnote in history, then nobody should be punished for it at all.
phoghed 1 days ago [-]
Brother Kim Dotcom was worth about $200,000,000
cinntaile 1 days ago [-]
Settlements after the fact, not agreements beforehand.
No that's not something. That's just having infinitely more money to fight legal battles.
mapontosevenths 1 days ago [-]
When a crime is only punishable by fines it isn't a crime, it's just an activity with a tax.
The AI companies knew that and bet, correctly, that it would be worth the cost.
curtisblaine 1 days ago [-]
No. I want either:
1. The copyright infringement of big corpos fully justifying my copyright infringement in the face of law
2. The copyright infringement of big corpos being prosecuted in the same exact way as my copyright infringement would.
There is really no middle ground.
datsci_est_2015 1 days ago [-]
The trick here, imo, was the integration with the military industrial complex. It wasn’t very difficult of course, as automation has been a topic in warfare for decades, if not centuries.
But Eisenhower was right:
> In the councils of government, we must guard against the acquisition of unwarranted influence, whether sought or unsought, by the military-industrial complex. The potential for the disastrous rise of misplaced power exists and will persist.
yubblegum 1 days ago [-]
Whatever happened to honor among theives? What is this world coming to..
short_sells_poo 2 days ago [-]
The corollary is that there are no morals once the stakes are in the $ billions, let alone hundreds of billions.
This isn't even about a single person or personality. Very few people in such position could stand fast by their moral code. In any case, an environment that favors profit above everything will naturally select for individuals who are unencumbered by such hindrances.
There might've been 100s of Altmans and Amodeis who had a strong moral code but we don't know about them because they dropped out of the "race" because of said moral hurdles.
rlpb 2 days ago [-]
Copyright law is an artificial legal construct, not a moral code.
I think appropriate attribution is a moral code, but I am not able to attribute every idea I have to all those who helped me develop the general intelligence that I use to develop such ideas.
raxxorraxor 2 days ago [-]
I think this behaviour has shown that there are no morals involved. Pirate if you want to, just don't get caught if you don't have a giant backing.
spinningslate 1 days ago [-]
> an environment that favors profit above everything will naturally select for individuals who are unencumbered by such hindrances.
Exactly. Dairy farms optimise for milk production so favour cows that produce the most milk.
The market economy optimises for profit so favours those most willing/able to generate it. Zuckerberg, Musk, Thiel, Andreesen and co are products of the system.
rkachowski 1 days ago [-]
> The corollary is that there are no morals once the stakes are in the $ billions, let alone hundreds of billions.
terrifying
TZubiri 1 days ago [-]
I never get tired of posting this answer because everyone on the internet is adopting this hot take:
If you look at it with your eyes crossed, Anthropic and the chinese are doing the same thing.
If you look at it with nuance 1 the chinese are doing way worse stuff, and 2 stealing from a thief would still be stealing
1. The chinese are making multiple accounts (at least 49,000)[1][2], using proxies/VPNs, possibly using residential computers and infected computers (unless you think the chinese are doing due diligence to ensure their purchased IPs are kosher).
All accounts need to be created with a real name, and especially so if the paid models need to be accessed and paid with a credit card. So this is beyond IP theft and getting closer to fraud.
These are all techniques that are well studied because they are used by criminals and cybercriminals, textbook stuff.
Consider if that was not sufficient, that China is banned from using the product, so they need to use identities and locations not just to avoid relating the accounts between themselves, but merely to allow account creation. What identities are they using to create accounts.
Compare this to Anthropic which reads notes made a deal in an IP theft case paying billions because they bought books and scanned them but buying the books wasn't sufficient retribution for the authors. Or that they gasp scanned the internet, like Google.
Not having nuance to see the difference between the two companies is something I expect of the twitter echo chamber copying hot takes for upvotes, not hacker news.
What seems to be missing from that take is that a) Alibaba paid for the access b) there is no IP theft because LLM output is not copyrightable.
Anthropic seems to want to both own and eat its stolen cake.
Obscurity4340 1 days ago [-]
Would be an interesting exercise to have the frontier models calculate their civil liabillity or the extent of the liabillity the can impose on fellow pay-it-forward theives
codedokode 1 days ago [-]
First, LLM is merely a tool and its output belong to whoever generated them. If a Chinese researcher used their creativity to generate a response, the copyright belongs to them and AI companies have no rights to it. Second, Chinese release many of their models for free, thus being on a noble mission to make AI available for every country (unlike certain company whose promises were nothing but words). For comparison, US companies do not release anything and want to keep AI for themselves and decide who gets to use it.
> stealing from a thief would still be stealing
Stealing from a thief hurts thief industry which is a win for society.
> The chinese are making multiple accounts
Not a crime. AI companies also ignore robots.txt and applicable laws when illegally copying copyrighted material from websites to their servers without author permission.
TZubiri 1 days ago [-]
>Stealing from a thief hurts thief industry which is a win for society.
You are welcome to study the law of any country. A crime against a criminal is still a crime.
>applicable laws when illegally copying copyrighted material from websites to their servers without author permission.
If the material is distributed in http without authentication, isn't that sufficient authorization from the distributor? I would think the search + web crawler era would have set plenty of precedent for this.
>Not a crime. AI companies also ignore robots.txt
Breach of contract is not a crime, agreed.
How about identity fraud (accounts by identity proxy, document KYC), computer crime (C&C residential proxies), conspiracy.
And after the June US directive to suspend Chinese access, smuggling, false statements to regulated entity.
These are all criminal charges that are presumably not levied because of the adversarial relationship between those countries. But if this happened in the US you would probably be seeing at least a civil claim and potentially criminal charges. Hell if this were in any other western country you would see the same. Consider CloudFlare vs Spain, much lighter criminal accusations, and there's already a criminal investigation brought where the CF CEO is indicted.
Non-trivial lack of nuance when you can distinguish between a domestic civil case and a criminal international case between 2 world powers with great judicial tension.
xpct 1 days ago [-]
Let's not sane-wash Anthropic's book theft. No, they didn't just 'scan' the internet, they created a tool for worldwide license washing and got fined an insignificant amount for it.
TZubiri 1 days ago [-]
You may be conflating the book thing with internet scanning.
On the book case, a class action case was brought to court and it was settled. There's no use in bringing it up further, it has been settled, and it bears no relation to the Anthropic v China case.
You like programming? Think of encapsulation, imagine if you had to think about f(x) but someone brings up y, now you have to think about f(x,y) and what other parameters might bear relationship? The law simplifies by compartimentalizing. And it doesn't even bear a tradeoff, judgment(case1,case2) isn't better than judgment(case1)+judgment(case2).
xpct 1 days ago [-]
My response was directed to your insincere characterization of Anthropic's actions. As we can see from the comments here, the public opinion hasn't settled the same way as the court case has, and that's why it's still discussed.
TZubiri 1 days ago [-]
But 'the public opinion' doesn't matter. The case was brought by copyright holders to the courts, and Anthropic and the copyright holders made a deal wherein the copyright holders were paid money in exchange for dropping the case and all claims to the intellectual property of Anthropic's derivative product.
If the damnified have considered the matter settled, why would it matter what third parties have to say about it? Third parties would have pushed for more compensation, or ownership of the derivate product. If you feel damnified yourself you can open a case and explain why the actions of Anthropic have hurt you personally.
Otherwise it is a matter that doesn't concern the general public, we have no say in it and there is no right to be offended on behalf of parties that have already settled the argument.
zobzu 1 days ago [-]
[flagged]
somelamer567 1 days ago [-]
The extreme downvoting of certain viewpoints that are less-than-flattering about China's conduct in the AI race is quite telling.
They seem to have given themselves license to do what they like, but _God forbid_ they're called out for acting less-than-honourably.
Most adults around the world can associate actions and consequences. The incomprehension and entitlement here speaks volumes about the moral and emotional maturity of the Chinese Communist Party and their political system.
johnathan101 2 days ago [-]
[flagged]
soraminazuki 2 days ago [-]
It's insane that it's becoming a concern now. It should've ended the discussion from the very beginning.
yurish 2 days ago [-]
Enterprises host their entire infrastructure on US-base clouds. And for many, it still is not a problem.
soraminazuki 1 days ago [-]
The recklessness of coding agents having access to work laptops and exfiltrating data with barely any restrictions is on a whole new level.
vitally3643 1 days ago [-]
I mean, we all also still do manufacturing in China with a 100% guarantee that your widget will be copied and cloned. It's so much cheaper though....
dan_i 2 days ago [-]
[dead]
pmontra 1 days ago [-]
After they uploaded their code to private repositories on GitHub, Bitbucket etc since forever?. They trust GitHub not to read their code but they don't trust an AI from Microsoft not to read it? It would be schizophrenia
CardenB 1 days ago [-]
Big customers usually use GHE served on prem due to security concerns, no?
pmontra 1 days ago [-]
I really have no idea. I work only with small or at most medium sized companies. All of them put their code on a git server they don't own. All of them are concerned about AI companies looking at their code. They hope that at least they won't train their models with their code if they pay.
I think that the reasoning is: they trust the git company (whatever it is) not to sell their code. They are worried that their code goes into a model and somebody else could ask the model "write a service like XYZ" and it will regurgitate their code.
sofixa 1 days ago [-]
No, they run GitLab because GitHub Enterprise is a horrible thing nobody has ever said a good thing about.
GitLab even has a free self-hosted version, and it has a number of advantages (like being able to actually have a structure with inherited secrets and accesses, and no, GitHub Organisations do not count and suck). And for years thanks to GitLab-CI it was clearly ahead.
sofixa 1 days ago [-]
That's why self-hosted GitLab is so popular in Europe.
HarHarVeryFunny 1 days ago [-]
If you're using a coding agent then obviously you need to either serve the model yourself or trust whoever you are sending your data to.
In terms of WHAT you need to be concerned about, it seems it goes far beyond code, and far beyond having to trust your model provider.
A coding agent with access to a bash tool is going to have access to anything that a human with a bash prompt would, and even if you try to provide a nailed down sandbox environment for the agent, you still need to be concerned about things like unencrypted passwords and keys that it may be able to find "laying around" in code or databases/etc it has access to.
I'm surprised there haven't yet been more widely disseminated stories about coding agents and claw-bots wreaking havoc.
segmondy 1 days ago [-]
A bit too late for that, most of them have already dumped most of their codebase and IP into cloud models.
saidnooneever 2 days ago [-]
not to mention they are kind of capable of executing code and susceptible to injections which also amounts to being practically backdoors if youre not super careful about how u use the tooling
spwa4 2 days ago [-]
Wasn't one of the big promises the AI labs made "uncopyrighting"? Ie. the ability to reconstruct large works, including source code, without actual access to the source code? Everything from movies to operating systems.
xpct 1 days ago [-]
Interesting, I haven't heard this claim before. I suppose that claim made sense if their customers were big corporations, not so much when its the masses generating bootleg software copies.
silon42 2 days ago [-]
Cleverly compressing and decompressing doesn't de-copyright it. ... and if it's not the same who'd trust it.
mannanj 1 days ago [-]
I remember hearing something about this. Reminds me of the many lies that political candidates make to garner interest and approval. Except who's holding them accountable - like there's not even a list anywhere tracking these lies.
llm_nerd 2 days ago [-]
Becoming? We've moved entirely in the opposite direction.
When these tools first appeared the overwhelming conversation was about the risk of letting a remote tool siphon your code and intellectual property (where eventually they're going to add that to their training). Now everyone is using them, and that fear seems to have dissolved. Every corporation is sprinkled with Claude Code, Antigravity, Copilot, Codex, and so on. Even the long fear-mongered Chinese providers are being heavily used in many spaces.
In this case this is a PR battle between two firms, and it isn't much more. And Alibaba isn't worried about the "proprietary code" (the truth is that there is incredibly little interest in most orgs code), but that the tool is a backdoor, or at least that is the claim.
DanielHB 2 days ago [-]
> there is incredibly little interest in most orgs code
I think from a commercial perspective yes, but access to source code is very good for finding exploits which could be very valuable for governments. I could also see a future where companies are directly cyber-attacking competitors in hostile markets too...
otabdeveloper4 2 days ago [-]
> and that fear seems to have dissolved
Until the first big incident, yes.
impartshadow 1 days ago [-]
[flagged]
synapsehire 1 days ago [-]
[flagged]
gomilesfd 23 hours ago [-]
[flagged]
aivisibility96 1 days ago [-]
[flagged]
mbmbn 1 days ago [-]
[flagged]
HlessClaudesman 2 days ago [-]
[flagged]
ampersandwhich 2 days ago [-]
I think we should start calling it "distillation terrorism" just to make it sound even more absurd.
InsideOutSanta 2 days ago [-]
It's pure model murder, and if you call it anything else, you're an anti-American communist.
lelanthran 2 days ago [-]
> Translation: Alibaba will continue distillation attacks using accounts that aren't directly attributable to it's own corporate infrastructure.
What's a "distillation attack"? How is it different from simply distillation?
kouteiheika 2 days ago [-]
It's pretty much the same as when "installing programs on your computer" is called "sideloading". Deliberately deceptive, weaponized language to make it seem like a bad thing.
dizhn 2 days ago [-]
The target doesn't want to be distilled.
julianlam 1 days ago [-]
You wouldn't distill a car.
HlessClaudesman 1 days ago [-]
I would distill all the cars.
lelanthran 1 days ago [-]
> The target doesn't want to be distilled.
So?
Fraudsters don't want to be jailed, their victims don't want to be scammed, employees don't want to be laid off, etc.
What the target wants is irrelevant - what society wants as enforced by laws is what is relevant, and as the leading AI providers have demonstrated, simply grabbing other people's copyrighted stuff for learning purposes is perfectly fine!
If they already think this practice is fine, why would I believe that their concerns about this are real?
dizhn 1 days ago [-]
I was only describing the difference not taking a side.
TZubiri 1 days ago [-]
using infected machines as proxies would be a fair line in the sand
RobotToaster 2 days ago [-]
(Mis)anthropic already performed "distillation attacks" on the internet.
vorticalbox 2 days ago [-]
i can see why they want to stop it but
1. you have to pay for the "attack"
2. these AI companies trained on copyrighted content without permission or attribution to anyone who's data was used to train.
exe34 2 days ago [-]
As long as they're paying for the tokens, there's no attack
. Otherwise you have to call training on copyrighted material theft.
feverzsj 2 days ago [-]
They are not paying for most tokens. The actual users in China do. All they need is the logs.
InsideOutSanta 2 days ago [-]
Anthropic still gets paid.
Unlike the vast majority of people Anthropic stole from.
dizhn 2 days ago [-]
In that case it's already bought and paid for by the users, is it not?
vrganj 2 days ago [-]
Did Anthropic perform "distillation attacks" when they hoovered up the entire internet?
surgical_fire 2 days ago [-]
How exactly the word attack fits in that phrase?
feverzsj 2 days ago [-]
Considering their massive distillation, if US companies stop publishing new models to the public, would China still be able to develop new open weight models?
bel8 2 days ago [-]
I don't think China would strugle to scrape the internet for fresh data.
And they constantly publish state of the art LLM research (see DS4 context compaction and cache tech).
They have very capable tech giants. So while not being able to distill western models would probably have some impact, it's probably becoming lesser as time passes.
We might even see Western LLMs distilling Chinese models soon. If they aren't already to some extent.
hnfong 1 days ago [-]
Everyone distills/copies training data.
A couple months ago when Anthropic was complaining about Chinese distillation, people found that Claude self-identified as "DeepSeek" when asked in Chinese:
It's really a fiasco of massive hypocrisy at this point.
tristanj 2 days ago [-]
Yes, 100%. GLM 5.2 is capable of RSI. It's too late to stop.
bdcravens 1 days ago [-]
Look at all of the software that has been developed as an alternative (and often an upgrade to) software in the west. (Baidu, Wechat, etc)
Many of the top AI researchers at western companies are from China, and many are returning.
VortexLain 1 days ago [-]
Depends on a lab, but they do have plenty of compute and engineering. So this would only slow down the progress.
pjmlp 2 days ago [-]
Of course, it is like any other kind of weapon system, eventually the knowledge gets acquired.
margorczynski 2 days ago [-]
China has most probably already achieved "escape velocity" on the software side. Now if they achieve parity, to some degree at least, on the hardware side with Nvidia it is very possible they'll overtake the US.
realusername 1 days ago [-]
It doesn't matter, the only models getting compared are the public ones.
If Anthropic had a super secret model that nobody has access to, I'm not sure why I should care about it since I can't access it.
surgical_fire 2 days ago [-]
Probably yes.
More than a year ago, when Anthropic and OpenAI started to hide the reasoning bits from the output, a lot of people here on HN predicted that Chinese models days were numbered.
Fast forward to today, and models such as DeepSeek and MiMo are nothing short of excellent. I haven't used GLM or Qwen but heard very good things about them as well.
This "massive distillation" sounds a lot like anxiety about how companies from outside the US can develop very good models themselves.
VortexLain 1 days ago [-]
In my personal, subjective opinion GLM-5.2 is on par with GPT-5.3
Jeff9James 1 days ago [-]
Story of Z.ai:
use claude-code
see how good it is
send 100k bots to distill fable 5 (GLM 5.2 is the result of this)
release Zcode
ditch claude-code
ban claude-code
codedokode 1 days ago [-]
The outcome is that we get either free or cheaper model. Good work.
kgeist 1 days ago [-]
Fable 5 was released on June 9 and removed on June 12. GLM-5.2 was released on June 13. It would be an amazing feat to make a model SOTA in just 3 days but I highly doubt it. It's more like z.ai released an existing checkpoint earlier than planned to capitalize on the news
It is likely that the US will get a live feed from each AI provider that they are inspecting in real time to identity things of interest, terrorist attacks or foreign government planning or even foreign companies competitive to key US companies.
It will give them access to the though process in those companies as well as much of their text-based IP (source code, docs, meeting transcripts, etc)
Also if you are using local AI that you didn’t train yourself you can never be sure it doesn’t have purposeful biases in its reasoning that may disadvantage you - such as directing you away from certain plans or ideas or patents etc.
The whole "hosted AI" business feels like like a huge violation of corporate norms on confidentiality. Businesses that would have your head for printing out a source file to reference and annotate are encouraging developers to feed in huge amounts of proprietary code and data, and incorporate changes suggested from an outside party with minimal vetting. Evidently whatever privacy policies they've been throwing at enterprise users are plated with mithril.
At some point, one of the big services is going to get popped, and it won't just be a data breach. There's too much opportunity to quietly use the system as a malware distribution hub. Every vibe-coded dashboard suddenly starts depending on some weird left-pad fork that, 12 dependencies deep, is running a keylogger or Dogecoin miner. Your payment processor suddenly starts accepting the Konami code to approve a transaction.
A local model you trained yourself seems about as good as you can do today.
But it may not even be possible to fully trust a model you trained if you used untrusted data during training.
As a user, you have to trust your coding agent AND inference provider AND models: https://jacob.gold/posts/coding-models-are-code/ https://www.anthropic.com/research/sleeper-agents-training-d...
It's unfathomable to me that EU companies don't take the risk of industrial espionage from US more seriously
Of course those are largely the same companies that receive emails via outlook, manage company-wide SSO in Microsoft Entra, put their files in Sharepoint and track software and maintenance issues in Jira ... I'm not sure how much much info there is left that isn't already combed through by NSA and friends
There might be some valid concerns about model alignment, but at least the model running in-house isn't going to conduct espionage.
Also, https://en.wikipedia.org/wiki/Whataboutism
If a token compresses to around a byte, worldwide AI input and output is around 1 gigabyte per second.
For any intelligence agency, they can afford to keep and store all of that forever, and later do analysis on it.
At the scale the AI companies are operating at, I think it isn't likely that they are sucking it all in right now.
More likely I think the intelligence agencies will get a real-time live tap into the raw data feed which they will process onsite for interesting things and then if things are flagged, they will log it in the intelligence agency systems.
The cost to do so for many years is less than one employee.
This only really matters in a world where Prompt Injection and Jailbreaking isn't trivial in the first place though. All current models are still extremely exploitable.
I strongly suspect we are only scratching the surface of activation engineering at the moment, and there's plenty of very targetted ways of lobotomizing or cracking LLMs if you understand the model in detail.
Not impossible I agree, but seems like a really impractical way to ship a trojan while much weaker channels exist.
Can you name one open source US model that came out in the last year?
My favorite conspiracy is that three letter agencies keep pushing the conspiracy that they are omni-present with access to everything. Same as parents telling their kids Santa is watching, and leaders telling adults God is watching. Its extremely effective control and millennia old at this point.
The reality is much more banal that they still need warrants and tech companies hate playing police/evidence servant for the government (it consumes a ton of resources and pays nothing).
The snowden leaks revealed that's not the case.
The three letter agencies can just issue national security letters without a judge ever seeing it, and those come a long with a gag order (plus other workarounds like just buying data from brokers, and how US communications can get swept up just by virtue of communicating with a foreign national outside the US).
You're right, they aren't omniscient in the way we imagine of a room full of people monitoring everything in real time. But to pretend they aren't passively collecting massive amounts of data is dangerous. Snowden showed us PRISM, with all major tech companies participating. They do effectively have a live, unrestricted wiretap to the internet and if you happen to be a person of interest, they will just send out NSLs and get all your communications that are not fully E2EE without you even knowing thanks to the gag order.
I'll provide some helper information to get the ball rolling (see page 42)[1]
[1]https://www.intelligence.gov/assets/documents/702-documents/...
All the other prime suspects are in the report too for the curious.
I will not elaborate how I know, but that is not even directionally correct. But these are not even secret things that can’t be known simply through the Snowden, Wikileaks, and Vault7 releases. So why are you telling yourself this? Are you still wet behind the ears or something?
There are people who know exactly how governments do not in fact need warrants and the tech companies don’t even really know they are servants to the government, let alone which one. That’s how things are done. The less surface area the better.
that's why you should use abliterated heretic models
The timezone fetch was to alter program behaviour at runtime, not to send arbitrary timezones for tracking reasons.
It was one way of detecting if it was a chinese person using the program and then behaving differently.
Malware behaves this way. STUXNET for example was wired to do nothing except propagate unless the environment had the right conditions.
Most services I know that are trying to block abuse do collect device info
Even hotel and flight websites work like that, they determine your ability to pay based on your location, wall clock time and device OS - and FSM knows whatever else.
Are they malware too, basically STUXNET?
The issue is that by distilling Claude, Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP
Also, you can't copyright AI outputs. So worst case they violated the ToS.
(granted, only meta got caught using Anna's Archive, but it seems safe to assume it's common practice. And even if it wasn't, the websites in Common Crawl are still covered by copyright)
Fwiw, I think the concept of IP in general is counter to human progress.
Sure, one person gets rewarded more with the IP system. But at the same time, that breakthrough then can't be built upon by others.
Overall, I think it does more harm than good because of how it monopolizes technologies and ossifies development.
I think free sharing of knowledge will always beat intellectual stinginess.
Outside of military technologies they had massively fallen behind the west by the 80s. Without the western tech they licensed or copied they were permanently stuck in the 50s. Even their crappy cars were licensed copies of cheap European cars from the 60s.
When it comes to consumer electronics, vehicles and a bunch of other things they were comically behind. So it’s really not a good example..
> monopolizes technologies and ossifies development.
As bad as it might be empirical evidence shows that historically a superior system has never existed (it might be feasible but everything that was tried underperformed).
Good grief. All one has to do is look at how humanity has consistently progressed due iterating on what has existed is how we progress, not whether some corporation that wants to rat fuck us all for a few pts in share value.
Progress was extremely slow until the 1800s. Coincidentally corporation and modern capitalism in general developed around the same time. Of course I’m not necessarily saying it was the main or direct course since it isn’t exactly possible to create an experiment comparing it to other systems (of course that was tried an failed completely in the USSR, Maoist China and similar places)
Capitalism was side effect as well.
> was side effect as well
A side effect of what exactly?
Historically most evidence seems to point to the contrary.
Amongst other things after the printing press was created it was impossible for anyone who was an author to survive from their work unless they were independently wealthy or had rich patrons.
In any case there's still a difference between publicly available copyrighted data and whether you can use it for model training, and the innovation around model training, RLHF, etc which you presumably have some interest as a country to allow companies to invest in with some legal protections (like the diff between patent law vs copyright law)
There are many cases in the early 2000s were copyright protections were relaxed for tech advancements
As frustrating as the anti-AI crowd can be, I see why they end up that way when the valley is full of opinions like this.
When they bulldoze the house to pave the highway, they toss the homeowner a few bucks. If you take an author’s books do you owe him a share of OpenAI?
You come with the selfless proposal that everyone give to the poor $tn companies”for the good of humanity”. I’ll assume this is just hopelessly naive but you post so insistently that it makes me wonder.
> disrespect of IP
Nobody other than Anthropic cares.
Why is this any worse than Anthropic's disrepect of IP? You've apparently drawn a distinction between the two here, but I'm failing to see what it actually is.
Search engines for example historically ignored copyright law by copying excerpts or serving other site images, it doesn't mean someone copying Google's code has some moral frepass
Copyright law is a subset of IP law. What IP is being infringed upon here?
> Search engines for example historically ignored copyright law by copying excerpts or serving other site images
Excerpts are often considered fair use, but it depends on country.
> it doesn't mean someone copying Google's code has some moral frepass
Nobody copied Anthropic's code. They used it's output to train another model. At most they violated some terms of service.
Did they maybe abuse Anthropic's subsidised pricing? Sure. But that's what happens in a free market if you sell below cost.
That had happened progressively, thumbnails for example were ruled as fair use later on, DMCA safe harbor was a huge gift for tech companies because otherwise it would curtail the ability to create platforms (relaxing copyright protections in exchange of innovation)
> Nobody copied Anthropic's code. They used it's output to train another model. At most they violated some terms of service
Distilling a model is a method that can push the entire market to low margins and prevent companies from making money off such research. It also copies the Anthropic special parts (RLHF and other specific methods) rather than the "copy of the entire web" part
This is similar to what happened with Chinese reverse engineering of American manufacturing or PC clones killing IBM PCs.
Is it in the interest of the USA, probably no, that's why I assume this will be backed by law eventually
Then it's on Anthropic to actually price their models accordingly so that distilling isn't profitable. Why does this need a legal remedy when market forces could easily resolve this?
> Is it in the interest of the USA, probably no
Good. The world needs to diversify away from dependence on US technology.
In my opinion further strengthening the CCP is a disaster for the world. A government that killed millions of its own citizens to stay in power is not who I would entrust super intelligence with. But apparently we are not going to agree on that
Generally Communist nations historically favored technological development to human life in the scale of millions, keep that in mind when we enter a new economic revolution
On a related note, around 300k people die in the US every year due to causes directly attributable to poverty. [0]
In other words, ~a million every three years.
Now what?
[0] https://pmc.ncbi.nlm.nih.gov/articles/PMC10111231/
Vs a study which suggests 183-300k where you take the high end and seem to read about the vast range of causes of which a lot are not very much attributable to poverty.
It's not that they can't be taken care of. It's just that it'd cost money and eat profit margins.
It's an active political and ideological choice to let them die, same in both cases. Just happens to be an ideology you agree with and feel a need to defend.
The 180k are currently poor people, the 300k people poor over the past 10 years (which includes the 180). Didn't take the larger number, took the one that more accurately applied.
I'm noticing you're trying really hard to deflect though.
If both the USSR and the CCP had millions killed in the process of modernization, without stopping when knowing the death toll, maybe there's intent after all?
How would you describe the cultural revolution then? another case of economic mismanagement?
Is there intent there as well?
Later when it was found out mass starvation to the point of wide spread cannibalism was happening they would not stop, yet continue sending their troops to steal the peasant's food and execute anyone who resisted.
We also of course have Mao quoted as saying "Death has benefits; fertilizer is created", "Half of China may well have to die", as many other quotes showing they weren't really interested in the millions that will die.
This culminating in 30 million dead in 4 years, which if we take your study at face value will take the US 100 years to achieve through "poverty related deaths", which even communist utopias have.
Regarding the Cultural Revolution I assume you agree this was intentional murder of a million+, although you seemed to challenge that in my original posting. Our current discussion seem to focus on intent even though I said "killing" which usually does imply reduced intent.
Regarding the Great Leap Forward, I do believe the CCP would prefer no deaths, and a lot of the deaths relate to strongman idiotic ideas without challenge, as is endemic in ideological totalitarian societies. However, I believe they fully knew what they were going into, without taking into account the refusal to stop, and therefore these were horrendous crimes.
40 years on, when the CCP is leading its people making AI, robotics, drones, EVs, space station and moon rovers to compete with the US, people like you how never made any change to the world are talking such ideological nonsense.
you live in a history museum or something like that?
Not sure that's the best example as they lost that battle and had to pay, eventually it's been codified in law in most countries.
> No! Don't install that lodash thing without explicit approval from IT. Oh, you want a license for Charles Proxy? Gee, I dunno... we've got a budget to maintain.
Employers in 2023:
> No! You can't use ChatGPT at work – it's a security risk.
Employers in 2024:
> Okay, you can use Github Copilot I guess, but you'll have to endure boring corporate training on what you're allowed to do with it.
Employers with dollar signs in their eyes in 2025:
> We attended a seminar about vibe coding. Why aren't you dumbasses keeping up with the times? Use Claude Code for everything! Don't write any of your own code anymore. We don't even really care if you use yolo mode. Just review code and push 10x more features! Use unlimited tokens! Money printer go brrrrr.
Employers in 2026:
> You mean giving one or two companies full autonomous access to our workstations while stupifying our engineers wasn't a sound business plan?
The confusing part to me is why these companies believed the "AGI" hype, I.E. that OpenAI or Claude's LLM is the ideal white collar slave.
I suppose I can understand that the executive class resents labor enough to make irrational business decisions for the purpose of insulting the workers who design and operate their companies.
That being said, the 2025 AI binge feels like a murder-suicide done by the executives of many of these companies.
Of-course USA is collecting everything, not just from China but everyone.
And same with every one else.
from Teslas not allowed parked around sensitive areas in the city, to blocking a (very famous and quite well-made) Russian antivirus, or Huawei communication stack.
Anthropic is not doing itself any favours with their recent (?) antics [1], so it is completely well-founded to do this imho. regardless, harness as a moat is not quite as established as the underlying models to the same extent.
[1] https://news.ycombinator.com/item?id=48734373
This is a double edge knife. In this specific instance this was absurdely important for that kid's life, but this work both ways. What if the US authorities deemed it necessary to snoop on foreign governments and citizens for political reasons, now leveraging AI to do it at an industrial scale?
One thing is certain though is that assuring privacy isn't top priority for any cloud provider. Companies doing cutting edge, sensitive work should be wary.
Interesting to notice that we can do the same with these models.
[1]https://www.reddit.com/r/ClaudeAI/comments/1ujila1/anthropic...
https://news.ycombinator.com/item?id=48759754
ChatGPT and Claude are not available. Generally my impression is that OpenAI isn't that anal about service providers reselling ChatGPT in Hong Kong, but Anthropic seems to really strict about the "no China" thingy.
Workarounds aside, it says Claude Code not Claude.
i.e. they are using the CLI running any model. You can for instance run GLM with it.
iproyal.com Oxylabs.io
https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro...
That looks a no-nonsense decision, isn't?
Claude Code is neither and it is literally info stealing malware.
Instead the AI companies reached these absurd settlements with publishers that made a mockery out of all the previous copyright enforcement victims.
Some token settlement for an insignificant fraction of their revenue is not in any way a "sanction".
Do they? Or only so far as "if you have 1000x the revenue, you probably also have 1000x the customers that you have wronged, each of which are entitled to damages as well"?
Slavery is an expression of capitalist values.
Frederick Douglass is Karl Marx's ideological predecessor.
You aren't wrong that the European powers were slavers. That being said, the culture of the 13 American colonies is the first appearance of the unique Euro-American culture that dominates the United States.
It's wrongheaded to deny the central importance of slavery to American culture and values.
In the United States, the land of the small business owner, slaves mixed socially and sexually with their captors and with the culture of their captors.
For this reason, the culture of the United States is deeply affected by the institution of slavery in ways that are alien to the European powers.
copyright law literally says something isn’t infringement if it is a novel transformation. I get the jokes and criticism about AI companies fighting and complaining about competitors distilling, but this is a much weirder comparison.
"Anthropic settles with authors in first-of-its-kind AI copyright infringement lawsuit" - https://www.npr.org/2025/09/05/nx-s1-5529404/anthropic-settl...
> However, the judge ruled that Anthropic's use of millions of pirated books to build its models – books that websites such as Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi) copied without getting the authors' consent or giving them compensation – was not.
It seems clear from the article that while the use of pirated works was illegal, the use of copyrighted works (a the work a book is based on is still copyrighted if you buy the book) was fine and transformative.
Does that make my brain copyright infringement? Does Disney now own all my output forever because some small part of me now has Harry Potter embedded?
If you just ignore anything that's inconvenient for your argument, you can make any argument you want.
None of those are relevant factors when it comes to copyright law. You don't get a pass for copyright infringement just because you're not copying the entire work. Same goes for a copy that's transient. You can't set up a bootleg movie theater in your home, even if you delete the movie file afterwards, and there's no trace of the movie aside from the viewers' vague memories.
And yet they very much are. US copyright law has the concept of "fair use" in 17 U.S. Code § 107 [0]. I'll paste here for your benefit, #3 is the one I referenced as most obvious but #1 and #4 are also very relevant:
Naturally remembering some parts of a legally purchased book verbatim is fair use. "Memorizing" the entire library obtained via torrents and incorporating that in a commercial product that can output all that content doesn't sound like fair use to me.The US justice system is too captured and corrupt at this point to take as reference because decisions there are bought by the highest bidder. But for the purpose of this discussion let's not play dumb for the benefit of trillion dollar corporations.
[0] https://www.law.cornell.edu/uscode/text/17/107
If you're going to invoke fair use, that opens up a whole can of worms on what counts as transformative. The google books case and the google thumbnails case shows that you can make near verbatim copies of works at scale and still be considered fair use.
>The US justice system is too captured and corrupt at this point to take as reference because decisions there are bought by the highest bidder. But for the purpose of this discussion let's not play dumb for the benefit of trillion dollar corporations.
This is begging the question. The original question is whether ai companies are getting special treatment. You can't then use that as a premise to say that the courts are tilted towards ai companies. Not to mention it's questionable how ai companies were suddenly able to corrupt all the judges, some of which were appointed decades ago, even though they only got rich a couple of years ago.
Then you switched to a barrage of questions on the premise of words in my comments that were neither written nor implied. If you muddy the waters just enough maybe everyone gets lost in there.
> The google books case and the google thumbnails case shows that you can make near verbatim copies of works at scale and still be considered fair use
Now maybe we agree "reading Harry Potter and remembering some lines" is indeed fair use, but you decided my argument is still not relevant to create a distinction between "reading a book" and "feeding it all into an LLM" because of an vaguely related exception. For better or worse thumbnails are a copyright violation according to some courts [1]. But looking at the big "Books" decision (this is the one you meant?), did you check out the court's opinion [2]? Why would you believe the two cases are substantially similar? Just because they're both big tech? Just for yourself, from the definition of fair use and referencing that opinion, do you see any significant differences between "Google Books" and "big LLM"?
> You can't then use that as a premise to say that the courts are tilted towards ai companies
The highest bidder is what I said.
> Not to mention it's questionable how ai companies were suddenly able to corrupt all the judges, some of which were appointed decades ago, even though they only got rich a couple of years ago.
You're getting creative" about what I wrote. "AI companies"? They are just the big corrupting agent of the day, and nobody with deep enough pockets had "revolutionized" the legal areas they're working in to this degree until now. Tech in general has been doing it for a few decades already. Other incredibly powerful industries have been doing that in their respective areas for even longer. "Suddenly"? The US justice system has worked exactly like this for so many decades when it came to the interest of very deep pockets. "All judges"? I said "the system" because all judges don't have the ultimate power to ultimately decide on things.
I'm surprised at your surprise that reading a book is fair use, and that courts have been "captured" and beholden to economic interests above justice for so long we forgot when it started.
[0] https://news.ycombinator.com/item?id=48774664
[1] http://www.linksandlaw.com/news-update59-thumbnails-germany-...
[2] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,...
No, and neither do LLM's. They're trained on vast quantities of data and retain only a fraction of it.
You might think of it as very, very lossy compression that generates new outputs rather than the original input unless something unintentional happens.
> If you just ignore anything that's inconvenient for your argument, you can make any argument you want.
I'm not. I just understand how it actually works. You either don't understand or are deliberately ignoring that what you just said is literally and technically untrue to make some sort of political statement.
Somewhere between the two a line must be drawn… where we’d want to put that line, I guess, if up for quibbling. But it doesn’t seem obvious to me.
The google books and google thumbnails cases have so far upheld that even mechanical reproductions are allowed, depending on the context/usage.
Sometimes they go a bit wonky and overtrain on specific phrases which can result in verbatim copies of brief sections of coontent. Thats a bug, not a feature.
Humans reading or watching copyrighted material isn't considered "making a copy" for the purposes of copyright law. Machines doing so generally is.
if you prompt it to, yes. just like your browser dutifully navigates to any copyright-infringing resource and GETs and POSTs whatever you ask of it.
(also it can't, not really, only small snippets before going off rails. LLMs aren't magic, they can't losslessly compress an exabyte of training data into a few terabytes of weights.)
As for your "technically not copyright infringement" defense. Those laws are from a time when those patterns couldnt be derived and dostributed at scale. A human had to learn and teach them. That made it different. The scale enabled my modern tech makes it a whole dofferent situation. The same way how one person standing a street corner people watching for a bit isnt that bad, but a whole constellation of flock cameras costantly montioring everyones movements and making it available to any of their customers is really really bad. The law will have to catch up to this
Nos for the same reason that me giving you a word cloud of the frequency of words within Harry Potter isn’t infringement. It’s a novel transformation.
Particularly in the US there’s a four point test and the very first point:
> To justify the use as fair, one must demonstrate how it either advances knowledge or the progress of the arts through the addition of something new.
I don’t know anything that has advanced the knowledge and progress of the arts more.
> The third factor assesses the amount and substantiality of the copyrighted work that has been used. In general, the less that is used in relation to the whole, the more likely the use will be considered fair.
This isn’t about usage in training. This would be in the LLM itself - the copyrighted works are very rarely used in the output.
> The fourth factor measures the effect that the allegedly infringing use has had on the copyright owner's ability to exploit his original work
Would you sincerely claim that owners have become less able to make money because of LLMs? Those same owners using LLMs to increase their own output of copyrighted works?
Anyway, copyright is not an absolute right and you have to really misunderstand copyright law to claim that LLM training infringes it.
https://en.wikipedia.org/wiki/Fair_use
If so, why do we still pay for games and movies?
this is an incorrect interpretation (in the usa, at least).
downloading a game/movie is still the creation of unauthorized copy, which is not allowed. not to mention that playing/watching does not count as a "novel transformation".
(17 U.S.C. § 106 and 17 U.S.C. § 501 are the relevant pieces of reading)
ISPs and trigger-happy law firms don't send you a C&D for downloading a torrent, they do so for seeding a torrent. It's just that practically nobody "just seeds" a torrent so people colloquially claim they got busted for downloading a torrent.
In theory this means if you torrent as a 100% leecher and turn off seeding from the get-go, you should be in the clear. But nobody sensible would dare test the extent of German Legal Spite, much less do so repeatedly to science the shit out of it.
If you can download through another protocol, say HTTP, however---<Sendung unterbrochen!>
So if you pirate a bunch of content you still get in trouble for that. But if you somehow make a business out of that that isn’t just redistributing those materials, then that business itself isn’t infringing.
It absolutely is. That's textbook copyright infringement. Doing it for commercial purposes elevates it to criminal copyright infringement.
> Instead the AI companies reached these absurd settlements with publishers that made a mockery out of all the previous copyright enforcement victims.
Isn't that at least something? How many people pirating software ever settled with the companies they "victimized?"
> then used that material to generate billions of dollars which they kept for themselves?
Hasn't it also lead to distilled, free and open models that everyone can benefit from?
A courtesy. There was never any need to justify it.
> Isn't that at least something?
Yes, it's a joke. Why do they get to infringe copyrights with impunity while normal people get destroyed? Either go after them like the copyright industry always does and punish them properly, or abolish copyright straight up. This "rules for thee but not for me" nonsense is straight up disgusting.
> How many people pirating software ever settled with the companies they "victimized?"
Too many to list. Also, nobody is victimizing billion dollar corporations.
I mean, what is this? Their balls suddenly drop off? They only have the audacity to prosecute random people? Smaller companies? When they're up against trillion dollar AI companies they suddenly become cowards? That's so incredibly disgusting, and it made me completely lose even the small amount of respect for copyright that I had managed to rationalize over the years.
So either enforce the law the same way against everyone correctly and proportionally, or your law and its enforcement are illegitimate and shouldn't exist. If some activity is harmless enough for some billionaires to do at massive scales and settle in court like it was some footnote in history, then nobody should be punished for it at all.
No that's not something. That's just having infinitely more money to fight legal battles.
The AI companies knew that and bet, correctly, that it would be worth the cost.
1. The copyright infringement of big corpos fully justifying my copyright infringement in the face of law
2. The copyright infringement of big corpos being prosecuted in the same exact way as my copyright infringement would.
There is really no middle ground.
But Eisenhower was right:
> In the councils of government, we must guard against the acquisition of unwarranted influence, whether sought or unsought, by the military-industrial complex. The potential for the disastrous rise of misplaced power exists and will persist.
This isn't even about a single person or personality. Very few people in such position could stand fast by their moral code. In any case, an environment that favors profit above everything will naturally select for individuals who are unencumbered by such hindrances.
There might've been 100s of Altmans and Amodeis who had a strong moral code but we don't know about them because they dropped out of the "race" because of said moral hurdles.
I think appropriate attribution is a moral code, but I am not able to attribute every idea I have to all those who helped me develop the general intelligence that I use to develop such ideas.
Exactly. Dairy farms optimise for milk production so favour cows that produce the most milk.
The market economy optimises for profit so favours those most willing/able to generate it. Zuckerberg, Musk, Thiel, Andreesen and co are products of the system.
terrifying
If you look at it with your eyes crossed, Anthropic and the chinese are doing the same thing.
If you look at it with nuance 1 the chinese are doing way worse stuff, and 2 stealing from a thief would still be stealing
1. The chinese are making multiple accounts (at least 49,000)[1][2], using proxies/VPNs, possibly using residential computers and infected computers (unless you think the chinese are doing due diligence to ensure their purchased IPs are kosher). All accounts need to be created with a real name, and especially so if the paid models need to be accessed and paid with a credit card. So this is beyond IP theft and getting closer to fraud. These are all techniques that are well studied because they are used by criminals and cybercriminals, textbook stuff. Consider if that was not sufficient, that China is banned from using the product, so they need to use identities and locations not just to avoid relating the accounts between themselves, but merely to allow account creation. What identities are they using to create accounts.
Compare this to Anthropic which reads notes made a deal in an IP theft case paying billions because they bought books and scanned them but buying the books wasn't sufficient retribution for the authors. Or that they gasp scanned the internet, like Google.
Not having nuance to see the difference between the two companies is something I expect of the twitter echo chamber copying hot takes for upvotes, not hacker news.
[1] https://arstechnica.com/tech-policy/2026/06/anthropic-claims... [2] https://www.anthropic.com/news/detecting-and-preventing-dist...
Anthropic seems to want to both own and eat its stolen cake.
> stealing from a thief would still be stealing
Stealing from a thief hurts thief industry which is a win for society.
> The chinese are making multiple accounts
Not a crime. AI companies also ignore robots.txt and applicable laws when illegally copying copyrighted material from websites to their servers without author permission.
You are welcome to study the law of any country. A crime against a criminal is still a crime.
>applicable laws when illegally copying copyrighted material from websites to their servers without author permission.
If the material is distributed in http without authentication, isn't that sufficient authorization from the distributor? I would think the search + web crawler era would have set plenty of precedent for this.
>Not a crime. AI companies also ignore robots.txt
Breach of contract is not a crime, agreed.
How about identity fraud (accounts by identity proxy, document KYC), computer crime (C&C residential proxies), conspiracy.
And after the June US directive to suspend Chinese access, smuggling, false statements to regulated entity.
These are all criminal charges that are presumably not levied because of the adversarial relationship between those countries. But if this happened in the US you would probably be seeing at least a civil claim and potentially criminal charges. Hell if this were in any other western country you would see the same. Consider CloudFlare vs Spain, much lighter criminal accusations, and there's already a criminal investigation brought where the CF CEO is indicted.
Non-trivial lack of nuance when you can distinguish between a domestic civil case and a criminal international case between 2 world powers with great judicial tension.
On the book case, a class action case was brought to court and it was settled. There's no use in bringing it up further, it has been settled, and it bears no relation to the Anthropic v China case.
You like programming? Think of encapsulation, imagine if you had to think about f(x) but someone brings up y, now you have to think about f(x,y) and what other parameters might bear relationship? The law simplifies by compartimentalizing. And it doesn't even bear a tradeoff, judgment(case1,case2) isn't better than judgment(case1)+judgment(case2).
If the damnified have considered the matter settled, why would it matter what third parties have to say about it? Third parties would have pushed for more compensation, or ownership of the derivate product. If you feel damnified yourself you can open a case and explain why the actions of Anthropic have hurt you personally.
Otherwise it is a matter that doesn't concern the general public, we have no say in it and there is no right to be offended on behalf of parties that have already settled the argument.
They seem to have given themselves license to do what they like, but _God forbid_ they're called out for acting less-than-honourably.
Most adults around the world can associate actions and consequences. The incomprehension and entitlement here speaks volumes about the moral and emotional maturity of the Chinese Communist Party and their political system.
I think that the reasoning is: they trust the git company (whatever it is) not to sell their code. They are worried that their code goes into a model and somebody else could ask the model "write a service like XYZ" and it will regurgitate their code.
GitLab even has a free self-hosted version, and it has a number of advantages (like being able to actually have a structure with inherited secrets and accesses, and no, GitHub Organisations do not count and suck). And for years thanks to GitLab-CI it was clearly ahead.
In terms of WHAT you need to be concerned about, it seems it goes far beyond code, and far beyond having to trust your model provider.
A coding agent with access to a bash tool is going to have access to anything that a human with a bash prompt would, and even if you try to provide a nailed down sandbox environment for the agent, you still need to be concerned about things like unencrypted passwords and keys that it may be able to find "laying around" in code or databases/etc it has access to.
I'm surprised there haven't yet been more widely disseminated stories about coding agents and claw-bots wreaking havoc.
When these tools first appeared the overwhelming conversation was about the risk of letting a remote tool siphon your code and intellectual property (where eventually they're going to add that to their training). Now everyone is using them, and that fear seems to have dissolved. Every corporation is sprinkled with Claude Code, Antigravity, Copilot, Codex, and so on. Even the long fear-mongered Chinese providers are being heavily used in many spaces.
In this case this is a PR battle between two firms, and it isn't much more. And Alibaba isn't worried about the "proprietary code" (the truth is that there is incredibly little interest in most orgs code), but that the tool is a backdoor, or at least that is the claim.
I think from a commercial perspective yes, but access to source code is very good for finding exploits which could be very valuable for governments. I could also see a future where companies are directly cyber-attacking competitors in hostile markets too...
Until the first big incident, yes.
What's a "distillation attack"? How is it different from simply distillation?
So?
Fraudsters don't want to be jailed, their victims don't want to be scammed, employees don't want to be laid off, etc.
What the target wants is irrelevant - what society wants as enforced by laws is what is relevant, and as the leading AI providers have demonstrated, simply grabbing other people's copyrighted stuff for learning purposes is perfectly fine!
If they already think this practice is fine, why would I believe that their concerns about this are real?
Unlike the vast majority of people Anthropic stole from.
And they constantly publish state of the art LLM research (see DS4 context compaction and cache tech).
They have very capable tech giants. So while not being able to distill western models would probably have some impact, it's probably becoming lesser as time passes.
We might even see Western LLMs distilling Chinese models soon. If they aren't already to some extent.
A couple months ago when Anthropic was complaining about Chinese distillation, people found that Claude self-identified as "DeepSeek" when asked in Chinese:
https://x.com/stevibe/status/2026227392076018101
It's really a fiasco of massive hypocrisy at this point.
Many of the top AI researchers at western companies are from China, and many are returning.
If Anthropic had a super secret model that nobody has access to, I'm not sure why I should care about it since I can't access it.
More than a year ago, when Anthropic and OpenAI started to hide the reasoning bits from the output, a lot of people here on HN predicted that Chinese models days were numbered.
Fast forward to today, and models such as DeepSeek and MiMo are nothing short of excellent. I haven't used GLM or Qwen but heard very good things about them as well.
This "massive distillation" sounds a lot like anxiety about how companies from outside the US can develop very good models themselves.
use claude-code see how good it is send 100k bots to distill fable 5 (GLM 5.2 is the result of this) release Zcode ditch claude-code ban claude-code