I'm reminded of a time that an intern took down us-east1 on AWS, by modifying a configuration file they shouldn't have had access to. Amazon (somehow) did the correct thing and didn't fire them -- instead, they used the experience to fix the security hole. It was a file they shouldn't have had access to in the first place.
If the intern "had no experience with the AI lab", is it the right thing to do to fire them, instead of admitting that there is a security/access fault internally? Can other employees (intentionally, or unintentionally) cause that same amount of "damage"?
There is a huge difference between someone making a mistake and someone intentionally sabotaging.
You're not firing the person because they broke stuff, you are firing them because they tried to break stuff. If the attempt was a failure and caused no harm, you would still fire them. Its not about the damage they caused its that they wanted to cause damage.
But for damaging company assets on purpose firing is only first step.
I do not see any mention of other legal action and article is shallow.
It might’ve been that someone in command chain called it “malicious” to cover up his own mistakes. I think that is parent poster point while writing out Amazon story.
Maybe, but without any other info, i kind of have to take the info provided at face value. Like obviously if the article is inaccurate the whole situation should be viewed differently.
From what I've seen in Amazon it's pretty consistent that they do not blame the messenger which is what they consider the person who messed up. Usually that person is the last in a long series of decisions that could have prevented the issue, and thus why blame them. That is unless the person is a) acting with malice, b) is repeatedly shown a pattern of willful ignorance. IIRC, when one person took down S3 with a manual command overriding the safeguards the action was not to fire them but to figure out why it was still a manual process without sign off. Say what you will about Amazon culture, the ability to make mistakes or call them out is pretty consistently protected.
> From what I've seen in Amazon it's pretty consistent that they do not blame the messenger which is what they consider the person who messed up
Interesting that my experience has been the exact opposite.
Whenever I’ve participated in COE discussions (incident analysis), questions have been focused on highlighting who made the mistake or who didn’t take the right precautions.
I've bar raised a ton of them. You do end up figuring out what actions by what operator caused what issues or didn't work well, but that's to diagnose what controls/processes/tools/metrics were missing. I always removed the actual people's name as part of the bar raising, well before publishing, usually before any manager sees it. Instead used Oncall 1, or Oncall for X team, Manager for X team. And that's mainly for the timeline.
As a sibling said you were likely in a bad or or one that was using COEs punatively.
It was one of the STEP interns that took down Google prod by modifying some config file by putting something erroneous into an automated tool. Everyone at the company was locked out, and someone had to physically access some machines in a datacenter to recover.
Malicious intent to be precise. Well-intentioned attempts to demonstrate issues for the purposes of helping to fix should generally not be punished, unless there is a wider fallout than expected and that can be attributed to negligence.
afaik this was intentional in that they stopped training runs and changing parameters for other employee training runs, and even joined in on the debugging group trying to solve the "issues".
One thing I suspect investors in e.g. OpenAI are failing to price in is the political and regulatory headwinds OpenAI will face if their fantastical revenue projections actually materialize. A world where OpenAI is making $100B in annual revenue will likely be a world where technological unemployment looms quite clearly. Polls already show strong support for regulating AI.
I'm trying to think of whether it'd be worth starting some kind of semi-Luddite community where we can use digital technology, photos, radios, spreadsheets and all, but the line is around 2014, when computers still did the same thing every time. That's my biggest gripe with AI, the nondeterminism, the non-repeatability making it all undebuggable, impossible to interrogate and reason about. A computer in 2014 is complex but not incomprehensible. The mass matrix multiplication of 2024 computation is totally opaque and frankly I think there's room for a society without such black box oracles.
Rumor says an intern at ByteDance was jailed for sabotaging their GPU cluster. Over 8000 H100 GPUs ran corrupted code for a month , all because he was frustrated with resources being diverted from his research to a GenAI project.
was told the intern used a bug in hugginface's load ckpt function to inject bad code. The code randomly change other tasks' parameter and get them sleep, only targeting training tasks using more than 256 cards
You could track down the direct Chinese rumor, but you'd have to leave the cyber basement. Big nono for HN, it can't even eat Americanized Chinese digital food like TikTok ( Chinese version - https://portal.sina.com.hk/others/sina/2024/10/20/1013680/%E... )
I'm reminded of a time that an intern took down us-east1 on AWS, by modifying a configuration file they shouldn't have had access to. Amazon (somehow) did the correct thing and didn't fire them -- instead, they used the experience to fix the security hole. It was a file they shouldn't have had access to in the first place.
If the intern "had no experience with the AI lab", is it the right thing to do to fire them, instead of admitting that there is a security/access fault internally? Can other employees (intentionally, or unintentionally) cause that same amount of "damage"?
There is a huge difference between someone making a mistake and someone intentionally sabotaging.
You're not firing the person because they broke stuff, you are firing them because they tried to break stuff. If the attempt was a failure and caused no harm, you would still fire them. Its not about the damage they caused its that they wanted to cause damage.
But for damaging company assets on purpose firing is only first step.
I do not see any mention of other legal action and article is shallow.
It might’ve been that someone in command chain called it “malicious” to cover up his own mistakes. I think that is parent poster point while writing out Amazon story.
Maybe, but without any other info, i kind of have to take the info provided at face value. Like obviously if the article is inaccurate the whole situation should be viewed differently.
From what I've seen in Amazon it's pretty consistent that they do not blame the messenger which is what they consider the person who messed up. Usually that person is the last in a long series of decisions that could have prevented the issue, and thus why blame them. That is unless the person is a) acting with malice, b) is repeatedly shown a pattern of willful ignorance. IIRC, when one person took down S3 with a manual command overriding the safeguards the action was not to fire them but to figure out why it was still a manual process without sign off. Say what you will about Amazon culture, the ability to make mistakes or call them out is pretty consistently protected.
> From what I've seen in Amazon it's pretty consistent that they do not blame the messenger which is what they consider the person who messed up
Interesting that my experience has been the exact opposite.
Whenever I’ve participated in COE discussions (incident analysis), questions have been focused on highlighting who made the mistake or who didn’t take the right precautions.
I've bar raised a ton of them. You do end up figuring out what actions by what operator caused what issues or didn't work well, but that's to diagnose what controls/processes/tools/metrics were missing. I always removed the actual people's name as part of the bar raising, well before publishing, usually before any manager sees it. Instead used Oncall 1, or Oncall for X team, Manager for X team. And that's mainly for the timeline.
As a sibling said you were likely in a bad or or one that was using COEs punatively.
In the article's case, there's evidence of actual malice, though-- sabotaging only large jobs, over a month's time.
That was not the idea of COE ever. Probably you were in bad org/team.
It was one of the STEP interns that took down Google prod by modifying some config file by putting something erroneous into an automated tool. Everyone at the company was locked out, and someone had to physically access some machines in a datacenter to recover.
The difference in this case is intent.
Did the employee have the intent to cause damage? If so just fire him/her.
Malicious intent to be precise. Well-intentioned attempts to demonstrate issues for the purposes of helping to fix should generally not be punished, unless there is a wider fallout than expected and that can be attributed to negligence.
I'd like to learn more about the AWS incident, but when I google "us-east1 intern" I get this comment. Do you have a link?
afaik this was intentional in that they stopped training runs and changing parameters for other employee training runs, and even joined in on the debugging group trying to solve the "issues".
It was a phd student that was mad about compensation or something purposely injecting malicious code.
I hope said intern finds a new job working for anti-AI causes.
People who sabotage things tend to do it against all sides (you can always find an excuse to sabotage if you try hard enough).
Are there are a lot of anti-AI organizations at this point? PauseAI is the main one I'm familiar with:
https://pauseai.info/
One thing I suspect investors in e.g. OpenAI are failing to price in is the political and regulatory headwinds OpenAI will face if their fantastical revenue projections actually materialize. A world where OpenAI is making $100B in annual revenue will likely be a world where technological unemployment looms quite clearly. Polls already show strong support for regulating AI.
The Amish?
I'm trying to think of whether it'd be worth starting some kind of semi-Luddite community where we can use digital technology, photos, radios, spreadsheets and all, but the line is around 2014, when computers still did the same thing every time. That's my biggest gripe with AI, the nondeterminism, the non-repeatability making it all undebuggable, impossible to interrogate and reason about. A computer in 2014 is complex but not incomprehensible. The mass matrix multiplication of 2024 computation is totally opaque and frankly I think there's room for a society without such black box oracles.
Regulation is not neccesarily bad for the market leader.
What a non-story.
But AI!
Wow BBC is garbage.
https://x.com/le1du/status/1847144170705785239
You could track down the direct Chinese rumor, but you'd have to leave the cyber basement. Big nono for HN, it can't even eat Americanized Chinese digital food like TikTok ( Chinese version - https://portal.sina.com.hk/others/sina/2024/10/20/1013680/%E... )The article quoting specific responses is garbage, here's a tweet explicitly stating it includes a rumour? What are you trying to say here?
He’s basically highlighting why the media is dead. Gullible folks would rather read salacious rumours than actual news.