i moved in with my SO last weekend after being together for multiple years, reading the blogpost about his wife dying in the span of 9 weeks really sent chills down my spine :(
He's a total legend, yet apparently he's never met Bill Gates in person from what he said in an interview in the Dave's Garage YouTube channel a few years ago. You'd think that someone who's been that prominent for so long in the company would have been invited to a company dinner where he was present or something.
He has stories on his blog about windows 2 iirc, so there was an overlap from a time where they were still relatively small. So I think it's a bit odd they never talked or met.
I wonder how many times a Deloitte, PwC, KPMG, Bain, EY, McKinsey, or BCG consultant naively tried putting him on a shortlist for being “impacted” over the years because he was in the Top X of a spreadsheet sorted on Y.
"Look this guy's job seems to be mainly writing blog posts. We could replace that with AI and get it to regularly pitch the new Visual Enshitify 2.0 product launch as a bonus. Win win win!"
IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.
There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.
There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.
There are proposed cpu architectures that work that way, like the Mill <https://millcomputing.com/>. Where most cpus support multiple calling conventions the Mill enforces a single calling convention in hardware. There is a hardware `call` instruction that does all the work directly, along with a corresponding `ret` instruction for returning from a function call. It also uses its equivalent of the TLB to ensure that each function is only granted permission to read from that portion of the stack which contains its arguments; any attempt to read outside that region would result in a permission error that causes the read to return a NaR (Not a Result, akin to a floating point NaN).
As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.
You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.
It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.
It should be efficient though, that's the point. Specialized hardware or instructions should be able to zero the stack in a single cycle, instead it's much more expensive. Of course the problem with this is it could be used to hide things just as easily, making it impossible to reverse engineer an unknown exploit.
Why would a specialized instruction be necessary? 'the stack' is stored in memory just like everything else.
Expensive is the (very slow for modern CPUs) operation of _writing_ that change in value out to memory at it's distant and slow speed compared to that which the CPU operates at, as well as the overhead of synchronizing that write to any other caches of those memory locations.
Maybe you're thinking of the trick of a band new page of memory mapped memory that is 'zeroed' but is in reality just a special 'all zeros' page in the virtual to physical memory lookup table? Those still need to be zeroed by real writes at some point, if they're ever used.
CPUs already special case xor reg,reg as zeroing out the register, breaking any data dependency on it. If zeroing bits of the stack were common enough, I'd believe CPUs could be made that handled it efficiently (they already special case the stack; push/pop)
I'm a bit distant from this stuff, but it looks like C++26 will have something like -ftrivial-auto-var-init enabled by default. See the "safe by default" section of [1].
For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.
Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.
This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.
Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.
This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init
Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.
The following types are automatically initialized:
Scalars (arrays, pointers, floats)
Arrays of pointers
Structures (plain-old-data structures)
The following are not automatically initialized:
Volatile variables
Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
Classes that are not plain-old-data
During initial testing where we forcibly initialized all types of data on the stack we saw performance regressions of over 10% in several key scenarios.
With POD structures only, performance was more reasonable. Compiler optimizations to eliminate redundant stores (both inside basic blocks and between basic blocks) were able to further drop the regression caused by POD structures from observable to noise-level for most tests.
We plan on revisiting zero initializing all types (especially now that our optimizer has more powerful optimizations), we just haven’t gotten to it yet.
it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.
Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.
Regarding contracts, there's an additional lesson here, quoting from the source:
> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.
I suppose this is why Linux kernel maintainers insist on never breaking user space.
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
If you promise randomization, then somebody will depend on that :)
Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.
I know it's easier said than done everywhere, just found it to be an interesting parallel.
one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables
You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.
In Ada you can pay for integer overflow checks (runtime) if you want to. With Ada SPARK you can prove that your code does not contain integer overflows so that you don't need runtime checks.
However, the compiler does not tell you this. We're back to the problem that it's possible to have a "working" C program that relies on UB and will therefore break at some point, but the tools will not yell at you for doing this. Whereas in Java or C# you get warnings or errors for using maybe-uninitialized variables.
Also, scanf should be deprecated. Terrible API. Never use scanf or sscanf etc. We managed to get "gets()" deprecated, time to spread that to other parts of the API.
atoi() or atof() etc. work OK, but really you need a parser.
I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise
Not really the ethos of C(++), though of course this particular bug would be easily caught by running a debug build (even 20 years ago). However, this being a game "true" debug builds were probably too slow to be usable. That was at least my experience doing gamedev in that timeframe. Then again code holding up for 20 years in that line of biz is more than sufficient anyway :)
When I was doing gamedev about 5 years ago, we were still debugging with optimisation on. You get a class of bugs just from running in lower frame rates that don't happen in release.
I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.
It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.
(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )
Not necessarily; you can do a thing where it's randomized during development, testing and fuzzing but not in production builds or benchmarks so that the obvious "I rely on internal map order" bugs are spotted right away.
You can get it pretty much for free by using a random salt with your hash function. This is also useful for avoiding DOS attacks using deliberate hash collisions to trigger quadratic behavior in your hash tables.
Any sane language would design a list iterator to follow the order of the list. No, the difference is when you're iterating over orderless hash-based sets or maps/dictionaries. Many languages choose to leave the iteration order undefined. I think Python did that up to a point, but afterward they defined dictionaries (but not sets) to be iterated over in the order that keys were added. Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.
> Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.
Most modern langages do that as part of hashdos mitigation, Python did that until it switched to a naturally ordered hashmap, then made insertion order part of the spec. Importantly iteration order remains consistent with a process (possibly on a per-hashmap basis).
Notably, Go will randomise the starting point of hashmap iteration on each iteration.
I am pretty sure there are trivial impls of sets with guaranteed iteration order in Python that use an underlying ordered map and a dummy value in each entry.
> Not ignore the compilation warnings – this code most likely threw a warning in the original code that was either ignored or disabled!
What compiler error would you expect here? Maybe not checking the return value from scanf to make sure it matches the number of parameters? Otherwise this seems like a data file error that the compiler would have no clue about.
Trying g++ version 11.4, there's no warning by default if you don't check the return value of sscanf. Even `g++ -Wall -Wextra -Wunused-result` produces no warnings for a small example.
The compiler has no way of knowing that the memory would be undefined, not unless it somehow can verify the data file. The most I think it can do is flag the program for not checking the return value of scanf, but even that is unlikely to be true since the program probably was checking for end of file which is also in the return value. It was failing to check the number of matched parameters. This is the kind of error that is easy to miss given the semantics of scanf.
> The compiler has no way of knowing that the memory would be undefined
Yes it would. -fsanitize=address does a bunch of instrumentation - it allocates shadow memory to keep track of what main memory is defined, and it checks every read and write address against the shadow memory. It is a combination of compile-time instrumentation and run-time checking. And yes, it is expensive, so it should be used for debugging and not the final release.
I tried this with clang ASAN. Nothing happens. It won't catch this bug. ASAN detects the presence of incorrect behavior, not the absence of correct behavior.
There's no use-after-free, use-after-return, use-after-scope, or OOB access here. It's a case of "an allocated stack variable is dynamically read without being initialized only in a runtime case," which afaik no standard analyzer will catch.
The best way to identify this would be to require all locals to be initialized as a matter of policy (very unlikely to fly in a games studio, especially back then, due to the perceived performance overhead) or to debug with a form of stack initialization enabled, like "-ftrivial-auto-var-init=pattern" which while it doesn't catch the issue statically, does make it appear pretty quickly in QA (I tested).
I only use UBSan and ASan on my own programs because I tend not to make mistakes about initialization. So my knowledge is incomplete with respect to auditing other people's code, which can have different classes of errors than mine.
Thank goodness that every language that is newer than C and C++ doesn't repeat these design mistakes, and doesn't require these awkward sanitizer tools that are introduced decades after the fact.
You both may be right. It could be that ASAN is not instrumenting scanf (or some other random standard lib function). Though since 2015, it certainly has been. https://github.com/google/sanitizers/issues/108
The simpler policy of "don't allow unintialized locals when declared" would also have caught it with the tools available when the game was made (though a bit ham-fisted).
The problem is that after calling scanf(), the number of variables that are defined is a variable number. For example:
int x, y, z;
int n = scanf("%d %d %d", &x, &y, &z);
At compile time, you can make no inferences about which of x, y, and z are defined, because that depends on the returned value n. There are many ways to branch out from this.
One is to insist on definite assignment - so if we cannot prove all of them are always assigned, then we can treat them as "possibly undefined" and err out.
Another way is to avoid passing references and instead allow multiple returns, like Python (this is pseudocode):
x, y, z = scanf("%d %d %d")
In that case, if the hypothetical `scanf()` returns a tuple that is less than 3 elements or more than 3 elements, then the unpacking will fail at run time and crash exactly at that line.
Another way is like Java, which insists that the return value is a scalar, so it can't do what C and Python can do. This can be painful on the programmer, of course.
I think it would make sense to have a keyword that permits unsafe instantiation specifically for the edge cases where initialization is too expensive. But I think it makes sense for the lazy case to be a little bit safer.
The idea is that ASAN would replace scanf with a function that does additional book keeping when writing to whatever arbitrary memory location the inputs dictate at runtime.
It's probably what the PR resolving the issue I linked to does. Though I didn't check
The pointer to the uninitialized variable is passed to scanf, which writes a value there unless it encounters an error. The compiler cannot understand this contract from the scanf declaration alone.
Good point. When reading, I kind of just assumed the "use of initialised memory" warning would pick this up.
But because the whole line is parsed in a single sscanf call, the compiler's static analysis is forced to assume they have now initialised. There doesn't seem to be any generic static analysis approach that can catch this bug.
Though... you could make a specialised warning just for scanf that forced you to either pass in pre-initilized values or check the return result.
I don't think they will get more rare; there will always be a top % of engineers that do deep dives. I hope anyway.
But AI won't replace them, nor did the past 50+ years of software development innovation. There's millions (tens of millions?) of higher programming language developers that don't know the difference between stack or heap besides maybe some theory they half remember from school but they don't care because they don't have to think about it for their day job.
If your whole career will be using higher order languages with very little data stored on stack (vs heap), why should those programmers care? It seems like normal progression of more abstraction in the tools that we use. Similarly, I have programmed a lot of C and C++ in my career and I never once need assembly language. (I am expecting someone to pop in the convo here and tell me about how I am a terrible C/C++ programmer because I don't know any assembly.)
i think the shift will be from craftmens to trademens in regards to general software engineers, but these are type of writes up stem of a artisan style all to its own.
We have been seeing this shift for a while, where "software engineers" graduate from 3 month bootcamps. Except now most likely they will not be earning 500k making crud apps.
What about the incredible front end Devs that only know JS/CSS/HTML? They can still be true craftspeople in their art, be it cross-browser/platform issues or performance tweaking.
When I worked at Microsoft and I had downtime I would sometimes read the code for app compatibility shims out of pure curiosity.
Win9x video games that made bad assumptions about the stack were a theme I saw. One of the differences between win9x and NT based windows is that kernel32 (later kernelbase) is a now user mode wrapper atop ntdll, whereas in the olden days kernel32 would trap directly into the kernel. This means that kernel32 uses more user mode stack space in NT. A badly behaving app that stored data to the left of the stack pointer and called into kernel32 might see its data structures clobbered in NT and not in 9x. So there were compatibility hacks that temporarily moved the stack pointer for certain apps.
I wonder how many people think of the call stack as running left to right, most recent return first, rather than top to bottom, likewise? If you stare at enough hex dumps, it makes perfect sense.
What was the testing like for such bugs? Is it somehow automated, or is there a lengthy doc describing the manual testing steps, or are there no tests at all?
I interned with the AppCompat team shortly before the release of Windows XP, which was huge for them as it was the first Windows for consumers on the NT kernel.
IIRC, they had a significant lab and tons of infrastructure for exercising and identifying compatibility issues in thousands of popular and less popular software packages. It all got distilled into a huge database of app fingerprints and corresponding compatibility shims to be applied at runtime.
IIRC the whole parsing performance issue was because the original code was written for the SP campaign of GTA5 that only had a handful of objects to parse data for. That was barely a blip in terms of performance impact and AFAIK was written years before GTAOnline was made (where it became an issue - and even then only became an issue much after GTAOnline was first made).
Writing some simple code that works with the data you expect to have without bothering with optimizations is fine, if anything it is one of the actual cases of "premature optimization": even with profiling no real time is spent on that code, your data wont make it spend any time and you should avoid wild guesses since chances are you'll be wrong (even if in this case it could be a correct guess, it'd be like a broken clock guessing the time is always 13:37).
The actual issue with that code was that, after they reused it for GTAOnline and started becoming a performance issue after some time as they added more objects, nobody thought to try and see what is wrong.
Are you actually arguing that using a JSON parser for JSON-formatted data is a premature optimization? The solution here was to use a different format, not a somewhat-JSON-compatible hacked together parser.
They were not the only one to make that mistake e.g. rapidjson had to fix the same error, few people expect parsing one token out of sscanf to strlen the entire input (not only that but there are c++ APIs which call sscanf under the hood).
The second error of deduplicating values by linear scanning an array was way more egregious.
The real, systemic error is that dozens(?) of engineers worked on that product, supposedly often testing the online component and experiencing that wait time first hand; and none thought "wait, parsing JSON doesn't take that long, computers are fast! what's going on?"
I think someone estimated that error cost them millions in revenue? I'm pretty sure a fraction of that could afford an engineer who knows how fast computers ought to be.
GTA was never my wheelhouse, but from what I gathered GTA Online didn't have that much support, and since it was only the initial loading time, and it would have increased over time as the shop content increased, and a very fast machine (e.g. a dev machine) would have had less of an issue, the engineers working on it were probably not that incentivised to dig into it.
Like, even though it's pretty critical to initial user experience initial loading time is generally what gets disregarded the most.
> I'm pretty sure a fraction of that could afford an engineer who knows how fast computers ought to be.
It can, if someone cares enough or realises it's an issue, and then someone is motivated enough to dig into it, or has the time to.
I'm willing to bet it was was done for performance reasons, subtraction is cheaper than float point division. Probably the compiler also has some tricks to optimize this further.
There is absolutely no way this could turn into an infinite loop. It could underflow, but for that to happen angle would have to be less than the 2*pi, therefore exiting the loop.
The article discusses how that turns into an infinite loop and causes a hang.
When you subtract a small float from a very large float, the value doesn't change. This is because the "steps" between float values increase with the size of the value (i.e. floats have coarser resolution for larger magnitudes)
To see this in action, try running the following in a JavaScript interpreter:
Sure the lower bound is nicer here. But when the tradeoff includes an unlimited upper bound it's not a very attractive option.
I guess the most robust code handling both performance and unexpected input would be one iteration of this (leveraging the assumption that angles are either always within the bounds, or had one frame of going out of bounds by a small amount); followed by a fmod if that assumption is just totally off.
> all these findings prove that the bug is NOT an issue with Windows 11 24H2, as things like the way the stack is used by internal WinAPI functions are not contractual and they may change at any time, with no prior notice.
This reminds me of an excellent article I read a while back, the gist of it was that, given sufficient success, there's no such thing as a private API.
My takeaway, speaking as someone who leans towards functional programming and immutability, is "this is yet another example of a mutability problem that could never happen in a functional context"
(so, for example, this bug would have never been created by Rust unless it was deeply misused)
This is more of a problem of the C/C++ standard that it allows uninitialized variables but doesn't give them defined values, considering it "undefined behavior" to read from an uninitialized variable. Java, for example, doesn't have this particular problem because it does specify default values for variables.
But it's this and many other features of C/C++ that make it faster than Java. C/C++ developers really don't want to "pay" for something for safety.
Though, I really like the _mm_undefined_ps() intrinsics for SSE that make it clear that you're purposefully not initialising a variable. Something like that for ints and floats would be pretty sweet.
It is definitely not the case that magically safer is slower. IMO too often the attitude from WG21 (the c++ language committee) has been "Some fast things are unsafe, therefore if we make our language more unsafe it will go faster" which... that's not how implication works.
As a very high level example, take sorting. Rust's standard library provides you both a stable and unstable sort, as does your C++ standard library.
The C++ standard promises these sorts have O(n log n) performance, it's unclear in modern C++ if having a nonsensical ordering† is Undefined Behaviour (as it was in older versions) or outright IFNDR (much worse than UB) but the real world effect will be similar anyway
Rust promises that these sorts work as expected, if you provide nonsensical ordering, obviously it can't very well "sort" things the way you asked, but we don't need to kill your neighbour's cats and wipe the hard disk either, so, it will either give you back the same things in... some order or it will report the fatal error in your software.
The Rust option here is clearly much safer right? So, how much performance is this costing? Actually, it's faster. So C++ is choosing slower and worse. What's the upside?
† For example what about if I insist that Red < Green, but also Green < Red, and furthermore Red == Green is true, but so is Red != Green, however neither Green == Red nor Green != Red are true!
Statically proving the variables get initialized wouldn't change the performance except by making sure you check the return value of sscanf, or turning refusal to check into a couple register wipes. Either way, that's a negligible increase to a hefty function call. It wouldn't require default initializing variables in all circumstances.
When I think of the "no runtime cost" mentality of C/C++ I don't think that normally extends to ignoring errors in I/O functions.
And yet, there is a good chance that C++ will start doing exactly this [1]. Because [2]:
> The performance impact is negligible (less that 0.5% regression) to slightly positive (that is, some code gets faster by up to 1%). The code size impact is negligible (smaller than 0.5%). Compile-time regressions are negligible. Were overheads to matter for particular coding patterns, compilers would be able to obviate most of them.
> The only significant performance/code regressions are when code has very large automatic storage duration objects. We provide an attribute to opt-out of zero-initialization of objects of automatic storage duration. We then expect that programmer can audit their code for this attribute, and ensure that the unsafe subset of C++ is used in a safe manner.
> This change was not possible 30 years ago because optimizations simply were not as good as they are today, and the costs were too high. The costs are now negligible.
Thanks for the references - that was interesting reading, particularly that initialisation can be good for instruction pipelining.
A trick we were using with SSE was something like
__m128 zero = _mm_undefined_ps();
zero = _mm_xor_ps(zero, zero);
Now we were really careful with viewing our ops as data dependencies to reason about pipelining efficiency. But our profiling tools were not measuring this.
We did avoid _mm_set_ps(0.0f) which was actually showing up as cache misses.
I wonder if we were actually slower because cache misses are something we can measure?!
I think the response to that would be: yes but the game would simply not have been made if it wasn't written in C++. That's not to say you couldn't or that you can't make something like GTA:SA in Rust in 2025 or in a safer different language in the early 2000s. It just would take a great deal more time and expense as you'd have needed to construct a lot of tooling and do a lot of training to ensure all of the employees were up to speed before getting started. C++ was, and I think to some extent still is, the lingua franca of the gaming industry - there are some fun exceptions (Naughty Dog implementing much of Crash Bandicoot in a home-grown LISP, and presumably dozens or hundreds of DSLs and other little bespoke scripting languages in use at other studios).
And that's not to mention the uncomfortable truth that while doing this correctly in something like Rust may very well take less effort overall than in C++, that is not the bar we are aiming to clear. They wanted to implement something that was correct-enough, and given that this bug wasn't hit for 20+ years and that the game was a roaring success on all the major platforms - I think that was the right decision.
Well what happened was that despite being based on an aging Renderware engine and programmed using a language with many potential footguns, the game was stable enough across multiple platforms, architectures and OSes that it was both a critical and commercial success.
I know what you’re saying - you can’t really know what might have been in an alternate reality. But in that alternate reality they’d have had to come up with something truly monumental to outdo themselves here.
I think you’re just being a wee bit picky about me using the words “the right decision”. If we’re honest with ourselves there probably wasn’t a Rust-like language in the conversation when they set out to build GTA3, Vice City or San Andreas so this is all kind of moot unless we're suggesting that Rockstar should have started out by building that language...
I'd actually say that Rust is a third option between "everything is immutable" and "mutable soup". Rust is more of "one mutator at a time". Because, Rust really embraces being able to mutate stuff (so not functional in that sense), it just makes sure that it's in a controlled way.
If your attitude is just "I'm not going to use abc because too many people say it's good", without even just trying that out first hand to verify those claims, I don't think you can go very far in your technical skills.
The best engineers I know are open to everything and played with almost every tool/language/whatever to form (sorry) informed opinions about them. They often know what they are talking about, and they choose the best tool for the job.
I think I can articulate what the comment means in a way that may make you rethink what you've said a little bit. I'm not wanting to make you think Rust is bad (I personally think it is good) I'm just trying to show you why this person may not be as backwards as you think they are.
So the person in question is irritated at an interesting blog post about a 20+ year old game being used as another opportunity to push Rust. So for starters Rust obviously wasn't around at the time the game was developed so it's not like Rockstar made the wrong call in implementing this using C++. But more importantly I don't think Rust is currently in a state where studios can justify using it to develop AAA games. They'd need big teams of developers with Rust experience who are well-versed in the sort of problems encountered during game development. You'd need battle-tested build/deployment processes that allow you to produce the binaries for Playstation/Xbox (not too dissimilar CPU/GPU wise, but each with their own platform-level quirks no doubt) and Switch hardware - potentially across multiple generations. You'd need various platforms' OS hooks and network-service APIs available. Additionally you'd need to convince the guys with the money that instead of spending $projected on a game, you'd need to spend $projected+$mystery_number when they take the plunge and write their first game in Rust with new tools etc rather than C++ and everything they currently use. The gaming industry is nothing if not ruthless at making money, if it made financial sense they'd be moving to Rust already - if it will make sense in the future, they'll be planning to do it.
You've been charitable in your read of the original comment, taking it as "this family of problem does not exist in Rust" - and for what it's worth I agree and really value this. However this other commenter has presumably seen it as a bit more naive and missing the bigger picture, and in combination with other similar experiences is questioning the value of these of glowing testimonies.
In addition, a lot of people saying "this is great, this is the future!" doesn't necessarily make something good automatically. For about 5+ years here on HN we had legions of people responding "blockchains will fix this" to almost every problem and very confidently declaring the rest of us are luddites for not getting it. I'm obviously not saying Rust is the same, I'm just trying to show that not following the crowd doesn't automatically mean you're the kind who will always fall behind.
As for how to avoid this? I dunno if you can undo the zillions of RIIR comments that have been floating around since Rust appeared on the scene, but if I was evangelising or even just strongly recommending it I'd just keep in mind that my target audience is maybe sick of seeing the same kinds of comments and would be a bit more creative and/or sensitive in approaching the topic.
While they did mention rust, the actual suggestion was "functional programming and immutability", which to me suggests several other languages first and makes it not really rust evangelism.
FWIW I think a linter or other similar code quality checker would have caught this as well. From a practical perspective (e.g., how do you prevent this from happening again in your game studio's multi-million line code base) that would have been the right thing to do here.
The code would have failed because you can't use an uninitialized variable, so you would have had to set it to a default. You don't just get random garbage from the stack.
You can write a genuine uninitialized local variable in Rust, it's just that you wouldn't do it out of laziness because while in C that's the default in Rust it's a lot of extra work to say "No, I really don't want to initialize this variable" and Rust is like "I mean, if you insist, all I can do is warn you that's a terrible idea".
int k; // C makes an uninitialized variable named k - probably bad idea
let k: i32 = unsafe { MaybeUninit::uninit().assume_init() }; // Rust, same bad idea
If we say "I will initialize it - later" that's fine in Rust and you just write the name (and where appropriate type) of the variable and go about your day. The compiler will reject your program if, in fact, it can't see why you're fulfilling that promise, and sometimes that might be because the compiler is dumb (but often it's because you are) but there's no problem technically with this and if the compiler agrees that we do, in fact, initialize it later then it compiles and works and everybody is happy.
But to actually make a variable and not initialize it, as we saw above, is a lot of extra work in Rust because like... that's a bad idea, why would you be setting out to do that?
This is such a bad idea that Rust's unsafe std::mem::uninitialized, which is how they did this before MaybeUninit existed, was de-fanged (giving it poor performance by actually writing a pattern to RAM every time) and deprecated so you get a warning if you try to use it even though it was already marked unsafe. See, people (and I'm sure many C programmers are like this) tend to imagine it's OK for say an integer to be uninitialized because surely any possible value is OK, right ? Nope. Your operating system knows that data was never written, and so it feels entitled to fuck you about if you expect it to stay unchanged, because it never promised that will work - as a result rarely but sometimes you get kicked in the head by the OS and you get a seemingly impossible bug.
It would have forced you to either specify a default or fail pretty loudly as soon as you launched the game, both much better than leaving a bug there just for it to resurface 20 years later.
Most popular languages would prevent this. In this case it’s as simple as having more sensible reader API than sscanf in standard library and forcing variables to be initialized.
Could you elaborate? I cannot see how a functional programming language would have protected you from reading a non existing value while not providing a default
It's more that functional languages just happen to be stricter in various ways that would've mitigated against this. You could quite happily design a functional language that has an unsafe equivalent to sscanf in its stdlib, or has big parts of the spec which are "undefined behaviour" that may differ depending on the underlying OS/compiler/runtime/stdlib in use. But the more popular functional languages have gained traction in part because they tend to have a "if you model the types correctly, the program basically works" philosophy around them. I don't think things like Haskell, Ocaml or F# would allow this if you wrote idiomatic code, you'd probably need to do something a little hacky or sketchy.
It simply would not have allowed you to write code which did that. And you wouldn't have a function like sscanf() either. You'd probably end up with a much more normal looking parser function that returned a value-or-error type.
I've never heard of a functional language that would allow you to initialize a value to whatever value the system memory already had in that memory location. In languages that allow nil, it would at least be nil; in languages that don't, you would have gotten an error about an uninitialized and undefaulted value. In any typed language, you would have also gotten an error.
It's true that C may be unique-ish in this regard though- this bug also couldn't happen in Ruby, which is not a functional language, but Ruby certainly still makes undefined behaviors much more possible than in other languages like Elixir.
Knowing C/C++, I more or less guessed what's happening (uninitialized variable) early in the blog post.
It blows my mind that the languages allow you to leave variables uninitialized which has caused countless bugs (including production bugs that I have seen first hand), and you often need to rely on additional compiler flags or static analysis tools/valgrind etc to catch them. Even though newer languages often use a different solution (default zero value or must initialize a variable before use), people still go back to C/C++ all the time.
> all these findings prove that the bug is NOT an issue with Windows 11 24H2, as things like the way the stack is used by internal WinAPI functions are not contractual and they may change at any time, with no prior notice. The real issue here is the game relying on undefined behavior (uninitialized local variables), and to be honest, I’m shocked that the game didn’t hit this bug on so many OS versions, although as I pointed out earlier, it was extremely close
This sentence is the real takeaway point of the article. Undefined behavior is extremely insidious and can lull you into the belief that you were right, when you already made a mistake 1000 steps ago but it only got triggered now.
I emphasized this point in my article from years ago (but after the game was released):
> When a C or C++ program triggers undefined behavior, anything is allowed to happen in the program execution. And by anything, I really mean anything: The program can crash with an error message, it can silently corrupt data, it can morph into a colorful video game, or it can even give the right result.
> If you’re lucky, the program triggering UB will show an appropriate error message and/or crash, making you immediately aware that something went wrong. If you’re unlucky, the program will quietly mangle data, and by the time you notice the problem (via effects such as crashes or incorrect output) the root cause has been buried in the past execution history. And if you’re very unlucky, the program will do exactly what you hoped it should do, until you change some unrelated code / compiler versions / compiler vendors / operating systems / hardware platforms – and then a new bug becomes visible, and you have no clue why seemingly correct code now fails to work properly.
As I wrote in my article, this point really got hammered into me when a coworker showed me a patch that he made - which added a couple of innocuous, totally correct print statements to an existing C++ program - and that triggered a crash. But without his print statements, there was no crash. It turned out that there was a preexisting out-of-bounds array write, and the layout of the stack/heap somehow masked that problem before, and his unlucky prints unmasked the problem.
Okay so then, how can we do better as developers today?
0) Read, understand, and memorize what actions in C or C++ are undefined behavior. Avoid them in your code at all costs. Also obey the preconditions of any API you use, whether in the standard library, operating system, etc.
1) Compile your application in Debug mode and compare its behavior to Release mode. If they differ by anything other than speed, then you have a serious problem on your hands.
2) Compile and run with sanitizers like -fsanitize=undefined,address to catch undefined behavior at runtime.
3) Use managed languages like Java, C#, Python, etc. where you basically don't have to worry about UB in normal day-to-day code. Or use very well-designed low-level languages like Rust that are safe by default and minimize your exposure to UB when you really need to do advanced things. Whereas C and C++ have been a bonanza of UB like we have never seen before in any other language.
Other than C#, there is no reason to use those other languages for game dev. Unless the game is fairly simple, or you want to risk a fairly long project by employing a language that hasn't been proven in tge space yet (Rust). No shade at any of those languages, I don't even like C#, just being pragmatic.
I would add: code defensively. Initialize your variables (either to a sensible value, or an outrageously wrong value) before passing pointers to them, even when you "know" that the value will be overwritten. Check for errors. Always consider what happens when things go wrong, not just when things go right. Any time you find yourself thinking, "condition X is guaranteed to hold, so I don't need to check for it" consider checking it anyway just in case you're wrong about that, or it changes later.
My only issue with defensive codding is that often it doesn't play nice with code coverage requirements. I've been in situations where I would like to add defensive coding just in case, but then the PR doesn't pass the coverage checks. The best is when you can ensure via th compiler (e.g. via the type system) that a case is impossible, but C++ (in my case) isn't perfect for this.
I learned this lesson many moons ago, on a Fortran code I wrote for a university assignment. It was a basic genetic algorithm, and for some reason it was converging much more slowly than expected. So I was sprinkling some WRITEs to debug, and suddenly the code converged a hundred times faster.
All this is true. Note also that the C++ folks are putting a serious effort into reducing UB. See the "safe by default" section of this writeup [1]. See also my other comment [2] regarding the performance impact of this sort of change. Short answer: with sufficient optimization, smaller than one might think.
Once this category of error is raised to your attention, you start to notice it more and more.
A little piece of technology made sense in the original context, but then it got moved to a different context without realizing that move broke the contract. Specifically in this case a flying boat became an airplane.
---
I recently worked a bug that feels very similar:
A linux cups printer would not print to the selected tray, instead it always requested manual feed.
Ok. Try a bunch of command line options, same issue.
Ok. Make the selection directly in the PPD (postscript printer definition) file. Same issue.
Ok! Decompile the PXL file. Wrong tray is set in pxl file... why?
Check Debug2 log level for cups - Wrong MediaPosition is being sent to ghostscript (which compiles the printer options into the print job) by a cups filter... why?
Cups filter is translating the MediaPosition from the PPD file... because the philosophy of cups is to do what the user intended. The intention inferred from MediaPosition in the PPD file (postscript printer definition) is that the MediaPosition corresponds to the PWG (Printer Working Group) MediaPosition, NOT the vendor MediaPosition (or local equivalent - in this case MediaSource).
AHA!! My PPD file had been copied from a previous generation of server, from a time when that cups filter did NOT translate the MediaPosition, so the VENDOR MediaSource numbers were used. Historically, this makes sense. The vendor tray number is set in the vendor ppd file because cups didn't know how to translate that.
Fast forward to a new execution context, and cups filters have gotten better at translating user intention, now it's translating a number that doesn't need to be translated, and silently selecting the wrong tray.
TLDR; There is no such thing as a printer command, only printer suggestions.
I always wonder, why not write these games on top of a virtual machine like Carmack started doing in Quake, a usage he then later extended to quake 2 and 3 [1].
I'm ignorant about game development, virtual machines and system programming but from the little I understand it seems a sensible choice to make.
While there is an initial price to pay modeling 99% of the game to be implemented on a user-implemented stack seems a sensible approach to me.
This is a game; I don't think a debug configuration (with checks for things like this enabled) would run fast enough to be playable on contemporary hardware.
Generally, game console "debug" configurations aren't "true" debug like most people think of -- optimizations are still globally enabled, but the build generally has a number of debug systems enabled that naturally require the use of a devkit. Devkits, especially back then, generally had 2-3x as much memory as retail systems -- so you'd happily sacrifice framerate during feature development to have those systems enabled.
Debugging was (and still is) generally done on optimized builds and, once you know the general area of the problem, you simply disable optimizations for that file or subsystem if you can't pinpoint the issue in an optimized build.
The biggest performance hit, in general, comes from disabling optimizations in the compiler. I say "in general" because there are systems that might be used to find this kind of thing that DO make a game wholly unplayable, such as a stomp allocator. Of course, you wouldn't generally enable a stomp allocator across all your allocations unless you're desperate, so you could still have that enabled to find this kind of bug and end up with a playable game.
The more likely reason here is that no one noticed or cared. GTA:SA is 21 years old and this bug doesn't affect the Xbox or other versions.
You can (and could) easily compile an optimized build with debug symbols to track down sources of issues, but catching a bug like this would likely take a dynamic checker like Valgrind or MSan, which do not allow for any optimizations if you want to avoid false negatives, and add even more overhead on top of that. (Valgrind with its full processor-level virtualization, and MSan with its shadow state on every access. But MSan didn't exist at the time, and Valgrind barely existed.)
At minimum, fine-grained stack randomization might have exposed the issue, but only if it happened to be spotted in playtests on the debug build.
How could a stomp allocator have possibly found this bug? The offending values are stored on the stack, in-bounds when written to, and again in-bounds when read from.
At no point is there an OOB access, just a failure to initialize stack variables. And to catch that, you'd need either MSan-style shadow state that didn't exist, thorough playtesting with fine-grained stack randomization, or some sort of poisoning that I don't think existed.
Problem with valgrind/asan/msan is that you have to start using these tools early in the development process. It can't be a "checklist" item before launch, or you'll have an insurmountable number of bugs, often with them baked in such that fixing the bug causes additional changes that introduce unrelated bugs.
Valgrind was released in 2002 to immediate celebration. It was available and surely known to the team. All they needed to do was write a unit test that loaded and instantiated those vehicle files and run it with "valgrind" in front of the command line.
I tried to use Valgrind to catch pretty much this exact bug 20 years ago, and it was nigh impossible. If you call any 3rd party code it'll have flag tens of thousands of false positives that you have to sift through. And that was on a small game engine, I can't imagine running it on millions of lines of code.
Again, you don't valgrind a whole game. You valgrind your unit tests. Even in 2004 when this game was released, and even in the game industry, unit testing was a routine thing. And this particular bug was in code very amenable to lightweight unit testing.
That would be assuming they knew there was a bug in that particular part of the code, which they probably didn't, until Windows 11 24H2. And unfortunately Valgrind doesn't work on windows.
It was available and widely celebrated, but there was no guarantee that these Windows and console developers had heard of it yet or that they could have used it if they had.
On Windows 11 24H2, more stack space was modified by a new implementation of Critical Sections.
IMHO this shows the downfall of Microsoft. Why did they do that? Critical sections have been there for many decades and should be basically bug-free by now. My best guess is someone thought they'd "improve" things and rewrote it, then made some microbenchmark that maybe showed the dubious improvement.
The other comment here mentions Raymond Chen, who wrote this article about why backwards-compatibility is very important (and arguably what got Microsoft into the position it's in today):
The software was fundamentally broken before the OS update. It was working by pure random chance with undefined behaviour. It’s a C++ issue, not an OS issue. The same code compiled for another OS would have different random results.
The core problem is some compilers initialising memory to zero in Debug mode, masking behaviour of unitialised data, since in most cases zero is a legit value. In Release mode, this zeroing doesn’t happen.
Devs need to be aware that the following C++ initisliser exists which zeros data structures for you:
Surprised to see the return value of sscanf being ignored, that seems like a pretty rookie mistake, and this bug would never have made it out of the original programmer's system if they had bothered to check it.
Yes, it would have made it out of the original programmer's system for that initial commit.
FTA:
I have a likely explanation for why Rockstar made this specific mistake in the data to begin with – in Vice City, Skimmer was defined as a boat, and therefore did not have those values defined by design! When in San Andreas they changed Skimmer’s vehicle type to a plane, someone forgot to add those now-required extra parameters. Since this game seldom verifies the completeness of its data, this mistake simply slipped under the radar.
So the original code (or at least a working code + data version) in GTA Vice City had no visible problems, at least with the Skimmer object, since the vehicles.ide file had the correct number of values for the Skimmer boat object.
Someone changed the Skimmer object from a boat to a plane for GTA San Andreas, BUT they DID NOT update the object to have the REQUIRED wheel values for a plane object.
Now the GTA code is expecting more values than it gets.
The vehicles.ide wasn't validated for correctness after the Skimmer object change to plane. Maybe there are more gotchas in that file...
At least users can fix the problem with a text editor instead of waiting and hoping that RockStar would fix the problem and release an update.
It has always been too easy to read & write beyond the stack. This should fail, plain and simple.
Mitigations exist - ASLR, NX pages, stack-smashing protection etc. but nothing comprehensively stops reads of stale data beyond the stack.
Thought experiment for a moment. What if the hardware ensures the unused part of a stack region cannot be read or written.
There are many ways to skin this cat, here’s one based around tracking each stack’s start address A, size S, and current depth D
1. Add an instruction to inform the CPU there is a stack at address A of size S. Its depth D is initially 0.
2. Add a jump instruction which reserves N bytes on the stack at address A, growing depth D to (D+N). Maybe this can be its own “reserve” instruction so as not to need a new jump instruction.
3. Give existing return instructions stack awareness. If returning to an address inside a stack, un-reserve the bytes reserved by the most recent jump, making the new depth (D-N).
4. Fail reads or writes to the stack region beyond its current depth. In other words fail all reads and writes between A+S-D and A+S.
5. The arithmetic is reversed on architectures whose stacks grow downwards.
Downsides I can see:
It cements one calling convention. The CPU memory manager will need a lot of state per stack, of which there are many per process: address A, size S, current depth D, plus a reservation stack - ie. sizes of each frame’s stack memory. That’s a lot of bookkeeping! It’s far from zero cost. The limits of how much bookkeeping the CPU can do impose limits on how deep a stack can go and how many stacks are supported - so when there are too many stacks or one goes too deep, either the CPU needs to signal failure or engage a fallback mode and revert to behaving as CPUs do today. And of course fallback puts things back to the start. It’d therefore only mitigate situations in which an attacker cannot control the depth of the stack / a bug always happens inside the max depth the CPU can bookkeep for.
That said, stacks are ubiquitous! Hardware stack awareness opens up all kinds of new mitigations.
This bug wasn't caused by a read beyond the current bounds of the stack, but a stale value from a prior call to the same function at the exact same location on the stack. Buffer-overflow protections like you describe wouldn't help here.
Any solution I can think of uses a lot of resources. Those sort of methods are useful in some contexts, such as highly secure operations, but seem very excessive for the sort of abuse and leak encountered in this example.
tl;dr of the explanation: the Skimmer vehicle is missing a wheel scale definition, so its wheel scale gets read from uninitialized memory. On previous versions of Windows, this happened to be the wheel scale of the previously-loaded vehicle, so things happened to work fine. Starting on Windows 11 24H2, LeaveCriticalSection (which gets called between loading vehicles) uses more stack space than before, so it now overwrites that memory with a gigantic value, resulting in the Skimmer spawning so high up that it may as well not exist at all.
SilentPatch (for GTAs, at least) specifically is a code-only mod, such that the single .asi file can be removed to uninstall it & all it's changes.
A real update should fix both (note: I don't believe the later releases did, they also just added defaults to the parser) but for SilentPatch: a mod is not a real update, and being as simple as possible to remove & reducing conflicts with other mods is more important here than a fix that digs as deep as possible.
I hope someone can figure out the Red Dead Redemption 2 bug where random animals and characters disappear silently if you have too many texture mods installed.
They wrote a JSON “parser” using sscanf. sscanf is not bulletproof! Just use an open source library instead of writing something yourself. You will still be a real programmer, but you will finish your game sooner and you won't have embarrassing stories written about you.
Yes, but now it's in the realm of ~3 minutes, and not ~8 minutes even on a top-spec PC, right? I really liked the game, but waiting 8 minutes to load just to get griefed by hackers within seconds of walking outside... I don't understand how that game makes any money.
IMO it varies widely. This past weekend it was taking me multiple attempts to get logged in to a public lobby— after waiting ~5-10 minutes!
Nothing has changed appreciably. If they would let you login to a private invite-only lobby that would likely speed things up greatly— but it’ll never happen.
I don’t think its ever been an option to login directly to an invite-only lobby. But then I have taken multiple multi-year breaks! I was pleasantly surprised you can actually play most of the game in a private lobby now… that is a huge change and I am not at all certain when it occurred.
Nitpicking: What you're describing is called a "variable ratio reinforcement schedule", and is considered to be the most effective form of operant conditioning.
However, it's not even remotely "like crack". Crack is really really really really fun, period, no "just enough of the time" about it. The reason people get hooked on crack is because it's guaranteed to be fun.
If I had to choose a substance that most closely mirrored variable ratio reinforcement conditioning, it'd probably be ketamine.
Putting the (very valid) reasons for not having human-readable game saves aside, are you sure it's worse than using a 3rd party library that's built to accept semi-valid input values, possibly evaluates user input in some way and has difficult to debug bugs that occur only under certain inputs? I agree that writing a stable and safe parser for a binary data file isn't easy, but there's less things that can go wrong when you can hardcode it to reject any remotely suspicious input. Third party XML/JSON libraries OTOH try to interpret as much as possible, even when the values are bogus. Also no need to deal with different text encoding bugs, line endings...
You misunderstood. Game developers should use a _good_ third–party library, not a _bad_ one. At a minimum they should be able to read the source code so that they know it is good. Thus open source libraries should be at the top of the list.
If you don't know what “good” looks like, take a look at [Serde](https://serde.rs/). It’s for Rust, but its features and overall design are something you should attempt to approach no matter what language you’re writing in.
I disagree. Serde is not merely good, it is excellent.
The only C code that I have recently interacted with uses a home–grown JSON “library” that is actually pretty good. In particular it produces good error messages. If it were extracted out into its own project then I would be able to recommend it as a library.
But how is that C project using a custom made JSON library doing better than Rockstar games doing the same? Because that library has good error messages?
Apart from that, many of us thought that Java serialization was good if just used correctly, that IE's XML parsing capabilities were good if just used correctly, and so on. We were all very wrong. And a 3rd party library would be just some code taken from the web, or some proprietary solution where you'd once again have to trust the vendor.
It’s good because they have spent hundreds or thousands of hours polishing and improving it. It’s paid off too, because stable releases never have broken data files any more. Any mistakes that do get made are usually found and fixed before the code is ever committed. Even the experimental branch rarely sees broken data files. It’s more likely to see these error messages when loading save files, because the code to read old save files and convert them for newer versions of the game is the hardest to write and test.
> And a 3rd party library would be just some code taken from the web, or some proprietary solution where you'd once again have to trust the vendor.
Open source exists for a reason, and had already existed for ~15 years by the time this game was begun. 20 years later there are even fewer excuses to be stuck using some crappy code that you bought from a vendor and cannot fix.
But also keep in mind in 2004 the legality of many open source projects was not really tested very well in court. Pretty sure that was right around the time one of the bigger linux distros was throwing its weight around and suing people. So you want to ship on PS2 and XBOX and PC and GameCube. Can you use that lib from inside windows? Not really. Can you build/vs buy? Buy means you need the code and probably will have to port it to PS2/GameCube yourself. Can you use that opensource lib? Probably, but legal is still dragging its feet, and you get to port it to PS2. Meanwhile your devs need a library 3 weeks ago and have hacked something together from an older codebase that your company owns and it works and means you can hit your gold master date.
Would you do that now? No. You would grab one of the multitudes of decent libs out there and make sure you are following the terms correctly. Back then? Yeah I can totally see it happening. Open source was semi legally grey/opaque to many corporations. They were scared to death of losing control of their secret sauce code and getting sued.
Before game companies earned all their profit through selling cosmetics and premium currency nobody cared if you cheated at your single player game and nobody SHOULD care if you want to give yourself extra money.
It's only now that single player progress is profitable to sell that video games have taken save game encryption to be default.
The trouble is that if some weirdness happens because of the edit, you've got to handle it even if you say it would be reasonable to assume that it's outside of being supported. Maybe you spend a bit more time defensive coding around what inputs it reads from the file, maybe a certain proportion of users doing the save edit see bugs in an apparently unrelated part of the game and seek support (and their bug report might not be complete with all the details), developers spend time to chase down what went wrong, maybe they bad-mouth it on forums which affects sales - there's going to be some cost to handling all of that.
One of the anecdotes from Titan Quest developed by Iron Lore is that their copy protection had multiple checks, crackers removed the early checks to get the game running but later 'tripwires' as you progress through the game remained and the game appeared to crash. So the game earned a reputation for being buggy for something no normal user would hit running the game as intended.
>The trouble is that if some weirdness happens because of the edit, you've got to handle it even if you say it would be reasonable to assume that it's outside of being supported.
What? No. What even are you suggesting? Hell, games with OFFICIAL MODDING SUPPORT still require you submit bug reports with no mods running.
Editing game files has always been "you are on your own", even editing standard Unreal config files is something you wont get support for, and they are trivial human readable files with well known standards.
>One of the anecdotes from Titan Quest
Any actual support for this anecdote? Lots of games have anti-piracy features that sneakily cause problems, and even could fire accidentally. None of those games get a reputation for being buggy. Games like Earthbound would make the game super hard and even delete your save game at the very end. Batman games would nerf your gliding ability. Game Dev Tycoon would kill your business due to piracy.
None of these affected the broad reputation of the game. Most of them are pretty good marketing in fact.
Game Dev Tycoon even later added Pirate Mode to the game, for people who wanted to experience super-hard-mode. Complete with random mail they got telling them why people pirated their game, framed as why people were pirating the game you just made.
After finishing the article I immediately did ctrl+f "rust" and was disappointed to not see any of the results I wanted, but actually this comment is more hilarious than anyone saying "why didnt rockstar use rust in 2004!!!1111!!???" it's a bit more of a sophisticated joke since there's an IYKYK factor but it is no less hilarious. Bravo sir, bravo.
To u/db48x whose post got flagged and doesn't reappear despite me vouching for it as I think they have a point (at least for modern games): GTA San Andreas was released in 2004. Back then, YAML was in its infancy (2001) and JSON was only standardized informally in 2006, and XML wasn't something widely used outside of the Java world.
On top of that, the hardware requirements (256MB of system RAM, and the PlayStation 2 only had 32MB) made it enough of a challenge to get the game running at all. Throwing in a heavyweight parsing library for either of these three languages was out of the question.
The comment reappeared, and while you're right about using proper libraries to handle data, it doesn't excuse the "undefined behavior (uninitialized local variables)" that I still see all the time despite all the warning and error flags that can be fed to the compiler.
Most of the time, the programmers who do this do not follow the simple rule that Stroustrup said which is to define or initialize a variable where you declare it (i.e. declare it before using it), and which would solve a lot of bugs in C++.
While it doesn't excuse the bad habits, we do have to keep in mind C++98 (or whatever more ancient was used back then) didn't have the simple initializers we now take for granted. You couldn't just do 'Type myStruct = {};' to null-initialize it, you had to manually NULL all nested fields. God forbid you change the order of the variables in the struct if you're nesting them and forget to update it everywhere. It was just considerably more practical to do 'Type myStruct;' then set the fields when needed.
I haven't been using C++ for a number of years but I think you could set the default values of fields even back then. Something like
struct test {
int my_int = 0;
int* my_ptr = std::nullptr;
};
Or is this something more recent ?
You cannot initialize them with a different value unless you also write a constructor, but it not the issue here (since you are supposed to read them from the file system)
This is the thing that drives artists and craftsmen to despair and drink: That a flawed, buggy, poor quality work can be "successful" while something beautiful and technically perfect can fail.
San Andreas might be rough under the hood, but on the surface it was nothing short of a masterpiece of game design. The engine was so complex and the cities felt alive, and the game could handle a lot of general nonsense. Still one of my favorite go-to games.
The job of the artist is to take the years of expertise and distill it down into something "enjoyable." The hardest mental hurdle to get over is that people just don't care about the technicals being perfect. Hell, the final product doesn't even need to be beautiful; it just needs to be arresting.
One artist can take months painting a picture of a landscape where everything is perfect. And the next artist can throw 4 colors of paint at a wall. The fact that lots of people enjoy the work of the second artist doesn't invalidate the work of the first. The two artists are focusing on different things; and it's possible for both of them to be successful at reaching their goals.
Let's be clear that it was a success very much in spite of UB, not because of it. And there was still a cost--likely at least hundreds of person-hours spent fixing other similar bugs due to UB (if not more).
I worked in gamedev around the time this game was made and this would have been very much an ordinary, everyday kind of bug. The only really exceptional thing about it is that it was discovered after such a long time.
> it doesn't excuse the "undefined behavior (uninitialized local variables)" that I still see all the time despite all the warning and error flags that you can feed to the compiler.
Yeah but we're talking about a 2004 game that was pretty rushed after 2002's Vice City (and I wouldn't be surprised if the bug in the ingestion code didn't exist there as well, just wasn't triggered due to the lack of planes except that darn RC Chopper and RC plane from that bombing run mission). Back then, the tooling to spot UB and code smell didn't even exist or, if at all, it was very rudimentary, or the warnings that did come up were just ignored because everything seemed to work.
JSON didn't save them: This is the same studio that handrolled a JSON parser with accidentally quadratic time complexity, making most players wait 3 to 10 minutes to load GTA Online, for 7 years, until a player got tired and found the root cause.
You’re not entirely wrong, but a library doesn’t have to be “heavyweight” in order to be bulletproof. And you can load the library during startup and then unload it after; it doesn’t have to stick around for the whole run time of the game. Modern OSes will reclaim the pages after you stop using them, if there is memory pressure. Of course the PS2 didn’t do that I am sure.
Meanwhile, in a certain modern OS, unloading a library is too broken to the point that people are discouraged to do so... Try to unload GLib [0] from your process :p
Unloading C libraries is fundamentally fraught with peril. It's incredibly difficult to ensure that no dangling pointers to the library remain when it's unloaded. It's really fun to debug, too. The code responsible for the crash literally is not present in the process at the time of the crash!
Definitely, and architectures back then were far less standardized. The Xbox 360 was a big-endian PowerPC CPU, the PS2 had a custom RISC-based CPU. On the desktop, this was still the era of PowerPC-based Macs. Far easier (and I would argue safer) to use a standard, portable sscanf-like function with some ascii text, than figure out how to bake your binaries into every memory and CPU layout combination you might care about.
Easier for internal development. Non- or less technical team members can tweak values without having to rebuild these binary files. Possibly also easier for lightweight modding externally as well.
This isn't that uncommon - look at something like Diablo 2 which has a huge amount of game data defined from text files (I think these are encoded to binary when shipped but it was clearly useful to give the game a mode where it'd load them all from text on startup).
Video games are made by a lot of non-programmers who will be much more comfortable adjusting values in a text file than they are hex editing something.
Besides, the complaint about not having a heavyweight parser here is weird. This is supposed to be "trusted data", you shouldn't have to treat the file as a threat, so a single line sscanf that's just dumping parsed csv attributes into memory is pretty great IMO.
Definitely initialize variables when it comes to C though.
Wow I had no idea YAML was that old. I always thought it was created some time around when CI/CD became popular. Now I'm really curious how it ended up as a superset of JSON.
The flaw isn't the language. The issue is a 0.5x programmer not knowing to avoid sscanf() and failing to default and validate the results. This could be handled competently with strtok() parsing the lines without needing a more complicated file format.
Worked fine on the target machines and the "0.5x programmer" got to see their family for winter holiday. Or are you saying they should have defensively programmed around a bug manifesting 21 years later and skip seeing their family during crunch time?
To be honest, I just don't like how you disparaged the programmer out-of-context. Talk is cheap.
Well-written 3rd party serialization libraries weren't exactly easy to come by 20 years ago, at least from what I can recall. Your best bet was using an XML library, but XML parsing was quite resource heavy. Many that seemed well designed turned out to be a security nightmare (Java serialization).
I disagree. JSON is 25 years old, and SAX parsers are 22. A SAX parser is the opposite of “resource heavy”, since it is event driven. The parser does not maintain complex state, although you might have to manage some state in order to correctly extract your objects from the XML. Granted, it wouldn’t have integrated nicely with C to generate the parser code from your struct definition back then, but the basics were there.
But it is even more important for today’s game studios to see and understand the mistakes that yesterday’s studios made. That’s the only way to avoid making them all over again.
And in 2004, didn't have a published specification, or much use outside of webdev (which hadn't eaten the world yet).
> and SAX parsers are 22
And, especially at the time, pretty much exclusive to Java, right?
Put another way, which are the high-quality open-source implementations of those formats that the developers should've considered while working on SA in 2003 and 2004? Or for that matter, in the 2001-2002 timeframe, when the parsing code was probably actually written for use in VC?
I’ll be the first to defend the greybeards I’ve befriended and learned from in AAA, but having seen codebases of that age and earlier, the “meta” around game development was different back then. I think the internet really changed things for the better.
Your average hire for the time might have been self-taught with the occasional C89 tutorial book and two years of Digipen. Today’s graduates going into games have fallen asleep to YouTube lectures of Scott Meyers and memorized all the literature on a fixed timestep.
This is the kind of thing I'd expect from Raymond Chen - which is extremely high praise!
I'm glad they tracked it down even further to figure out exactly why.
Or randomascii. A freaking legend (although he had a heart braking streak of bad events ... I wish him the best)
What happened to him?
https://randomascii.wordpress.com/2024/10/01/life-death-and-...
https://randomascii.wordpress.com/2016/10/17/vestibular-dysf...
I remember reading this and having a mini-midlife-crisis after every read
I documented it this time :sigh: https://github.com/MatthewJohn/terrareg/commit/2231ba733a7f5...
i moved in with my SO last weekend after being together for multiple years, reading the blogpost about his wife dying in the span of 9 weeks really sent chills down my spine :(
i need a hug.
Damn. That was so painful to read. He's such a talented and respectable guy. Wish him all the best moving forward.
So sad :(
Raymond is a wizard. Read his blogs for many years and love his style and knowledge.
He's a total legend, yet apparently he's never met Bill Gates in person from what he said in an interview in the Dave's Garage YouTube channel a few years ago. You'd think that someone who's been that prominent for so long in the company would have been invited to a company dinner where he was present or something.
Microsoft's a big company, and billg "stepped down" in 2000. Raymond is still working, so they overlap less than may appear.
He has stories on his blog about windows 2 iirc, so there was an overlap from a time where they were still relatively small. So I think it's a bit odd they never talked or met.
Small thing but I love the effort he puts into actually coding up his examples instead of screenshots. For example: https://devblogs.microsoft.com/oldnewthing/20250414-00/?p=11...
He has many better ones but that's the latest one I've seen
Raymond knows everything. From microcode bugs on Alpha AXP to template meta programming to UI.
I wonder how many times a Deloitte, PwC, KPMG, Bain, EY, McKinsey, or BCG consultant naively tried putting him on a shortlist for being “impacted” over the years because he was in the Top X of a spreadsheet sorted on Y.
"Look this guy's job seems to be mainly writing blog posts. We could replace that with AI and get it to regularly pitch the new Visual Enshitify 2.0 product launch as a bonus. Win win win!"
[flagged]
IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.
There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.
There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.
There are proposed cpu architectures that work that way, like the Mill <https://millcomputing.com/>. Where most cpus support multiple calling conventions the Mill enforces a single calling convention in hardware. There is a hardware `call` instruction that does all the work directly, along with a corresponding `ret` instruction for returning from a function call. It also uses its equivalent of the TLB to ensure that each function is only granted permission to read from that portion of the stack which contains its arguments; any attempt to read outside that region would result in a permission error that causes the read to return a NaR (Not a Result, akin to a floating point NaN).
As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.
https://millcomputing.com/wiki/Protection#Protecting_Stacks
This is really interesting—how do stack references work in this design?
You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.
It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.
It should be efficient though, that's the point. Specialized hardware or instructions should be able to zero the stack in a single cycle, instead it's much more expensive. Of course the problem with this is it could be used to hide things just as easily, making it impossible to reverse engineer an unknown exploit.
Why would a specialized instruction be necessary? 'the stack' is stored in memory just like everything else.
Expensive is the (very slow for modern CPUs) operation of _writing_ that change in value out to memory at it's distant and slow speed compared to that which the CPU operates at, as well as the overhead of synchronizing that write to any other caches of those memory locations.
Maybe you're thinking of the trick of a band new page of memory mapped memory that is 'zeroed' but is in reality just a special 'all zeros' page in the virtual to physical memory lookup table? Those still need to be zeroed by real writes at some point, if they're ever used.
CPUs already special case xor reg,reg as zeroing out the register, breaking any data dependency on it. If zeroing bits of the stack were common enough, I'd believe CPUs could be made that handled it efficiently (they already special case the stack; push/pop)
I'm a bit distant from this stuff, but it looks like C++26 will have something like -ftrivial-auto-var-init enabled by default. See the "safe by default" section of [1].
For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.
Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.
[1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...
[2]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...
[3]: https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1...
Microsoft's Visual C++ compiler has the /Ge compiler option ( see https://learn.microsoft.com/en-us/cpp/build/reference/ge-ena... ) Deprecated since VC2005.
This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.
Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.
This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init
Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.
see https://web.archive.org/web/20200518153645/https://msrc-blog...Randomization at this level would be too expensive. There are tools that do this for debug purposes, and your stuff runs a lot slower in that mode.
I had to Google to find the tid bit that I read about Perl years ago. I think this will affect iteration order of dicts.
it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.
Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.
Especially in games, it’s common for only the highly optimised release builds to have playable performance.
Yeah, even their integration tests will probably run in opt mode.
Regarding contracts, there's an additional lesson here, quoting from the source:
> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.
I suppose this is why Linux kernel maintainers insist on never breaking user space.
But the linux equivalent here would be glibc, not the kernel
Nope. You have to remember https://www.hyrumslaw.com/
If you promise randomization, then somebody will depend on that :)And then you can never remove it!
Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.
I know it's easier said than done everywhere, just found it to be an interesting parallel.
> If you promise randomization
You don't. You say the order is undefined.
That isn't the point. In practice, if you provide randomness, it will be depended upon.
Why is that? Is that just bad coding habits?
All of this is bad coding habits. That's why we're here.
You can randomly not randomise it :)
one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables
You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.
In Ada you can pay for integer overflow checks (runtime) if you want to. With Ada SPARK you can prove that your code does not contain integer overflows so that you don't need runtime checks.
And you can disable these checks with a flag when it comes to Ada, and yeah, with SPARK, none of it happens at runtime.
Check the table at https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce..., look for "SPARK builds on the strengths of Ada to provide even more guarantees statically rather than dynamically.".
More reading:
https://docs.adacore.com/spark2014-docs/html/ug/en/tutorial....
https://learn.adacore.com (many books for learning Ada and SPARK) available in PDF, EPUB, and HTML format.
However, the compiler does not tell you this. We're back to the problem that it's possible to have a "working" C program that relies on UB and will therefore break at some point, but the tools will not yell at you for doing this. Whereas in Java or C# you get warnings or errors for using maybe-uninitialized variables.
Also, scanf should be deprecated. Terrible API. Never use scanf or sscanf etc. We managed to get "gets()" deprecated, time to spread that to other parts of the API.
atoi() or atof() etc. work OK, but really you need a parser.
I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise
Not really the ethos of C(++), though of course this particular bug would be easily caught by running a debug build (even 20 years ago). However, this being a game "true" debug builds were probably too slow to be usable. That was at least my experience doing gamedev in that timeframe. Then again code holding up for 20 years in that line of biz is more than sufficient anyway :)
When I was doing gamedev about 5 years ago, we were still debugging with optimisation on. You get a class of bugs just from running in lower frame rates that don't happen in release.
I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.
It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.
(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )
At booking.com? :)
Aren't you just creating another contract? Users might write code that depends on it being random.
Maybe it would be good to change all non promised things between releases. So that such unwritten rules never become something users rely upon.
For those users, do this instead: https://xkcd.com/221/
Then you are wasting runtime clock cycles randomizing lists.
Not necessarily; you can do a thing where it's randomized during development, testing and fuzzing but not in production builds or benchmarks so that the obvious "I rely on internal map order" bugs are spotted right away.
You can get it pretty much for free by using a random salt with your hash function. This is also useful for avoiding DOS attacks using deliberate hash collisions to trigger quadratic behavior in your hash tables.
Any sane language would design a list iterator to follow the order of the list. No, the difference is when you're iterating over orderless hash-based sets or maps/dictionaries. Many languages choose to leave the iteration order undefined. I think Python did that up to a point, but afterward they defined dictionaries (but not sets) to be iterated over in the order that keys were added. Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.
> Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.
Most modern langages do that as part of hashdos mitigation, Python did that until it switched to a naturally ordered hashmap, then made insertion order part of the spec. Importantly iteration order remains consistent with a process (possibly on a per-hashmap basis).
Notably, Go will randomise the starting point of hashmap iteration on each iteration.
Best change ever, that. Now it would also be nice if sets were ordered too.
I am pretty sure there are trivial impls of sets with guaranteed iteration order in Python that use an underlying ordered map and a dummy value in each entry.
iirc, Go intentionally randomizes map ordering for just this reason.
Yep, and then you get crash reports you can’t reproduce.
Same can be said about pointer addresses (random for each run). But ASLR exists for a specific reason.
> Not ignore the compilation warnings – this code most likely threw a warning in the original code that was either ignored or disabled!
What compiler error would you expect here? Maybe not checking the return value from scanf to make sure it matches the number of parameters? Otherwise this seems like a data file error that the compiler would have no clue about.
Trying g++ version 11.4, there's no warning by default if you don't check the return value of sscanf. Even `g++ -Wall -Wextra -Wunused-result` produces no warnings for a small example.
Undefined behavior to access the uninitialized memory. A sanitizer would have flagged that.
The compiler has no way of knowing that the memory would be undefined, not unless it somehow can verify the data file. The most I think it can do is flag the program for not checking the return value of scanf, but even that is unlikely to be true since the program probably was checking for end of file which is also in the return value. It was failing to check the number of matched parameters. This is the kind of error that is easy to miss given the semantics of scanf.
> The compiler has no way of knowing that the memory would be undefined
Yes it would. -fsanitize=address does a bunch of instrumentation - it allocates shadow memory to keep track of what main memory is defined, and it checks every read and write address against the shadow memory. It is a combination of compile-time instrumentation and run-time checking. And yes, it is expensive, so it should be used for debugging and not the final release.
https://clang.llvm.org/docs/AddressSanitizer.html , https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=m...
I tried this with clang ASAN. Nothing happens. It won't catch this bug. ASAN detects the presence of incorrect behavior, not the absence of correct behavior.
There's no use-after-free, use-after-return, use-after-scope, or OOB access here. It's a case of "an allocated stack variable is dynamically read without being initialized only in a runtime case," which afaik no standard analyzer will catch.
The best way to identify this would be to require all locals to be initialized as a matter of policy (very unlikely to fly in a games studio, especially back then, due to the perceived performance overhead) or to debug with a form of stack initialization enabled, like "-ftrivial-auto-var-init=pattern" which while it doesn't catch the issue statically, does make it appear pretty quickly in QA (I tested).
Thanks for the investigation. Oops, it seems like MSan (memory sanitizer) is the appropriate tool that detects uninitialized reads? https://stackoverflow.com/questions/68576464/clang-sanitizer...
I only use UBSan and ASan on my own programs because I tend not to make mistakes about initialization. So my knowledge is incomplete with respect to auditing other people's code, which can have different classes of errors than mine.
Thank goodness that every language that is newer than C and C++ doesn't repeat these design mistakes, and doesn't require these awkward sanitizer tools that are introduced decades after the fact.
This codebase predates ASAN by the best part of a decade.
You both may be right. It could be that ASAN is not instrumenting scanf (or some other random standard lib function). Though since 2015, it certainly has been. https://github.com/google/sanitizers/issues/108
The simpler policy of "don't allow unintialized locals when declared" would also have caught it with the tools available when the game was made (though a bit ham-fisted).
The problem is that after calling scanf(), the number of variables that are defined is a variable number. For example:
At compile time, you can make no inferences about which of x, y, and z are defined, because that depends on the returned value n. There are many ways to branch out from this.One is to insist on definite assignment - so if we cannot prove all of them are always assigned, then we can treat them as "possibly undefined" and err out.
Another way is to avoid passing references and instead allow multiple returns, like Python (this is pseudocode):
In that case, if the hypothetical `scanf()` returns a tuple that is less than 3 elements or more than 3 elements, then the unpacking will fail at run time and crash exactly at that line.Another way is like Java, which insists that the return value is a scalar, so it can't do what C and Python can do. This can be painful on the programmer, of course.
I interpret "don't allow unintialized locals when declared" as meaning that this call:
Would be caught, because it takes references to undeclared variables. To be allowed, the programmer would have to initialize the variables beforehand.Then people would complain about the wasteful initialisation of out-params. Foolishly, perhaps
I think it would make sense to have a keyword that permits unsafe instantiation specifically for the edge cases where initialization is too expensive. But I think it makes sense for the lazy case to be a little bit safer.
The idea is that ASAN would replace scanf with a function that does additional book keeping when writing to whatever arbitrary memory location the inputs dictate at runtime.
It's probably what the PR resolving the issue I linked to does. Though I didn't check
Uninitialized variables are a really common case.
The pointer to the uninitialized variable is passed to scanf, which writes a value there unless it encounters an error. The compiler cannot understand this contract from the scanf declaration alone.
Yeah, the debugging here is great, but the actual cause is super mild.
Good point. When reading, I kind of just assumed the "use of initialised memory" warning would pick this up.
But because the whole line is parsed in a single sscanf call, the compiler's static analysis is forced to assume they have now initialised. There doesn't seem to be any generic static analysis approach that can catch this bug.
Though... you could make a specialised warning just for scanf that forced you to either pass in pre-initilized values or check the return result.
I always enjoy reading deeply technical writeups like these. I only wonder how much more rare they may or may not get in the AI era.
I don't think they will get more rare; there will always be a top % of engineers that do deep dives. I hope anyway.
But AI won't replace them, nor did the past 50+ years of software development innovation. There's millions (tens of millions?) of higher programming language developers that don't know the difference between stack or heap besides maybe some theory they half remember from school but they don't care because they don't have to think about it for their day job.
If your whole career will be using higher order languages with very little data stored on stack (vs heap), why should those programmers care? It seems like normal progression of more abstraction in the tools that we use. Similarly, I have programmed a lot of C and C++ in my career and I never once need assembly language. (I am expecting someone to pop in the convo here and tell me about how I am a terrible C/C++ programmer because I don't know any assembly.)
Why should I care is a awful catchohrase.
i think the shift will be from craftmens to trademens in regards to general software engineers, but these are type of writes up stem of a artisan style all to its own.
We have been seeing this shift for a while, where "software engineers" graduate from 3 month bootcamps. Except now most likely they will not be earning 500k making crud apps.
and thats a good thing
I call bullshit. What 3mo bootcamp grads were earning 500k writing CRUD apps? Zero.
What about the incredible front end Devs that only know JS/CSS/HTML? They can still be true craftspeople in their art, be it cross-browser/platform issues or performance tweaking.
Compare python devs of today to fortran devs of the 60s. Something like that distance. Maybe more. But the trend isnt new.
I'm more curious in what changed with the critical section locking/unlocking implementation in this version of Windows!
It looks like the utilized stack, or a stack protection area, increased.
When I worked at Microsoft and I had downtime I would sometimes read the code for app compatibility shims out of pure curiosity.
Win9x video games that made bad assumptions about the stack were a theme I saw. One of the differences between win9x and NT based windows is that kernel32 (later kernelbase) is a now user mode wrapper atop ntdll, whereas in the olden days kernel32 would trap directly into the kernel. This means that kernel32 uses more user mode stack space in NT. A badly behaving app that stored data to the left of the stack pointer and called into kernel32 might see its data structures clobbered in NT and not in 9x. So there were compatibility hacks that temporarily moved the stack pointer for certain apps.
I wonder how many people think of the call stack as running left to right, most recent return first, rather than top to bottom, likewise? If you stare at enough hex dumps, it makes perfect sense.
What was the testing like for such bugs? Is it somehow automated, or is there a lengthy doc describing the manual testing steps, or are there no tests at all?
I interned with the AppCompat team shortly before the release of Windows XP, which was huge for them as it was the first Windows for consumers on the NT kernel.
IIRC, they had a significant lab and tons of infrastructure for exercising and identifying compatibility issues in thousands of popular and less popular software packages. It all got distilled into a huge database of app fingerprints and corresponding compatibility shims to be applied at runtime.
I don't know. I wasn't on the team doing this. I was just looking at the source tree.
For anyone with access issues
https://web.archive.org/web/20250423144746/https://cookieplm...
Okay, but why did `LeaveCriticalSection` change? Compiler changes, new features, refactoring, etc? That’s the most interesting part (and absent)!
Am I the only one to be annoyed by this...?
while (this->m_fBladeAngle > 6.2831855) { this->m_fBladeAngle = this->m_fBladeAngle - 6.2831855; }
Like, "let's just write a while loop that could turn into an infinite loop coz I'm too lazy to do a division"
I want to assume that the GTA developers did this hack because it was faster than floating point division on the Playstation 2 or something.
But knowing they were able to they were able to blow up loading GTA5 by 5 minutes by just parsing json with sscanf, I don't have much hope.
IIRC the whole parsing performance issue was because the original code was written for the SP campaign of GTA5 that only had a handful of objects to parse data for. That was barely a blip in terms of performance impact and AFAIK was written years before GTAOnline was made (where it became an issue - and even then only became an issue much after GTAOnline was first made).
Writing some simple code that works with the data you expect to have without bothering with optimizations is fine, if anything it is one of the actual cases of "premature optimization": even with profiling no real time is spent on that code, your data wont make it spend any time and you should avoid wild guesses since chances are you'll be wrong (even if in this case it could be a correct guess, it'd be like a broken clock guessing the time is always 13:37).
The actual issue with that code was that, after they reused it for GTAOnline and started becoming a performance issue after some time as they added more objects, nobody thought to try and see what is wrong.
Are you actually arguing that using a JSON parser for JSON-formatted data is a premature optimization? The solution here was to use a different format, not a somewhat-JSON-compatible hacked together parser.
They were not the only one to make that mistake e.g. rapidjson had to fix the same error, few people expect parsing one token out of sscanf to strlen the entire input (not only that but there are c++ APIs which call sscanf under the hood).
The second error of deduplicating values by linear scanning an array was way more egregious.
The real, systemic error is that dozens(?) of engineers worked on that product, supposedly often testing the online component and experiencing that wait time first hand; and none thought "wait, parsing JSON doesn't take that long, computers are fast! what's going on?"
I think someone estimated that error cost them millions in revenue? I'm pretty sure a fraction of that could afford an engineer who knows how fast computers ought to be.
GTA was never my wheelhouse, but from what I gathered GTA Online didn't have that much support, and since it was only the initial loading time, and it would have increased over time as the shop content increased, and a very fast machine (e.g. a dev machine) would have had less of an issue, the engineers working on it were probably not that incentivised to dig into it.
Like, even though it's pretty critical to initial user experience initial loading time is generally what gets disregarded the most.
> I'm pretty sure a fraction of that could afford an engineer who knows how fast computers ought to be.
It can, if someone cares enough or realises it's an issue, and then someone is motivated enough to dig into it, or has the time to.
I'm willing to bet it was was done for performance reasons, subtraction is cheaper than float point division. Probably the compiler also has some tricks to optimize this further.
There is absolutely no way this could turn into an infinite loop. It could underflow, but for that to happen angle would have to be less than the 2*pi, therefore exiting the loop.
The article discusses how that turns into an infinite loop and causes a hang.
When you subtract a small float from a very large float, the value doesn't change. This is because the "steps" between float values increase with the size of the value (i.e. floats have coarser resolution for larger magnitudes)
To see this in action, try running the following in a JavaScript interpreter:
console.log(1_000_000_000_000_000_000 - 1);
But that’s “impossible”. It’s an angle between 0 and 2pi. When transformed it might go over a bit so they added the check.
It will “never” become big.
So why check? It’s unnecessary.
Thus the bug.
If m_fBladeAngle is really large (>2.2e8 back of the envelope), the subtraction will have no effect, and that would be an infinite loop.
Long shot, but maybe if the value is small, then this loop could be faster than division.
If the code runs every frame, it's probably always small and does just one iteration once in a while when it wraps over the value.
for real. The author clearly never heard of fmod
fmod takes in the order of 30+ cycles, probably more in year 2003 CPUs, vs 1 for cmp, 1 for sub, 1 for jmp.
Sure the lower bound is nicer here. But when the tradeoff includes an unlimited upper bound it's not a very attractive option.
I guess the most robust code handling both performance and unexpected input would be one iteration of this (leveraging the assumption that angles are either always within the bounds, or had one frame of going out of bounds by a small amount); followed by a fmod if that assumption is just totally off.
> all these findings prove that the bug is NOT an issue with Windows 11 24H2, as things like the way the stack is used by internal WinAPI functions are not contractual and they may change at any time, with no prior notice.
This reminds me of an excellent article I read a while back, the gist of it was that, given sufficient success, there's no such thing as a private API.
Could you please find this article and link it here. I'm curious about the arguments.
I know there’s an XKCD comic about this
Just bring back spacebar heating
https://xkcd.com/1172/
Much love to Silent, who’s been improving my favorite game for over... a decade now?
My takeaway, speaking as someone who leans towards functional programming and immutability, is "this is yet another example of a mutability problem that could never happen in a functional context"
(so, for example, this bug would have never been created by Rust unless it was deeply misused)
This is more of a problem of the C/C++ standard that it allows uninitialized variables but doesn't give them defined values, considering it "undefined behavior" to read from an uninitialized variable. Java, for example, doesn't have this particular problem because it does specify default values for variables.
But it's this and many other features of C/C++ that make it faster than Java. C/C++ developers really don't want to "pay" for something for safety.
Though, I really like the _mm_undefined_ps() intrinsics for SSE that make it clear that you're purposefully not initialising a variable. Something like that for ints and floats would be pretty sweet.
It is definitely not the case that magically safer is slower. IMO too often the attitude from WG21 (the c++ language committee) has been "Some fast things are unsafe, therefore if we make our language more unsafe it will go faster" which... that's not how implication works.
As a very high level example, take sorting. Rust's standard library provides you both a stable and unstable sort, as does your C++ standard library.
The C++ standard promises these sorts have O(n log n) performance, it's unclear in modern C++ if having a nonsensical ordering† is Undefined Behaviour (as it was in older versions) or outright IFNDR (much worse than UB) but the real world effect will be similar anyway
Rust promises that these sorts work as expected, if you provide nonsensical ordering, obviously it can't very well "sort" things the way you asked, but we don't need to kill your neighbour's cats and wipe the hard disk either, so, it will either give you back the same things in... some order or it will report the fatal error in your software.
The Rust option here is clearly much safer right? So, how much performance is this costing? Actually, it's faster. So C++ is choosing slower and worse. What's the upside?
† For example what about if I insist that Red < Green, but also Green < Red, and furthermore Red == Green is true, but so is Red != Green, however neither Green == Red nor Green != Red are true!
Statically proving the variables get initialized wouldn't change the performance except by making sure you check the return value of sscanf, or turning refusal to check into a couple register wipes. Either way, that's a negligible increase to a hefty function call. It wouldn't require default initializing variables in all circumstances.
When I think of the "no runtime cost" mentality of C/C++ I don't think that normally extends to ignoring errors in I/O functions.
And yet, there is a good chance that C++ will start doing exactly this [1]. Because [2]:
> The performance impact is negligible (less that 0.5% regression) to slightly positive (that is, some code gets faster by up to 1%). The code size impact is negligible (smaller than 0.5%). Compile-time regressions are negligible. Were overheads to matter for particular coding patterns, compilers would be able to obviate most of them.
> The only significant performance/code regressions are when code has very large automatic storage duration objects. We provide an attribute to opt-out of zero-initialization of objects of automatic storage duration. We then expect that programmer can audit their code for this attribute, and ensure that the unsafe subset of C++ is used in a safe manner.
> This change was not possible 30 years ago because optimizations simply were not as good as they are today, and the costs were too high. The costs are now negligible.
[1] https://github.com/cplusplus/papers/issues/1401
[2] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27...
Thanks for the references - that was interesting reading, particularly that initialisation can be good for instruction pipelining.
A trick we were using with SSE was something like
__m128 zero = _mm_undefined_ps(); zero = _mm_xor_ps(zero, zero);
Now we were really careful with viewing our ops as data dependencies to reason about pipelining efficiency. But our profiling tools were not measuring this.
We did avoid _mm_set_ps(0.0f) which was actually showing up as cache misses.
I wonder if we were actually slower because cache misses are something we can measure?!
I think the response to that would be: yes but the game would simply not have been made if it wasn't written in C++. That's not to say you couldn't or that you can't make something like GTA:SA in Rust in 2025 or in a safer different language in the early 2000s. It just would take a great deal more time and expense as you'd have needed to construct a lot of tooling and do a lot of training to ensure all of the employees were up to speed before getting started. C++ was, and I think to some extent still is, the lingua franca of the gaming industry - there are some fun exceptions (Naughty Dog implementing much of Crash Bandicoot in a home-grown LISP, and presumably dozens or hundreds of DSLs and other little bespoke scripting languages in use at other studios).
And that's not to mention the uncomfortable truth that while doing this correctly in something like Rust may very well take less effort overall than in C++, that is not the bar we are aiming to clear. They wanted to implement something that was correct-enough, and given that this bug wasn't hit for 20+ years and that the game was a roaring success on all the major platforms - I think that was the right decision.
We don't have enough information to claim it's the "right decision" only that this choice did work, not that other choices couldn't have been better.
In video games you can go back and try another option but life isn't like that and so we can only suppose what might have happened.
Well what happened was that despite being based on an aging Renderware engine and programmed using a language with many potential footguns, the game was stable enough across multiple platforms, architectures and OSes that it was both a critical and commercial success.
I know what you’re saying - you can’t really know what might have been in an alternate reality. But in that alternate reality they’d have had to come up with something truly monumental to outdo themselves here.
I think you’re just being a wee bit picky about me using the words “the right decision”. If we’re honest with ourselves there probably wasn’t a Rust-like language in the conversation when they set out to build GTA3, Vice City or San Andreas so this is all kind of moot unless we're suggesting that Rockstar should have started out by building that language...
I'd actually say that Rust is a third option between "everything is immutable" and "mutable soup". Rust is more of "one mutator at a time". Because, Rust really embraces being able to mutate stuff (so not functional in that sense), it just makes sure that it's in a controlled way.
The constant rust evangelism on this site is such a turn off from actually wanting to use the language.
There'd be a lot less Rust evangelism on this site if there were less UB bug outcomes on this site.
If your attitude is just "I'm not going to use abc because too many people say it's good", without even just trying that out first hand to verify those claims, I don't think you can go very far in your technical skills.
The best engineers I know are open to everything and played with almost every tool/language/whatever to form (sorry) informed opinions about them. They often know what they are talking about, and they choose the best tool for the job.
I think I can articulate what the comment means in a way that may make you rethink what you've said a little bit. I'm not wanting to make you think Rust is bad (I personally think it is good) I'm just trying to show you why this person may not be as backwards as you think they are.
So the person in question is irritated at an interesting blog post about a 20+ year old game being used as another opportunity to push Rust. So for starters Rust obviously wasn't around at the time the game was developed so it's not like Rockstar made the wrong call in implementing this using C++. But more importantly I don't think Rust is currently in a state where studios can justify using it to develop AAA games. They'd need big teams of developers with Rust experience who are well-versed in the sort of problems encountered during game development. You'd need battle-tested build/deployment processes that allow you to produce the binaries for Playstation/Xbox (not too dissimilar CPU/GPU wise, but each with their own platform-level quirks no doubt) and Switch hardware - potentially across multiple generations. You'd need various platforms' OS hooks and network-service APIs available. Additionally you'd need to convince the guys with the money that instead of spending $projected on a game, you'd need to spend $projected+$mystery_number when they take the plunge and write their first game in Rust with new tools etc rather than C++ and everything they currently use. The gaming industry is nothing if not ruthless at making money, if it made financial sense they'd be moving to Rust already - if it will make sense in the future, they'll be planning to do it.
You've been charitable in your read of the original comment, taking it as "this family of problem does not exist in Rust" - and for what it's worth I agree and really value this. However this other commenter has presumably seen it as a bit more naive and missing the bigger picture, and in combination with other similar experiences is questioning the value of these of glowing testimonies.
In addition, a lot of people saying "this is great, this is the future!" doesn't necessarily make something good automatically. For about 5+ years here on HN we had legions of people responding "blockchains will fix this" to almost every problem and very confidently declaring the rest of us are luddites for not getting it. I'm obviously not saying Rust is the same, I'm just trying to show that not following the crowd doesn't automatically mean you're the kind who will always fall behind.
As for how to avoid this? I dunno if you can undo the zillions of RIIR comments that have been floating around since Rust appeared on the scene, but if I was evangelising or even just strongly recommending it I'd just keep in mind that my target audience is maybe sick of seeing the same kinds of comments and would be a bit more creative and/or sensitive in approaching the topic.
I don't think mentioning Rust on an article specifically talking about a memory safety bug count as "constant". This is Rust's core strength.
While they did mention rust, the actual suggestion was "functional programming and immutability", which to me suggests several other languages first and makes it not really rust evangelism.
FWIW I think a linter or other similar code quality checker would have caught this as well. From a practical perspective (e.g., how do you prevent this from happening again in your game studio's multi-million line code base) that would have been the right thing to do here.
Rust protects you from external file data you read being incorrect?
That's one hell of a language!
The code would have failed because you can't use an uninitialized variable, so you would have had to set it to a default. You don't just get random garbage from the stack.
You can write a genuine uninitialized local variable in Rust, it's just that you wouldn't do it out of laziness because while in C that's the default in Rust it's a lot of extra work to say "No, I really don't want to initialize this variable" and Rust is like "I mean, if you insist, all I can do is warn you that's a terrible idea".
If we say "I will initialize it - later" that's fine in Rust and you just write the name (and where appropriate type) of the variable and go about your day. The compiler will reject your program if, in fact, it can't see why you're fulfilling that promise, and sometimes that might be because the compiler is dumb (but often it's because you are) but there's no problem technically with this and if the compiler agrees that we do, in fact, initialize it later then it compiles and works and everybody is happy.But to actually make a variable and not initialize it, as we saw above, is a lot of extra work in Rust because like... that's a bad idea, why would you be setting out to do that?
This is such a bad idea that Rust's unsafe std::mem::uninitialized, which is how they did this before MaybeUninit existed, was de-fanged (giving it poor performance by actually writing a pattern to RAM every time) and deprecated so you get a warning if you try to use it even though it was already marked unsafe. See, people (and I'm sure many C programmers are like this) tend to imagine it's OK for say an integer to be uninitialized because surely any possible value is OK, right ? Nope. Your operating system knows that data was never written, and so it feels entitled to fuck you about if you expect it to stay unchanged, because it never promised that will work - as a result rarely but sometimes you get kicked in the head by the OS and you get a seemingly impossible bug.
It would have forced you to either specify a default or fail pretty loudly as soon as you launched the game, both much better than leaving a bug there just for it to resurface 20 years later.
Most popular languages would prevent this. In this case it’s as simple as having more sensible reader API than sscanf in standard library and forcing variables to be initialized.
Of course not, but this here was a memory access error and rust would have prevented this.
You didn't actually understand what the post is about. Maybe read it again.
Could you elaborate? I cannot see how a functional programming language would have protected you from reading a non existing value while not providing a default
It's more that functional languages just happen to be stricter in various ways that would've mitigated against this. You could quite happily design a functional language that has an unsafe equivalent to sscanf in its stdlib, or has big parts of the spec which are "undefined behaviour" that may differ depending on the underlying OS/compiler/runtime/stdlib in use. But the more popular functional languages have gained traction in part because they tend to have a "if you model the types correctly, the program basically works" philosophy around them. I don't think things like Haskell, Ocaml or F# would allow this if you wrote idiomatic code, you'd probably need to do something a little hacky or sketchy.
It simply would not have allowed you to write code which did that. And you wouldn't have a function like sscanf() either. You'd probably end up with a much more normal looking parser function that returned a value-or-error type.
I've never heard of a functional language that would allow you to initialize a value to whatever value the system memory already had in that memory location. In languages that allow nil, it would at least be nil; in languages that don't, you would have gotten an error about an uninitialized and undefaulted value. In any typed language, you would have also gotten an error.
It's true that C may be unique-ish in this regard though- this bug also couldn't happen in Ruby, which is not a functional language, but Ruby certainly still makes undefined behaviors much more possible than in other languages like Elixir.
[dead]
Knowing C/C++, I more or less guessed what's happening (uninitialized variable) early in the blog post.
It blows my mind that the languages allow you to leave variables uninitialized which has caused countless bugs (including production bugs that I have seen first hand), and you often need to rely on additional compiler flags or static analysis tools/valgrind etc to catch them. Even though newer languages often use a different solution (default zero value or must initialize a variable before use), people still go back to C/C++ all the time.
> all these findings prove that the bug is NOT an issue with Windows 11 24H2, as things like the way the stack is used by internal WinAPI functions are not contractual and they may change at any time, with no prior notice. The real issue here is the game relying on undefined behavior (uninitialized local variables), and to be honest, I’m shocked that the game didn’t hit this bug on so many OS versions, although as I pointed out earlier, it was extremely close
This sentence is the real takeaway point of the article. Undefined behavior is extremely insidious and can lull you into the belief that you were right, when you already made a mistake 1000 steps ago but it only got triggered now.
I emphasized this point in my article from years ago (but after the game was released):
> When a C or C++ program triggers undefined behavior, anything is allowed to happen in the program execution. And by anything, I really mean anything: The program can crash with an error message, it can silently corrupt data, it can morph into a colorful video game, or it can even give the right result.
> If you’re lucky, the program triggering UB will show an appropriate error message and/or crash, making you immediately aware that something went wrong. If you’re unlucky, the program will quietly mangle data, and by the time you notice the problem (via effects such as crashes or incorrect output) the root cause has been buried in the past execution history. And if you’re very unlucky, the program will do exactly what you hoped it should do, until you change some unrelated code / compiler versions / compiler vendors / operating systems / hardware platforms – and then a new bug becomes visible, and you have no clue why seemingly correct code now fails to work properly.
-- https://www.nayuki.io/page/undefined-behavior-in-c-and-cplus...
As I wrote in my article, this point really got hammered into me when a coworker showed me a patch that he made - which added a couple of innocuous, totally correct print statements to an existing C++ program - and that triggered a crash. But without his print statements, there was no crash. It turned out that there was a preexisting out-of-bounds array write, and the layout of the stack/heap somehow masked that problem before, and his unlucky prints unmasked the problem.
Okay so then, how can we do better as developers today?
0) Read, understand, and memorize what actions in C or C++ are undefined behavior. Avoid them in your code at all costs. Also obey the preconditions of any API you use, whether in the standard library, operating system, etc.
1) Compile your application in Debug mode and compare its behavior to Release mode. If they differ by anything other than speed, then you have a serious problem on your hands.
2) Compile and run with sanitizers like -fsanitize=undefined,address to catch undefined behavior at runtime.
3) Use managed languages like Java, C#, Python, etc. where you basically don't have to worry about UB in normal day-to-day code. Or use very well-designed low-level languages like Rust that are safe by default and minimize your exposure to UB when you really need to do advanced things. Whereas C and C++ have been a bonanza of UB like we have never seen before in any other language.
Other than C#, there is no reason to use those other languages for game dev. Unless the game is fairly simple, or you want to risk a fairly long project by employing a language that hasn't been proven in tge space yet (Rust). No shade at any of those languages, I don't even like C#, just being pragmatic.
The most successful videogame of all time was written for Java as an applet in the browser.
Muggsy Bogues had a very successful NBA career at just 5 foot 3.
Unity+C# is now a pretty common combo.
I would add: code defensively. Initialize your variables (either to a sensible value, or an outrageously wrong value) before passing pointers to them, even when you "know" that the value will be overwritten. Check for errors. Always consider what happens when things go wrong, not just when things go right. Any time you find yourself thinking, "condition X is guaranteed to hold, so I don't need to check for it" consider checking it anyway just in case you're wrong about that, or it changes later.
My only issue with defensive codding is that often it doesn't play nice with code coverage requirements. I've been in situations where I would like to add defensive coding just in case, but then the PR doesn't pass the coverage checks. The best is when you can ensure via th compiler (e.g. via the type system) that a case is impossible, but C++ (in my case) isn't perfect for this.
Code coverage tools allow to pragma the defensive code which will appear reasonable to most reviewers ?
I learned this lesson many moons ago, on a Fortran code I wrote for a university assignment. It was a basic genetic algorithm, and for some reason it was converging much more slowly than expected. So I was sprinkling some WRITEs to debug, and suddenly the code converged a hundred times faster.
All this is true. Note also that the C++ folks are putting a serious effort into reducing UB. See the "safe by default" section of this writeup [1]. See also my other comment [2] regarding the performance impact of this sort of change. Short answer: with sufficient optimization, smaller than one might think.
[1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...
[2]: https://news.ycombinator.com/item?id=43779449
Maybe it’s the real reason why CJ couldn’t follow the dang train.
Once this category of error is raised to your attention, you start to notice it more and more.
A little piece of technology made sense in the original context, but then it got moved to a different context without realizing that move broke the contract. Specifically in this case a flying boat became an airplane.
---
I recently worked a bug that feels very similar:
A linux cups printer would not print to the selected tray, instead it always requested manual feed.
Ok. Try a bunch of command line options, same issue.
Ok. Make the selection directly in the PPD (postscript printer definition) file. Same issue.
Ok! Decompile the PXL file. Wrong tray is set in pxl file... why?
Check Debug2 log level for cups - Wrong MediaPosition is being sent to ghostscript (which compiles the printer options into the print job) by a cups filter... why?
Cups filter is translating the MediaPosition from the PPD file... because the philosophy of cups is to do what the user intended. The intention inferred from MediaPosition in the PPD file (postscript printer definition) is that the MediaPosition corresponds to the PWG (Printer Working Group) MediaPosition, NOT the vendor MediaPosition (or local equivalent - in this case MediaSource).
AHA!! My PPD file had been copied from a previous generation of server, from a time when that cups filter did NOT translate the MediaPosition, so the VENDOR MediaSource numbers were used. Historically, this makes sense. The vendor tray number is set in the vendor ppd file because cups didn't know how to translate that.
Fast forward to a new execution context, and cups filters have gotten better at translating user intention, now it's translating a number that doesn't need to be translated, and silently selecting the wrong tray.
TLDR; There is no such thing as a printer command, only printer suggestions.
Infamously, this is also why Ariane 501 blew up.
(a component being reused in a new context where a contract is broken, not bad CUPS drivers)
Use a debugger folks. A 10x dev cited this story to me about the ills of not using one.
I always wonder, why not write these games on top of a virtual machine like Carmack started doing in Quake, a usage he then later extended to quake 2 and 3 [1].
I'm ignorant about game development, virtual machines and system programming but from the little I understand it seems a sensible choice to make.
While there is an initial price to pay modeling 99% of the game to be implemented on a user-implemented stack seems a sensible approach to me.
[1] https://fabiensanglard.net/quake3/qvm.php
The article mentions using breakpoints, so they did use a debugger.
This is a game; I don't think a debug configuration (with checks for things like this enabled) would run fast enough to be playable on contemporary hardware.
That's not accurate.
Generally, game console "debug" configurations aren't "true" debug like most people think of -- optimizations are still globally enabled, but the build generally has a number of debug systems enabled that naturally require the use of a devkit. Devkits, especially back then, generally had 2-3x as much memory as retail systems -- so you'd happily sacrifice framerate during feature development to have those systems enabled.
Debugging was (and still is) generally done on optimized builds and, once you know the general area of the problem, you simply disable optimizations for that file or subsystem if you can't pinpoint the issue in an optimized build.
The biggest performance hit, in general, comes from disabling optimizations in the compiler. I say "in general" because there are systems that might be used to find this kind of thing that DO make a game wholly unplayable, such as a stomp allocator. Of course, you wouldn't generally enable a stomp allocator across all your allocations unless you're desperate, so you could still have that enabled to find this kind of bug and end up with a playable game.
The more likely reason here is that no one noticed or cared. GTA:SA is 21 years old and this bug doesn't affect the Xbox or other versions.
From GP:
> (with checks for things like this enabled)
You can (and could) easily compile an optimized build with debug symbols to track down sources of issues, but catching a bug like this would likely take a dynamic checker like Valgrind or MSan, which do not allow for any optimizations if you want to avoid false negatives, and add even more overhead on top of that. (Valgrind with its full processor-level virtualization, and MSan with its shadow state on every access. But MSan didn't exist at the time, and Valgrind barely existed.)
At minimum, fine-grained stack randomization might have exposed the issue, but only if it happened to be spotted in playtests on the debug build.
This was a PS2 game and codebase.
MSan didn’t exist at the time and valgrind doesn’t work on a ps2.
Neither of those are necessary to find this bug as it could be found using a stomp allocator if you’re a developer on the project at the time.
How could a stomp allocator have possibly found this bug? The offending values are stored on the stack, in-bounds when written to, and again in-bounds when read from.
At no point is there an OOB access, just a failure to initialize stack variables. And to catch that, you'd need either MSan-style shadow state that didn't exist, thorough playtesting with fine-grained stack randomization, or some sort of poisoning that I don't think existed.
Tools like valgrind/asan/msan would have flagged this instantly too. Just a unit test of that vehicle loader would have seen it.
Really this is more a story about poor development practice than it is an interesting bug.
Problem with valgrind/asan/msan is that you have to start using these tools early in the development process. It can't be a "checklist" item before launch, or you'll have an insurmountable number of bugs, often with them baked in such that fixing the bug causes additional changes that introduce unrelated bugs.
As if tools in early 2000's were any good...
Valgrind was released in 2002 to immediate celebration. It was available and surely known to the team. All they needed to do was write a unit test that loaded and instantiated those vehicle files and run it with "valgrind" in front of the command line.
I don't know whether Valgrind ever gained support for any of the GTA target platforms. It wasn't available on Windows for a very, very long time.
Still isn't available for windows from what I gather.
I tried to use Valgrind to catch pretty much this exact bug 20 years ago, and it was nigh impossible. If you call any 3rd party code it'll have flag tens of thousands of false positives that you have to sift through. And that was on a small game engine, I can't imagine running it on millions of lines of code.
Again, you don't valgrind a whole game. You valgrind your unit tests. Even in 2004 when this game was released, and even in the game industry, unit testing was a routine thing. And this particular bug was in code very amenable to lightweight unit testing.
That would be assuming they knew there was a bug in that particular part of the code, which they probably didn't, until Windows 11 24H2. And unfortunately Valgrind doesn't work on windows.
It was available and widely celebrated, but there was no guarantee that these Windows and console developers had heard of it yet or that they could have used it if they had.
I like the one where you shoot at the moon and it gets closer
That isn't a bug. https://x.com/ObbeVermeij/status/1757572432863384046
But it is a very likable story :)
[dead]
20 years and still nobody has realized what the back of this sign is about. IYKYK.
https://static.wikia.nocookie.net/gta-myths/images/f/f1/Egg_...
fixed link (the above did not work for me) https://static.wikia.nocookie.net/gta-myths/images/f/f1/Egg_...
And if I don't know?
On Windows 11 24H2, more stack space was modified by a new implementation of Critical Sections.
IMHO this shows the downfall of Microsoft. Why did they do that? Critical sections have been there for many decades and should be basically bug-free by now. My best guess is someone thought they'd "improve" things and rewrote it, then made some microbenchmark that maybe showed the dubious improvement.
The other comment here mentions Raymond Chen, who wrote this article about why backwards-compatibility is very important (and arguably what got Microsoft into the position it's in today):
https://devblogs.microsoft.com/oldnewthing/20031224-00/?p=41...
and also this memorable case: https://news.ycombinator.com/item?id=2281932
This is an existing bug in GTA, not Windows 11.
Really? Someone depending on UB in their software represents the downfall of Microsoft?! What a hot take...
User has working software. User updates operating system. User has broken software.
That's a problem for the party trying to sell operating system updates.
The software was fundamentally broken before the OS update. It was working by pure random chance with undefined behaviour. It’s a C++ issue, not an OS issue. The same code compiled for another OS would have different random results.
While this is technically correct that doesn’t get the customer to put the blame where it belongs.
Not a C++ issue, but a sloppy developer issue.
The core problem is some compilers initialising memory to zero in Debug mode, masking behaviour of unitialised data, since in most cases zero is a legit value. In Release mode, this zeroing doesn’t happen.
Devs need to be aware that the following C++ initisliser exists which zeros data structures for you:
MyStruct s = { };
Apparently the planes are buggy in all sorts of ways, although I don't remember them being this bad... https://youtu.be/hrJ0eVY5ACw
Surprised to see the return value of sscanf being ignored, that seems like a pretty rookie mistake, and this bug would never have made it out of the original programmer's system if they had bothered to check it.
Yes, it would have made it out of the original programmer's system for that initial commit.
FTA:
So the original code (or at least a working code + data version) in GTA Vice City had no visible problems, at least with the Skimmer object, since the vehicles.ide file had the correct number of values for the Skimmer boat object.Someone changed the Skimmer object from a boat to a plane for GTA San Andreas, BUT they DID NOT update the object to have the REQUIRED wheel values for a plane object.
Now the GTA code is expecting more values than it gets.
The vehicles.ide wasn't validated for correctness after the Skimmer object change to plane. Maybe there are more gotchas in that file...
At least users can fix the problem with a text editor instead of waiting and hoping that RockStar would fix the problem and release an update.
It has always been too easy to read & write beyond the stack. This should fail, plain and simple.
Mitigations exist - ASLR, NX pages, stack-smashing protection etc. but nothing comprehensively stops reads of stale data beyond the stack.
Thought experiment for a moment. What if the hardware ensures the unused part of a stack region cannot be read or written.
There are many ways to skin this cat, here’s one based around tracking each stack’s start address A, size S, and current depth D
1. Add an instruction to inform the CPU there is a stack at address A of size S. Its depth D is initially 0.
2. Add a jump instruction which reserves N bytes on the stack at address A, growing depth D to (D+N). Maybe this can be its own “reserve” instruction so as not to need a new jump instruction.
3. Give existing return instructions stack awareness. If returning to an address inside a stack, un-reserve the bytes reserved by the most recent jump, making the new depth (D-N).
4. Fail reads or writes to the stack region beyond its current depth. In other words fail all reads and writes between A+S-D and A+S.
5. The arithmetic is reversed on architectures whose stacks grow downwards.
Downsides I can see:
It cements one calling convention. The CPU memory manager will need a lot of state per stack, of which there are many per process: address A, size S, current depth D, plus a reservation stack - ie. sizes of each frame’s stack memory. That’s a lot of bookkeeping! It’s far from zero cost. The limits of how much bookkeeping the CPU can do impose limits on how deep a stack can go and how many stacks are supported - so when there are too many stacks or one goes too deep, either the CPU needs to signal failure or engage a fallback mode and revert to behaving as CPUs do today. And of course fallback puts things back to the start. It’d therefore only mitigate situations in which an attacker cannot control the depth of the stack / a bug always happens inside the max depth the CPU can bookkeep for.
That said, stacks are ubiquitous! Hardware stack awareness opens up all kinds of new mitigations.
Why isn’t this a common idea? Has it been tried?
This bug wasn't caused by a read beyond the current bounds of the stack, but a stale value from a prior call to the same function at the exact same location on the stack. Buffer-overflow protections like you describe wouldn't help here.
Any solution I can think of uses a lot of resources. Those sort of methods are useful in some contexts, such as highly secure operations, but seem very excessive for the sort of abuse and leak encountered in this example.
tl;dr of the explanation: the Skimmer vehicle is missing a wheel scale definition, so its wheel scale gets read from uninitialized memory. On previous versions of Windows, this happened to be the wheel scale of the previously-loaded vehicle, so things happened to work fine. Starting on Windows 11 24H2, LeaveCriticalSection (which gets called between loading vehicles) uses more stack space than before, so it now overwrites that memory with a gigantic value, resulting in the Skimmer spawning so high up that it may as well not exist at all.
I wonder if they fixed the vehicle definition file as well, or just the parser. The latter would be an incomplete fix.
SilentPatch (for GTAs, at least) specifically is a code-only mod, such that the single .asi file can be removed to uninstall it & all it's changes.
A real update should fix both (note: I don't believe the later releases did, they also just added defaults to the parser) but for SilentPatch: a mod is not a real update, and being as simple as possible to remove & reducing conflicts with other mods is more important here than a fix that digs as deep as possible.
Given that those parameters are for wheels on a plane that doesn’t have wheels, I would say fixing the parser is the better fix
It is, but who enjoys fixing parser issues...
For some reason this domain is blocked by work dns filtering?
Let your IT department know that their denylist is broken.
Yes, there doesn't seem to be anything even remotely suspicious here.
pretty wild how bugs can stick around that long - id never think something from 20 years ago would pop up just cause windows changed
>Scientists claim to have discovered a ‘new color’ no one has seen before.
LOL!
I hope someone can figure out the Red Dead Redemption 2 bug where random animals and characters disappear silently if you have too many texture mods installed.
I spent hours looking for a badger.
Just like $dayjob.
I love it how many bugs go from "why doesn't this work?!" to "how on Earth did this work previously?!"
How does a bug from 20 years ago even still work today?
[dead]
[dead]
[flagged]
funnily enough, there was an infamous bug in GTA5 for a long time that was related to using JSON : https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...
I remember that one too :)
They wrote a JSON “parser” using sscanf. sscanf is not bulletproof! Just use an open source library instead of writing something yourself. You will still be a real programmer, but you will finish your game sooner and you won't have embarrassing stories written about you.
Loading times are still absurd, fwiw.
Yes, but now it's in the realm of ~3 minutes, and not ~8 minutes even on a top-spec PC, right? I really liked the game, but waiting 8 minutes to load just to get griefed by hackers within seconds of walking outside... I don't understand how that game makes any money.
IMO it varies widely. This past weekend it was taking me multiple attempts to get logged in to a public lobby— after waiting ~5-10 minutes!
Nothing has changed appreciably. If they would let you login to a private invite-only lobby that would likely speed things up greatly— but it’ll never happen.
That's probably just the nature of P2P networking code.
> If they would let you login to a private invite-only lobby that would likely speed things up greatly— but it’ll never happen.
Did they remove this option in the last couple years?
I don’t think its ever been an option to login directly to an invite-only lobby. But then I have taken multiple multi-year breaks! I was pleasantly surprised you can actually play most of the game in a private lobby now… that is a huge change and I am not at all certain when it occurred.
It was like crack. People put up with a lot of problems and bugs just because it was really fun just enough of the time to get them hooked.
Nitpicking: What you're describing is called a "variable ratio reinforcement schedule", and is considered to be the most effective form of operant conditioning.
However, it's not even remotely "like crack". Crack is really really really really fun, period, no "just enough of the time" about it. The reason people get hooked on crack is because it's guaranteed to be fun.
If I had to choose a substance that most closely mirrored variable ratio reinforcement conditioning, it'd probably be ketamine.
Definitely similar to a drug addiction, speaking from firsthand experience with both. GTA has been harder to give up than cocaine was.
Putting the (very valid) reasons for not having human-readable game saves aside, are you sure it's worse than using a 3rd party library that's built to accept semi-valid input values, possibly evaluates user input in some way and has difficult to debug bugs that occur only under certain inputs? I agree that writing a stable and safe parser for a binary data file isn't easy, but there's less things that can go wrong when you can hardcode it to reject any remotely suspicious input. Third party XML/JSON libraries OTOH try to interpret as much as possible, even when the values are bogus. Also no need to deal with different text encoding bugs, line endings...
You misunderstood. Game developers should use a _good_ third–party library, not a _bad_ one. At a minimum they should be able to read the source code so that they know it is good. Thus open source libraries should be at the top of the list.
If you don't know what “good” looks like, take a look at [Serde](https://serde.rs/). It’s for Rust, but its features and overall design are something you should attempt to approach no matter what language you’re writing in.
There are no good third party libraries
I disagree. Serde is not merely good, it is excellent.
The only C code that I have recently interacted with uses a home–grown JSON “library” that is actually pretty good. In particular it produces good error messages. If it were extracted out into its own project then I would be able to recommend it as a library.
But how is that C project using a custom made JSON library doing better than Rockstar games doing the same? Because that library has good error messages?
Apart from that, many of us thought that Java serialization was good if just used correctly, that IE's XML parsing capabilities were good if just used correctly, and so on. We were all very wrong. And a 3rd party library would be just some code taken from the web, or some proprietary solution where you'd once again have to trust the vendor.
It’s good because they have spent hundreds or thousands of hours polishing and improving it. It’s paid off too, because stable releases never have broken data files any more. Any mistakes that do get made are usually found and fixed before the code is ever committed. Even the experimental branch rarely sees broken data files. It’s more likely to see these error messages when loading save files, because the code to read old save files and convert them for newer versions of the game is the hardest to write and test.
> And a 3rd party library would be just some code taken from the web, or some proprietary solution where you'd once again have to trust the vendor.
Open source exists for a reason, and had already existed for ~15 years by the time this game was begun. 20 years later there are even fewer excuses to be stuck using some crappy code that you bought from a vendor and cannot fix.
You both are correct.
But also keep in mind in 2004 the legality of many open source projects was not really tested very well in court. Pretty sure that was right around the time one of the bigger linux distros was throwing its weight around and suing people. So you want to ship on PS2 and XBOX and PC and GameCube. Can you use that lib from inside windows? Not really. Can you build/vs buy? Buy means you need the code and probably will have to port it to PS2/GameCube yourself. Can you use that opensource lib? Probably, but legal is still dragging its feet, and you get to port it to PS2. Meanwhile your devs need a library 3 weeks ago and have hacked something together from an older codebase that your company owns and it works and means you can hit your gold master date.
Would you do that now? No. You would grab one of the multitudes of decent libs out there and make sure you are following the terms correctly. Back then? Yeah I can totally see it happening. Open source was semi legally grey/opaque to many corporations. They were scared to death of losing control of their secret sauce code and getting sued.
> Putting the (very valid) reasons for not having human-readable game saves aside,
I don't follow. What would the reasons be?
A human-readable game save file is presumably human-editable.
Most binary save game files are human editable, too; unless they go through a separate encoding stage.
Editting simcity saves was my introduction to hex editing...
For me, iirc, it was Bard's Tale
Require a hash in the file to match the rest of the file if you want to avoid effortless changes to the file.
(There is no way to prevent changes by a knowledgeable person with time or tools, so that's not a goal)
Before game companies earned all their profit through selling cosmetics and premium currency nobody cared if you cheated at your single player game and nobody SHOULD care if you want to give yourself extra money.
It's only now that single player progress is profitable to sell that video games have taken save game encryption to be default.
It's so stupid.
The trouble is that if some weirdness happens because of the edit, you've got to handle it even if you say it would be reasonable to assume that it's outside of being supported. Maybe you spend a bit more time defensive coding around what inputs it reads from the file, maybe a certain proportion of users doing the save edit see bugs in an apparently unrelated part of the game and seek support (and their bug report might not be complete with all the details), developers spend time to chase down what went wrong, maybe they bad-mouth it on forums which affects sales - there's going to be some cost to handling all of that.
One of the anecdotes from Titan Quest developed by Iron Lore is that their copy protection had multiple checks, crackers removed the early checks to get the game running but later 'tripwires' as you progress through the game remained and the game appeared to crash. So the game earned a reputation for being buggy for something no normal user would hit running the game as intended.
>The trouble is that if some weirdness happens because of the edit, you've got to handle it even if you say it would be reasonable to assume that it's outside of being supported.
What? No. What even are you suggesting? Hell, games with OFFICIAL MODDING SUPPORT still require you submit bug reports with no mods running.
Editing game files has always been "you are on your own", even editing standard Unreal config files is something you wont get support for, and they are trivial human readable files with well known standards.
>One of the anecdotes from Titan Quest
Any actual support for this anecdote? Lots of games have anti-piracy features that sneakily cause problems, and even could fire accidentally. None of those games get a reputation for being buggy. Games like Earthbound would make the game super hard and even delete your save game at the very end. Batman games would nerf your gliding ability. Game Dev Tycoon would kill your business due to piracy.
None of these affected the broad reputation of the game. Most of them are pretty good marketing in fact.
Game Dev Tycoon even later added Pirate Mode to the game, for people who wanted to experience super-hard-mode. Complete with random mail they got telling them why people pirated their game, framed as why people were pirating the game you just made.
Mostly to prevent people and programs from editing them, obfuscating implementation details, reducing file sizes (say had they used XML vs. binary)...
Higher barrier to cheating.
It's a single player game. Cheat codes are built into it by design.
HESOYAM
Telling a 3d engine programmer not to have opinions on data formats? Good luck with that.
After finishing the article I immediately did ctrl+f "rust" and was disappointed to not see any of the results I wanted, but actually this comment is more hilarious than anyone saying "why didnt rockstar use rust in 2004!!!1111!!???" it's a bit more of a sophisticated joke since there's an IYKYK factor but it is no less hilarious. Bravo sir, bravo.
[dead]
To u/db48x whose post got flagged and doesn't reappear despite me vouching for it as I think they have a point (at least for modern games): GTA San Andreas was released in 2004. Back then, YAML was in its infancy (2001) and JSON was only standardized informally in 2006, and XML wasn't something widely used outside of the Java world.
On top of that, the hardware requirements (256MB of system RAM, and the PlayStation 2 only had 32MB) made it enough of a challenge to get the game running at all. Throwing in a heavyweight parsing library for either of these three languages was out of the question.
The comment reappeared, and while you're right about using proper libraries to handle data, it doesn't excuse the "undefined behavior (uninitialized local variables)" that I still see all the time despite all the warning and error flags that can be fed to the compiler.
Most of the time, the programmers who do this do not follow the simple rule that Stroustrup said which is to define or initialize a variable where you declare it (i.e. declare it before using it), and which would solve a lot of bugs in C++.
While it doesn't excuse the bad habits, we do have to keep in mind C++98 (or whatever more ancient was used back then) didn't have the simple initializers we now take for granted. You couldn't just do 'Type myStruct = {};' to null-initialize it, you had to manually NULL all nested fields. God forbid you change the order of the variables in the struct if you're nesting them and forget to update it everywhere. It was just considerably more practical to do 'Type myStruct;' then set the fields when needed.
You could always `bzero` or `memset` the entire struct to 0.
But only if it contains strictly POD members, otherwise it's UB.
I haven't been using C++ for a number of years but I think you could set the default values of fields even back then. Something like
Or is this something more recent ?You cannot initialize them with a different value unless you also write a constructor, but it not the issue here (since you are supposed to read them from the file system)
That's C++11 syntax. Before then you'd have to manually initialize them in every constructor, with a hand-written default constructor as a minimum:
And it did not matter at all. The game shipped and was a success.
This is the thing that drives artists and craftsmen to despair and drink: That a flawed, buggy, poor quality work can be "successful" while something beautiful and technically perfect can fail.
San Andreas might be rough under the hood, but on the surface it was nothing short of a masterpiece of game design. The engine was so complex and the cities felt alive, and the game could handle a lot of general nonsense. Still one of my favorite go-to games.
The job of the artist is to take the years of expertise and distill it down into something "enjoyable." The hardest mental hurdle to get over is that people just don't care about the technicals being perfect. Hell, the final product doesn't even need to be beautiful; it just needs to be arresting.
Heck, sometimes the thing that's most interesting about a work is people arguing over whether or not it's art.
One artist can take months painting a picture of a landscape where everything is perfect. And the next artist can throw 4 colors of paint at a wall. The fact that lots of people enjoy the work of the second artist doesn't invalidate the work of the first. The two artists are focusing on different things; and it's possible for both of them to be successful at reaching their goals.
Some artist also think "is mot music because it has no guitars".
The standars "artist" have are atificial and snoby.
How can Deadmau5/whatever EDM artist sell so much?
Let's be clear that it was a success very much in spite of UB, not because of it. And there was still a cost--likely at least hundreds of person-hours spent fixing other similar bugs due to UB (if not more).
I worked in gamedev around the time this game was made and this would have been very much an ordinary, everyday kind of bug. The only really exceptional thing about it is that it was discovered after such a long time.
> it doesn't excuse the "undefined behavior (uninitialized local variables)" that I still see all the time despite all the warning and error flags that you can feed to the compiler.
Yeah but we're talking about a 2004 game that was pretty rushed after 2002's Vice City (and I wouldn't be surprised if the bug in the ingestion code didn't exist there as well, just wasn't triggered due to the lack of planes except that darn RC Chopper and RC plane from that bombing run mission). Back then, the tooling to spot UB and code smell didn't even exist or, if at all, it was very rudimentary, or the warnings that did come up were just ignored because everything seemed to work.
JSON didn't save them: This is the same studio that handrolled a JSON parser with accidentally quadratic time complexity, making most players wait 3 to 10 minutes to load GTA Online, for 7 years, until a player got tired and found the root cause.
https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...
You’re not entirely wrong, but a library doesn’t have to be “heavyweight” in order to be bulletproof. And you can load the library during startup and then unload it after; it doesn’t have to stick around for the whole run time of the game. Modern OSes will reclaim the pages after you stop using them, if there is memory pressure. Of course the PS2 didn’t do that I am sure.
Meanwhile, in a certain modern OS, unloading a library is too broken to the point that people are discouraged to do so... Try to unload GLib [0] from your process :p
[0] https://docs.gtk.org/glib/
Unloading C libraries is fundamentally fraught with peril. It's incredibly difficult to ensure that no dangling pointers to the library remain when it's unloaded. It's really fun to debug, too. The code responsible for the crash literally is not present in the process at the time of the crash!
Why weren't binary files used like I would expect in the 1990's DOS game? fread into a struct and all that
By the 2000s, portability was a concern for most titles. Certainly anything targeted at a rapidly changing console market back then.
Definitely, and architectures back then were far less standardized. The Xbox 360 was a big-endian PowerPC CPU, the PS2 had a custom RISC-based CPU. On the desktop, this was still the era of PowerPC-based Macs. Far easier (and I would argue safer) to use a standard, portable sscanf-like function with some ascii text, than figure out how to bake your binaries into every memory and CPU layout combination you might care about.
Easier for internal development. Non- or less technical team members can tweak values without having to rebuild these binary files. Possibly also easier for lightweight modding externally as well.
This isn't that uncommon - look at something like Diablo 2 which has a huge amount of game data defined from text files (I think these are encoded to binary when shipped but it was clearly useful to give the game a mode where it'd load them all from text on startup).
Video games are made by a lot of non-programmers who will be much more comfortable adjusting values in a text file than they are hex editing something.
Besides, the complaint about not having a heavyweight parser here is weird. This is supposed to be "trusted data", you shouldn't have to treat the file as a threat, so a single line sscanf that's just dumping parsed csv attributes into memory is pretty great IMO.
Definitely initialize variables when it comes to C though.
Wow I had no idea YAML was that old. I always thought it was created some time around when CI/CD became popular. Now I'm really curious how it ended up as a superset of JSON.
Vouching seems to be time-lagged and require more than one.
XML was everywhere.
The flaw isn't the language. The issue is a 0.5x programmer not knowing to avoid sscanf() and failing to default and validate the results. This could be handled competently with strtok() parsing the lines without needing a more complicated file format.
Worked fine on the target machines and the "0.5x programmer" got to see their family for winter holiday. Or are you saying they should have defensively programmed around a bug manifesting 21 years later and skip seeing their family during crunch time?
To be honest, I just don't like how you disparaged the programmer out-of-context. Talk is cheap.
Using a well–written third–party library would not increase the development time; it would in fact reduce it. No risk of missing Christmas there.
Well-written 3rd party serialization libraries weren't exactly easy to come by 20 years ago, at least from what I can recall. Your best bet was using an XML library, but XML parsing was quite resource heavy. Many that seemed well designed turned out to be a security nightmare (Java serialization).
I disagree. JSON is 25 years old, and SAX parsers are 22. A SAX parser is the opposite of “resource heavy”, since it is event driven. The parser does not maintain complex state, although you might have to manage some state in order to correctly extract your objects from the XML. Granted, it wouldn’t have integrated nicely with C to generate the parser code from your struct definition back then, but the basics were there.
But it is even more important for today’s game studios to see and understand the mistakes that yesterday’s studios made. That’s the only way to avoid making them all over again.
> JSON is 25 years old
And in 2004, didn't have a published specification, or much use outside of webdev (which hadn't eaten the world yet).
> and SAX parsers are 22
And, especially at the time, pretty much exclusive to Java, right?
Put another way, which are the high-quality open-source implementations of those formats that the developers should've considered while working on SA in 2003 and 2004? Or for that matter, in the 2001-2002 timeframe, when the parsing code was probably actually written for use in VC?
I’ll be the first to defend the greybeards I’ve befriended and learned from in AAA, but having seen codebases of that age and earlier, the “meta” around game development was different back then. I think the internet really changed things for the better.
Your average hire for the time might have been self-taught with the occasional C89 tutorial book and two years of Digipen. Today’s graduates going into games have fallen asleep to YouTube lectures of Scott Meyers and memorized all the literature on a fixed timestep.
Otoh, the Internet has meant that nothing is ever finished, there's always an update to download.
[dead]