Checked-size array parameters in C

101 points by chmaynard 21 hours ago

Arch-TK 19 hours ago

That weird feeling when you realise that the people you hang out with form such a weird niche that something considered common knowledge among you is being described as "buried deep within the C standard".

What's noteworthy is that the compiler isn't required to generate a warning if the array is too small. That's just GCC being generous with its help. The official stance is that it's simply undefined behaviour to pass a pointer to an object which is too small (yes, only to pass, even if you don't access it).

loeg 12 hours ago

The other fun wart with `static` is C++ doesn't support it. So it has to be macro'd out in headers shared with C++.
https://godbolt.org/z/z9EEcrYT6
- pjmlp 7 hours ago
  And probably never will, because C++ compatibility with C beyond what was done initially, is to one be close as possible but not at the expense of better alternatives that the language already offers.
  Thus std::array, std::span, std::string, std::string_view, std::vector, with hardned options turned on.
  For the static thing, the right way in C++ is to use a template parameter,
  template<typename T, int size> int foo(T (&ary)[size]) { return size; }
  -- https://godbolt.org/z/MhccKWocE
  If you want to get fancy, you might make use of concepts, or constexpr to validate size at compile time.
- flohofwoe 5 hours ago
  
  Not surprising and not a "wart". C and C++ have diverged since the mid-90s and are two very different languages now. E.g. trying to build C code with a C++ compiler really doesn't make much sense anymore (since about 20 years).
flohofwoe 5 hours ago

A lot of "modern" C features (e.g. added after ca 1995) are unknown to C++ devs, I would have expected that at least the Linux kernel devs know their language though ;)

Veserv 19 hours ago

Pointer to array is not only type-safe, it is also objectively correct and should have always been the syntax used when passing in the address of a known, fixed size array. This is all a artifact of C automatically decaying arrays to pointers in argument lists when a array argument should have always meant passing a array by value; then this syntax would have been the only way to pass in the address of a array and we would not have these warts. Automatic decaying is truly one of the worst actual design mistakes of the language (i.e. a error even when it was designed, not the failure to adopt new innovations).

jacquesm 19 hours ago

Fully agreed, and something that is hard to fix. This guy is trying really hard and with some success:
https://news.ycombinator.com/item?id=45735877
- wild_pointer 19 hours ago
  
  This guy is doing something else completely. In his words:
  > In my testing, it's between 1.2x and 4x slower than Yolo-C. It uses between 2x and 3x more memory. Others have observed higher overheads in certain tests (I've heard of some things being 8x slower). How much this matters depends on your perspective. Imagine running your desktop environment on a 4x slower computer with 3x less memory. You've probably done exactly this and you probably survived the experience. So the catch is: Fil-C is for folks who want the security benefits badly enough.
  (from https://news.ycombinator.com/item?id=46090332)
  We're talking about a lack of fat pointers here, and switching to GC and having a 4x slower computer experience is not required for that.
  - Veserv 18 hours ago
    
    I am actually not talking about the lack of fat pointers. That is almost entirely orthogonal to my point. I am talking about the fact that what would be the syntax for passing a array by value was repurposed for automatically decaying into a pointer. This results in a massive and unnecessary syntactic wart.
    The fact that the correct type signature, a pointer to fixed-size array, exists and that you can create a struct containing a fixed-size array member and pass that in by value completely invalidates any possible argument for having special semantics for fixed-size array parameters. Automatic decay should have died when it became possible to pass structs by value. Its continued existence continues to result in people writing objectively inferior function signatures (though part of this it the absurdity of C type declarations making the objectively correct type a pain to write or use, another one of the worst actual design mistakes).
    Fat pointers or argument-aware non-fixed size array parameters are a separate valuable feature, but it is at least understandable for them to not have been included at the time.
    
    moefh 18 hours ago
    
    > The fact that the correct type signature, a pointer to fixed-size array, exists and that you can create a struct containing a fixed-size array member and pass that in by value completely invalidates any possible argument for having special semantics for fixed-size array parameters.
    That's not entirely accurate: "fixed-size" array parameters (unlike pointers to arrays or arrays in structs) actually say that the array must be at least that size, not exactly that size, which makes them way more flexible (e.g. you don't need a buffer of an exact size, it can be larger). The examples from the article are neat but fairly specific because cryptographic functions always work with pre-defined array sizes, unlike most algorithms.
    Incidentally, that was one of the main complaints about Pascal back in the day (see section 2.1 of [1]): it originally had only fixed-size arrays and strings, with no way for a function to accept a "generic array" or a "generic string" with size unknown at compile time.
    [1] https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas...
  - jacquesm 18 hours ago
    
    This is not about performance.
jdougan 10 hours ago

One of the nicer features of D is that arrays are value types with no degrade to pointer.

o11c 19 hours ago

Better option: just wrap it in a unique struct.

There are perhaps only 3 numbers: 0, 1, and lots. A fair argument might be made that 2 also exists, but for anything higher, you need to think about your abstraction.

pixl97 18 hours ago

https://en.wikipedia.org/wiki/Zero_one_infinity_rule
- kalterdev 15 hours ago
  
  Nice article, never seen that.
  I’ve always thought it’s good practice for a system to declare its limits upfront. That feels more honest than promising ”infinity” but then failing to scale in practice. Prematurely designing for infinity can also cause over-engineering—like using quicksort on an array of four elements.
  Scale isn’t a binary choice between “off” and “infinity.” It’s a continuum we navigate with small, deliberate, and often painful steps—not a single, massive, upfront investment.
  That said, I agree the ZOI is a valuable guideline for abstraction, though less so for implementation.
  - o11c 12 hours ago
    
    There's a reason I prefer "lots" over "infinity".
    For your "quicksort of 4 elements" example, I would note that the algorithm doesn't care - it still works - and the choice of when to switch to insertion sort is a mere matter of tuning thresholds.
- kragen 11 hours ago
  
  The zero-one-infinity rule is not applicable to the number of bytes in Poly1305 nonces and ChaCha20 keys. They are exceptions.

nikeee 19 hours ago

GCC also has an extension to support references to other parameters of the function:

    #include <stddef.h>
    void foo(size_t n, int b[static n]);

https://godbolt.org/z/c4o7hGaG1

It is not limited to compile-time constants. Doesn't work in clang, sadly.

fuhsnn 19 hours ago

Clang is working on a different version with annotations https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3656.pdf

eqvinox 12 hours ago

It also only works with that order, not if the size is after the array :(

kragen 11 hours ago

No, you can predeclare the size; this compiles with no warnings:

    #include <string.h>
    #include <unistd.h>

    void foo(size_t n; const char s[static n], size_t n)
    {
      write(1, s, n);
    }

    int main(int argc, char **argv)
    {
      foo("hello, ", 7);
      if (argc > 1) foo(argv[1], strlen(argv[1]));
      foo("\n", 1);
      return 0;
    }

However, it still compiles with no warnings if you change 7 to 10!

Clang does not support this syntax.

EPWN3D 3 hours ago

Unfortunately you cannot use static in array typedefs, which really blows. So you have to have an extra constant to keep track of the array size in order to use it. If it worked on typedefs, you could just make the array parameter the appropriate type and derive the array's count with sizeof(array_type_t).

Philpax 18 hours ago

Excited to for Walter to drop by and extol the virtues of fat pointers :-)

For reference: https://digitalmars.com/articles/C-biggest-mistake.html

Animats 14 hours ago

But only for constant size arrays.

You could just declare

    struct Nonce {
        char nonce_data[SIZE_OF_NONCE];
    }

and pass those around to get roughly the same effect.

jjgreen 8 hours ago

Cracked me up the first time I saw "nonce" used like this: https://www.slangsphere.com/what-is-nonce-in-british-slang/

aaaashley 20 hours ago

Funny thing about that n[static M] array checking syntax–it was even considered bad in 1999, when it was included:

"There was a unanimous vote that the feature is ugly, and a good consensus that its incorporation into the standard at the 11th hour was an unfortunate decision." - Raymond Mak (Canada C Working Group), https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_205.htm

jacquesm 19 hours ago

It wasn't considered bad, it was considered ugly and in the context given that is a major difference. The proposed alternative in that post to me is even more ugly so I would have agreed with the option that received the most support, to leave it as it was.
- moefh 19 hours ago
  
  It was always considered bad not (just) because it's ugly, but because it hides potential problems and adds no safety at all: a `[static N]` parameter tells the compiler that the parameter will never be NULL, but the function can still be called with a NULL pointer anyway.
  That's is the current state of both gcc and clang: they will both happily, without warnings, pass a NULL pointer to a function with a `[static N]` parameter, and then REMOVE ANY NULL CHECK from the function, because the argument can't possibly be NULL according to the function signature, so the check is obviously redundant.
  See the example in [1]: note that in the assembly of `f1` the NULL check is removed, while it's present in the "unsafe" `f2`, making it actually safer.
  Also note that gcc will at least tell you that the check in `f1()` is "useless" (yet no warning about `g()` calling it with a pointer that could be NULL), while clang sees nothing wrong at all.
  [1] https://godbolt.org/z/ba6rxc8W5
  - jacquesm 18 hours ago
    
    Interesting, I wasn't aware of that and thought the compiler would at least throw up a warning if it had seen that function prototype.
    
    moefh 18 hours ago
    
    It's not intuitive, although arguably conforms to the general C philosophy of not getting in the way unless the code has no chance of being right.
    For example, both compilers do complain if you try to pass a literal NULL to `f1` (because that can't possibly be right), the same way they warn about division by a literal zero but give no warnings about dividing by a number that is not known to be nonzero.
    
    jacquesm 18 hours ago
    
    Right, so if the value is known at compile time it will flag the error but if it only appears at runtime it will happily consume the null and wreak whatever havoc that will lead to further down the line. Ok, thank you for pointing this out, I must have held that misconception for a really long time.
  - OneDeuxTriSeiGo 18 hours ago
    
    Note that the point of [static N] and [N] is to enforce type safety for "internal code". Any external ABI facing code should not use it and arguably there should be a lint/warning for its usage across an untrusted interface.
    Inside of a project that's all compiled together however it tends to work as expected. It's just that you must make sure your nullable pointers are being checked (which of course one can enforce with annotations in C).
    TLDR: Explicit non-null pointers work just fine but you shouldn't be using them on external interfaces and if you are using them in general you should be annotating and/or explicitly checking your nullable pointers as soon as they cross your external interfaces.
  - MobiusHorizons 13 hours ago
    
    Wow, that’s crazy. Does anyone have any context on why they didn’t fix this by either disallowing NULL, or not treating the pointer as non-nullable? I’m assuming there is code that was expecting this not to error, but the combination really seems like a bug not just a sharp edge.
    
    jacquesm 7 hours ago
    
    Indeed, at a minimum you should be able to enforce that check using a compiler flag.

kazinator 17 hours ago

The pointer-to-array solution is okay, with the caveat that pointer-to-array typedefs should be avoided.

The problem is that they are attractive for reducing repeated declarations:

  typedef unsigned char thing_t[THING_SIZE];

  struct red_box_with_a_hook {
     thing_t thing1, thing2;
  }

  void shake_hands_with(thing_t *thing);

That is all well. But thing_t is an array type which still decays to pointer.

It looks as if thing_t can be passed by value, but since it is an array, it sneakily isn't passed by value:

  void catch_with_net(thing_t thing);  // thing's type is actually "usnsigned char *"

  // ...
    unsigned char x[42]];
    catch_with_net(x);        // pointer to first element passed; type checks

anonymousiam 15 hours ago

This would not be the first time that the "static" keyword in C was reused for something "new" (relative to the original K&R pre-ANSI C).

https://developer.arm.com/community/arm-community-blogs/b/em...

eqvinox 12 hours ago

"Would"? It's already in the standard.

halayli 18 hours ago

on a similar note, there are also these field attributes that are very helpful for catching similar issues:

https://clang.llvm.org/docs/AttributeReference.html#counted-...

1over137 15 hours ago

Anyone know if there's a flag to tell clang to treat `void fn(int array[N])` as if it was `void fn(int array[static N])`?