> So in this sense, the quality of the C you write is really a reflection of you as a C programmer, not the shortcomings of the language.
Can't you substitute "C" with just about anything in this sentence?
It's all well and good to talk about how "beautiful" a language is, but when people are literally endangered because of totally preventable security vulnerabilities that don't happen in programs written in other languages, it's hard to sway me as to how important this so-called "beauty" is.
(Note: I'm playing devil's advocate here to some extent. My view is that safety is important, but lack of provable safety is not some terrible Demogorgon that we should hide in fear from. I think a lot of the concern over safety is valid, but in some contexts it's just overhyped.)
My view is that lack of provable safety should be resolved by defensive code (runtime checks). And then, you are safe (if safety is important in your code, which probably should by default in a professional setting).
I agree, it is solvable by defensive code. The vast majority of the time that code is perfectly sufficient. The number of people who don't die when the hundreds of thousands of things that don't go wrong when an embedded C-program doesn't crash or blow apart because of memory safety bugs daily demonstrates this. I don't think people understand just how much of our world is run, quite literally, by "not provably safe" code. It's not just C and C++, either.
Which is one reason why I don't buy the "memory safety" argument as a very strong one for adopting Rust. There are other much better reasons to do so for a certain class of programming, in my opinion.
Vulnerabilities like buffer overflow do not happen in languages with a string type. Humans are responsible if something bad happens, but without a safety net, the outcome is worse.
C has a char* type, which we call a string, but it is also the type of a pointer to a single char, which is not a string at all, and also something perfectly usable. "Ends with nul" is barely a part of C, it's more like a programmer's agreement. The language doesn't enforce it, require it, or check it. All it does is insert nul characters in literals, which is hardly enough to make a string type.
Thus if you have a to_upper(char*) function, you don't know what it takes or does without looking it up. Does it uppercase a single character or a whole string? How do you even tell what you were passed without potentially reading past the end of a buffer?
If I happen to have a pointer-to-char and pass it to a to_upper function that operates on strings, it will just write on invalid memory, because C can't distinguish between the two.
From the signature, I would say it expects a NUL-terminated sequence of characters (a C-string) and it would modify it in-place to upper case each character. C already has a standard C function:
extern int toupper(int);
(via #include <ctypes.h>) that will upper case a single character. If, on the other hand, I saw:
extern char *to_upper(const char *);
I would expect that to_upper() returns a new string (freeable via a call to free()) that is the upper case version of the given string.
> If I happen to have a pointer-to-char and pass it to a to_upper function that operates on strings, it will just write on invalid memory, because C can't distinguish between the two.
Um ... how do you "happen" to have a pointer-to-char? And unknowingly call to_upper()? I'm lost as to how this can happen ...
The signature doesn't tell you that. If my API said
int frobnicate(char*)
and you make that kind of assumption, then your code may or may not work, depending on what the function does internally. You simply do not know whether I am operating on null-terminated char sequences or a single char.
>Um ... how do you "happen" to have a pointer-to-char?
char* text = "some text";
char* c = text[2]
There you go.
>And unknowingly call to_upper()?
Who said anything about unknowingly calling a function? It's "toupper", not "string_to_upper" or "char_to_upper". The function signature simply doesn't tell you what the function requires of its input.
Your response to me shows you don't program in C all that much. I ran your code example through a C compiler and got:
a.c:2: warning: initialization makes pointer from integer without a cast
a.c:2: error: initializer element is not constant
What you really want is:
char * text = "some text";
char * c = &text[2];
which still doesn't prove your point because c is still pointing to a NUL-terminated string.
If fronnicate() really takes a single character, I might ask why the function requires a pointer to char for a single character instead of:
int frobnicate(char);
but if you are going to really argue that point, so be it. Discard the fact that in idiomatic C, a char * is generally considered a NUL-terminated string (and here I'm talking ANSI C and not pre-ANSI C where char * was used where void * is used today).
You are also shifting the argument, because in your original comment I replied to, the function you gave was to_upper(). toupper() is an existing C function.
P.S. char * is a pointer-to-character, not a "pointer-to-byte", pedantically speaking. All the C standard says is that a 'char' is, at minimum, 8 bits in size. It can be larger. Yes, there are current systems that this is true.
A single typo doesn't tell you anything about my programming habits.
>which still doesn't prove your point because c is still pointing to a NUL-terminated string.
No, it's pointing at a char that happens to be part of a nul-terminated string. The semantic intent of that distinction is entirely lost because C fails to make a distinction. I could easily overwrite that nul, and it would no longer be the case. Then it's suddenly an array of chars, and everything pointing at it is now a new type of thing.
char* s = (char*) rand();
This also will point at a 'nul terminated string' with very high probability. Doesn't mean it is safe to call string functions on it...
>I might ask why the function requires a pointer to char for a single character instead of int frobnicate(char)
You could say the same about any pointer argument. Obviously pointers are useful for a reason. If frobnicate returned a char, I would just end up dereferencing a pointer to stick it back in the string it came from. Whether that is frobnicate's job or it's caller's job is a matter of API design, and should not be determined by C, especially when it makes no preference for any other kind of pointer.
>You are also shifting the argument, because in your original comment I replied to, the function you gave was to_upper
My arbitrary example function name doesn't matter one iota. Get over it, and stop being needlessly dense.
Don't worry about me, I never make any mistakes. I'm a true C programmer: I believe that "implement a good string type" is an unsolved problem and that the last 50 years never happened.
You are right, "do not happen" sounds too much like "will never happen". See also Wikipedia's entry about that example[0]. My point is that if the programmer can't prove accesses are always within appropriate bounds, there should be a runtime check. That is simple. This is not "slow" (and even in the case it you need it fast and are ok to randomly crash, avoiding checks should be explicit). And some languages do it by default and make it really hard to mess with memory.
Well, yes, I agree in general bounds should be checked at runtime when it isn't possible to statically verify access at compile time.
I'm not sure how default access in C or C++ isn't explicitly avoiding checks. By definition "a[b]" is an unchecked dereference. It doesn't get more explicit than "by definition." Of course if by "explicit" you mean "syntax exists that demarcates unchecked access" then C and C++ will never satisfy. I'd argue that's a contrived and artificially narrow use of "explicit" meant, er, explicitly to exclude C and C++ from being acceptable by definition and therefore not terribly fair.
Yes (Rust's "unsafe" blocks serve the same purpose), and my point is you're narrowing the definition of "explicit" to exclude C or C++ by definition. And that isn't exactly a fair, in my view.
There is no doubt that C, by definition, opts out from performing bound checkings. But if bounds were always checked by default (implicitly), then you would have to opt-out explicitly, which is a safer approach, because all else being equal, in case of a programming mistake, the code ends up not being vulnerable to that specific kind of attack.
Can't you substitute "C" with just about anything in this sentence?
It's all well and good to talk about how "beautiful" a language is, but when people are literally endangered because of totally preventable security vulnerabilities that don't happen in programs written in other languages, it's hard to sway me as to how important this so-called "beauty" is.