Glitzersachen
On glitter things found at the road sidePainting Lipstick on a Pig
Actually this is retro computing cos-play
I have recently come back an idea that first took me 27 years ago (yes, this long). It's not something you can make money with or disrupt an industry. The time of this idea is long gone. And as it is — this late in history — it doesn't enable me or anybody else, writing better programs or writing programs more quickly.
It is — in a sense — part of a future once envisioned, but that then never came to pass — like such a lot of the present contains broken futures encysted at its core. Which would be a topic for another time.
For now, see it as an exercise in retro computing: Interesting things that could have been done with the means and methods of the past, but were somehow missed.
At the end of this post I'll explain the application of the lipstick on a pig metaphor a bit more.
The problem with p(l)ain C
It all started with the gradual realization that, when programming plain C, people will implement the same data structures again and again. For example safely dynamically growing and shrinking strings, which every modern programming language has, which indeed even UCSD Pascal already had. Instead, even today, in C, people are copying around sequences of characters in raw, unprotected memory.
This seems to be the case generally for every interesting data structure which would make high level programming possible in the first place: Trees, lists, buffers, vectors, graphs. They are not in the standard library, everything has to be either implemented from the first beginning or integrated from third-party libraries which often have their own, often conflicting, ideas about foundational concepts like memory management or input and output.
But what happens, when, during the heat of implementing something completely else, a developer finds themselves in sudden need of a string data type? Will they say, oh wait, put this on hold, I'll first extend by repertoire and find myself a proper string library? And then it's not a given they will actually find one, with the offerings in the plain C sector being of so mixed quality…
Hardly.
They'll instead say: Look, I only need to paste together a
couple of strings into another one, I'll just do this with strcpy and
friends and explicitly managed memory and be done, looking for
another library just doesn't pay here.
And things will grow from there and not in a good, but instead in a buggy way. It's like a cancer.
Or the problem with memory ownership, that is so often unclear, even in interface descriptions. It's also too often not managed by libraries which don't want to make (and perhaps shouldn't) make assumptions on memory ownership.
For example:
result_t foobar(){ blah* p = malloc(froom_count * sizeof(froom)); register_froom_buffer(p); /* ??? */ }
The question here: Do we, at the place marked with ???, need to
de-allocate the memory p points to — or is this even
contra-indicated here?
The answer really depends on what register_froom_buffer does: Does it
take ownership of the memory, e.g. storing p in some module internal
data structure and will free all memory registered in this structure
later when shutting down the froom_buffer subsystem, or does it just
read the data in the memory and doesn't de-allocate it (not when
calling register_froom_buffer nor later)?
Whether we can really decide this question depends also a lot on
the quality of documentation available for register_froom_buffer.
Another question: Are we allowed to to pass any pointer into
register_froom_buffer, or does the memory have to be allocated with
malloc? It might make a difference, if the froom_buffer subsystem
wants to de-allocate the memory at some time.
What all this makes clear: Only allocating memory in the client procedure and passing it into some library call altready opens up a host of question, which all can be answered wrong, so becoming potential sources of bugs.
So we have, overall, two categories of pain points:
- No nice data structures.
- Managing memory manually in clients.
C++ is not the answer, BTW
And no, C++ is not the answer. Or rather, it might the answer in one sense, but then you are substituting small ugly problems by big elegant ones, which have hidden foot-guns attached so powerful they don't take off only your hand, but both your feet and blind you at the same time.
(I think the message in this simile is, that using C++ might not be a survival trait)
Which is to say, C++ is not a beginners' language either, but this would exactly be what the industry as much as open source development needs: A language that minimizes risk from inexperienced and hobby developers as much as from experienced developers with a hangover or a bout of inattentiveness due to the third child teething (if you know what I mean).
Painting lipstick on the plain C pig won't fix these problems either (or not to any meaningful extent), but I thought I should get the C++ elephant out of the room at this point in the exposition already.
Back to pig lip-sticking, then.
The str API
So let's think a bit about the API we want to have for a string data
type str that does away with the problems we have so fuzzily
identified so far.
#include "str.h" #include <stdio.h> str s; str_init(&s, "Hello"); /* 1 */ str_append_from_cstr(&s, ", Dave!"); /* 2 */ printd("%s\n", str_internal_cstr(&s)); str_deinit(&s); /* 3 */
The interesting features here are:
- No explicit memory management: The memory is internally allocated at (1) and de-allocated at (3). If necessary, the internally allocated buffer will be grown in (2).
- Correct memory management can be (in this example) reviewed in local
scope and memory is actually "bound to the scope". Every
initializing function of str, here
str_initmust be followed by an invocation of a de-initializing function (str_deinit).
Improvements on these ideas are possible, but this exposition should be enough for now.
I might already have published an early version of this library, possibly 15 – 20 years ago, but I am not sure.
Templates for C
One thing that makes building general data structures for plain C so difficult, is that C doesn't have parametric polymorphism. Simply speaking: One cannot typedef a vector of ANY type that could then be instantiated or specialized into a vector of char or vector or PIDs.
There are various workarounds, the most popular, that the
implementation of vector of ANY works on untyped data (e.g. void*) and
it's up to the client to implement a type-safe wrapper in front of
that to get a vector of a specific element type.
Which workarounds are all prone to mistakes being made and bugs being bred.
So this is the third pain point with C: No parametric polymorphism.
Actually no systematic polymorphism of any kind, only ad-hoc point solutions whose proper implementation Joe Average C Programmer is only able to do after a substantial number of years of experience with the beast.
One (at least partial) solution for this particular weak spot of plain C would be templates into which one could, for example, just plug in the required element type of a container. Like in C++.
Let's for moment ignore the pesky details how to make such a template mechanism reasonably taste like C instead of an alien wart on the language.
Instead let's focus for a minute on the question: What does it have to do with the string data type presented above?
Pretty simple (and you might already have guessed it): A string is mostly (apart from a partly different terminology of operations) a dynamic buffer of characters (we're not talking about UTF-8 here).
But dynamic buffer of <type> is a generic datatype where the element type <type> is the type parameter we want to plug in at instantiation time.
I already have an implementation of templates for C, called c4-generics. A generic stack implementation (only a demo) is in the examples. And an implementation of a generic dynamic buffer and a string type based on this, is almost finished, but not yet published.
Let me give you a taste of the buffer template and of the string instantiation on top of that.
First the implementation (here: working on untyped data):
typedef struct abuf_ { char* elements; size_t count; size_t capacity; } abuf_;
void abuf_init_(abuf_descriptor_t* d, abuf_* b){ b->elements = malloc(d->element_size * d->initial_capacity); assert(b->elements); b->count = 0; b->capacity = d->initial_capacity; } void abuf_deinit_(abuf_descriptor_t* d, abuf_* b){ free(b->elements); b->elements = 0; b->count = 0; b->capacity = 0; } void abuf_append_from_c_array_(abuf_descriptor_t* d, abuf_* b, void* elements, size_t count){ assert(count>=0); if (count) { ensure_free_capacity(d, b, count); memcpy(element_address(d, b, b->count), elements, d->element_size * count); b->count += count; } };
The following is the template code that gets "expanded" when instantiating the template:
generic_type_interface(abuf); require_type(eT); exports(init); exports(deinit); exports(push); typedef struct abuf { abuf_ internal; } abuf; inline static void abuf_init(abuf* b); inline static void abuf_deinit(abuf* b); inline static void abuf_append_from_c_array(abuf* b, abuf_eT* e, size_t count); inline static void abuf_init(abuf* b){ abuf_init_(&abuf_descriptor, &(b->internal)); } inline static void abuf_deinit(abuf* b){ abuf_deinit_(&abuf_descriptor, &(b->internal)); }; inline static void abuf_append_from_c_array(abuf* b, abuf_eT* e, size_t count){ abuf_append_from_c_array_(&abuf_descriptor, &(b->internal), e, count); }
And the following is the instance str of /abuf which implements the string type str:
instance(str, abuf); type(eT, unsigned char); const(initial_capacity, 10); inline static void str_append_from_cstr(str* s, str_eT* cstr){ str_append_from_c_array(s, cstr, strlen(cstr)); }
Of course I am only presenting fragments here, but you will have to admit that it's overall pretty impressive how much this all smells like C (instead of text substitution).
The typical Makefile is also nice and clean as far as make-files go.
Why it's all so misguided
So were does this leave us (and this is where we get back to the topic why it's a pig we have been lip-sticking all along).
We have, and this needs to be granted, been starting with a better way to implement re-usable libraries for C (and in C), which in turn should give us better productivity and notably safer and easier reviewable code in C (like this scope based resource handling by paired init/deinit function calls).
Fine. Though, there will always be problems left which the compiler (and simple preprocessing based implementations) cannot catch. Also (compile or runtime) errors in generated code are really difficult to debug, so any wins with this method will likely be limited.
And in the end I wouldn't want to go the way of C++ (initially C with classes): Building a complete, supposedly upward compatible languages "in front" of C. This experiment has been conducted already (at least twice) and brought only more pain into the world at large.
On the other side I recently noticed how much new libraries and tools are released from the Rust open source community — and this despite the fact that there are likely much more C or C++ developers out there than Rust developers.
My conclusion: Rust enables much higher productivity than C where practically every project already at the beginning gets hit by the problem of the missing a safe string type.
OK, that's hyperbole, but you get the idea: The barrier of entry is much higher in C or C++:
- You need to be a seasoned developer to get something working.
- Reusability — writing reusable components as well as using them — is much more difficult.
(1) accounts (at least partly) for more available developer time. (2) turbo boosts development in Rust by shorter time from first code to compiles and does not crash as well as easy availability of libraries ready-made by others.
The package/build manager situation likely also has some influence on (2): C will die from the lack of something like cargo allowing people to build a local environment easily (and with low demand on their skills).
And this is, why investing time in improving the situation of C is wasted: Rust already has all the answers, and is well integrated with a friendly compiler, with a sufficiently powerful type system, a build system tailored to the language and an ecosystem with a large number of useful libraries.
Trying to improve C by external tools is possible (as argued above), but it's like painting lipstick on a pig: The pig is still ugly and still smells, despite looking marginally better.
If I want to develop in a C style language with explicit memory management and the option of low level access, these days the best angle of approach is to learn Rust.
Mind you, I don't regret these experiment at all. I might even be adding the one or the other thing to it in my private pocket universe in which these efforts live (the same one in which I am booting Unix v4 from virtual tapes on virtual PDP-11s).
But: The future is decidedly Rust. As I said: Have a look at the productivity of Rust projects. It's striking.
Article version history
- — Light copy-editing.
- — First published version.