Reflections on porting C code to Rust
#19072
In recent years there's been an effort to escape from C and C++, generally motivated by memory safety and lack of good tooling. Having programmed in these languages for years—both for fun and also to actually make a living—I understand the push for something new. There have been many times where I'd be pulling my hair out thanks to someone before me (or let's be real, myself too) neglecting to use consistent types in their code, forgetting to free memory, attempting to be "clever" by abusing undefined behavior, abusing macros (and/or templates in C++), you name it. I've also dealt with the many perils of C and C++'s lack of a standard build system and means of distributing libraries, to the point where if I write any C or C++ code, I aim to have as little dependencies as possible not for any pretentious philosophical reason or urge to build everything myself (although it's fun to do that), rather because it simply makes it a pain in the ass to compile a large project across different platforms.

My company uses C (and more recently, C++, but in its reduced "C with classes" dialect) for a bunch of mission critical things related to weather data.. stuff that can seriously make an impact on peoples lives if things go haywire, so the idea of Rust and everything it boasts (memory safety, "fearless concurrency", etc.) really stuck out to me. I had been meaning to try it for a while, so as part of a recent effort to modernize our almost 20-year-old codebase, I decided I'm going to start porting stuff to Rust. I usually stick to the motto of "if it ain't broke, don't fix it", but a lot of the routines we use needed to be rewritten long ago, be it to accommodate future changes to the incoming data, or otherwise add new features that wouldn't be possible with the existing code without a massive amount of refactoring anyway. To be clear: I think rewrites are usually a bad idea. I could write a whole post on why, but that's for another day...

Anyway, the first routine I decided to port is a WSR-88D Level-II radar data decoder and image generator which was originally written in C, then later massively reworked in C++ to include support for new additions to the Archive-II data format, e.g. messages indicating the radar site has entered SAILS mode (where it'll interrupt a volume scan to rescan the base elevation) as well as new data types, like dual polarity products used for determining the shape of weather phenomena as opposed to just its intensity (like velocity or reflectivity, the latter being what you usually see on TV). The version I'm writing in Rust, as of now, is closer to the original C version of the routine in that it only processes reflectivity data from the base elevation scan and nothing else. As I get better at the language I hope to expand it and make a full replacement for the C++ version currently in production. I'll add that the current C++ version we're using is actually riddled with bugs, which is yet another reason I wanted to give a Rust rewrite a go lol

My program
The source code of what I ended up making is available on patchii.
Everything in the source code is based on what's written in the NOAA ROC's ICD for RDA/RPG, Build 23.0.
(also if you're a seasoned Rust developer don't expect anything amazing...)
After taking in raw Level-II data as input and parsing it based on the format described in the PDF above, it produces an image like this:
//i.fii.moe/1QXtilhXEbN7E8.png

So here's my thoughts on Rust so far, having gotten a working port of the original C code...

The good:
• The borrow checker is brilliant. I know Rust developers often complain about having to "fight" with the borrow checker, but I found that every time that it yelled at me, it was actually outlining a pretty glaring design flaw I had overlooked.
• Cargo is amazing. It's actually such a breath of fresh air to have something that feels like, say, npm but for an actual systems language. It quite literally "just worked" and I didn't have any headaches installing dependencies or configuring the project. I like that when you set up a new project with it it even goes as far as setting up a little git repo with an appropriate .gitignore, little things like that are just nice.
• While not exactly related to the language itself, I was very pleased with the official Emacs rust-mode package. Having keybindings set up for things like C-c C-c C-r => cargo run is sweet.
• I think immutability by default is great. When I write C and C++ code, I'm a const nazi and have to go out of my way to ensure that everything that can be const is const. But the thing is, and the general theme I've noticed when writing Rust, is that when you're writing C and C++ code, it's easy to get lazy. "Weird coupling between my structs is preventing me from making this const, even though the function I'm passing a pointer to this struct into doesn't actually need to read into the struct itself? Whatever, fine, ig we're passing mutable raw pointers now" is a way less likely scenario with Rust, since everything being immutable by default serves as kind of a psychological trick to leverage your laziness for the better lol
• The pattern matching syntax is great
• All numeric types being explicitly labeled as i32, u8, f32, etc. is awesome. In C and C++ you almost always have to include <stdint.h>, so it's nice to not have any fluff like that here.
• In general, as a result of coming from a "clean slate", any and all code ends up looking cleaner than their C and C++ equivalents. I believe this is just because of entropy buildup over time and the C and C++ ecosystem being so fried... if you were to take the C and C++ languages, gut their standard libraries and replace them with something more sensible, introduce something like Cargo, and adopt a strict set of conventions like Rust developers (seemingly) follow, I probably wouldn't even be commenting on this right now. Every Rust binding I've used thus far is nice, and there's just something so satisfying about importing a third-party library and being able to agree with it on how basic things like errors work.
• Lastly, the compiler output is super helpful and easy to read. No template error spam like you get with C++ hahaha

The bad:
• The for loop syntax is good for most cases, but there were some instances where I found myself having to use a while loop instead because, say, I'm not dealing with a linear range. This could just be me not knowing the language well enough to use a better alternative, or some bias towards C-style for loops on my part, but nevertheless it irked me.
• ++ and -- are not valid increment/decrement operators. WHY?!?!?!
• You can't reference another enum while declaring the enum itself. This was a bit annoying for cases where I'd want to provide two alternate names for the same underlying value. For example, this kind of code isn't possible in Rust unless you explicitly specify each value:

enum WaveformType
{
    WF_CONTIGUOUS_SURVEILLANCE_WITH_SZ2_PHASE_CODING,
    WF_CONTIGUOUS_DOPPLER_WITH_SZ2_PHASE_CODING,
    WF_CONTIGUOUS_SURVEILLANCE_WITHOUT_RANGE_AMBIGUITY,
    WF_BATCH,
    WF_CONTIGUOUS_SURVEILLANCE,
    WF_CONTIGUOUS_SURVEILLANCE_WITH_RANGE_AMBIGUITY,

    // Alternatively, as it's shorter and better matches the ICD:
    CS   = WF_CONTIGUOUS_SURVEILLANCE,
    CDw  = WF_CONTIGUOUS_SURVEILLANCE_WITH_RANGE_AMBIGUITY,
    B    = WF_BATCH,
    CDwO = WF_CONTIGUOUS_SURVEILLANCE_WITHOUT_RANGE_AMBIGUITY,
    SZCS = WF_CONTIGUOUS_SURVEILLANCE_WITH_SZ2_PHASE_CODING,
    SZCD = WF_CONTIGUOUS_DOPPLER_WITH_SZ2_PHASE_CODING
};


The ICD document I linked uses the names Zdr, φ, and ρ to express differential reflectivity, differential phase, and correlation coefficient products respectively. While I'd prefer the pattern of providing both the technical name and a more human-readable one in my enum, I instead just opted to leave extra comments in my Rust code instead. Whatever, I can understand why they did this to an extent since enums in Rust work drastically differently from how they do in C and C++.

Closing thoughts
Rust is cool and I regret writing it off for so long. I figured I'd make a new thread for this to cultivate some discussion on these newer languages, cuz they don't seem to be going away any time soon and I actually think it's worth the initial time investment to give them a shot for the sake of becoming a better programmer if nothing else. I found myself realizing just how bad some of my original C++ code was over the course of writing this relatively simple routine, so I can't imagine what it'll be like once I start porting more complicated stuff over. Other languages on my radar (no pun intended) are Zig and Odin, which I may rewrite this same routine in someday to get a feel for as well.
I once shot a man in Reno, just to watch him die.
#19224

This reply is mostly about some aspects of C++ that I dislike. I think I've been in denial about the impact of the inability for C++ to evolve: Although I've always wanted to try out Rust, Zig, etc. I felt insecure about whether I would have been just chasing trends. But I think my opinion on the matter is becoming increasingly clear. Its model for evolution is dysfunctional.

Error handling

I don't like exceptions. For the same reason RAII is uncomfortable for me, because to raise an error in a C++ constructor you are forced to use exceptions. Error codes in C are painful in their own way, the language could be helping us so much more without being over-abstracted. Error handling is a good example, and here is a story:

In Go, if a function can fail, the client code always looks like this:

f, err := os.Open(&quot;filename.ext&quot;)

You are not forced to check err, but you are always forced to acknowledge its existence. Rust merely uses the type system to make the check mandatory. I can understand how essentially forcing a nullptr check into the type system would turn people off, because sometimes it is indeed unnecessary. But I really appreciate it though. On the side, it's also nice that it's a monad.

With pattern matching being as first-class as it is in Rust, it's really not as much of a hurdle... Additionally, I think the while let Some(...) or while let Ok(...) construct specifically somehow feels like it's providing much stronger guarantees statically than while ([some ptr]), even though it's the exact same after compilation, probably.

There were some discussions about the perforated standard of C/C++ and weak standard library of C++. Indeed, this is a good demonstration: Despite standardizing optionals and result types. It had to be watered down to be tied to exceptions, plus the fact that most of the code that you call as a client won't be using them. It's a mixed bag.

I can make a similar case about safe unions. std::variant is okay now, but it took a little too long to get here (compilers were Not Good at optimizing variant access for A Long Time). Rust enums, which are straight sum types from ML languages like Haskell, are tagged automatically, and integrates with the language through pattern matching. Zig's tagged unions are also better than std::variant.

Why is this allowed?

int x = x + 1;

Indeed, this is always a bug, so why is this UB? So why wasn't this outlawed in C99, say? Again, the standard of the language knows nothing about memory models, so it's not like they needed to preserve some implementation's behavior. This is just one example out of potentially dozens, just meant to illustrate the point about UB and more generally the age of the language in the current year.

Binary inclusion

Something that I got unreasonably excited for is #embed in C23. Perhaps people are happy piping stuff around with xxd or ld, and have been doing it for decades, it's just not for me. Why does this have to so hard in C++? The author of std::embed (same author who pushed for #embed in C23) has had a terrible experience pushing for it in WG21, it's strictly superior to #embed for C++. (Although it seems #embed made it to C++26.)

This is really not so hard, other languages have it, and do so because there is no ridiculous baggage, and are genuinely trying to solve problems.

Regex

There is a regex library in the C++ standard library, it's std::regex. However, take it from someone involved with the committee: "it is currently faster to launch PHP to execute a regex than it is to use std::regex".

In short, the standard specified constraints that make it impossible to optimize the implementation, and because the big three decided to use inline templates, it is impossible to update the standard without forcing everyone to recompile from the ground up. Sounds like a crazy reason to prevent a language from evolving? Indeed it does. But fresh recompiles are also painful, and committee members have skin in the game! It is the design of the language in combination with fragmentation of toolchains that has lent itself for projects to be easily put in this position. Such a change is said to incur an ABI break.

By the way, std::regex is also kaputt with Unicode input. Because you were expected to use std::wregex (feel free to chuckle). Having seen the trouble it causes, UTF-8 by default in Rust just feels like it's worth it. A proposal that meant to extend std::regex with UTF-8 capabilities was shot down, because it also requires an ABI break! Despite the fact that the standard itself says nothing about the implementations, let alone the ABI. Despite the fact that the proposal was submitted with an implementation and usage experience.

In the meantime, std::regex remains undeprecated, despite boost::regex, the library that it's based on, outperforming it by an order of magnitude.

I took this from this benchmark. cppstd is std::regex. As a side note, the go-to default in Rust, the regex crate, is so fast, on par with hyperscan, in part because it implements the same SIMD algorithm "Teddy" from hyperscan.

"Couldn't you just use a regex library that isn't silly?", yea, it's what I'd do in practice. PCRE2, re2, hyperscan, etc are all valid options. But you have to put in the effort to reach for it, when others are having such great results with minimal effort (I don't even want to start a section about build systems, compared to cargo add regex). To claim C++ is a systems language and provide this sort of built-in facility is simply embarrassing.

I sometimes get an itch that regex compilation (from the regex string into the appropriate state machine, or whichever representation the regex engine uses) happens at runtime, when I know the regex string itself at compile-time. Indeed, CTRE can do it in C++ (and there are other preprocessor solutions like re2c and ragel). It's included in the benchmark above. But why did it take until 2017 for this to be a thing? Other languages that have stronger compile-time feature have had it forever! Because it required extensions to template metaprogramming in C++17, which leads to my next point of frustration.

Metaprogramming

In C, all you had were macros. I can't write complicated macros myself, and it's a always a pain to try to understand what the ones that I import are really doing. In C++, most metaprogramming is done with templates instead. For writing generics where you want to stick different types into containers, it's okay. For more complex scenarios, can we really pretend SFINAE is a good technique? Hacking with template eventually evolved into an embedded untyped Lisp inside the type system. Compile times became extremely infuriating. You didn't have an alternative for some of the template magic until very recently with things like if constexpr and concepts in C++20, which were themselves proposed as early as 2006. And it won't be until another decade (that is being optimistic) until the average C++ programmer knows what concepts even are. The syntax and keywords decisions are extremely beginner unfriendly, it all reads like nonsense, and once again, all because of the baggage.

Speaking of, have you used serde? Rust macros are closer to Lisp macros in that you get a stream of tokens, and certainly nothing like C macros.

See you soon!
#19229

"it is currently faster to launch PHP to execute a regex than it is to use std::regex"

LOL

time is not a profiler but I found it funny nonetheless

I once shot a man in Reno, just to watch him die.
#19231
PHP wins again
https://sig.flash.moe/signature.png