Hardware Hacking

Decoding Rust strings

David Lodge

23 Jun 2026 7 Min Read

Related services
Related blogs

TL;DR

Some language compilers, including Rust, alter how things are stored in a binary
Tools to reverse these may not always work for all variants
Spending a bit of time understanding how the compiler works will help you
Even a janky script can get you back on track

Introduction

Recently in a test I had the joy of reverse engineering the custom binary that acts as a webserver and system controller for their ARM based device. Normally this isn’t a problem: extract the binary, throw it into Ghidra, do stuff.

The problem here was that the binary was in written in Rust. Which does a lot of things differently from other language frameworks, including how it handles strings.

Strings are vital to reverse engineering complex binaries: they provide hints to the reverse engineer of where the logic is and allow smooth navigation through the binary. Ghidra makes the assumption that strings are simple C-style strings and needs a bit of a hand for something different. This hand sort of exists in Ghidra, but isn’t that effective.

What’s the problem?

Let’s step back a bit and look at how data is stored. Strings are really the simplest complex type in most programming language: a string is (usually, but doesn’t have to be) a list of readable characters that can be treated as a single entity. Below are a couple of strings:

“Hello World”

“Goodbye”

Strings are everywhere in most binaries, from log entries, to filenames, to error messages, to assert statements.

The most common style of string is the c-string. This uses a terminator to mark the end of the string, which makes processing it easy – you read bytes until you get the terminator. For C, NUL termination (i.e. 0x00) is used. For word aligned CPUs, such as ARM, the string is often word aligned, with NUL characters being used to pad out the string to the word boundary.

The strings above can be seen in the output from Ghidra, nicely referenced:
A screenshot of a computer

AI-generated content may be incorrect.

There are problems with using this technique. It works well for static strings; but with dynamic strings care needs to be taken to ensure that the delimiter isn’t overwritten. This can cause strings to merge into each other and potentially cause buffer overflows.

The other major technique is used by more modern languages: the string is stored as a tuple (an ordered, immutable collection of objects), with the string and the length of the string. Instead of a delimiter, the ‘whatever’ functions use the string constrain it with the length.

This is safer for dynamic strings, but it makes it more complex for the library, as it needs to manage memory and the string length. This is why, in many languages, strings are immutable, that is, they cannot change; if a string is changed, it is copied to a new space in the heap with the new bigger string. This can make simple string operations get inefficient fast!

Modern languages, such as go and Rust often use the tuple process. Sometimes the reverse engineering programs can handle this, sometimes they can’t. Here’s the above string from a Rust program compiled using Rustc:

We can see the tuple there: an 8 octet pointer to an address (for the string) followed by 8 octets for the length of the string (all little endian). In this case Ghidra gets it right and we can reverse engineer properly. It doesn’t always.

We can see the raw strings in the binary:

Note there’s no separator. The \n is an LF (0x0A) which is part of the string.

Because the strings are sequential, if I use a command like strings, we get a bit of a mess:

Work before

I’m not the only person who’s been in the position. There are some Rust string extractors out there. IDA Pro will manage it internally. This uses a plugin which has been ported by @[email protected] to Binary Ninja.

There are a number of blog posts trying to explain the problem, but no firm code that I could use. Mandiant’s floss project can extract rust strings, but only on PE (Windows) binaries that have been written for an x64 / x86 backend.

So it looks like I’m on my own.

Executable construction

All modern operating systems, whether Linux, macOS or Windows, use a container format for executables. These vary depending on the platform, but have common features, they’re normally subdivided into sections which contain data, static data or code. This allows the linker to be intelligent and can provide some protection against modification of immutable data whilst the executable is running.

Linux binaries use a common format called ELF (Executable and Linkable Format), which has a legacy that goes all the way back to SVR4 Unix. ELF is just a container format, so it doesn’t matter whether the code is designed for x64 or ARM32.

ELF splits the data into a number of sections, which can be listed using the objdump or readelf commands:

The above screenshot shows the section name, address, size and a number of other things. We’re interested in two main sections:

.rodata – this contains immutable data for the program.

.data.rel.ro – this contains relocatable immutable data for the program

Rustc puts the list of pointers into .data.rel.ro and the string blob (binary large object) into .rodata. We can effectively ignore the other sections.

So to extract the strings, we go through .data.rel.ro one word at a time. If the word is between the address for .rodata (0x4a000) and the address + size (0x4a000 + 0x60b0) then we can assume that it’s a pointer to the string table and the next word is the size of the string.

We do some filtering for sanity, discarding strings less than 4 characters or greater than 100 characters and voila, we have implemented strings for a Rust binary:

All that’s needed is to make it a tad more flexible and post it to our GitHub repo and you’re done.

Ghidra Plugin

I was bored and messing around with vibe coding, so I thought, why not ask an LLM to help me with the hard parts and convert my script to a Ghidra plugin.

So, I did, and it did. A quick touch up, a bit of polish, and we now have a Ghidra plugin in our GitHub repo.

Conclusion

Things being a “bit” different in executables doesn’t need to be a roadblock. Being a bit aware of other formats combined with a bit of reverse engineering and a hacked together bit of Python can turn something that would make life difficult into something that isn’t a problem.

The script and Ghidra plugin can be found on our GitHub repository.