Shellcode as 'XML'

The thing that hath been, it is that which shall be; and that which is done is that which shall be done: and there is no new thing under the sun.
I keep seeing creative ways threat actors hide shellcode - encoding it as ipv4, guids, or even Shakespeare.
I mean yeah, I guess you could call that creative… but data is just data. None of these methods offer anything better than the other. The only difference between all the possible ways and the ones presented above is that someone bothered to document them.
So that is what I’m gonna do now. Just document another way to encode and decode data.
Windows ❤️ XML:
Windows loves XML. Event logs are XML, app manifests are XML, task scheduler entries are XML, etc. This makes XML a good candidate because it allows us to blend in.
Hell, the HTML that is used to render this website can be considered some dialect of XML too… nvm, I asked the all-knowing AI and it told me that “No, HTML is not XML. While both are markup languages derived from a common ancestor (SGML), they were designed for different purposes.” Ok. Whatever.
When I imagine XML in my mind’s eye, I think of tags that open <tag> and close </tag>, newlines, and that weird
signature thing at the top: <?xml version="1.0" encoding="UTF-8"?>
Shellcode
To store shellcode in XML one could go really deep. Like, use variable names, their values, maybe even use indentation and the tree structure to store data. But I’m not gonna do that.
If we just spew out tags, somehow shove the shellcode bytes in between, and put that signature like the cherry on top, I think we could call it XML! As XML is text and shellcode may contain non-printable characters, to convert bytes to text we’re gonna use… *drumrollll* BASE64!
This method also slightly decreases the shellcode entropy because there are repeating brackets and newlines. Less entropy is always good. That is what one should strive to do in life: decrease entropy in their surroundings.
Demo
This fast, responsive, pretty, fully clientside demo is brought to you by PUT EVERYTHING IN A SINGLE HTML FILE gang and the WASM mafia.
What? The tags don’t close? And the indentation is all messed up?
Who cares?
Let me verify this payload using my trusty file utility:
» file poem.txt poem.txt: XML 1.0 document, ASCII text
LGTM! ship it!
Ok, Detect It Easy also thinks it’s XML:
» diec poem.txt Binary Format: plain text[LF] Source: XML(1.0)
Even the brand spanking new AI/ML-assisted file type detector from Google thinks it’s XML:
» magika poem.txt poem.txt: XML document (code)
I mean yeah, obviously no automatic tool is going to detect this (YET!). But if any SOC analyst so much as runs strings on your payload, you’re TOAST. But that is not our target audience. We are trying to bypass the clankers.
POC||GTFO
Ok, for this POC we’re gonna have two shellcodes:

Our payload is going to look like this:
--------------
| |
| stage 1 |
| |
==============
| 128 bit |
| egg |
==============
| uint64 |
| size of xml|
==============
|XMLXMLXMLXML|
|XMLXMLXMLXML|
|XMLXMLXMLXML|
|XMLXMLXMLXML|
--------------
Stage 1 is going to start searching from its RIP downwards until it finds the egg. Why is the egg 128 bits?
Because this is TMPEST127’s blog! Ok. Jk. It’s 128 because if it was 64 bits or smaller, the search logic could false-positive on the mov instruction that loads the egg value into registers.
After finding the egg it’s going to read the next 64 bits to get how large the XML is. Then it’s going to read that many bytes to grab the XML. After that we just decode the XML (which is our stage 2) using the library described in this writeup, allocate RWX memory, jump to it, and Voilà.
For embedding strings in shellcode I’m going to be using my enc_pic_str utility, which I wrote about in a previous blogpost.
For the sake of demonstration, stage 1 will pop one messagebox and stage 2 will show another one.
As we view the final payload in less it is pretty clear where stage 1 ends and the XML of stage 2 begins:
<FA>A<B8>^@0^@^@A<B9>@^@^@^@<FF><D6>H<85><C0>tfI<89><C6>1<C9>H<89><FA>A<B8>^@0^@^@A<B9>^D^@^@^@<FF><D6>H<85><C0>tX1 <C9>I<8D>^T^LA<8A>T^W^X<88>^T^HH<FF><C1>I9<CD>u<EC>B<C6>^D(^@H<89>|$ H<89><C1>H<89><DA>I<89><F8>M<89><F1><E8>^H<F5> <F5><FF><FF>H<85><C0>t/A<FF><D6><E9>><FF><FF><FF>H<C7><C0>^H^@^@^@<CC><E9>1<FF><FF><FF>H<C7><C0> ^@^@^@<CC> <CC><E9>$<FF><FF><FF>H<C7><C0> ^@^@^@<CC><E9>^W<FF><FF><FF>H<C7><C0>^K^@^@^@<CC><E9> <FF><FF><FF>H<89><C8>M<85><C0>t^R1<C9>D<8A>^L D<88>^L^HH<FF><C1>I9<C8>u<F0><C3>p<EF><D2>^N18<A0>(p<EF><D2>^N18<A0>(^O^T^@^@^@^@^@^@<?xml version="1.0" encoding="UTF-8"?> <SIPk8E> </iD7CDo> <7gEAAEi> <DxCA> <xwM> </NWV1VTS> </IHs> </CAEAALh> <gAAAAZ>
POC

Don’t let the size of the final_payload.exe fool you. The payload itself is just 8kb (size of combined.bin). The rest
of those bytes are just dummy bytes from our minimal.s stub exe into which we inject combined.bin to simulate
shellcode execution in memory.
The header-only library itself is PIC_COMPATIBLE: no strings, no stdlib dependency, should get inlined down into a single decode and encode function.
There are some details about how the encoding isn’t really fully base64 because it also uses the / symbol which would
mess up the decoding, and so on. But you can just read the source code yourself. I will not go into details here. Or ask Claude.
I will not bore you with how many detections this payload has on VirusTotal or AnyRun. You can do that yourself by building the project from this repo.