Closing the year before heading to 39th Chaos Communication Congress we decided to tag a release for the main project for those too conservative to run from the main branch.
For those on site that wants to have a chat, our DECT extension is 6681. Incidentally the same number as the control port for when Arcan was used to provide overlay, texting, memes, scheduling etc. for a pirate university TV station back in the early 2000s.
Before going into the project updates, there is tragic news to relay that put a massive damper on spirits and overall productivity for the last few months: Our most beloved of project members, Elijah “moon-child” Stone died on the 9th of September at the young age of 22. Elijah had been with the project since his early teens and was consistently kind, caring, courteous and clever well beyond his years. The 0.8 topic branch on one of his favourite subjects, performance engineering, will be dedicated to his memory. Our thoughts are with his family and partner.
Community Updates
It has been quite a while since we switched from GitHub to Fossil for development. Since we don’t expect people to tool around an uncommon tool we also mirror to git hosted on Codeberg.
A final friendly warning to packagers is to use those repositories. The ones on GitHub will no longer receive any mirror updates and any changes over there are likely to be of a more incendiary nature.
As covered in more detail, (part 1,part 2) we have had a longer in-person hackathon. The main outputs from those is that Alexander has a port of Gamescope to save us from keeping up to date with all the special quirks that running Steam over Xwayland takes. The clip below shows that running Baldur’s Gate 3.
Magnus keeps chipping away at his platform plugin for Qt5/Qt6 that largely works as intended for the likes of Qbittorrent and Binary Ninja but struggles still with hybrid 2D/3D complex window managed applications like FreeCad.
He also created patches to KeepassXC and a script (added upstream) to Durden to integrate them. Both are recommended reading.
Valts is still busy with his portable viewer for the A12 protocol that should soon be usable with some of the neat bits we cover further below.
In the bin of experimental applications, we have Atro with “Lasso” that is a hybrid ‘interactive canvas’ form of window manager.
On top of providing a number of bug fixes across the board, Bohdan created Xkbd2Lua for statically translating X Keyboard Layouts to our own format, removing the need to lug libxkbcommon around for those that want even fewer traces of X11 around in their lives.
Ariel has been toiling away at one approach for a bootable complete Arcan+Durden+Cat9 setup from as static a build as possible matching Arcan as OS design (there will be more). As part of that there is a nix oneliner that should hopefully work on a few setups:
nix run --impure 'git+https://codeberg.org/ingenieroariel/arcan?ref=nix-flake-build&dir=nix'
Status Update: Arcan
As normal, check the changelog for the fine grained changes.
A lot has happened on the network side, thanks in large part to the continued support from NLnet and a longer write up for that is coming up in a little while.
To start with, there is now support for ML-KEM as Post-Quantum cryptography to protect against ‘collect now, decrypt later’ (should the fantasy computers ever materialise). This is implemented as part of the forward secrecy ratcheting rekeying process. The same has been extended to set a signature verification key for file transfers and a proof of work scheme for load balancing search requests.
Connection resumption and Casting
Clients that act as sources now has connection resumption support. This means that if the network connection is lost, the source application is kept alive and re-paired when you connect back.
In the following clip I first host arcan running pipeworld on my networked machine and connect. The window pops up and I create a few cells to show that there is data/state remotely. I close the window and reconnect. The window reappears just as I left it.
Showing connection resumption on a arcan-net hosted client
There is also a –cast option added. This lets the first user that connects become the “driver” over the hosted application. Any subsequent connections gets a read-only copy of the stream.
Unified and Referential Links
The other major network changes has to do with the directory server part. To recap, any endpoint can have one out of three roles: Source, Sink or Directory. Normally a source hosts some kind of application and provides, either inbound or outbound, access to a Sink. The Directory works as a self-hosted rendezvous for discovery, but also state, file-store and coordination between multiple clients using the same appl.
For instance, if I have a directory server hosting the Durden desktop appl and multiple clients download and run it from the directory, they can use the messaging domain that the directory provides to synch clipboard state or share input devices like one would do with Synergy/Barrier in the past.
The admin API for configuring the directory server has received two new functions, reference_directory for referential links and link_directory for unified links.
A referential link lets users with access to directory server A access a referenced directory B, forming larger networks. Using the command-line tool, the following example:
arcan-net --path myfriend myserver@ myappl
Would have the client connect to myserver, then download and run myappl from the referenced directory myfriend. The path can/be/arbitrary/long. Connection primitives are negotiated by each step in the chain, which takes us into the very interesting space of transitive trust-discovery models.
In the example above, myserver gets a DIROPEN request for myfriend from the client. It sees that this is a referential link, and asks myserver for connection primitives on behalf of the client – forwarding the public key used by the client to authenticate to myserver. Myfriend returns with either direct connection information (ip, port etc.) or a request that myserver tunnels the traffic if it is not directly reachable.
The unified link is invisible to the user and is a more privileged connection. It lets multiple directories form a shared namespace such that they can access/host/mirror the same resources as if they were one logical server.
Say that you have a server in your home and another hosted on a VPS somewhere. With a unified link between the two, your devices can access the one when you are at home, and hand over to the other when you go outside.
Dynamically Hosted Directed Sources
For every directory hosted appl it is possible to slot in a controller. A controller is a set of server side scripts that regulate messaging and resource access for more advanced networked applications. The scripting API has received some new functions that are worth looking into.
The first is launch_target. This is best explained through an example. Say my directory server configuration database has a launch target defined, like:
This would package the controller for durden, sign it with whatever is assigned to mytag in the local keystore, and upload it to the directory server marked as myserver in local the keystore.
The server verifies permissions and that signing key match previous signatures, and assigns a runner VM. Now the next time any client would run the durden appl:
# arcan-net myserver@ durden
It would spawn an instance of chromium that connects as a source only visible to the specific client, with a polite suggestion to source it immediately. The client does so automatically and the window pops up.
There are more controls to add here for state management, sandboxing details, letting the controller script inject events and so on. The point of the feature is that the beefier server now has a mechanism for fine grained application hosting.
Speaking of, if the client end doesn’t have the full Arcan stack but something simpler like the Smash viewer, it is not possible (opt-in) to let the directory server host the arcan side of the equation turning the client side into a very thin one:
# arcan-net --host-appl myserver@ durden
External Resource Resolver
Other changes to the controller development side is that the event handlers used to to list/download/upload files can now be hooked up to an external resolver.
The way it normally works is that you add an event handler, like this:
function durden_load(cl, name) return name end
This does nothing fancy, it just forwards whatever the client requested to the server-side file-store. The function in this form is just to add any client specific block/name translation. Another options would be to return the result of open_nonblock to dynamically generate the data to be transferred.
If I modify the server’s config.lua to this:
function ready() launch_resolver("durden", "/usr/bin/myresolver") end
Had this been the assigned resolver, storage requests would be rejected outright and load requests would get the ‘test.mp4’ file contents regardless of what was asked for. The point of this feature is to provide caching and translation to other file providers. Ones currently being looked into covers regular URLs, Magnet-to-torrent and IPFS.
Custom Debugger Integration
The remainder of changes are mostly on the developer side of things. Recall that we once wrote a debugger frontend to Cat9. This implemented the Debug Adapter Protocol, but was designed in such a way that we could support other protocols. DAP is too heavy for our own needs, so a lighter one was thrown together to match how we use the Lua VM in both the engine and the directory server. This lets us debug locally as well as with a corresponding directory controller in lockstep mapped as remote threads.
The following clip shows the current state of that in Cat9:
Using cat9 debug frontend to attach, break and step arcan running pipeworld
It can also be attached on-demand on a scripting error. There is some heavy lifting left on the frontend side before this is completely seemless, but in a not too distant future we will be at a point where you can pause-edit-update-continue a fleet of devices running a single Arcan application at once.
With that we have all the building blocks in place for more interesting networked Arcan applications. The first target for that will be a community chat application to move us away from Discord.
The second half of the week-long session started with us waking up with a hole in our collective hearts: Alex had absconded in order to participate in a meeting way up north. He bravely overslept (brave brave Sir Alex) and missed that, regretting most decisions along the way. Such is life.
48 of the bottles of Mate did arrive, so to fill the hole we speed-ran through them and none us will have another bottle again, or until the next time — whichever comes first.
just one more bottle ..
For show and tell it was finally Björn’s time to shine like a very distant and faint star. First squishing a number of long standing bugs, such as primary screen sometimes not getting the right identity/serial/make/model at startup breaking monitor settings persistence; unreliable transition between 16-bit, 24-bit, 30-bit and fp16 composition modes; cat9 subwindows failing to materialise when hosted over a network and directory server hosted sinks randomly shutting down (spoiler: always check your signal masks when execve()ing).
Then, apparently, he had snuck in a little edit mode into cat9, promptly overdoing it by adding vim style input modes, folds and language support. The following clip captures the current state.
cat9 built-in job output editor
The tangent was justified by some long rant about feature archetypes and a pilgrimage in search for the answer to the existential question `if a text editor is predestined to eventually grow a shell, and a shell is predestined to become a text editor, will they end up in the same place?`
With the aforementioned fixes we saw first signs of arcan-net -l 6680 -- /usr/bin/gamescope supertuxkart running, with a little caveat left to fix:
The details are fairly interesting. When trying to pass accelerated buffers from a shmif source, it is always possible for the other end to reject acceleration. The buffer passing routine then tries to gracefully convert it into pixels and try a slow path. This does not currently work when the shmif context structure is not actually managing the GPU, as the screenshot illustrates. The network tool (arcan-net) currently always rejects accelerated buffers and tuxcart-on-psychedelics it is.
To celebrate its 30 year anniversary we watched the aptly named hacking movie ‘Hackers’. Magnus confessed to not having seen it even once. Oh to be young again, and also a robot.
<björn> the costumes are the best part
Speaking of that, the QPA 3D support commits materialised, then proclaimed to be working flawlessly, only to promptly break as soon as others tried it out. Somewhat disheartened by the fact and followed by “I fixed the race condition now but sometimes it goes black for some reason” on repeat, he asked for help and help was given.
The following clip is an artist’s rendition of Björn providing sagely advise about swap-chains, fencing and buffer lifecycles during resize.
Valts, meanwhile, started his days with the work from the previous night unavailable to him. Swallowing a large dose of what the kids would call copium, “It is not Nix. There is no way it is Nix. It was Nix.” could be heard.
This was promptly followed by adding a UI for configuring smash that we will opt to not show here to spare any would-be-UX designers from feeling overwhelmed.
All in all the participants confessed to having mild to moderate fun, with faint hopes of meeting up again at the next Chaos Communication Congress – should we manage to score some tickets.
We are a few days into the 3rd Arcan IRL hackathon.
The theme this time around is “Project Wayhem” with hem meaning home in Swedish, and it taking place in my renovation project/eternal money sink fort out in the middle of nowhere. Three brave travellers fought the perils and outrageous ticket prices of public transport in Sweden and came out victorious.
After an unhealthy amount of greets, rants and assorted beverages, we eventually managed to get to work.
First up in the show and tell was Alexander with his SHMIF backend for Gamescope that got massaged into working shape. The repository can be found here: https://codeberg.org/sashabjorkman/GameScope
The following clip shows it in action, playing Baldur’s Gate 3.
Baldur’s Gate 3 via steam in gamescope –backend shmif
He also commissioned an artist for an event-appropriate mascot:
Scandinavian hedgehog being the only non-rubbish wild animal currently living in my backyard.
While we have paths for running x11 and wayland clients already, such as Xarcan and our native wayland support, nobody wants to work on the wayland implementation and the Xorg DDX works well enough for clients with reasonable use of desktop resources. Steam is not such a client and definitely not reasonable.
Since the goal-posts for anything “”Wayland”” these days are mounted on roombas powered by jet engines, the best effort / reward solution is Gamescope: write a backend for that and let them exhaust their resources chasing down the complementary IPC system of the day or the next ‘protocol’ extension to join the animal farm.
We celebrated this victory with a few runs of the true eternal game of the year (tuxkart) on the only machine out of 10 or so with functioning bluetooth support for the controllers. Instead of the 2.4 meter projection screen in a calibrated viewing environment, we thus huddled around the tiniest laptop screen around.
Next up was Magnus with his Qt QPA (found here: https://codeberg.org/vimpostor/qtarcan). He fought bravely, and eventually won, getting 3D accelerated windows working. This is one of the few remaining items before the fabled feature completion.
After a number of hours staring into the void and the void staring back, “fucking hell I can finally see my application, this is huge” could be heard from several kilometres away.
While everyone is waiting for the commits to materialise, the following screenshot is proof that he is not a pathological liar:
Valts (thanks NLnet!) being true to his nature, was much less dramatic and kept toiling away at the alternate, portable and lightweight implementation of our network protocol, that can be found here: https://smash.tase.lv — inching closer to supporting multiple windows. He was also trying his best to say good things about Nix and that was promptly ignored.
Björn (thanks NLnet!) while not busy making good use of his many academic degrees by cooking; writing this blog post; crafting cocktails and preparing snacks for each movie night, spent his time trying to debug why every n’th modeset between SDR and HDR caused the amdgpu kernel module to crash. He got absolutely nowhere and remain a disappointment to most everyone.
The purpose of that is threefold: One is to pretend to actually do something useful. The other is making sure there is an reference to compare to when eventually repeating the process for SDL3. The third reason is simply that we need more varied clients to test network transparency and local/remote migration with.
With several days to go, there might be more things to report on in the near future. With any luck, the 60 bottles of Club Mate ordered weeks ago might actually arrive before everyone heads back.
This is the final part concluding the long journey on how to migrate away from terminal emulation as the main building block for command-lines, text-dominant shells and user interfaces in general.
On top of those, and the many related demos, we have specialised tangents on accessibility and networking. The only remaining thing we have yet to cover is the programmer’s interface to all of this.
Before slipping into that dark night, I would like to summarise the mental model I use for ‘terminals’:
Terminal emulators emulate a machine that has never existed, a fantasy computer. They are constructed just like other emulators for the weakest of devices; a big old state table and switch (*pc++){case OP_1: implement OP_1(); case OP_2 ...} with a quirk being that the real horrors are hidden elsewhere. Many indulge in expanding this fantasy computer, adding all kinds of crazy expansion boards for faster video modes and co-processors. This is nothing new in itself — we have done that to practically every computer ever.
The tragedy is that it is taken seriously as a default.
Another day at the office is just running a terminal emulator (xterm) inside a terminal emulator (console) so you can use a terminal emulator-translator (ssh) to emulate more terminals (tmux). It is emulation all the way down instead of as a last resort for compatibility.
The other quirk is the flora of instruction sets to emulate and the subset selection heuristics. These are downplayed as ‘just text’ but everything that touches American Standard Code for Information Interchange are raw instructions for this machine. Programs for it are everywhere and most of them transform or expand the code, without being treated with the same scrutiny as one would for other code generators such as compilers.
A related take worth highlighting is this quote:
It feels a little unsettling that what we now use, effectively, as a data encoding format (ASCII) continues to hold – and it’s up to us to handle when writing software, too – things that correspond not to textual data, but TTY key press and serial link events, and that they follow you everywhere, even in places where the are no teletypes, no serial links, and no keys. Like most people who poked at their innards I used to think that terminal emulators are nasty, but for the longest time I thought that’s just because we took the “emulator” part a bit too seriously and didn’t bother to question it because we’re intellectually lazy. Now I realize it’s legitimately hard to question it, because a lot of assumptions are actually baked into the damn data model – terminal emulators aren’t leaky abstractions, they’re quite literally the abstraction being leaked :-D.
x64k@lobste.rs
Computer people have tried to tame this beast before, using abstraction libraries such as TurboVision and Curses (hence the title of this post). One might be tricked into thinking that keeping these interfaces and substituting the implementation with something befitting the real computing environment would be enough. That is where the ‘leaky abstraction’ part meets ‘just text’ pops up like a little demon and plants itself firmly on your shoulder. Since it is so easy to sneak in arbitrary instructions in-band, people have done a lot of that, all over the place — going so far as to publishing cutesy little articles encouraging the malpractice.
Which building blocks do we have to plug the leaks with?
The Arcan project is suspiciously large in scope. Much of it is not needed for this particular problem. There are a few parts to single out, and a simplified build-mode (-DBUILD_PRESET=client) to assist with that. This will produce:
libarcan-tui
libarcan-shmif
libarcan-shmif-server
libarcan-a12
afsrv_terminal
arcan-net
[libarcan-tui] is the meat in this dish. It fits the ‘ncurses’ bin (.. and ‘libreadline’ and other widgets). There are also Lua bindings with some extra conveniences to smooth over other related POSIX details to help escape the clutches of /dev/tty.
Its reference implementation uses [libarcan-shmif] to talk to the desktop, using some variant of [libarcan-shmif-server] for the other half of that equation. One could substitute in [libx11/win32/…] for those, but some features make that quite difficult. For an technical walkthrough how those work, see the recent writeup of A deeper dive into the SHMIF IPC system.
To fill the portability gap across mobile and desktop, we have a (young) separate project, smash, managed by Valts Liepiņš. For the Zig capable, this is a good place to support.
[libarcan-a12] provides the reference implementation for [A12], the wire transport for this across devices. [arcan-net] providing a standalone binary wrapper for common use cases.
[afsrv_terminal] is the last piece in the equation. It is a standalone binary which, at first glance, is a terminal emulator(!) that outputs into [libarcan-tui]. That is needed for compatibility and proof of feature parity. When passed an environment (ARCAN_ARG=cli=lua) it ignores the terminal emulation part; enables the Lua bindings with some added functions useful specifically for shell work; and runs a bootstrap script that loads the real shell. An example of such is the Lash#Cat9 link from before.
Enough babbling. Time for some code.
We start with a low level C ‘hello world’, then repeat it using the Lua bindings:
#include <arcan_tui.h>
static void redraw(structtui_context *C)
{
arcan_tui_erase_screen(C, false);
structtui_screenattr text =
arcan_tui_defattr(C, TUI_COL_ALERT);
arcan_tui_writestr(C, "hello world", text);
}
static void resized(structtui_context *C,
size_t px_w, size_t px_h,
size_t cols, size_t rows, void *T)
{
redraw(C);
}
int main(int argc, char* argv[])
{
struct tui_cbcfg cbcfg = {
.resized = resized
};
arcan_tui_conn *con =
arcan_tui_open_display("myname", "hello");
structtui_context *tui =
arcan_tui_setup(con, NULL, &cbcfg, sizeof cbcfg);
if (!tui)
returnEXIT_FAILURE;
/* draw something immediately */
redraw(tui);
arcan_tui_refresh(&tui);
/* main processing loop */
int inf[] = {STDIN_FILENO};
while(1){
/* block (-1) until there are events happening on a set of contexts or inbound data on a set of descriptors (inf) */
structtui_process_res result =
arcan_tui_process(&tui, 1, inf, 1, -1);
/* synch any changes imposed by the event handlers */
arcan_tui_refresh(&tui);
}
/* shutdown, with no error message */
arcan_tui_destroy(tui, NULL);
returnEXIT_SUCCESS;
}
There are quite a few nuances to cover here already.
For starters, opening a display matches the pattern used in EGL and elsewhere in using an opaque pointer to allow the implementation to support different outer display systems. It takes an immutable ‘name’ of the application and a mutable ‘identity’ (e.g. the current open document, path or other resource).
With the display, we can acquire a context (window) and set a table of event handlers. The table also lets the implementation know which features you support/need and which ones you do not.
Some of the possible handlers include:
Input: Text codepoint, Abstract announced label, Native keys with modifiers
Input: Mouse buttons and motion
Window Management State: Visibility, size and colourscheme changes
Note: Passing the size of the vtable doubles as an additive version identifier. This makes it possible for features to be appended to the table without breaking backwards compatibility.
A tui context represents a single window. You can request additional ones from a set of types, and have ones pushed to you as a user request for alternate representations (mainly accessibility and debugging).
Calls to arcan_tui_refresh will atomically forward changes to the contents of the window. If nothing has changed it will return immediately.
When drawing we have a lot of shaping attributes to apply, from the regular bold/italic/underline/strikethrough to shaping hints, double-width characters, ligature substitutions, border drawing and foreground/background colours.
Note: Colours are a special case. The terminal tradition is ugly - there used to be a small set of reference colours, the emulator remapping them as desired (RED is now YELLOW) and several different extensions to specify explicit red/green/blue values or a wider palette. Here there is a semantic palette (ALERT, LABEL, ...) as well as the legacy one. The values they map to are tracked per window and can be dynamically updated by the outer display system (this triggers a 'recolor' event). They are resolved to red/green/blue values when the window is packed on refresh.
The process function takes a set of tui contexts, a set of file descriptors, their sizes and a timeout. The set of contexts is for handling multiple windows and the set of file descriptors and timeout for the common pattern of being input triggered.
Note: Since the UI processing is all out of band, STDIN and STDOUT are left intact and you can mask all signal handling. This removes the need for isatty() and you get both pipeline processing through an intact STDIN/STDOUT and an interactive UI. The choice of colouring or not colouring output is up to the final presentation.
There are a lot of bells and whistles in order to cover everything people have ever done with terminals, far too many to go through here. You have helpers for asynchronous transfers between file descriptors with progress updates; arbitrarily many cursors; verifying if a certain unicode codepoint has a representation in the current set of fonts; spawning new processes with embed-able windows; window manager integration controls; per row attributes; notifications and much more. Then there are widgets for showing buffers, navigating lists of items and so on.
More Windows
Since there is no restriction on a single screen we can be more adamant about avoiding premature composition – the desktop and its window manager should naturally be responsible for such decisions.
To request a new window, you do this:
static bool on_window(structtui_context *T,
arcan_tui_connection *C,
uint32_t id,
uint8_t type,
void *T)
{
/* we use a custom id to distinguish between multiple
* request and windows pushed to us */
if (id != 0xcafe)
return false;
structtui_cbcfg cbcfg = {
/* fill in as needed */
};
structtui_context* new =
arcan_tui_setup(T, C, &cbcfg, sizeof(cbcfg));
/* window allocation can always fail */
if (!new)
return false;
/* do something with new */
return true;
}
/* set in initial handler table before context creation */
cbcfg.subwindow = on_window;
...
structtui_subwnd_req req = {
.hint = TUIWND_SPLIT_LEFT,
.rows = 80,
.cols = 25
};
arcan_tui_request_subwnd_ext(tui, TUI_WND_TUI,
0xcafe, req, sizeof(req));
As you can see it is very similar to how the original context was created. [new] should be added to the processing loop, or spawned into a thread with its own separate one.
Note: The context setup now uses a parent as a reference. This inherits all dynamic properties regarding locale, colours and text so there is a starting point for the implementation to work with.
Among the nuances is the HINT. This is a way to communicate to the outer display system how its window is intended to be used. Among the possible values are for setting it to a discrete tab, but also for embedding into other windows or be ‘swallowed’ (take the parents place until closed).
In this clip from Cat9 you can see that in two forms:
In the first form we take the contents of a previous job, request a new window and move the job output into it.
The second one is something more advanced. The requested type is actually HANDOVER, and we use the helper:
This helper creates a new process that inherits window ownership. The context can still be referenced as a proxy and used with the arcan_tui_wndhint function to reposition or reanchor.
This is useful for embedding as done in the clip, where we also get feedback on scaling and dimensions. This is where some of [shmif] bleeds through; if the new process is graphical and uses that library, such as the video player and PDF viewer in the clip, we can embed and even interact with graphical applications even though the TUI process itself has no concept of bitmap graphics.
In Lua, it would look like this:
wnd:new_window("handover",
function(wnd, new)
ifnot new thenreturnendlocal in, out, err, pid =
wnd:phandover("/usr/bin/afsrv_terminal", "")
end
)
Reading a line
Lets take a look at the common case: a command-line, i.e. our ‘libreadline’ replacement. This is a rather complicated affair since it involves text editing; suggestions; prompt; auto-completion; help information; keybinding controls; masking characters for password entry; searchable command history and so on.
Some of that you can see in this clip, also from Cat9.
The prompt is live updated (shown by the ticking clock), feedback on syntax/validation errors are highlighted, toggle-able extended help as to what a completion does and error messages explaining the validation failure.
#include <arcan_tui_readline.h>
...
structtui_context *T;
/* context has been setup like before */
...
structtui_readline_opts options =
{
.anchor_row = 1
.allow_exit = true
};
arcan_tui_readline_setup(T, options, sizeof options);
int status;
char* out;
while (1){
structtui_process_res result =
arcan_tui_process(&T, 1, NULL, 0, -1);
status = arcan_tui_readline_finished(T, &out);
if (status)
break;
}
if (status == READLINE_STATUS_DONE){
/* do something with *out */
free(out);
}
This swaps out relevant parts of whatever previous handler table you have for ones provided by the readline widget, and reverts it once finished. There are a lot of options to pick from when setting up the widget, including callback handlers for requesting history, validating current input, password masking, filtering undesired characters and so on.
For Lua, we can go with a more involved example:
local readline =
wnd:readline(
function(_, message)
print("you entered", message)
end
)
readline:set_prompt("type something: ")
readline:set_history({"things", "from", "the", "past"})
readline:suggest({"some", "thing", "to", "add"})
while (wnd:process()) do
wnd:refresh()
end
Networking
All this would be only marginally useful if it did not also work across the network. Remote administration and similar tasks are, of course, passport holding citizens in this world and even more work has gone into the networking protocol itself.
Say that you have written a new application using libarcan-tui and you want to share it with others or access it from another computer. From the developer angle, there is nothing to do, the existing tooling provides everything.
We will omit authentication key management here, since that is an big topic in its own right. That’s done with the ‘soft-auth’ argument for the pull apprach, and a12:// rather than a12s:// for the push one.
Speaking of push and pull, running something like:
Would set the device up to serve an instance of my_thing to every inbound connection. This is what we mean with ‘pull’, the display end connects, gets served an application and pulls the output to its display. This is what you would be most familiar with through ssh.
Push flips the direction around and is what you would expect from X11 style remoting with something like DISPLAY=some.host:port xterm.
Here ‘my.other.host’ would be setup with arcan-net --soft-auth -l 6680. To push my_thing: ARCAN_CONNPATH="a12://my.other.device" /path/to/my_thing.
Both have its uses. Normally I have a number of devices fuzzing, doing traffic monitoring, or monkey testing with a core dump crash handler installed. All of these attach their corresponding TUI interaction tool and on-demand push to whichever device I am currently working on directly, or to a fallback multiplexing server that keeps them around until I check in.
What is interesting in the network case is how it differs from terminal times over the wire. This is nuanced with many factors in play as congestion control isn’t exactly XON/XOFF or the RS-232 RTS/CTS of yore. Is the priority interactive latency? memory consumption locally? memory consumption remotely? low network bandwidth?
To our disadvantage there is potentially more data per packed cell, but ZSTD compresses that very well. At the same time it is only the visible changes that are transferred, not some theatre piece with ASCII-character LOGO turtle on acid playing the lead. Any back-pressure propagates from the network tool to the source so pending updates merge there.
In a simple line-mode screen, ssh + your shell doing a ‘find /’ would start sending a colossal amount of data on the wire. Here the strategy is partly up to the shell.
In the following clip from Cat9, you can see that it detects a data heavy job and switches to a slow updating statistics view until it terminates or I instruct it to show what is going on. Importantly I can keep on working and I am not shoving 127MiB of data over the wire.
For an alt-screen tool, like vim and tmux, they instead get to pay the price for cursor movements to draw at non-continuous locations; style changes; border drawing characters and so on. Then suffer all the crazythings ncurses does (like padding or delaying writes to align with the baudrate of the connection). Then suffer the crazy things emulators have to do (no incoming data last n miliseconds, then it’s probably safe to draw). Then suffer the crazy things they themselves have to do to understand if a key is a key or a control character.
Just as we concluded our first NLnet grant, it is also time to say goodbye to the second phase of the project, ‘anarchy on the desktop’, and enter the third and final one.
As per the old roadmap, 0.7 is the last window of opportunity for any trailing features. For 0.8 we finally get to pull out the performance tuning tricks, 0.9 for hardening the attack surfaces, to be followed by me disappearing in a puff of smoke.
I am also happy to say that we have received two new NLnet grants, one administrated by me, the other by Cipharius. Both of these build on the networking layer. For my part we’ll extend the directory server end to have a more flexible programmable remote side, for supporting linking multiple ones together and making the entire stack ‘turnkey’ easy to deploy and use.
In the other end there will be a simpler access layer for letting a device join your ‘one desktop, many devices’ network and extend it with its storage and sensors (cameras, displays, IMUs, and so on), or act as a portable viewer, Smash, for accessing it from within the confines of the corporate walled gardens.
I have often and, rightly so, been accused of not being a very public person. As an exemption to that rule I did partake in an interview by the venerable David Chisnall which can be read over at lobste.rs here: https://lobste.rs/s/w3zkxx/lobsters_interview_with_bjorn_stahl — if you are at all curious about my background and motivation.
Let’s briefly look at the building blocks in place:
Legacy Strategy – SHMIF and A12 were both verified to ensure that anything that could be done (arcan vs Xorg part 1, part 2) through previous popular tools and protocols can still be done and persist onwards (missing – bringing X along).
Now we can braid these strands together into one rope, whip it into shape and see what should dangle from it.
This release doesn’t have many demonstrable items for the core engine itself, mostly fixes all across the board. Instead we will recap some changes and ongoing work for Lash#Cat9 and Xarcan. Our reference desktop environment, Durden, will get a separate release post when some of the tools has received more polish work.
As a teaser though, the following clip shows how the network tool can map in the key-local private file store and route it to our decode plumber:
Directory server triggered file streaming
In a similar vein, the following clip shows the same approach to picking a file from the directory store, setting it as the drag source and then dropping it into Cat9 and letting it expand it into a job view.
This act as the development baseline for more things, particularly letting controller appls running server-side define other namespaces, scan / index and annotate, distribute across a network of directories and search.
Other highlights for that release will be a new streaming/sharing composition tool; trainable mouse gestures; new initial setup configuration helper; desktop icons; on-screen keyboard; input typer helper with dictionary/IME completion; stereoscopic support for Xreal Air style glasses and more bespoke displays such as Looking Glass (lenticular) and Tilt5 (projected, retroreflective AR).
Lash#Cat9
Technically, Lash is a Lua scripting environment that is wrapper around our curses replacement, libarcan-tui, adding some convenience features to make it easier to write a command-line shell. Cat9 is the current reference shell for this.
The major features added to Cat9 over the last ~2 years since its inception has all received individual write ups:
In A spreadsheet and a debugger walk into a shell we cover a Debug Adapter Protocol Implementation as well as an interactive spreadsheet that can be populated by regular shell commands. The following two clips come from that article:
Sampling debugger watchset into a spreadsheet
Creating a spreadsheet, running cells that map to shell commands, mixing with builtin functions
In Cat9 Microdosing: stash and list we cover a ‘ls’ replacement that effectively turns it into a file manager, and a scratchpad for accumulating files into a workset of files to monitor for changes, or forward to compression / transfer tools. The following two clips from that article:
Using list to navigate with first mouse, then keyboard, opening a media file with open-swallow
Using list to create a stash of files, and removing them in one fell swoop.
In Cat9 Microdosing: each and contain we add the option to absorb the output of previous or current jobs into a collection that can then be referenced by other commands as a single unit, as well as defining asynchronous processing options over a range of data from other jobs (best used with the stash mentioned above). The following two clips come from that article:
Using contain to swallow ongoing jobs, merging them into an overview
Using list to build a stash, then each to run a command on each item, merging into a container job
The next planned such write-up contains a social timeline for things like mastodon and email timeline by frontending ‘madonctl’ and ‘himalaya’ as well as more developer tools like scm based integration. The following clip is a sneak peak from that:
SCM monitor detecting a fossil repository, exposing its tickets, controls for staging commits, viewing diffs and navigating timeline.
Xarcan
In the last release post we hinted at how Xarcan can be used to keep your own window manager, letting Arcan act as a display driver — as well as a security, control and configuration plane that selectively merge in arcan native clients with options for how, or if, X clients gets to see inputs and clipboard action.
The following clip was used for that:
Window Maker managing windows and decorations, Arcan handling display control and mixing in arcan-shmif clients.
One thing I’ve never really been thrilled with about the Xwayland design is that the wayland compositor wants also to be the X window manager, and that all the related state about window position and whatnot is just internal to the compositor. xwin and xquartz don’t have this problem, you can run twm as your window manager and still move windows, but in xwayland that doesn’t work because wayland refuses to allow you to position your own window for (what I consider) essentially religious reasons [5]. And as a result the compositor gets a lot more complicated and you _still_ need to change the X server to get things to work. What I’d have preferred is a wl_x11_interop protocol that Xwayland could use to send this kind of advisory state, which mutter could optionally only expose to Xwayland if it really wanted to be less functional than X.
‘Essentially religious’ reasons indeed. What really happens is that Xarcan synchronises X11 ConfigureWindow and Stacking information with the corresponding VIEWPORT events.
This lets the Arcan side of the WM equation take the output video and decompose it into degenerate regions of linked coordinate spaces. It then uses these regions to determine if it should forward input or not.
For Arcan clients, it creates empty placeholder windows in X11 space, and swaps out its contents for the real one during scanout. It lets the X11 side know there is an object there, but prevents it for having access to its contents – unless desired.
That last part can be solved with controlled sharing. In the following clip you can see how I drag the contents of an Arcan window into it, letting legacy ‘screen sharing’ or ‘screenshotting’ (as flawed as those concepts are, but that is a long rant) applications see and capture it:
Sharing an arcan application window into an Xserver letting GIMP screenshot it
There is quite a lot more going on and planned:
An interesting coming change to arcan-shmif will have it re-use the ‘platform’ layer in Arcan that is responsible for managing input devices and controlling displays when there is no other arcan-shmif server to connect to.
This will effectively turn the Xarcan DDX into a standalone Xserver that works as expected, but at the same time keep up to date with features and changes to the KMS/GBM part of the graphics stack.
This lets us drop a whole lot of cruft from Xorg: XNest/Xephyr is no longer needed, neither is XAce or XFree86. We can eliminate quite a few slowdowns and pitfalls in startup.
Since it’s trivial for us to compartment and apply policy based on client origin, it will also be possible to partially share in the other direction, such as letting Xarcan act exclusively as an input-driver in order to bring xf86-input-wacom and XIM along.
Before then we have a few minor immediate cleanups left, mainly fixing regressions in GLAMOR caused by changes to DRI3 and PRESENT and some work-arounds for XDND to be able to work between multiple, networked, X servers. It’s not hard, just tedious.
Now it’s time to head off to 38c3. See you next year for the final reveal.
This is a technical description of the IPC system used throughout the Arcan project, from both a designer and developer perspective, with annotations on legacy and considerations along the way. It’s one of a few gems inside of the Arcan ecosystem, and thousands of hours have gone into it alone.
The write-up assumes a basic computer science background, while the sections prefixed with ‘comment’ are more advanced.
History
SHMIF, or “SHared Memory InterFace” has a long history, dating back to around 2007. It was first added to cover the need to least-privilege separate all parsing of untrusted data in the main engine, simply because the ffmpeg libraries couldn’t stop corrupting memory – generally a bad thing and we had more than enough of that from GPU drivers doing their part.
With parsers sandboxed, it evolved to also work as a linker interposed- or injected shellcode- way of manipulating 3rd party audio/video processing and event loop without getting caught doing so. Rumour has it that it was once used to automate a lot of the tedium in games such as World of Warcraft, and was not caught doing so.
It was written to be portable across many operating systems. The initial version ran on both Windows, OSX, BSDs and Linux. There were also non-public versions that ran on Android and iOS. These days the focus remains on BSDs and Linux, with the networking version of the same model, “A12”, intended to retain compatibility with the others.
Its design is based on lessons learned from emulating arcade games of yore, as they represent the most varied and complex display systems to date. The data model evolved from increasingly complex experiments, up to- and beyond- the point of painstakingly going through every single dispatch function in X11 to guarantee that we did not miss anything. The safety and recovery aspects come from many lessons learned breaking and fixing control systems for power grids. The debugging and performance choices came from working on a last-resort debugging tiger team on (mainly) Android.
Layout
There is a shared memory region, and a set of OS specific primitives to account for inadequacies in how various kernels expose controls over memory allocation and use. Combined we refer to these as a segment. The first one established is referred to as the ‘primary’ and it is the only one that is guaranteed on a successful connection. Additional ones are negotiable, and the default is to reject any new allocation. This decision is ultimately left to the window management policy.
Comment: In this form, there is no mechanism for a client to influence allocations in the server end. Further compromises are possible (as we come to later) in order to gain more features, but for a hardened setup, this is one out of several ways we reduce the options for exploiting any vulnerabilities or staging a denial of service attack.
The shared memory is split into a fixed static region and a dynamic one.
The following figure shows the rough contents of these regions:
Fields (not to scale) and layout of the two regions of the shared memory region of a segment
The order of the fields in the static region is organic, it has been extended over time. To avoid breaking compatibility, changes have been appended as more metadata was needed. The region marked ‘aux’ is 0- sized by default; it is only used for negotiating advanced features e.g. HDR metadata and VR device support.
Some of the more relevant and non-obvious members of the static regions are:
DMS – Dead Man’s Switch. If it is ever modified the segment is considered dead. After this point no modifications to the page will be processed by the other side. (See ‘Safety Measures’ section).
Verification Cookie. This is a checksum comprised of calculating the offsets and values of other members in the region. Both sides periodically calculate and compare this value to detect version mismatches or corruption.
Inbound/Outbound event buffers. – These are fixed slot ring buffers of 128b events. They can be thought of as asynchronous ‘system’ calls (See ‘Event Processing’ section).
Segment Token. A unique identifier for this specific segment. This can be used by the client end to reference other events if the identifier has been shared by some other mechanism. The ‘VIEWPORT’ event, for instance, instructs window management for repositioning or embedding segments owned by other clients or threads.
The entire memory region is treated as an unsafe contested area; one side populates it with changes it wants to see done and through some synchronisation trigger, and the other side verifies and applies or rejects them.
Comment: For debugging and inspection, this means a single snapshot of the mapped memory range is sufficient to inspect the state of the connection and trivial to write analysis, fuzzing and reporting tools for.
The raw layout is not necessarily exposed to the consumer of the corresponding library. Instead a context structure (struct arcan_shmif_cont) contains the developer-relevant pointers to the corresponding subregions.
Comment: While the implementations for this interface live in userspace, the design intent was to be able to have the server end live completely in a kernel, and have this act as the sole system call interface.
Each segment has a type that is transferred once from the client to the server during the REGISTER event (or when requesting a new one through a SEGREQ event). This is mainly a window management and server hint to control event response, but also determines if video and audio buffers are for (default) client to server or (screen recording and similar features) server to client.
First Connection (Client)
The client end comes as a library, libarcan-shmif. The rough skeleton that we will unpack here looks like this.
The SEGID_ part is a hint to the server as to the intended use of this connection and how it could manage its resource allocation and scheduling. There is a handful of types available, but APPLICATION is a safe generic one. A video player would be wise to use MEDIA (extremely sensitive to synchronisation but not input), while a game would use, well, GAME (high resource utilisation, input latency and most-recent presentation more important than “perfect” frames).
The FATALFAIL part simply marks that there is no point to continue if a connection can’t be negotiated. It saves some error checking and unifies 'fprintf(stderr, "Couldn't connect")' like shenanigans.
The arg_arr ‘args’ is a form of passing command line arguments to the client without breaking traditional getopt/argv. It can be used to check for key=value pairs through something like ‘if (arg_lookup(&args, "myopt", 0, &val)){ ... }‘ .
A good question here would be, how does the client know to find the server? The actual mechanism is OS dependent, but for the POSIX case there are two main options that the library is looking for: the ARCAN_CONNPATH and ARCAN_SOCKIN_FD environment variables. The value for CONNPATH is the name of a connection point and is defined by the server side.
Comment: The connection point name is semantic. This stands in contrast to how Xorg does with its DISPLAY=:[number] where number normally came from the virtual terminal the user was starting Xorg from. The server end can spawn multiple connection points with different names and apply different policies based on the name.
ARCAN_SOCKIN_FD is used to reference a file descriptor inherited into the process running arcan_shmif_open. This is used when the server itself spawns the new process. It is also used in the special case of ARCAN_CONNPATH being set to “a12://” or “a12s://”. This form actually starts arcan-net to initialise a network connection to a remote host, which creates a single-use shmif server for the client to bind to. This is one of the ways we translate from local IPC to network protocol.
The 'arcan_shmif_initial' part gives the information needed to create correct content for the first frame. This includes user preferred text format (size, hinting), output density (DPI aware drawing over ‘scaling’ idiocy), colour scheme (contrast for accessibility, colour blindness or light/dark) and even locale (to migrate away from LC_… /getlocale) and location (latitude/longitude/elevation).
Comment: For accelerated graphics it also contains a reference to the GPU device to use for rendering, this lets the server compartment or load-balance between multiple accelerators.
Now for the ‘send audio/video’ part.
shmif_pixel px = SHMIF_RGBA(0x00, 0xff, 0x00, 0xff);
for (size_t y = 0; y < C.h; y++)
for (size_t x = 0; x < C.w; x++)
C.vidp[Y * C.pitch + x] = px;
arcan_shmif_signal(C, SHMIF_SIGVID);
This fills the dynamic video buffer part of the segment with a full opaque green pixel in linear RGB in whatever packing format the system considers native (embedded in the SHMIF_RGBA macro).
Comment: While on most systems (these days) that would be 32-bit RGBA, it is treated as compile time native as CPU endianness would be. Low-end embedded might want RGB565, special devices like eInk might want RGB800 and so on.
There are a lot of options available here, but most have to deal with special synchronisation or buffering needs. These are covered in the ‘Special Case’ sections on Synchronisation and on Accelerated Graphics.
For this example we omitted the aural representation, but if you have a synthesizer core or even tone-generator the same pattern apply; switch vidp for audp and SHMIF_SIGVID for SHMIF_SIGAUD (it is a bitmask, use both if you have both).
Comment: The common distinction between audio and video is something we strongly oppose. It causes needless complexity and suffering trying to have one IPC system for audio, then another for video and then trying to repair and synchronise the two after the fact. It is one of those historical mistakes that should have ended yesterday, but the state of audio on most systems is almost as bad as video.
At this stage we are already done (13 lines of code, zero need for error handling) but for something more polite, we will flesh out the ‘event processing’ part.
struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
if (ev.category == EVENT_IO){
/* mouse, keyboard, eyetracker, ... handling goes here */
}
switch(ev.tgt.kind){
case TARGET_COMMAND_EXIT:
/* any custom cleanup goes here */
break;
case TARGET_COMMAND_RESET:
/* assume that we are back where we started */
default:
break;
}
}
This will block until an event is received, though more options are covered in the section on ‘Synchronisation’. No action is ever expected of us, we just get polite suggestions ‘it would be nice if you do something about this’. The category part will only be EVENT_IO or EVENT_TARGET and the next section will dip into why.
Comment: The _RESET event in particular is interesting and will be covered in the 'Recovery' Special Case. It can be initiated by the outer desktop for whatever reason, and just suggest 'go back to whatever your starting state was, I have forgotten everything' but is also used if the server has crashed and the implementation recovered from it, or is shutting down and have already handed responsibilities over to another.
The event data model cover 32 different server to client possibilities, and 22 client to server. Together they cover everything needed for a full desktop and more, but it is descriptive, not normative. React to the ones relevant to you, ignore the others.
First Connection (Server)
There are two implementations for the server end; one inside the arcan codebase tailored to work better with its more advanced resource management, crash resiliency and scripting runtime. The other comes as a library, libarcan-shmif-server, and is mainly used by the arcan-net networking tool which translates this into the A12 network protocol.
Let’s walk through a short example which accepts a single client connection, and in the next section do the same thing for the client application end. Normal C foreplay is omitted for brevity.
This creates a connection point for a client to bind to. There are two alternatives, shmifsrv_spawn_client and shmifsrv_inherit_connection. Spawn takes care of creating a process with the primitives inside. Inherit takes some preexisting primitive and builds from there. Both return the same shmifsrv_client structure.
Comment: For a controlled embedded or custom OS setting, the spawn client approach is the safest bet. The inherit connection approach is for when there is a delegate responsible for spawning processes and reduces the number of system calls needed to a bare minimum.
The shmifsrv_monotonic_rebase() call sets up the internal timekeeping necessary to provide a coarse (25Hz) grained CLK (clock) signal.
Now we need some processing that interleaves with rendering/input processing loops, which is a larger topic and out of scope here.
int status;
while ((status = shmifsrv_poll(cl) <= CLIENT_NOT_READY)
{
/* event and buffer processing goes here */
}
if (status == CLIENT_DEAD)
{
shmifsrv_free(cl, SHMIFSRV_FREE_FULL);
exit(EXIT_SUCCESS);
}
It is possible to extract an OS specific identifier for I/O multiplexing so that _poll is only invoked when there is some signalled inbound data via shmifsrv_client_handle().
Comment: The flag passed to _free determines the client side visibility. It is possible to just free the server side resources and not signal the dead man's switch. This can be used to transparently pass the client to another shmif server process or instance.
Before we get to the event and buffer processing part, there is also some timekeeping that should be managed outside of a higher frequency render loop.
int left;
int ticks = shmifsrv_monotonic_tick(&left);
while (ticks--){
shmifsrv_tick(cl);
}
This provides both integrity and liveness checks and manages client requested timers. The (left) returns the number of milliseconds until the next tick. This is used as feedback for a more advanced scheduler, if you have one (and you should).
Now to the event processing:
struct arcan_event buf[64];
size_t n_events, index = 0;
if ((n_events = shmifsrv_dequeue_events(cl, buf, 64)){
while (index != n_events){
struct arcan_event* ev = &buf[index++];
if (shmifsrv_process_event(cl, ev))
continue;
/* event handlers go here */
}
}
This will dequeue, at most, 64 events into the buffer. Each event is forwarded back into the library in order to allow a subset of internally managed ones. They are just routed through the developer to allow complete visibility. You can use arcan_shmif_eventstr() to get a human readable representation of its contents.
Comment: The reason for having a limit is that a clever and malicious client could set things up in a way that would race to stall the server or exhaust its file descriptor space as part of a denial of service, either to affect the user directly or as part of trying to make an exploitation chain more robust.
Now for the ‘event processing’ part.
if (ev->category != EVENT_EXTERNAL){
fprintf(stderr, "unexpected event category\n");
continue;
}
switch (ev->ext.kind){
case EVENT_EXTERNAL_REGISTER:
/* only allow this once on the client */
arcan_shmifsrv_enqueue(cl, &(struct arcan_event){
.category = TARGET_COMMAND,
.tgt = {
.kind = TARGET_COMMAND_ACTIVATE
}
}
break;
default:
break;
}
The event data model has a lot of display server specific nuances to it, neither is necessary except for the one above. This unlocks the client from the ‘preroll’ state where it accumulates information received into the “arcan_shmif_initial” structure as covered in the client section. Any information necessary for a client to produce a correct first frame goes before the ‘ACTIVATE’ one. The most likely ones you want is DISPLAYHINT, OUTPUTHINT and FONTHINT to instruct the client about the size it will be scaled to, the density, colourspace and subchannel layout it will be presented through, as well as the preferred size of the most important of primitives, text.
Comment: There are a number of event categories, but only one reserved for clients (EVENT_EXTERNAL). The other categories are for display server internals. The reason they are exposed over SHMIF is for the 'server' end to be split across many processes and still interleave with event processing in the server. This allows us to have external sensors, input drivers etc. all as discrete threads or processes without changing anything else in the architecture. It also allows a transformation from using it as a kernel-userspace boundary to a microkernel form.
The last part is to deal with ‘buffer processing’ part of the previous code.
/* separate function */
bool audio_cb(shmif_asample *buf,
size_t n_samples,
unsigned channels, unsigned rate,
void *tag)
{
/* forward buf[n_samples] to audio device or mixer
* configured to handle [channels] at [rate] */
return true;
}
if (status & CLIENT_VBUFFER_READY){
struct shmifsrv_vbuffer vbuf = shmifsrv_video(cl);
/* forward vbuf content to GPU */
shmifsrv_video_step(cl);
}
if (status & CLIENT_ABUFFER_READY){
shmifsrv_audio(cl, audio_cb, NULL);
}
The contents of vbuf is nuanced. There is a raw buffer or opaque GPU system handle + metadata (timing, dirty regions, …), or TPACK (see section on ‘Text Only Windows’) and a set of flags corresponding to the ‘presentation-hints’ on how buffer contents should be interpreted concerning coordinate system, alpha blending and so on.
Comment: Most of the graphics processing properties are things any competent scanout engine has hardware acceleration for, excluding TPACK (and even ancient graphics adaptors used to have those as 'text mode' display resolution). There are support functions to unpack this into a compact list of text lines and their colouring and formatting in "arcan_tui.h" as arcan_tui_tunpack().
Comment: For accelerated GPU handles it is possible to refuse it by sending a BUFFERFAIL event. This will force the client implementation to convert accelerated GPU-local content into the shared pixel format on their end. This is further covered in 'Special Case: Accelerated Graphics'. It doubles as a security measure, preventing the client from submitting command buffers to the GPU that will never finish and livelock composition that way (or leverage any of the many vulnerabilities GPU side). On a hardened system this would be used in tandem with IO-MMU isolation.
In total this lands us with less than 100 lines of code with very granular controls, a fraction of what other systems need to just boilerplate graphics alone. If you only want a 101 level – take on how SHMIF works, we are done; there is a lot more to it if the topic fascinates you, but it gets more difficult from here.
Synchronisation
While one might be tempted to think that ‘display servers’ are about well, providing access to the display, its real job is actually desktop IPC with soft realtime constraints. The bread and butter for such systems is synchronisation. If you fail to realise this you are in for a world of hurt and dealing with it after the fact will brew a storm of complexity.
Comment: It is also the hardest problem in the space - figuring out who among many stakeholders knows what; when do they know it; when is something said relevant or dated. All of those are difficult as is, but it gets much worse when you also need to factor in resonance effects, malicious influence and that some stake holders reason asynchronously about some things, and synchronously about others. As icing on an already fattening cake you need to weigh in the domain specific (audio / video) nuances. Troubleshooting boils down to profiling, and problems manifest as 'jitter' and 'judder' and how those fit with human cognition. Virtual Reality is a good testing ground here even if you are otherwise uninterested in the space.
Comment: beginner mistakes here are fairly easy to spot; if someone responds to synchronisation problems by increasing buffer sizes (landing in a version of the network engineering famous 'buffer bloat' problem) or arbitrary sleep calls (even though some might be necessary without adequate kernel level primitives) they are only shifting the problem around.
Recommended study here is ‘queuing theory’ and ‘signal processing’ for a deeper understanding.
To revisit the previous code examples on the client end:
arcan_shmif_signal(&C, SHMIF_SIGVID);
This is synchronous and blocking. The thread will not continue execution until the server end has said it is ok (the shmifsrv_video_step code). The server can defer this in order to prioritise other clients or to stagger releases to mitigate the ‘thundering herd’ problem.
For normal applications, this is often sufficient and comparable to ‘VSYNC’. When you have tighter latency requirements and/or it is costly to produce a frame, you need something more. The historically ‘easier’ solution has been to just add another buffer:
_resize and _resize_ext calls are both also synchronous and blocking. This is because the server end needs the controls to guarantee that enough memory is available and permitted. It will recalculate all the buffer offsets (vidp, audp, …) and verification cookie in the context and possibly move the base address around to satisfy virtual or physical memory constraints.
Comment: Some accelerated display scanout controllers have hard requirements on physically continuous memory at fixed linear addresses and those are a limited and scarce resource and why such resize request might fail, especially in tight embedded development settings. Same applies when dealing with virtual GPUs in virtual machines and so on. The other option to still satisfy a request is to buffer in the server end, causing an extra copy with increased power consumption and less memory bandwidth available for other uses.
The request above would make the first arcan_shmif_signal call return immediately, and only block if another signal call happens before the server is able to consume the buffer from the first. Otherwise the context pointer (C.vidp) will be changed to point to the new free buffer slot. This also has the added cost of adding another display refresh period of latency.
Comment: It is possible to increase the buffer count even further, but this changes the semantics to indicate that only the most recently submitted buffer should be considered and others can be discarded. This counters the latency problem of the double buffering problem at the expense of memory consumption. This has historically been called 'triple buffering' but, to much confusion, has also been used for the 'double buffering' behaviour with just deeper buffer queues and is thus meaningless.
Not every part of a buffer might have actually changed. A common optimisation is to annotate which region that should be considered, and for regular UI applications (blinking cursor, single widget updates, …) this substantially cuts down on memory transfers. To cover this you can mark such regions with calls to arcan_shmif_dirty() prior to signalling.
Comment: While some might be tempted to annotate every pixel, there are strong diminishing returns as soon as you go above just one region due to constraints on memory transfers. Internally the shmif client library implementation will just merge multiple calls to _dirty into the extents of all changes. For the triple buffering behaviour mentioned in the previous comment, dirty regions won't have any effect at all as changes only present in the one buffer would not guarantee to transfer over in the next and the cost of trying to merge them on the composition end would cancel out the savings in the first place.
There are more synchronisation nuances to cover, but to avoid making this section even more exhausting, we will stick to the two most relevant. The first of these look like this:
This returns immediately and you can chose to check if it is safe to draw into the video buffer yourself (through arcan_shmif_signalstatus), or to simply continue writing into the buffer and risk ‘tearing’ in favour of lower latency. This amounts to what is commonly called ‘Disable VSYNC’ in games.
Comment: For those that have written games in days of yore, you might also recall 'chasing the beam', unless dementia has taken over by now. Since game rendering can have predictable write patterns and display scanout can have predictable read patterns it is possible to align your writes such that you raster lines up to just before the one the display is currently reading. This is neither true for modern rendering nor is it true for modern displays, 'the framebuffer' is a lie. Still, for emulation of old systems, it is possible, but impractical, to repeatedly access the 'vpts' field of the static region to figure out how many milliseconds are left until the next VBLANK and stage your rendering accordingly.
The last option is to to keep the SHMIF_SIGBLK_NONE behaviour, but adding the flag SHMIF_RHINT_VSIGNAL_EV to C.hints prior to a _resize call. This will provide you with a TARGET_COMMAND_STEPFRAME event and you can latch your rendering to that one alone and let your event loop block entirely.
Comment: Enabling STEPFRAME event delivery by sending a CLOCKREQ request provides a secondary path for advanced latency management as it enables feedback in presentation timing. Sampling the 'vpts' field of the static region would provide information about upcoming deadline and STEPFRAME contains metadata about presentation timing as to when the contents was actually presented to scale up/down effects and quality. Current game development is full of these kinds of tricks.
Event Processing
Event loop construction is an interesting and nuanced topic. We start by returning to the naive one introduced in the client section:
struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
/* interpretation goes here */
}
Responding to each event that arrives here should be as fast as possible. This is easy most of the time. Whenever a response includes rendering however, response times can vary by a lot. Some events are more prone to this, with mouse motion and resize requests being common offenders.
What happens then is that the number of events in the incoming queue starts to grow. If the rate of dispatched events is lower than that of the incoming one, we get buffer back-pressure.
This applies to both the server and the client side. Each side has different constraints and call for different mitigation. The server end is more vulnerable here as it has multiple clients to process, and higher costs for processing events as most prompt some form of managerial decision.
Comment: Events from client A can directly or indirectly cause a larger number of responses from clients B and C (amplification), which in turn can cascade into further responses from A. This can snowball fast into 'event storms' and back-pressure building up in others as 'resonance effects'.
One small change to the loop improves on the client end of the equation:
bool process_event(struct arcan_event *ev)
{
/* interpretation now goes here */
}
struct arcan_event ev;
while (arcan_shmif_wait(&C, &ev)){
bool dirty = process_event(&ev);
size_t cap = PP_QUEUE_SIZE;
while (arcan_shmif_poll(&C, &ev) > 0 && cap--){
dirty |= process_event(&ev);
}
if (dirty){
render();
}
}
This will flush out as much of the inbound queue (or up to a cap corresponding to the size of the ring buffer) as possible, and only render after all have been applied. This prevents earlier events in the queue from being overdrawn by responses to later ones in the queue.
Comment: Since the event queue is a ring-buffer visible to both sides, it is possible for either party to atomically inspect the head and tail values to determine the current state of the other end, as well as incoming workload. This is a powerful advantage over other possible carriers, e.g. sockets.
The full data model would take another lengthy post to flesh out, so we will only look at one event which highlights library internals. That event is ‘TARGET_COMMAND_DISPLAYHINT’. This event is used to indicate the size that the server end would prefer the window to have. The client is free to respond to this by resizing to the corresponding dimensions. If it doesn’t, the server can still scale and post process – it has the final say on the matter.
As mentioned earlier, resize is synchronous and blocking due to its systemic cost so it makes sense to keep them at a minimum. Some of that responsibility falls on the window manager to ensure that a drag resize using a 2kHz mouse doesn’t also result in 2000 DISPLAYHINTs. Even if that would happen, the implementation has another trick up its sleve.
There is a small number of events which are considered costly and can be coalesced. When _wait or _poll encounters such an event, it sweeps the entire pending queue looking for similar ones, merging them together into the one eventually returned, providing only the most recent state.
Comment: There is a tendency for IPC systems to be designed as generally as possible, even if their actual narrow use is known. This defers the decision to commit to any one domain specific data model, making optimisations such as this one impossible -- you can't coalesce or discard what you don't know or understand, at least not responsibly with predictable outcome. This does not make complexity go away, to the contrary, now every consumer has increased responsibility to manage queuing elsewhere. The problem doesn't magically disappear just because you have an XML based code generator or some-such nonsense.
Most of this article has been about a single segment, though in ‘rich’ applications you would have more: child windows, popups, custom mouse cursors and so on. We already mentioned it is possible to request more, even though only one is ever guaranteed. This is not more difficult than setting up the primary one. You submit a SEGREQ event with a custom identifier and type. Eventually you either get a NEWSEG event or REQFAIL event back with the same identifier. For the NEWSEG you forward the event structure to arcan_shmif_acquire and you get a new arcan_shmif_cont structure back.
What does this have to do with queue management? Well, each new segment has their own separate queues and each segment can be processed and rendered on separate threads independent of each other. There is a monotonic per-client global counter and timestamp as part of each event to account for ordering requirements between ‘windows’, but in practice those are exceedingly rare.
A final part about the events themselves. SHMIF is an IPC system, it is not a protocol, it doesn’t cross device boundaries. We have a separate and decoupled networking protocol specifically for that. As an IPC system we can and should take advantage of device and operating system specific nuances.
Two such details is that each event has a fixed size, 128 bytes (64 did not cover all cases) which amounts to 2 cache lines for the vast majority of architectures out there. They are in linear continuous buffers at a native aligned base with access patterns that prefetch very well. The packing of different fields is tied to the system ABI which is designed to be optimal for whatever you are running on.
Safety Measures
We are almost done with the overall walkthrough, then we can finish off with some special cases and exotic features. Before then there is one caveat to cover.
In previous sections we have brushed upon a few tactics that protect against misuse; the validation cookie to detect corruption, version and code generation mismatches as well as the dead man’s switch. There is still one glaring issue from shared memory event management and audio/video signalling solution: what happens if the other end livelocks or crashes while we are locked waiting for a response?
In a socket based setup the common ‘solution’ for a crash in the other end would cause it to detach and you can detect that. For a live lock it is to resort to a kind of ping-pong protocol and somehow disambiguate between that and a natural stall for some other part of the system (very often, GPU driver).
By default (there is a flag to disable this) each segment gets a guard thread. This guard thread periodically (default: every second) checks the aliveness of a monitoring process identifier that the server filled out, as well as if the dead man’s switch has been released. If that happens, it immediately unlocks all internal semaphores causing any locked call into the shmif library to release and any further calls to error out so the natural event loop takes hold. This setup is also used to not only detect- but recover from- crashes (see ‘Special Case: Recovery and Migration’).
This might not be enough for troubleshooting or even communicating to a user that something is wrong. For this we have the ‘last_words’ part of the memory region. This is an area the client can fill out with a human presentable error message that the server end can forward to relevant stakeholders.
The Arcan engine itself splits out into two parts. One potentially privileged parent supervision process that is used to negotiate device access, and the main engine. This supervision process also acts as a watchdog. Every time the engine enters and exits a dangerous area, e.g. the graphics platform or the scripting VM, it registers a timestamp with the parent. If this exceeds some threshold, the parent first signals the engine to try and gracefully recover (the scripting VM is able to do that) and if the problem persists, shuts down the engine. This will trigger the guard threads inside clients and they, in turn, enter a reconnect, migrate or shutdown state.
Special Case: Handover Allocation
As we covered previously, requesting a new segment to use as a window takes a type that indicates its role and purpose. One such type that sticks out is ‘SEGID_HANDOVER’. This means that the real type will be provided later and that the segment will be handed over to a new client.
This would launch “/path/to/something” so that when it calls arcan_shmif_open it will actually use the segment we received in order to connect. We can then use new_token in other events to manage some of it, e.g. reposition its windows, inject input and more. All of this retains the chain of trust: the server end knows who invited the new client in and can treat it accordingly.
This can be used to embed other clients into your own window, to build an external window manager and so on. In our ‘command lines without terminal emulation’ shell, Lash#Cat9, we use that to manage graphical clients while still being ‘text only’ ourselves. For other examples, see the article on Another Low Level Arcan Client: A Trayicon Handler.
Special Case: Recovery and Migration
A key feature of SHMIF is that it can redirect and reconnect clients manually. Through this we can even transition back and forth between local and networked operations. The section on ‘Safety Measures’ covered how it works in SHMIF internals. There is also an article on ‘Crash Resilient Wayland Compositing’ (2017) that demonstrates this.
When a client connects, the library enqueues a ‘REGISTER’ event that contains a generated UUID. This can be leveraged by the window manager to persist location on the desktop and so on. At any stage it can also send a ‘DEVICEHINT’ event back.
This event is used to provide an opaque handle for GPU access in operating systems which requires that (which can further be used to load balance between multiple GPUs), but it can also mention a ‘fallback connection point’. Should the server end die (or pretend that it has died), the library will try to connect to that connection point instead.
If it is successful, it will inject the ‘TARGET_COMMAND_RESET’ event that we covered earlier. We will use the following clip from “A12: Visions of the Fully Networked Desktop“. as a starting point.
Migration of a client between devices through simulated recovery
In it, you see Lash#Cat9 CLI shell inside the ‘Durden’ Window manager having a video clip as an embedded handover allocation. This has previously used the local discovery feature of the network protocol to detect that the tablet in front of the screen (a Surface Go) is available as a sink and has added it as an icon in the statusbar — unfortunately occluded by the monitor bezel in the clip.
When I drag the window and drop it on that icon, Arcan sends a DEVICEHINT with the connection primitive needed for the network embedded into the event. It then pulls the dead man’s switch, forcing the shmif library to go into recover. Since it remembers the connection from the DEVICEHINT, it reconnects and rebuilds itself there.
This feature is not only leveraged for network migration as shown, but also for compartmentalisation between multiple instances; for crash recovery; for driver upgrades and for upgrading the display server itself. All using the same code paths.
Special Case: Accelerated Graphics
Many ‘modern’ clients unfortunately have a hard dependency to a GPU, and unfortunately the mechanisms for binding accelerated graphics between display server and client are anything but portable.
Comment: Khronos (of OpenGL and Vulkan fame) tried to define a solution of their own (OpenWF) that failed miserably. What happened instead is even worse; the compromise once made for embedded systems, 'EGL' got monkey patched with a few extensions that practically undoes near all of its original design and purpose, and it is suddenly what most are stuck with.
There is a lot of bad blood and vitriol on the subject that we will omit here and just focus on the SHMIF provided interface. Recall the normal way of starting a SHMIF client:
#include <arcan_shmif.h>
int main(int argc, char **argv)
{
struct arg_arr args;
struct arcan_shmif_cont C =
arcan_shmif_open(SEGID_APPLICATION,
SHMIF_ACQUIRE_FATALFAIL,
&args);
}
This does still apply. A client is always expected to start a normal connection first, and then try to bootstrap that to accelerated, which can fail. The reasoning for that is iff permissions or GPU driver problems stop us from providing an accelerated connection, the regular one can still be used to communicate that to the user rather than have them dig through trace outputs for the answer.
To extend a context to being accelerated you can do something like this:
struct arcan_shmifext_setup cfg = {
.api = API_OPENGL,
.major = 4,
.minor = 2
/* other options go here */
};
int status = arcan_shmifext_setup(&C, &cfg);
if (status != SHMIFEXT_OK){
/* configuration couldn't be filled */
}
There are a number of options to provide in the config that requires some background with OpenGL etc. to make any sense so we skip those. If you know, you know and if you don’t, enjoy the bliss. If the setup is OK, the passed ‘cfg’ is modified to return the negotiated values, which might be slightly different than what you requested.
Afterwards, you can then continue with arcan_shmifext_lookup() to extract the function pointers to the parts of the requested API that you need to use, bound to the driver end of the created context.
When writing platform backends to existing applications, they often provide their own way of doing all this and we do need a way to work with that. If there already is a context living in your process and you want to manually export and forward a resource, it is possible through:
There are a several more support functions with similar patterns for context management, importing already exported images and so on, but this should be enough to get an idea of the process.
Special Case: Debugging and Accessibility
We have already shown how the client end can request new segments through SEGREQ events and how those are provided through NEWSEGMENT events coming back. Another nuance to this is that the server end can push a NEWSEGMENT without having to wait for a SEGREQ in advance.
This can be used to probe for support for things, such as custom client mouse cursors, or to signal a paste or drag and drop action (clipboard is just yet another segment being pushed), as the server end will know if the client mapped the segment or not.
There is nothing stopping us from actually mapping and populating the segment from within libarcan-shmif, and there are two cases where that is actually done. One is for SEGID_DEBUG and another for SEGID_ACCESSIBILITY.
If one of these are received, libarcan-shmif will (unless the feature is compiled out) internally spawn a new thread. In the debugging case it will provide a text interface for attaching a debugger, for exploring open files, inspecting environment and memory from within the process itself. In the accessibility case it will latch on to frame delivery (SHMIF_SIGVID) in order to overlay text descriptions that gets clocked to the video frames being delivered and the dirty regions being updated.
Special Case: Text-only Windows
There are more reasons as to why a set of fonts and desired font size is provided during the preroll state and why there is a ‘text rows and columns’ field in the static region.
Among the hints that can be set for the video region, there is SHMIF_RHINT_TPACK. This changes the interpretation of the contents of the video buffer region to use a packing format (TPACK) which is basically a few bytes of headers and then a number of rows where each row has a header covering how it should be processed (shaping, right-to-left) along with a number of cells with the formatting, colour and possible font local glyph indices (for ligature substitutions).
The format is complete enough to draw anything a terminal emulator would be able to throw at it, but also do things the terminal emulator can’t, such as annotation layers or borders that fall outside of the ‘grid’, saving precious space.
This approach lets the server end handle the complex task of rendering text. It also means that the costly glyph caches, GPU related acceleration primitives like distance fields and so on can all be shared between windows. It means that the server can apply the heuristic for ‘chasing the beam’ style minimal latency or tailor updates to the idiosyncrasy of eInk displays when appropriate, and that the actual colours used will fit with the overall visual theme of the desktop while letting the client focus on providing ‘just text’.
While it is possible to build these yourself, there is a higher level abstraction support library, ‘libarcan-tui’ for that purpose. The details of that, however, is a story for another time.
Our reference desktop environment, Durden, rarely gets covered here these days. This is mostly due to the major features are since long in place and that part of the project is biding its time with smaller fixes while waiting for improvements in the rest of the stack.
Recently the stars aligned and I had some time over to work on the accessibility story, particularly on the ‘no vision’ parts. Other forms (e.g. limited mobility, low vision) will be covered in future articles but most things are already in place, including eye tracking and bespoke devices like stream decks.
Here is a short recording of the first run of a clean installation, setting things up.
This is enabled by default during first setup. The first question presented will be to disable it, but there is no hidden trapdoor combination or setting to enable it for the first time.
The following recording shows using the builtin menu system to start a terminal emulator and open a PDF and the mouse cursor to navigate, with OCR results as I go. These will be elaborated on further in this article.
There is a previous higher-level article which covered a rougher outline of what is intended, but this one is more specific about work that can be used today — albeit with some rough corners still.
One detail to take away from that article is how the architecture splits user data processing into specific one-purpose replaceable programs. These can be isolated and restricted to a much stronger degree than a more generic application.
These are referred to as frameservers. They have a role (archetype) and the ones of interest here are decode and encode. Decode translates from a computer native representative to a human presentable one, like image loading into a pixel soup or annotated text into sound via synthesised speech. Encode goes from potentially lossy computer to human translation such as pixel soup to text via OCR, image description or transcribing audio.
Another detail is that the “screen reader” here is not the traditionally detached component that tries to stitch narration together through multiple sidebands. Instead, it is an “always present first class mechanism” that the window manager should leverage. There is no disconnect between how we provide visual information from aural and they naturally blend with the extensive network transparency as part of our ‘many devices, one desktop’ design target.
Key Features
Here is a short list of the things currently in place:
Multiple simultaneous positional voice profiles for different types of information
On-Demand Client requested Accessibility windows
Force-injected client accessibility fallback with default OCR latched to frame updates
Buffered Text Input with lookup oracle
All desktop interaction controls as a file-system
Command-line shell aware
Keyboard controlled OCR
Content aware mouse- audio feedback
Special controls for text-only windows tracking changes
Indexer for extracting or generating alt-text descriptions
Screen Reading Basics
Let’s unpack some of what happens during setup.
The TTS tool in Durden uses one or many speech profiles that can be found in the durden/devmaps/tts folder. Each profile describe one voice, how its speech synthesis should work, which kinds of information it should convey and any input controls.
They allow for multiple voices to carry different information at different positions to form a soundscape, so you can use ‘clean’ voices for important notifications and ‘fast robotic’ for dense information where actions like flush pending speech buffer doesn’t accidentally cancel out something important.
A compact form starts something like this:
model = "English (Great Britain)", gain = 1.0, gap = 10, pitch = 60, rate = 180, range = 60, channel = "l", name = "basic_eng", punct = 1, cappitch = 5000
These are just the basic voice synthesis parameters one would expect. Then it gets better.
These tell what this voice gets to do. The action keys mark which subsystems it should jack into, and some custom prefix announcement as value. The profile above would present all menu navigation, system notifications, clipboard access and window title on selection.
The bindings are keyboard binding overlays that take priority when the voice is active. Just like any other binding, they map to paths in the virtual filesystem that Durden is structured around. The two examples shown cancels all pending speech for the specific voice, or turns down the speech rate to low and repeats the last message then returns it back to the voice default. There are, of course, many others to chose from including things like key echo and so on.
Holding down any of the meta or accessibility bound buttons for a few seconds without activating a specific keybinding will play the current ones back for you to ease learning or refresh your memory.
By adding a position = {-10, 0, -10} attribute to the profile the system switches to 3D positional audio and, in this example, positions the voice to your back left. This feature also introduced the /target/audio/position=x,y,z which lets you move any audio from the selected window to a specific position around you, along with /target/audio/move=x,y,z,dt which would slide the voice around over time.
With an event trigger, e.g. /target/triggers/select/add=/target/audio/position=0,0,0 and /target/triggers/deselect/add=/target/audio/position=10,0,-10 the soundscape around you also match window management state itself.
The alt_text option would read any tagged UI elements with their text description. XY beeps specifies the frequency range to map the mouse cursor coordinates based on their screen position so that the pitch and gain changes as you slide it across the screen.
The waveform used to generate the wave will also change with the type of contents the cursor is over, so that text windows such as terminal emulators will get a distinct tone and distinguishes between empty and populated cells.
The following clip shows navigating over UI elements, an browser window and a terminal window. You can also hear how the ‘select text to copy to clipboard’ doubles as a more reliable means of hearing text contents in uncooperative windows.
There are also more experimental parts to the cursor, such as using a GPU preprocessing stage to attenuate edge features and then convert a custom region beneath the cursor into sounds. While it takes some training to decipher, this is another form of seeing with sound and applies to any graphical content, including webcam feeds. After some hours I (barely) managed to play some graphical adventure games with it.
Kicking it up a notch
Time to bring out the spice weasel and get back to the OCR part I briefly mentioned earlier.
The astute reader of this blog will recall the post on Leveraging the “Display Server” to Improve Debugging. A main point of that is that the IPC system is designed such that the window manager can push a typed data container (window) and the client can map and populate it. If a client doesn’t, there is an implementation that comes along with the IPC library. That is not only true for the debug type but for the accessibility one as well.
This means that with the push of a button we can probe the accessibility support for a window, and if none exists, substitute our own. This is partly intended to provide support for AccessKit which will complete the solution with cooperative navigation of the data model of an application.
The current fallback spawns an encode session which latches frame delivery to the OCR engine. The client doesn’t get to continue rendering without the OCR pass completed so the contents is forced to stay in synch.
This also where our terminal replacement comes in, particularly the TUI library used to provide something infinitely better than ncurses. The current implementation turns the accessibility implementation into a window attachment with a larger font (for the ‘low vision’ case), the shell populates it with the most important content and the text to speech tool kicks in.
Such text-only windows also get added controls, /global/tools/tts/voices/basic_eng/text_window/(speak_new, at_cursor, cursor_row, synch_cursor, step_row=n) for one dedicated reading cursor, that you move around separately.
Our custom CLI shell, Cat9, probes accessibility at startup. If found, it will adjust its presentation, layout and input control to match so that there is instant feedback. There is still the normal view to explore with keyboard controls, but added aural cues. The following clip demonstrates this both visually and aurally:
clip of using basic cat9 shell controls, with readline, completion and content feedback through the accessibility window
All this is, of course, network transparent.
A final related nugget covers both accessibility and security. The path /target/input/text lets you prepare a text locally that will be sent as simulated discrete characters. These are spaced based on a typing model, meaning that for the many poor clients that stream over a network, someone with signal processing 101 doing side channel analysis for reconstructing plaintext from encrypted channel metadata will be none the wiser.
This is useful for other things as well. For an input prompt one can set an oracle which provides suggestion completions. This is provided by the decode frameserver through hunspell, though other ones are just around the corner for filling the role of Input Method Engines, Password Manager integration, more complex grammar suggestions and offline LLM.
The following clip shows how I first type something into this tool, then read it back from the content of the window itself.
The current caveat is that it still does not work with X11 and Wayland clients due to their embarrassingly poor input models. Some workarounds are on their way, but there are a lot of problems to work around, especially for non-latin languages.
Here we continue the series of posts on the development of a command-line shell which defies terminal emulation by using the display server API locally and a purpose built network protocol remotely.
For this round we have added an interactive spreadsheet representation and a Debug Adapter Protocol implementation. These come as two discrete sets of builtins (groups of commands), ‘dev’ and ‘spreadsheet’ with a surprise interaction or two.
The intent for the dev builtin is to eventually collect more and more useful tools for managing all aspects of software development, from source control management to building, testing and fuzzing.
Starting with the spreadsheet. By typing:
builtin spreadsheet new
You would get something like the following screenshot:
The following clip shows how I spawn new spreadsheet, populated by a CSV source and using the mouse cursor to interact with the layout.
This complements the form intended for media processing that was presented a few years ago as “Pipeworld” and re-uses the same language and parsing logic.
Cells can be populated with static contents, expressions like =max(A1:A4) or even shell commands through the ! prefix, e.g. !date +%Y:%M:%S. The following clip shows some basic expression use, as well as forced reprocessing of expressions and shell commands.
Combining shell commands with expressions that are re-executed on request
More useful is to populate the sheet with outputs from some command and processed by Lua patterns. The following screenshot shows the result of running insert #0 4 separate "%s+:" !cat /proc/cpuinfo
Populating a spreadsheet at some insertion point using a shell command split and separated with Lua patterns
Exporting can be done using the existing copy builtin with control over output format, subrange and so on: copy #4(csv, compact, a1:b5) to export the resolved values of a1,b1,a2,b2 … as CSV.
There is still more to be done for this to be a complete replacement for the likes of sc-im, and to add things like plotting via gnuplot (if I can ever get their plot language to behave) and graphviz, as well more experimental things like importing Makefiles – — but at least for my, admittedly humble, spreadsheet uses it is good enough for daily driving.
Onwards to the Debugger
To start things you would do something like the following:
builtin dev
debug launch ./test
The ‘dev’ builtin set is used as it will eventually accumulate all developer task related commands like building, deployment, source control management and so on.
The following clip shows the default behaviour for that in action. In it you can see how multiple discrete jobs are created for managing threads, breakpoints and so on, detachable and with mouse cursor handling in place.
Basic debugger use and navigation
In the clip I also showed the process of stepping through threads, spawning source view, setting and toggling breakpoints, inspecting registers and modifying target variables.
I have spent thousands of hours staring at the GDB CLI prompt, and hated nearly every second of it. Not because of the relaxing task of debugging code or exploring software state itself, but for the absolutely atrocious terminal thrashing interface even with mitigation layers such as pwndbg. In fairness, a debugger TUI is in the deepest end of the complexity pool to get going.
There is a lot that goes into handling the protocol, and quite a few telltale signs of its designers, so we have just passed the point of basic bring-up. Importantly it is all composable with the data manipulation, filtering and transfer tools we already have elsewhere.
As an example of that we have ‘contain’ from the previous article, for instance, to bunch all the subwindows together into one contained job, useful when running multiple debug sessions side by side to step through multi-process issues.
We do have some other conveniences in place. Stepping controls are defined by the granularity of the job it represents, so stepping in the disassembly view would step instructions, while stepping in the source view would go by line or statement and so on.
Now to close the loop and mix in the spreadsheet part. In the following clip you see me picking a few registers and thread source location that gets added to a watch set. Whenever thread execution is stopped, these will be resampled and updated.
I then click the ‘spreadsheet’ option which will create a spreadsheet and populate it with the contents of the watchset as I go.
Live mapping watched dataset to spreadsheet
With all this in place we can almost start stitching the many other related projects together, from the data visualization from Senseye (closing in on 10 years…) with the window management from Pipeworld to the harnessing and setup from Leveraging the Display Server to Improve Debugging and build the panopticon of debugging from the plan presented in “Retooling and Securing Systemic Debugging” (2012, doi:10.1007/978-3-642-34210-3_10). But that is for another time.
Time to continue the journey towards better CLI shells without the constraints of terminal emulation. In the previous post we looked into the new commands list and stash. List was for providing an interactive and live updated take on ‘ls’, and stash for batching file operations.
This time we get contain for merging jobs together into datasets and each for batched command execution.
First a quick look at the latest UI convenience: view #job detach . This simply allows us to take any job and pop it out into its own window. The following clip shows how it works with ‘list’, initiated with mouse drag on the job bar. List was chosen as it uses both keyboard and mouse navigation, as well as spawns new windows of its own.
Showing detaching a list job, navigating and watching a media resource and re-attaching it again.
Onwards to contain. In the following clip I create a new job container by typing contain new. I then spawn a few noisy jobs and tell contain to adopt them through contain add.
Creating a container, manually adding two jobs to it and then toggling between viewing their contents and an overview of their status.
By default contain will show an overview, coloured by their current run status. I can step through the job outputs either by clicking on their respective index in the job bar or type in contain show 1 (or any other valid index).
The container can also be set to automatically capture new jobs. In the following clip I spawn such a container and then run some commands. Those get added into the container automatically.
Using contain capture to spawn a new container that absorbs new jobs until cancelled.
Contain meshes with commands like repeat, applying the action for all contained jobs at once. It gets spicier when I chose to merge the output of multiple contained jobs, either by right-clicking their entry in the job bar, or manually by running contain #0 show 1 2. These are then treated as a single dataset for any other commands, e.g. copy #0(1-100) that operate on the data for a job.
Contain even applies to interactive jobs. In the following clip I contain a ‘list’ in a detached window and show that mouse navigation is still working.
contain catch on an interactive builtin (list) detached into a separate window working as expected
Moving on to each. Each is related to ‘for’ in Bash and similar shells, locally known as the syntax that I never recall when I need to and rarely get to do precisely what I want. Since we accumulate previous command outputs in discrete and typed contexts, we can avoid the “for I in file1 file2 file 3 do xyz $I done;’ form and reference the data to operate through our job and slicing syntax.
Starting simple, running this:
each #0(1,3,5-7) !! cat $arg
Anything before !! will be treated as part of the each command, and anything after will be reprocessed and parsed with $arg substituted for the sliced data, with some special sauce as $dir which will check if it is referring to a file and substitute its path, or use the path of the referenced job.
While it might at the quickest of glances look similar to the ‘for’ setup, the actual processing is anything but. Recall that everything we do here is asynchronous. If I would swap out ‘cat $arg‘ for ‘v! cat $arg‘ each invocation would spawn a new vertically split window, attach a legacy terminal emulator to it, and run the cat command.
Each also supports processing arguments:
each (sequential) #0(1,3,5-7) !!open $arg
Would functionally make it into a playlist. In this clip you can see how the media in the stash opens, and each time I close the window it launches the next in line.
Using each on a stash of files to build a sequential playlist, running the next command when the previous finishes
Since we are not fighting for a single stdin/stdout pipeline, we have more interesting options:
each (merge) #0 !!cat $arg
This joins forces with the contain command by spawning a new container and attach the new jobs automatically.
The contained set of jobs also interact well with other commands, like trigger or repeat. In the following clip I repeat the previous form of running each on a stash of files. I then run the merge / cat command listed above and you can see how the commands keep progressing in parallel. Running repeat on the container would repeat the commands that had finished executing, merging output with the previous run, while letting the ongoing ones continue until completion.
Each with (merge) option on a stash of files, using sh ‘cat’ to read the contents of the files, repeating the completed subset.
The container here would also respect commands like trigger. Having a stash of makefiles, running them through a contained each like this:
each (merge) #stash !! make -f $arg -C $dir
trigger #0 ok alert "completed successfully"
Would treat each in the stash as a makefile, dispatch make, merge it into a container, associate a trigger with all merged jobs completing successfully and trigger a desktop notification.
That is enough for this time, next time around we will (likely) see what we can do to assist developer tooling such as the venerable gdb.
There has been numerous quality of life improvements added to our terminal emulation liberated command-line environment, Lash, to the curses-like “libarcan-tui” it relies on, but particularly to its reference shell implementation, Cat9. These have mainly been subtle enough that there is little point in making the effort with longer dedicated write-ups on their various implications and overall splendour.
In general my criteria for any non-release write-up on this site is that the contents should, at the very least, highlight and counter some of the many deficiencies in the popular solutions, as well as fit into a grander story.
For the chapter on text-dominant shells there are quite a few candidates in the pipeline. Rather than building them up for an information overload like I did with the original release post, I decided to space them out with a focus on one or two of interest.
For this round I have selected ‘list‘ and ‘stash‘, as they interact and complement each other in a nice way, as well as showcase better integration with the outer desktop.
Before that, lets highlight a few conveniences first. In the following screenshot you see how the suggestion popup gained some width after I pressed F1 and how the borders of the popup itself defies the grid, as it is not using line drawing characters. Many commands now support descriptions of what they or any subcommands do.
screenshot of command completion with extended hints
They also give stronger validation feedback about the validation error and the affected offset.
screenshot of stash map command failing validation, region for failure and reason
As well as hint about what the current argument is expecting:
completion showing that “stash map” is expecting a “source item” argument.
List supplants the well-known ‘ls’ (dating back to the very first AT&T UNIX). Although we still can fork and run ‘ls’ just like any old shell can with the generic improvement that we can reference its contents and context long after it is gone; repeat its execution later in-place and detect changes; slice out selected results and so on. Just as it is bland and boring, it is also problematic.
The file in the following screenshot does not exist, or at least isn’t called that and you can’t refer to it using the output of ls alone. If isatty() applies to stdout it behaves one way and you might get escaping that may or may not work for referencing it interactively depending on your libreadline implementation. It behaves in another when in a pipeline. All parsing of the output is ambiguous, we know that since long, don’t parse ls and all that. The data might be outdated the second it is produced and you wouldn’t know. Tab completion output presents it in yet another way.
screenshot of ls and bash struggling with a file, since everything apparently is a file, all is struggle.
Enter list. As shown in the following screenshot, as soon as I run the command some subtle things happen. One is that the number of ‘active’ jobs hinted at in the prompt does not go back to zero.
screenshot of Cat9 having run ‘ls’ versus our own ‘list’
That is because list is still very much alive. If I create a new file in the folder, the list automatically updates and marks that an item is new. If I click on folders it navigates to them, and if I click on regular files some default open action will be triggered. I can change sorting order without re-execution. For the setup I have here, the default open action tells the outer window manager to swallow up until terminated, and hand over to a dedicated player or viewer and this composes over the network if run that way. The same naturally works if I hand input focus to the list job and navigate using the keyboard. The following clip demonstrates all that:
video clip demonstration interacting with the output of list using both keyboard and mouse
There are quite a few sneaky details to this. If I would, for instance, change my working directory and try to reference contents in the list, what would happen? In the following screenshot you can see the result of copy #0(2,4,5) and how the list resolves to its full path and the presentation metadata is stripped away. The crazy file is now gone after composing with /bin/rm by running rm #0(8) — no fragile escaping necessary.
screenshot of running copy #0(2,4,5) with list correctly separating content from presentation
This brings us to stash. Internally the mouse-2 binding for list was set to stash add #csel($crow) which means add to stash the currently selected row in the currently selected job. Stash is a singleton active job (only one running). It acts as a dynamic list of sorts, but there is, of course, more to it. Stash accumulates files on demand from other source, such as list, drag and drop actions and so on. The following clip shows that in action.
right clicking items in list outputs automatically adds to stash, then using the stash to remove picked files
It also act as a generic staging area. A simple danger is that you “ls |grep pattern” and based on that information run a destructive command like “rm another_pattern”. The caveat being that infinitely many things might have happened between your ls and rm and innocent files are lost in the process. With stash I explicitly build my change set, then stash unlink yes to explicitly commit to only apply to this set.
As with list and others, we have deeper mouse and keyboard interaction as well as visual layout separated from the actual content because we are not a dot matrix printer. Resizing will prioritise content based on the space allotted. Presentation and content is separated. This screenshot shows a compact window having run ‘cat #stash’ to show composition with traditional cli tools.
screenshot show composition of stash contents with /bin/cat via ‘cat #stash’ appearing as job #2 at the top
Perhaps you can mimic some of this (not really, I am just being characteristically smug) via some of the recent trendy perversions of putting more logic into a terminal emulator clinging on to that riced PDP-11 experience as the default.
Another stash action is verify. This queues a checksum calculation for every file referenced in the stash. When that command is repeated, any entry where the checksum has changed gets highlighted. The following screenshot shows a stash of files where one was still being modified when the stash was re-verified.
This is useful to ensure that you are not operating on a stash that is currently being modified, or having preset tripwire protocol, mark the stash as part of the serialisable cat9 state set and move between machines to check that they match. It also goes well with the next item.
You should have noticed the arrows and that the name appears twice. This is for the last feature to demonstrate. If I run stash map <src> <new-dst> the arrow changes. This lets me give a name to the nameless, e.g. file descriptors being passed to the window. If I combine that with stash archive an archive (arc, tar, …) will be built using the mapping presented. The following screenshot shows such an archive built and extracted.
the result of building a stash, remapping (#1) building that as an archive (#2) and extracting it (#3)
In the following clip I use the file-picker in my outer desktop (durden) to drag and drop two files into the shell window. As you can see these are automatically added to the stash. Note that they are not shown by their path, because I am actually running this remotely over our network protocol.
drag and drop into remote shell, resolving into local files and archiving
I then use resolve to ensure that the items (marked !) are resolved to locally readable files as tar is not capable of accepting sets of passed file descriptors and working from (so eventually needs replacement with something better). Lastly I archive them together.