Containers: A quick tour

Lately I moved into elementary OS Juno, besides the usual bugs in an unstable release I was expecting to be able to do most of what I was doing before. I recently published my first application in elementary’s AppCenter, so the first thing I wanted to do in my clean system was to install it and check everything was working fine. But to my surprise, there were no elementary apps in App center!. What I hadn’t realized was that Houston (AppCenter’s backend) also needs some preparation for Juno, which means rebuilding every application for the new environment based on Ubuntu Bionic. This new repository is not yet ready, as it is to be expected in an unstable release. As I was thinking how to test my application, I saw Spotify running inside a Docker container. This got me thinking I should learn about containers and this seemed like a good opportunity. I wanted to see if I could use them as a test environment for older Linux distributions, specially elementary OS Loki.

This blog post was originally going to be a Github repository with the scrips necessary to create a system image and install it. Sadly, the more I learned about containers, the more I realized it’s not really about the commands you use, but the capabilities of each type of container and its relationship between the guest system, the host system and the application we try to run. In the end, I decided the actual scripts were not that interesting. Instead, the overview of the kinds of containers I tried seemed more useful. For me in the future, and maybe for other people trying to understand how containers work.

This means I won’t focus on a short description of the commands required to have a container up and running (there are already several short tutorials for this), instead I will try to explain my journey through different kinds of containers and how they relate to each other. That being said, if you want to try this out, you need the image of a system. This means having a folder with all files from a typical installation except special ones (more on these later). Before starting I should warn you that tinkering with these things may confuse your operating system in unexpected ways, so I don’t recommend running this in your main Linux installation. I tested with two installations, one of elementary OS Loki and another one of elementary OS Juno. The way I created the images was by creating a copy of the host, by running the following in a clean Linux installation:

$ sudo rsync -avP --exclude=/proc/* \
 --exclude=/sys/* \
 --exclude=/dev/* \
 --exclude=/run/* \
 --exclude=/tmp/* \
 --exclude=/media/* \
 --exclude=/home/*/* \
 / ~/my_container

$ cp -r /etc/skel/. ~/my_container/home/$(whoami)/

This command copies almost everything from the host, including network configuration, users,  groups, passwords, installed packages and creates a default home directory. Other ways of getting this image include using debootstrap, but then you will get a minimal system that has to be configured. The main topic here are containers, so I instead chose to create an image that should cause the least amount of problems. I know it sounds pointless to use host system as the guest system, but believe me, we will have enough fun as it is.

My objective was to create a container from which I could run a graphical application, with internet connectivity and sound: firefox. This would show I can use most of what a desktop application uses from it’s operating system, but inside a container.

Now, let’s talk about containers. My initial understanding of containers was “something like a virtual machine but faster, smaller and easier to create”. From my experience with virtual machines, communication with hardware from the guest system has always been a pain to set up. But desktop applications often try to communicate with the hardware, and my first impression of systems like Docker was that they try to favor security by creating overly isolated containers. When reading about it, chroot seemed to be a simpler alternative, closer to what I was looking for.

Chroot

While looking for simpler, less isolated containers, I remembered the installation procedure for Arch Linux. After booting a minimal system in a USB, one had to create another minimal root system where the final installation would be, then use chroot change the current root, to be this one we just created. Next, the rest of the packages of the desired system were installed using pacman, Arch’s package manager. Because we had changed the root, these packaged would get installed in our minimal system. Finally, the installation configured the bootloader to point to this new root and after a reboot this new system would be used. The cool thing to note about chroot is the simplicity of what it does: change the root directory, nothing more.

The environment we change into is called a chroot jail (ever heard of jailbreaking?, guess what kind of jail you are breaking out of). Technically, chroot is system call and has been part of Unix kernels pretty much since the beginning. Because in Unix everything is a file, we can have an idea of how the container will behave. Everything being a file means that the communication between the kernel and user space happens through the file system, using special files that represent hardware devices or kernel interfaces. By moving these files into the new root, applications will be able to communicate with the kernel even after changing the root directory.

On a simple system we want to keep /proc, /sysand /dev.  In broader terms these directories contain respectively: the running processes, interfaces into the kernel, and files representing hardware. The content of these directories is created by the kernel, and can’t be copied as a normal file, instead what we have to do is mount them as a bind inside our new root filesystemDoing this makes directories in our new root point to the original files in the real root.

$ sudo mount --bind /proc ~/my_container/proc
$ sudo mount --bind /sys ~/my_container/sys
$ sudo mount --rbind /dev ~/my_container/dev

You can check everything went well by running mount | grep my_container. You will notice we actually mounted more than 3 filesystems. This happens because the option --rbind creates a recursive bind, binding not only /dev but also any other mount that was inside /dev. To unmount them after we are done the simplest thing is to reboot. You can try using sudo umount  and the option -R for recursive umounts, but this usually fails for me. After these systems are mounted, we can change into our new root by using:

$ sudo chroot ~/my_container

Inside the chroot jail we can now run commands to see how the environment was set up. For example calling id shows we are logged in as root,  and checking env shows we have a different environment from the one we had before. These are symptoms of the first drawback of using chroot: it keeps the user and environment that were active when it was called. Which sounds like what we want, until we realize that our user is not the one calling chroot. We are using sudo to call it as root, and sudo creates a special safe environment. To log in as another user we can use chroot’s --user option, and to keep our environment we can use sudo’s -E but have to explicitly avoid changing the $PATH variable. In the end we have to get out of the session using exit and then get back in with:

$ sudo -E "PATH=$PATH" chroot --user=$(whoami) ~/my_container

We can run things inside our container, for example we can install packages as we normally do in a terminal.

$ sudo apt-get install cowsay

Why do we care so much about having the same environment as the one in the host system?, isn’t the point of creating a test environment for desktop applications to be able to have a different system?. The problem is the host operating system is not just a set of files, but also the state of each process that runs in the background as a daemon, like the init system, the dbus server, the X11 server, the display manager and so on. Right now we are carrying over all these processes from the host system into the guest system by binding /proc. The problem with these daemons is that there can only be one instance of them in a computer, they can have multiple clients, but we can’t have multiple servers.

On top of that, daemons also talk to the kernel and to other processes via sockets, or some other IPC mechanism. This means the three directories we mounted are not enough to provide all services a process inside chroot may need, for instance a more complex graphical application like AppCenter (which by the way has it’s own daemon) will most likely fail to run.

I will go into more detail about specific services that are relevant for desktop applications and how to make them available inside the container. But before that, let’s introduce schroot to ease the process of setting up the container. It .

Schroot

Schroot is a command that lets us modify the container quickly, so we can experiment with different setups and not have to mount and unmount everything every time.

First install schroot by using sudo apt-get install schroot. Next we need to tell schroot which filesystems to mount, for this it uses the same syntax as /etc/fstab. To set up the same container as before, create /etc/schroot/my_container/fstab as root, with the following content:

# file system   mount point    type     options        dump    pass
/proc           /proc          none     rw,bind        0       0
/sys            /sys           none     rw,bind        0       0
/dev            /dev           none     rw,rbind       0       0

For the rest of the container’s configuration schroot will read the contents of /etc/schroot/chroot.d/my_container.conf. It’s content is shown below, but  you need to change every instance of {username} with your username in the host machine.

[my_container]
description=simple container
type=directory
directory=/home/{username}/my_container
profile=desktop
groups={username}
root-users={username}
setup.fstab=my_container/fstab
preserve-environment=true

Even though we will be able to iterate more quickly over our container’s configuration we have to be aware that we are trading off some of the simplicity of chroot. Schroot will do more stuff, and we need to aware of what this is. For instance, we don’t need to be root. Editing my_container.conf as root grants permission to launch the container to users in the root-users property​. Other things schroot will do and we have to take into account are:

  • Read the file copyfiles and copy each file entry from the host into the guest system. By default it copies the DNS server found by your host system when connecting to your network.
  • Read the file nssdatabases and copy each file from your host’s /etc directory into the container’s. By default it will copy usernames, groups, passwords and network configuration.

The default copyfiles and nssdatabases files are located in /etc/schroot/default. In there, you can also find a default fstab that will not be used because we configured to use our version instead. This was done using the property setup.fstab in the configuration file. If you wan to change the behavior of the other files you can use setup.nssdatabases and setup.copyfile.

Actually on systemd, this my_container.conf may fail because the file copied by the default copyfiles is a symlink, so go ahead and create a new empty copyfiles inside /etc/schroot/mycontainer and modify my_container.conf so it does not copy anything. All that being said, we are now ready to launch our container with:

$ schroot -c my_container

Back to our discussion about specific services provided by the host. We can now discuss what else is needed to get a graphical application running inside of our container. Because I’m using elemenary OS I will test first with a simple GUI application, the calculator. When I try to run io.elementary.calculator in the container, even though the application runs fine, I get a bunch of errors:

[dconf] unable to create directory '/run/user/1000/dconf': Permission denied. dconf will not work properly.

This error is related to dconf the runtime configuration daemon used by GNOME. It stores settings like themes, fonts or the state of your app when it was closed, and tells interested apps when something changed. It seems the application is trying to communicate through the /run filesystem, and we didn’t mount that. To fix this let’s add the following line to our fstab file:

...
/run            /run            none    rw,rbind         0       0

Now calling io.elementary.calculator woks without throwing weird errors!. The problem with doing this is that a lot of daemons also communicate through this filesystem. Specially the directory /run/user/1000 that is owned by your user and can be written by any application. Mounting /run helps couple some daemons we want, like dconf, but also couples some we don’t want. To try this out run poweroff inside the container and watch how your host machine gets shut down (you don’t even need to be root for this!). After you reboot if you call mount in the host, you will see that schroot keeps everything mounted. It’s better to end that session that we didn’t close. Use --list to find the name of the session, and then close it.

$ schroot --list
  > chroot:my_container     session:{session_name}
$ schroot -c session:{session_name} --end-session

The troubling part is that mounting /run also couples other daemons that use dbus, which include notifications, Gala and several others. It’s now easy to see that testing applications gets problematic, because we will mixing daemons from the host system with clients from the guest system. But having the correct version of daemons and clients is not enough, things are also dependent on what we did before, to get a very similar session than the one in the host, by copying usernames, network configuration and environment variables. But hey! firefox runs inside this container. Still, something like AppCenter may cause problems and reset Wingpanel in your host machine.

This shows that the lack of isolation is not that good to test desktop applications. What we would really like is to launch most of the things we can inside the container and make some daemons from the host (also called services) available inside the container. The init system, (systemd in the case of elementary OS and almost every modern Linux distribution) is the one in charge of running all these services. After looking for ways to ask systemd to create a similar session inside our container I found systemd-nspawn. This container system has been specially developed to make test environments, being closer to chroot than other container solutions, with less configuration options but still powerful enough to really isolate the guest system safely.

Nspawn

Nspawn is really simple to use, but keeps increasing the complexity of whet it does, in comparison to chroot or schroot. It will create a clean chroot environment and will call an init system with a real PID 1, using Linux namespaces. This process in turn will try to run all services configured using systemd in the guest system.

The main difference is that we will try not to explicitly bind any filesystems from the host into the guest an instead let nspawn do it’s magic: we will launch our container and cross our fingers hoping that nspawn will mount what we need, and then let the init scripts from the guest system do what we want and avoid doing what we don’t want. If you have ever tried to debug the startup procedure of your machine you will know that the number of things that can fail is rather depressing. But let’s try it out, to run nspawn with:

$ sudo systemd-nspawn -bD ~/my_container

As you can see this command has to be run as root, just like chroot. The -b option tells it to launch the init system instead of just calling a shell, the -D option tells it where our image is. To get out of the container, use poweroff, this time it will not shut down your host but instead execute the shut sown sequence of processes started by systemd. The option --bind can be used to manually tell nspawn to bind something inside the container.

As I said in the beginning one of the key objectives was to correctly configure the network inside the container, this means having: internet access and DNS resolution. These can be checked using ping 8.8.8.8 and nslookup http://www.google.com. When testing these, I had different problems depending on the system I was trying things on. I will explain what I did, but I you may get different results if you are running different systems. The main thing you should know is that in Linux the file /etc/resolv.conf is supposed to contain a list of DNS servers, no servers, menas no DNS resolution.

When I tried a Loki system inside a Loki system DNS resolution was not working. For this, following Arch’s guide on setting up systemd-resolved inside the container was enough:

$ sudo systemctl enable --now systemd-networkd systemd-resolved
$ sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

This works because Loki uses resolvconf to generate the file /etc/resolv.conf and when it’s inside the container it does not find any server and generates an empty file. Instead, systemd-resolved generates a fallback file that includes google’s DNS servers 8.8.8.8 and 8.8.4.4.

When I tried a Juno system inside Juno bad things happened: running the container made DNS resolution not work in the host system, let alone in the guest system. After hours messing with this, I found that the correct solution was to (ironically) disable systemd-resolved inside the container.

$ sudo systemctl disable systemd-resolved

This works because Juno uses systemd-resolved for DNS resolution which in turn provides a stub DNS server in 127.0.0.53, which gets set as the only DNS server in /etc/resolv.conffor the guest system. This server gets set also in the guest system  because nspawn mounts /etc/resolv.conf whenever it detects systemd-resolved to be running in the host. What I think happens then is that two servers are launched to serve the stub in 127.0.0.53, one in the host and another one in the guest, which ends up breaking DNS resolution everywhere. Disabling systemd-resolved inside the container leaves only one server and the mounted /etc/resolv.conf with 127.0.0.53 as the only DNS server works correctly. So, are we done now? let’s try running firefox as before:

$ firefox
 > Error: no DISPLAY environment variable specified

Well, things are never as easy as one expects. This happens because the environment inside the container comes as a result of all the scripts that were run by systemd. It so happens that the very important DISPLAY variable, used by all graphical applications that use X11 was not set. In order to be able to establish a connection with X11 we must tell the display number as follows:

$ DISPLAY=:0 firefox

Success! firefox now runs inside the container (at least in mine). If this does not work for you there may be several reasons. Some systems don’t use :0 as their display, in that case you should check in the host system what is the output of calling echo $DISPLAY. If that still does not work I’ve seen people suggest binding the file ​~/.Xauthority as not doing this caused a No protocol specified error. Other people recommend using --bind-ro=/tmp/.X11-unix so that nspawn mounts X11’s socket, which should be read only or otherwise the systemd in the guest will delete it in the host system. I didn’t need any of these, but I’ve read it may depend on which display manager is used by the host. Maybe some of this stuff got moved into dbus? I don’t know.

Sadly the success will not last too long, as you may notice that firefox inside the container does not have sound. This happens because there is no audio server (Pulse Audio in elementary OS) inside the container. Which is systemd working for us so we don’t get multiple servers and cause chaos as the multiple DNS servers from before. So, how do we communicate to this server from inside the container?. The socket used to communicate to it is located in /run/user/1000/pulse, yes, that dreadful filesystem we didn’t want to mount, and the whole reason we started using nspawn. If you run mount inside the container you will notice there is already a /run/user/1000 filesystem being created, but it does not contain the pulse directory. Because we don’t want to mess with nspawn’s mount, we better mount the pulse directory somewhere else and then tell firefox where to find it.

$ sudo systemd-nspawn -bD ~/my_container \
       --bind=/run/user/1000/pulse:/run/user/pulse
... [boot sequence] ...

(my_container)$ DISPLAY=:0 PULSE_SERVER=/run/user/pulse/native firefox

Final remarks

At last we got firefox running inside the container! (at least I did, did you?). Maybe all the problems I had, can give you some insight into the complexity that goes behind containers. The main problem is the coupling between the host and the guest systems. In some cases we want them to interact, as for networking, audio and video, but in other cases we don’t want them to, as is the case with daemons like the display manager, the init system, the DNS stub server or a server storing global configurations. Because modern operating systems are composed of so many different parts, and services can interact many different ways, running something inside a container will always depend on the application we are trying to run, the guest system and the host system. Regarding the application, we have to know which services it needs and which services it can modify. For the host and the guest systems we should know which services they provide and how they interact with each other. This becomes very difficult for full systems that run hundreds of services.

There is really no silver bullet that will allow any application to run flawlessly in any guest system inside any host. This yields an unreasonable number of variables that must be taken into account. From what I can see, popular containers try to limit these variables to some extent. Some focus on a certain kind of applications like pbuilder (build systems) or Docker (web services) and some focus on a specific kind of guest system, like nspawn that is specific to Linux. In the end, we will never get rid of having a mix of daemons from the host system, with other daemons or clients in the (probably different) guest system. Any ABI breakage happening in any of these daemons will cause problems when running things inside the container.

There are still a lot of container types I haven’t talked about. Even though I have tried Docker I think it will be very similar to nspawn with respect to my use case. The places Docker shines are in deploying and releasing services to run inside servers. This is something that I don’t need in my case. Nevertheless I will try to get an image of Loki running inside Juno using Docker and nspawn, results will come later in another blog. Aside from these there are still other options like Snaps or LXC, these I have yet to try out.

For the time being I will keep testing in my spare computer. I haven’t found a reliable container solution that will guarantee I won’t spend a lot of time debugging the container itself. Using a spare computer gives me confidence that all daemons and clients being tested are the correct version, there is nothing weird happening to any service during the startup procedure and I can even debug the daemons themselves (which has been necessary several times). Containers are a very good option as long as you require few services from the host, so far I have yet to find something that works as good as I want for testing desktop applications.

Advertisements

Update #3: Loki is coming

It’s been quite some time since I wrote here, but we haven’t stopped working at all, in fact I have now a lot of new ideas of how to improve input methods on elementary OS. The good news is Loki will be here and we have some new cool features, the probably bad but really not so bad news is that since Loki beta is already out, Loki stable will be released any time now, so addition of new big features is not really an option. This will not let us add a lot of the most interesting ideas, instead we should start planning to get them on Loki+1.

After trying to make loading of .xkb files on Gala and discovering that was an unsuccessful approach (I still want to do this, if anyone knows how please contact me) I decided to get the most stuff working on Loki as I could. The first one was the Compose key, it had been broken in a while on Loki because this functionality was removed from gnome-settings-deamon and moved to Gnome Shell, we then needed to move this to elementary’s window manager, Gala.

After fixing this, I wanted to get modifier-only shortcuts to switch keyboard layouts working again (the problem is described in bug #1357895), the issue is the normal API to handle keybindings is not really designed to handle those of this type, although they are the most widely used in almost all operating systems. In the bug report Maxim Taranov suggested a work around on comment #52, so I started to make an update to the keyboard settings panel that used this, by the time I was doing so I was surprised to see a merge proposal by Kirill Antonik to Gala that would allow us to get this working in a less hacky way, and supporting much more keybindings, I immediately updated my work to use this instead. By this time I thought we just needed to wait for this branch to get merged on Gala, so we could update the keyboard plug with mine. Gala’s branch got merged but we decided the keyboard plug needed some work still so we got that a bit late, and some people trying the beta experienced a minor breakage, luckily this is all fine now as the keyboard plug has been merged too.

So, what’s new in the keyboard plug?, basically we finally got rid of the Options tab, and have replaced it by options that appear and disappear for layouts that actually need them (I bet most people won’t even know which these are without reading the code, but that’s exactly the point), while still leaving what we think are the most useful options available to users. Maxim and I spent a lot of time discussing which options to provide and I think we managed to get a very nice compromise between user friendliness and flexibility, some of this required some careful parsing of the xkb_config database but I really like the end result.

So that’s all for Loki, but I am much more excited by what will come next, it seems this series of blogs got some attention and since I’ve been able to learn about other languages. For instance I’ve spent some time talking to Jung-Kyu about Korean support, and that’s what I want to work on next, I have some new ideas about adding Korean-specific options to the keyboard plug. We also have contacted a lot of translators to know what their experience has been typing on their language on elementary and we’ve received feedback about CKJV and several other languages. I would really like to speak directly to people with problematic layouts so I can better understand the problem and figure out how to solve it (this is mainly the reason why Korean will be the first, Jung-Kyu is providing a lot of feedback as we go along), so if you have some ideas please don’t hesitate to contact me.

Update #2: The need for Wayland

We left off previously in that I was about to write code to load xkbcommon keymaps into the X server, that soon proved to be not a very good idea because the keymap type on xkbcommon is an opaque structure, this means the authors want to be able to change its organization at their will without breaking someone else’s code, so the only way to manipulate it is through their API. We could technically bypass this but I think the maintainer burden added would be too much, and it would make difficult to update to newer versions of the library. Another way to accomplish this is as a patch to the tree to libxkbcommon, which sounds fine, but as I said before I don’t want to bother other people because this is sort of an experiment, also it would add code there that will be useless once we move to Wayland, and will also make people less willing to move out from X.

After putting this to the side, it occurred to me that I could use the .vapi file I already had for libxkbcommon to manually translate keypresses into their unicode characters and then send them to the applications, I just needed to know where to “tap” the event stream going from a keypress up to when it was sent to an application. This seamed a reasonable idea because I would have to do it anyway to implement ibus (or at least it seemed like the obvious way to do so). Turns out this stream can’t be tapped because it doesn’t even go through Gala, and it also not how ibus is implemented (I guess precisely due to impossibility of intercepting keypress events). So how does this actually work? I’ll try to explain briefly next.

The first warning sign that told me this wasn’t how things worked was how ibus is implemented in Gnome’s Shell, there is a comment in their code that explains this, it says events reach the X server, then Mutter intercepts all of them, lets Clutter handle them through the clutter_x11_handle_event() function which will send them to the corresponding application (Clutter Actor), then the application receives an event that hasn’t been translated yet, notices an input method is enabled and sends it to the d-bus daemon that will push the resulting translation into a gdk event stream that the shell is listening into (through Mutter). Mutter then, assumes every event here comes from ibus, so it sends it back to Clutter so it reaches the corresponding Clutter Actor again.

At this point I decided to verify all this by myself, because if we are filtering all X events through Mutter and sending them to the corresponding Clutter Actor, why can’t we translate them in Mutter and then send them just once without this strange round tripping to the application?. I found 3 points at which I could intercept this event stream to see what was happening.

I knew there was a way of intercepting Clutter events, but nothing guaranteed that the function clutter_x1_handle_event() handled X events by translating them into Clutter events. After some reading of Clutter’s code I verified this was actually true, X events are translated and pushed into Clutter’s event queue. So I thought adding a Clutter event filter would show all keypresses before them being delivered to the application, if this were true I would just need to translate the keycode as I wanted and let the event reach is normal destination. After doing this I found out no CLUTTER_KEYPRESS events where going through (except for the tab and alt key while alt tabbing), what was going on here? at some point we were loosing events, but they were reaching the application because everything worked normally, so was there another event queue somewhere handling this?.

Then I found out another filter on ClutterX11 this one received an X event and a Clutter event, my guess was that this function was called when translating a X event into a Clutter event, so if my function filled the Clutter event appropriately and returned TRANSLATED then I would be doing the translation and continue sending the Clutter event instead of the X event. So I tried listening for X keypress events. Sadly enough, there were still no keypress events to be seen. I could only see some events of type GenericEvent, this seemed like another dead end. After some research I found out that listening for X key events was very naive, now a days to support weird input devices like Wacom tablets we use XInput2 that uses GenericEvents to send bigger events than what X supports. So I needed to decode these GenericEvents (there seemed to be a lot of these, just what I would expect if every keypress and keyrelease was being sent) to know if there were some XI_keypress events going through there. Doing this wasn’t a trivial task because XInput2 does not have much documentation and there are no Vala bindings for this. After creating a .vapi file for XI2.h I still had some trouble because I needed to cast a struct pointer to another struct pointer which is not documented on the manual vapi file tutorial (turns out to be a simple pointer cast on vala). After all this I could finally check if my keypresses were going through here, and guess what… they weren’t, only the tab and alt key I had seen before were going through here.

The third point at which I could intercept events was overriding the method Meta.Plugin.xevent_filter(), I could see exactly where it was being called from Mutter and I am pretty confident all X events that reach Mutter go through this before anything happens, in fact handling the event here would make it never to go through Clutter at all. Doing things here had a problem though, I would have to find the correct Clutter Actor that should get the event by myself, but nevertheless if keypresses were going through here it would be something I could work with. So I copied my code from the ClutterX11 filter here, and if stuff wasn’t disappointing enough already… well, no interesting keypress events were to be seen here either. So again, what’s going on here?.

My conclusion of all this is, key press events don’t ever reach Mutter and consequently neither Gala. This means the comment on Gnome Shell’s code is quite misleading or maybe I misunderstood it from the beginning. Events reach the application (X client) directly then if an input method is enabled these are sent to the window manager to be displayed on the input method window (the bubble). Keycode translation does not happen in the window manager but in the application itself, mostly hidden into Gtk so that application developers don’t have to do this explicitly.

It seems to me like i18n is a hard problem that no one can claim to have solved completely without knowing every language that a unicode string may represent. This makes people unwilling to commit to an API that will be frozen forever. Instead what I feel is that projects just pass the ball around to other projects without no one trying to actually solve it directly. Wayland right now only has basic keycode translation with libxkbcommon just like X did with xkb, and leaves everything else to the applications, but then Gtk tries to make easy to create internationalized applications by hiding all this below some API. Let’s hope wayland-im will improve things.

Where does this leave us regarding Gala?, well I’ve came to the conclusion that what I wanted to do (load arbitrary xkb files) can’t be done in a non hacky way, because currently as an X window manager we don’t have full control of the events going through, this will only be true once we move to Wayland (and even then we may have to deal with the remaining abstractions (Mutter and Clutter). Still I think users can’t wait for Wayland right now so I’ve decided to move away from my original idea and fix this temporarily for the next elementary release just the way Gnome does it. I will then start looking into  moving Gala to Wayland because I think this is what we really need to take the X server out of the middle and let us handle events by ourselves. I also think we need a better infrastructure that allows developers to test things quickly before actually committing and releasing to everyone else, without having to deal with code that is outside in abstract dependencies, so we’ll see how that turns out later.

 

Update #1: Too many layers

I’ve been digging, trying to implement the loading of xkb files on Gala, turns out that stuff has been abstracted several layers deep, which makes it difficult to experiment with this stuff without bothering people upstream and having to make changes and coordinate with several different projects (and teams). Because what I’ve been trying to do is rather experimental, I don’t want to just create patches for a lot of projects and argue they should merge my code because I think it will be good (I don’t even know that myself), instead what I’ve been trying to do is implement stuff and see how well it works out and if I really think it’s useful for others.  I’m also looking to add the least amount of code to Gala, so rewriting the whole keyboard handler seems like the most extreme solution, and I am trying not to come to this.

I will try to explain briefly all the layers involved with keyboard handling on Gala, before in place of this I had about 3 paragraphs trying to do so but they were mostly just ranting about the excessive abstractedness of the whole thing, but in the end I don’t think that’s relevant for the discussion and I don’t have a global understanding of all projects to question decisions made by others, so I will just point out important details I’ve seen I need to get things done. For this I will just show a diagram of how systems interact (it’s important to note that this relationships are only related to keyboard handling I have no idea how these systems interact for example for graphics).

keyboard_handling_deps
Dependencies of keyboard handling

There are 5 projects here, arrows represent where some project calls a library from someone else. Gala is written in Vala which means calling C code from here is not a trivial task. Mutter and Clutter are written in C but using GObject and Glib, this both ease calling it from Vala code but because they are in the end just C code they can easily call C libraries like XLib, or libxkbcommon. This makes them our most accessible interfaces to lower level interfaces from Gala. Also a sidenote about Mutter is that it aims to be both compatible with Wayland and X11 so it has two backends to support this, but it also means we can’t expect Wayland specific functionality to be provided from it.

I think Gala uses Mutter’s X11 backend but I don’t know how can I test this to be sure. The problem of this is that the API provided by Mutter to set the keyboard layout only uses the RMVLO description which seems to be a legacy interface that comes from the fact that this is what setxkbmap does and was the easiest to copy, as opposed to what xkbcomp does which implied understanding the xkb description specification and how to upload it to the X server (both approaches in the end invoke the xkb compiler every time a layout switch happens which is one of the issues I’m trying to solve). Because Mutter cares about providing functionality available only to X11 and Wayland it’s unlikely that an API that uses libxkbcommon to load xkb files will be provided.

Although Mutter with it’s native backend does use libxkbcommon to change the keyboard layouts, when trying to call the function that does this from Gala I stumbled upon several issues, the most important one seems to be that X11 grabs input devices and does not let Clutter listen to events, an assertion on the library fails saying “Clutter is not the device manager”. Also, this interface is not part of the API stability guarantees which on one hand could bring some problems in the future, but even more frustratingly: makes them completely unusable from Vala code, because as it turns out outputting #define symbols through Vala is impossible (we just need to add the line #define CLUTTER_ENABLE_COMPOSITOR_API ) and even worse do it before the #include directive for the library. So at this point using Clutter’s keyboard handling on the native backend becomes unfeasible.

So, what do we do now? Well I’ve come to the conclusion that right now the simplest approach is to make functionality to load libxkbcomon’s keymap format into X11 directly on Gala. Currently for testing what I did before I created a .vapi file by hand for libxkbcommon so I can now use it from Gala, I haven’t seen bindings for XKBLib so I may have to do this next, we’ll see how that all goes. If I’m successful with this then maybe it would help Mutter to include an API for this, so I could upstream some of it if it would actually work for someone else.

As a minor side rant, I have to say that I wished Gala would have a more monolithic design and would not impose OOP through Vala in the way it does. I mean, the most complex, big and by far successful free software project is the Linux Kernel and it’s also a huge monolithic piece of code, I think segregating functionality around several projects adds a lot of overhead for people trying to help. It’s true that the codebase would grow substantially, but I think having code that actually does something as opposed to glue code would ease fixing problems and trying out new things.

Keyboard input methods and i18n on elementary OS

This post contains some ideas I’ve had lately on how to improve the internationalization of input methods on elementary OS (mainly keyboard input). Originally, my idea was to write a blueprint but I think there’s still a lot to be discussed before proposing what to actually do. I will write down my ideas here and hopefully with some feedback we will manage to get a blueprint we can work on.

This is a very long text but it is what I currently need to clearly explain my reasoning. The idea is to summarize everything later, but you can always skip to the end where the most important information is summarized in bullet points.

The problem

There are a lot of different types of keyboard devices, and it’s very hard to guess which one the user has. However this layer has been simplified greatly by the fact that almost all modern keyboards use USB, and that the kernel handles the translation of key presses to keycodes and provides them to us via evdev. In spite of this, even if we can decode keyboards relatively easily, there is also the problem of which language the user wants to type in on their PC, this we cannot guess from basically anywhere but maybe the user’s locale configuration and even if we tried to do so people who type in a language other than English will most likely want to do so in several other languages with very specific needs (applications they use have shortcuts better suited for US layouts, their language can be typed in several ways, or they are fluent in more than 2 languages) currently this has been solved by allowing users to switch their keyboard layouts by using a sequence of keys, but this has been broken constantly because there has never been a definite solution to the problem that encompasses enough languages to suit a very large audience, so languages are patched on top, and then any change breaks something.

In the times when X was developed the keyboard input was very basic and it didn’t offer support for almost any features other than the ones needed for a US layout. This is why the X keyboard extension (xkb) was devised; it was a way to allow switching the symbols that every key had assigned on the fly, it added more modifier keys, and added support for Unicode (I think this wasn’t there before but I’m not sure). This was a good thing but it still left out several languages that are quite complex such as Japanese or Chinese. Later on ibus and several other input methods were created to support these but in order to correctly make layout switching work distributions had to disable several xkb options because xkb modified the keyboard layout at a lower layer than them and would then produce some weird behaviors.

A lot of the design of xkb was influenced by the limitations imposed by the X protocol, and how fast (or slow) computers were back then. But now, people is trying to leave X behind, so we can try to get rid of most of this cruft and try to design a more robust solution that won’t break as often. We want something that supports typing in as many languages as we can, allows users to switch seamlessly between them, and makes it as easy as possible for them.

Current state of things

Some time ago most distributions exposed all xkb options through a very ugly interface, just like elementary actually does. Ubuntu and Gnome were others that did the same, but they decided to remove them in favor of better support for complex languages that used ibus but compromising a lot of these xkb options (which some times can even conflict with each other). This angered users greatly because they couldn’t easily change their layouts as they did before. This was solved but still a lot of the flexibility that the xkb options allowed was lost and feelings were hurt. I think there is no need for such a compromise.

Currently the workflow for anyone trying to type in another language is:

  1. Try to set up the language from the operating system’s settings panel, if they find it there then they are fine and happy.
  2. If this does not work google “how to type in <language name> on Linux/elementaryOS/Ubuntu” and get to some tutorial about configuring ibus or spend hours deciding which of all the options available they should try for their language.
  3. Install ibus (or another input method), and in the case of ibus also install the actual engine for the language they want to type in.
  4. Use the interface provided by the input method (which will always try to override what the operating system does because it knows better), then the user just hopes the operating system can handle this and wait for it to magically work, which some times doesn’t happen.

After doing this, even if they succeed at step 1, there are some caveats, for example a lot of people got used to changing layouts by using both shift keys. They will be disappointed to see this does not work anymore, but the only reason it worked before was because X was the sole manager of the keyboard and this is not the case anymore.

There are other issues with the fact that the panels provided by these input methods look ugly in elementary which is not nice aesthetically.

The solution

To me input methods can be classified in 2 types, let’s call them basic and advanced, basic input methods map 1 input thing to exactly 1 keysym where input thing stands for either one key press, several modifier key presses and another non modifier key or a dead key followed by several other keys, the point here is the computer can know when to translate the input to the needed keysym by itself, either because the keymap file tells it, or a dead key sequence matched. These can be easily specified and configured with xkb and it’s keymap file format to describe layouts.

These files have often been kept hidden from the users and xkb options were provided as an “easier” way of editing them to the user’s needs. I think these options have evolved to fit a lot of particular requirements that not a lot of people actually need like “Left Alt as Ctrl, Left Ctrl as Win, Left Win as Alt” nevertheless distributions often just spit all of them in some GUI to the user. This has grown to a point where I think it may be even simpler to explain the format of a keymap file, than trying to describe what these options do, don’t do, or how they interact when conflicting ones are enabled like “Swap Ctrl and Caps Lock” and “Swap ESC and Caps Lock”. I think the definition of a keymap file is not difficult to understand once you remove a lot of the complexity added back then that we don’t need anymore like groups, geometry, rules. Just the opposite, the flexibility gained by learning this can’t be matched by any GUI or set of options provided by someone. We should just expose this to power users and stop trying to digest it for them.

Contrary to basic input methods, advanced input methods are required when the input character sequence yields to multiple options of keysyms, this happens in languages where you type the sound of a sentence but it can be written in several ways, so the user must choose which is the one they want. In some cases the program even has to guess how to separate characters into words (note that I don’t know any of these languages so this is what I have concluded from reading about them). This would also be the case if we wanted to provide some kind of predictive text input method like the one on phones nowadays. The difference here is the input method can’t know what the user wants to translate their key presses into or when the translation should happen, so it needs to provide an interface to them (usually a popup with options), and then wait for them to choose the correct one.

However, these two do not need different implementations, actually, advanced input methods need a basic input method to get the characters they want to translate into keysyms, so these actually sit on top of basic ones, and should work with them instead of trying to override everything they do.

On top of all this we should provide nice graphical interfaces so users can configure input methods to their liking which basically means provide ways of changing the basic input method and specific options to the advanced method they are using (if they are using one). Advanced methods should be “bundled” with a basic one that the user will be able to change if for example they want to use Pinyin on an AZERTY keyboard instead of QWERTY. The keyboard layout configuration would consist of a set of basic methods and advanced methods each with their own basic method bundled to them.

Implementation

Probably the most important factor here is that X is being replaced by Wayland. This will render all X specific stuff useless but will also give us the opportunity to choose where to go next. Either way we must be aware that some stuff will be useless in some time (code wise) unless we start to move to the actual libraries that will be used on Wayland.

To provide the kind of integration a user would expect from elementary I believe we will need to add some new functionality to Gala. For the basic level of input methods libxkbcommon should be more than enough, this library contains all the currently used layouts on Linux but in an X free environment and is the proposed solution for when we move to Wayland. This library loads a keymap from several sources such as a description like the one used before by setxkbmap with the RMLVO syntax (which is the only one used currently by Mutter), or a full .xkb file containing everything we need for a layout. We should provide a gsettings interface that allows to chose a layout using any of these options.

We also need to finally get rid of the options tab on the keyboard plug on switchboard because as stated before is hard to understand, provides conflicting options, breaks often because X is not the only one controlling the keyboard anymore and is not flexible enough for the user’s specific needs. Instead I have thought about a solution that takes into account the most common scenarios I’ve found people complains about when loosing this tab: “I can’t swap X and Y keys anymore”, “I can’t enable the compose key where I want it” and “I can’t change layouts anymore”. The last one will hopefully be handled by Gala and the fact that layouts will mostly be in one place. So my idea is the following:

Add a small interface that will ask for the user to press a key, and provide a menu of possible common actions to bind it to, these actions can be “Control, Caps Lock, Shift, Alt, AltGr, Compose Key, Menu” and maybe some Japanese specific keys like the kana key but I don’t know enough about this to have a concrete idea. The implementation of this is surprisingly easy if we use libxkbcommon to load a layout file where we create an alias for that key, or change it’s keycode.

remaping-widget
An attempt of mockup for the widget I’m talking about.

For any other use case not handled by this what I suggest we do is allow an arbitrary xkb file to be loaded from the switchboard plug. I really didn’t knew how powerful this was but I think most of the complaints people have can be addressed if they knew how to write their custom keyboard layout file and sometimes can even be better than switching layouts. Even if people is really not willing to learn the file format instead of trying to agree on a set of configuration options it would be easier to design a graphical application that generates layout files in an intuitive way without most of the limitations that the original X input method imposed on the xkb file format, in a similar way as Ukelele does in OSX.

We could also fix some bugs as we move along. Currently every time a layout switch happens libxkcommon is called by Mutter to compile the next layout and load it to Clutter, we could be more clever about this and load all layouts to memory and just switch the reference to the current one. This would fix an annoying bug, where you switch layouts and the first key you type after that doesn’t register because the new keymap file hasn’t been compiled yet.

For the advanced input methods people has always supported ibus but I have found out this is not unanimous and some people prefer others like Fcitx. What ibus did was provide a framework that desktop environments used to display the bubble with options, interpret user’s feedback and send it to the application, it did not provide any language specific capabilities. Instead other people used this framework to create engines for specific languages which are daemons that communicate with the Shell throug D-bus. On Wayland a merge was accepted on Weston that extends the Wayland protocol to allow a preedit section and feedback from the user, this is still a Weston only thing and I haven’t come across information about it being merged into the core protocol. But this new framework will eventually replace what ibus did with a much more standardized version of it. So, on the X side of things we are pretty much left with using ibus but it will be useless once we move to Wayland if im-wayland gets into the core protocol, so we may just support ibus as a temporal solution.

In any case, what I would like to do here (and I haven’t thought about this at length implementation wise, mostly because I don’t know enough about these input methods to have an idea on the requirements) is to provide a set of advanced input methods out of the box which can be selected from the keyboard plug without installing anything, each providing layout-specific options on the right panel. To do this we would need to narrow down the problem to a subset of languages, and then choose one of the input methods available for each of them. Then we would need to be able to configure them from our keyboard plug (I think ibus engines provide a d-bus interface we could use for this). Which, would imply looking at each language and deciding which options are the most useful, this is a very language specific thing and would require someone who is fluent in the language to provide us with feedback.

Undoubtedly, for advanced input methods to happen nicely we need a lot of feedback from people who actually need them.

What needs to be done

If all that was too long to read for you, here is a summary of the key points I think need to be done:

Gala

  • Use libxkbcommon to add a way of loading arbitrary xkb files (this change should actually happen on Mutter).
  • Try to load all layouts on memory and just switch the reference to the current one.
  • Provide a gsettings interface that allows to specify the list of layouts coming either from a file or from an RMLVO description, and also leave room to choose advanced input methods.

Switchboard keyboard plug

  • Kill the options tab.
  • Add a way of loading arbitrary xkb files into Gala.
  • Add an interface that asks for a keypress and shows a menu with options on what action should be binded to it. In a similar way as to how custom shortcuts are added.
  • See if ibus packages are installed and list them on the keyboard plug. Even better a set of pre installed ibus engines can be provided so that people chooses a language and it just works.
  • Provide a subset of the options given on the specific engine’s configuration panel directly on the keyboard plug (this should be doable through d-bus).

Team work

  • Agree on a set of officially supported languages to narrow down the problem.
  • Get at least one person who uses each of these languages on a daily basis and is willing to spend time giving feedback to developers.
  • Decide on an input method per language from all the ones available.
  • Work with each one of the language advisors to agree on a subset of configuration options that will be provided directly from the switchboard plug.

Final Notes

All of this has come to my mind after a lot of time of reading, mostly by myself, about this problem. I’m not entirely sure about everything here but I would really like people to do some mockups for the keyboard plug, I have some in paper that I could try to draw on Inkscape but my skills aren’t great.

Also my native language is Spanish so my assumptions regarding advanced input methods on other languages may be wrong.

Finally, I would really like to get general feedback about this, if you have any comments please feel free to send them to me, my mail is: santileortiz@gmail.com