Make Linux (Xorg / Wayland) Great Again on Touchscreen Devices

7 Apr, 2019

As you may or may not know, I've got a brand-new Surface Pro 6 in place of my GPD Pocket, partially due to the unbearable slowness of Atom processors. It is easy to see that I'm not a Microsoft guy, especially not a Windows guy, thus I almost immediately wiped the OEM Windows and installed ArchLinux on my Surface. Why did I bother buying a Microsoft device in the first place you ask? Well, somehow I just wanted a 2-in-1 device and this seems the only usable choice.

I haven't written a blog post on how to configure the whole thing for Linux yet, and I might do it later. For now, I would like to focus on one aspect of the configuration, which is the touch screen. Everybody knows that not many desktop OSes work well on a touch screen -- possibly only Windows work fine. Mac lacks support for touch screens as a whole and that's probably because of their strategy of avoiding competition with themselves. For Linux, there is basic support for touch screen in almost every desktop environment -- touch at least works as pointer events. However, you are out of luck if you would like to use your device purely by touch screen, without a mouse or trackpad -- you can't even get right-click menus to appear or type words into the text boxes (the QT on-screen input method overrides the IME thus not very useful for for CJK speakers like me).

For on-screen keyboards that are compatible with other IMEs, there are a ton of solutions and I'm not going to reinvent the wheel again. But for the right-click menus, I haven't seen a lot except in some applications they try to make it work by designating long-press events as right-click. It seemed to me that the only way forward was to implement something on my own, something that works across desktop environments to ensure that I won't have to write everything again in case I switch desktop environment or migrate to Wayland in the future.

Evdev

Since I want to make something not depending on a specific display system or desktop environment, things like LD_PRELOAD hooking magic is out of luck for this purpose. Although it is completely possible to hook into the input processing logic of libinput, which most of the desktops use nowadays and I have done it before for my GPD Pocket, for something even a little bit complicated like this one, the hook will just depend on too much of the internal logic of libinput, and they may even be version-specific. If I choose to do this, I may as well fork my own version of libinput and maintain it myself, and this could even be easier.

The only solution here seems to be going down one or two levels inside the Linux input stack. An image on the Wikipedia Wayland page shows the current Linux input driver stack clearly:

input stack

The input driver stack of current Xorg display server is also similar due to the adoption of libinput as the default input abstraction layer. What lies directly below the libinput library is the evdev driver of the Linux kernel, which exposes device nodes in the form of /dev/eventX for each enumerated input device.

These devices are character devices that are readable by any program with sufficient privileges and the access is not exclusive, which means that we can detect input events without hacking into any library at all. All that is needed is a privileged daemon that reads from evdev devices just like what libinput does, and the detection of long-press events are pretty trivial to implement by hand-rolling -- something I believe most programmer have done in some point. Besides, a user-space library libevdev can be used to parse the events from those character devices, further reducing the work we need to do.

Next problem is how do we simulate the right-click event after a long-press event having been registered. Actually, those evdev devices are not only readable, but also writable, and what is written is injected into the normal event stream of the corresponding input device. This property seems very useful to us, and the linux input subsystem also comes with a /dev/uinput device node that allows setting up new virtual evdev devices directly from userspace. Either way, simulating input is as simple as writing events into the corresponding device nodes, which is also well-supported by libevdev for convenience.

A Python Toy

There is a simple binding of libevdev for Python, simply named python-evdev. With it, basically you just create an InputDevice instance pointing to the evdev device node, and then write an asynchronous asyncio loop calling InputDevice.async_read_loop() to loop over all the events emitted by the device.

For faking the input device, python-evdev also provides an interface for the UInput part of libevdev -- it can even copy the capabilities of any given InputDevice to create an almost identical fake input device, which is extremely convenient for our purpose.

For some reason, the pen and the touchscreen on my Surface Pro 6 shows up as different evdev device nodes. Fortunately, everything in the python-evdev bindings are compatible with the asyncio asynchronous interface of Python, so we can simply run multiple async loops, one for each device that needs right-click emulation.

With these in mind, I quickly threw something together. I've put the code on Gist, but basically it's just the following snippet

async for ev in dev.async_read_loop():
  if ev.type == ecodes.EV_ABS:
    abs_type = ecodes.ABS[ev.code]
    # Track the position of touch
    # Note that this position is not 1:1 to the screen resolution
    if abs_type == "ABS_X" or abs_type == "ABS_MT_POSITION_X":
      pos_x = ev.value
    elif abs_type == "ABS_Y" or abs_type == "ABS_MT_POSITION_Y":
      pos_y = ev.value
  elif ev.type == ecodes.EV_KEY:
    tev = KeyEvent(ev)
    if tev.keycode == 'BTN_TOUCH':
      if tev.keystate == KeyEvent.key_down:
        if trigger_task != None:
          trigger_task.cancel()
        trigger_task = asyncio.get_event_loop().create_task(trigger_right_click())
        pos_last_x = pos_x
        pos_last_y = pos_y
      elif tev.keystate == KeyEvent.key_up:
        if trigger_task != None:
          trigger_task.cancel()

where trigger_task is simply a asynchronous task to trigger right click after a certain delay. This is to ensure that the right click only happens if a touch event isn't released within a certain interval -- that is, a real LONG click.

While debugging this code, I came across a problem that the resolution of the touch device is not 1:1 with the screen resolution -- it is usually very high compared to what the screen can offer. Since it is impossible to keep a finger still while doing long-pressing, a fuzz within a certain range must be allowed, but it will be wildly different on every device because of the different resolution ratio between the touch device and the screen, and also because of possibly HiDPI screens. To calculate the ratio and to analyze the dpi of the screen itself, some interface must be used to query the data from the actual display server that is in use, which kind of conflicts with my goal to make it display-server-independent.

What I ended up doing is to just read the maximum allowed fuzz from an environment variable, and therefore every user can simply adjust one environment variable if they find themselves uncomfortable with the sensitivity of the long-press detection.

Another problem here is that this program might conflict with some applications that already implements right-click emulation, e.g. Telegram and Firefox. These applications are those obeying the rules, and breaking them is not something a compatibility script should do, at least not explicitly. The reason of the conflict is simple -- the script emulates another right-click when the right-click action has already been triggered. The extra click, though it is still a right-click, may cancel the previously-triggered right-click action such as context menus if the click happens outside of it. I could not find an obvious solution to this problem, so I simply made the delay of my script shorter (or considerably longer, just not nearly the same) than the delays used by most of these applications. This way, the right-click our script triggers will not happen too close with the right-click emulation implemented by the applications, giving us some space to avoid the conflict manually (e.g. by removing the finger quicker).

Native Implementation

Python is a great language for quick prototyping. However, after playing with the prototype script I wrote, I found some strange stuff happening, probably due to some bugs or limitations of the python-evdev binding. The most annoying is that the UInput interface seems to stop working after running for a while without obvious errors. The program doesn't crash -- it just stops sending emulated events without any visible clue. Restarting it fixes the problem immediately, and I tried to reduce the program to its simplest form, only to find that this still happens randomly.

Exhausted by the debugging process, I wrote a simple C program to see if it happens by directly calling into the native libevdev APIs. And it doesn't -- the C program I wrote to test it works perfectly after an entire day of use. At this point, I've already lost interest in figuring out what the hell is wrong with that Python binding -- it might be better to just rewrite the whole stuff in C. After all, it didn't seem a good idea to me to run a Python interpreter all the time just for this simple feature. Something in native code would be much more elegant as long as I am concerned. Of course, it would be much better if I can do a kernel module for this, but then it seems like an overkill to me.

To implement the same functionality in C, we have to use something like select or epoll to poll on the fds opened from /dev/eventX devices. When the fd is readable, an event can then be pulled from the libevdev interface, just as how it is in the Python bindings. For the delayed task, I used a timerfd provided by recent versions of Linux, which is simply an fd that becomes readable after a set interval, working perfectly in a select or epoll context.

With all of these in mind, everything went smoothly implementing the C version. This time, I exposed more configurable options via environment variables, e.g. the delay of long-press events, the fuzz, and a blacklist and whitelist of devices to be included in the list that the program will listen on for long-press events. I've also implemented a touch device detector based on its capabilities, so you can expect it to work out of the box without messing with your non-touch mouse inputs. Unfortunately, this doesn't work when you have a touchpad, because they look exactly like a touchscreen. This also doesn't take dynamic changes of devices into consideration.

Anyway, it was a fun programming exercise considering that I've never written a proper C program before, let alone things like select and timerfd. My code of this program is on my Gitea instance, while an AUR package for it is also available (compiled version in ArchLinuxCN repository).

Final Thoughts

Linux desktops are kind of a mix of both love and hatred to me. They are flexible, configurable, but sometimes they just miss that critical feature that I would not die without but would be annoyed. Sometimes, things get worse when there are multiple competing solutions of one problem but none of them are compatible with each other, while none of them being a fully working solution.

I'm not to blame anyone for this. After all, the Linux desktop community is still like a hobby community -- we are always seen as 'nerds'. I just want it to become better and possibly remove some obstacles that brought me annoyance, so that nobody else would. Plus that it was really a fun programming practice to actually implement something with C, a language that I dare not to touch until this day.