Not really English version of TypeBlog.Net. Mostly random thoughts and short comments, seldom long and complete articles.
15,070 words
https://en.typeblog.net Mastodon zh_CN

The "Magic" of Custom ROMs: Prelude

I am going to start a blog series on porting custom ROMs for Android devices. This idea comes from recent chats with friends on the topic of choosing new phones to buy and custom ROMs -- that the whole notion of being able to compile an entire operating system from scratch and make it running on a given device seems pretty magic-like to many. Pretty much all of these friends of mine are fairly tech-savvy, and many of them are developers themselves and use Linux as their daily driver OS on their PCs. But when it comes to phones, somehow, running a customized OS begins to sound like magic to them, and most of them just sit around and wait for someone to port a custom ROM to the phone they are using, or alternatively, they check if there is already somebody working on a port before deciding to switch to a new phone.

To be clear, I am not expecting everybody to learn to make everything on their own. Frankly, working with something like a custom ROM has never been something too enjoyable except for people that like to stare at a terminal window rolling for hours and exclaim when the phone even just lights up the screen with a boot animation, to say the least. However, it seemed to me that people do not refrain from trying just because they do not like the process. Instead, most of them stop trying long before they even get a glance into the actual process itself. They stop because there is not enough guidance on where to begin and how to start their first step into the rabbit hole of custom ROM development.

Now, there do exist quite a few documentations and guides on porting custom ROMs, on some forums like xda-developers or on wiki pages of custom ROMs like LineageOS. But they either only describe a process that worked for one instance with no actual expalanation and insights on the rationale behind each step, or just being too vague to actually guide anybody in the first place. Here is a quote from a LineageOS wiki article

Ok, so if nobody is making headway on your device, where do you go from here? Consider that the majority of the device maintainers for LineageOS have significantly different day jobs than Android device maintenance (or even programming, for that matter). If you are passionate enough about getting LineageOS up and running on your device, you can make it happen. Start easy; buy an old, but well supported device and try compiling your first ROM. Once you’re running software you compiled yourself, start investigating the device configuration files. Tweak, and then tweak some more. Eventually, see what you’re able to accomplish on your device of interest!

...yeah, it is not great.

I am not going to quote any guides on XDA like this. Although they are more detailed than this one from LineageOS, they seldom work except for one or two devices and, as previously mentioned, do not actually tell you the rationale behind the steps. If you do not have the exact same environment and device as used in those guides, it is almost certain that you will run into some issue not covered by them at all. I have seen people following those guides and being turned away by an error early on which can be fixed easily if the rationales were actually explained instead of simply listing commands for people to copy and run.

The custom ROM community is, for the least, more obscure than many of other FOSS communities I've interacted with, not in the way that the source code is not visible (otherwise it cannot be called FOSS), but, hopefully evident from what I just described, the fact that there is such a lack of either guidance or documentation for people to get started. Android Open Source Project (AOSP) does have an official documentation page, but that is more for OEMs instead of FOSS developers, and even that is probably not quite complete if you compare it with what OEMs actually get from Google. This can not only hurt the community in the sense that less developers are available, but also by the possibility of creating and enabling some so-called buildbots or snake-oil projects to survive -- since there is no easy way for an educated user to quickly check for information. These questionable developers are normally those that managed to get a build working (probably by following those guides with steps but without explanation) but failed to obtain more information to start actual development. Still desiring the "status" of a reputable developer, they resort to making bold, probably questionable claims and pretend to know what they are actually doing. Back to the users' side, if the only source of information is what developers claim, then listening to a "reputable" developer does not really feel that different from listening to a random developer that happens to make bold claims. I have been involved in one of such dramas, and while I totally regret taking part in one and mocking these uninformed developers, it did make me realize how many more people could have become actual Android ROM developers if more information was available in an easy-to-acquire manner.

In addition, custom ROMs are not ported to new devices based on the number of requests (as LineageOS Wiki states); instead, it's only devices with developers capable of porting custom ROMs that get supported. Therefore, more developers also means potentially more supported devices, bringing in more users, which in turn increases publicity of custom ROM projects. The higher the publicity is, the harder it will be for Google and mobile phone vendors to screw us up, which has been happening since several years ago with the introduction of SafetyNet and the proprietarization of many formerly open-source Android components, for example, the Dialer and SMS apps. To me, it seems particularly important that the FOSS Android ROM community should have more developers to combat the recent tendency of proprietarization of the once open-source Android ecosystem, yet it is not capable of doing so because of all the aforementioned obscurity.

To make it absoluetly clear, being obscure is most certainly not intentional, either by Google or by the community. The whole process of porting an open-soruce Android ROM to a device does include many, many difficulties and differs significantly depending on what sort of device you are working on. The Android Open Source Project documentation was intended for OEMs because that's what it was desigend for from the very beginning: for people or organizations that have at least some control over the hardware, for example, the partition layout, or the trusted execution environment (TEE). But, obviously, we, as average comsumers, do not. That the Android custom ROM community thrives on almost blackbox hardware nobody in the community knows any implementation details of is already incredible, especially before there was a thing called Project Treble that significantly reduced the workload of such development.

I know I would not be able to change the situation very much. After all, I am just yet another Android ROM developer that shares a lot of problems everybody has. I do not have time either to answer every question about ROM development just like everybody else. Neither am I a very good writer that writes easily comprehensible articles. Heck, I am not even an English native speaker. But I want to at least make an attempt to do something different. I want to share part of my experience as an Android custom ROM developer for all these years, not just a series of steps that even a sophiscated script may be able to execute. I would like to paint a general picture of what is "porting", what do we do when we say "porting", and, most importantly, how to figure out solutions to problems on your own when trying to do so -- even StackOverflow is seldom useful when attempting to port an Android ROM.

So, here I start a new blog series. I am not sure yet what it will be, but let us just try and see.

Standard Notes Sync Protocol, and SFRS, a Rust implementation

As you may know, I have been a user of Standard Notes for a long time. Since I am a self-hosting nerd, during this whole time I was using a self-hosted server of Standard Notes, using the Go implementation of the Standard File protocol (the former name of the sync protocol of Standard Notes). There is just one slight problem: this implementation seems to be abandoned since mid-2019.

Of course, Standard File (and thus Standard Notes Sync Protocol) is not a particularly complicated protocol that needs maintanence that often, but things have happened since the Go implementation last updated. For example, the whole Standard File name was discarded and the protocol became officially named as the synchronization protocol of Standard Notes. But the protocol itself was later updated too, at least introducing a different conflict resolution algorithm, which the Go implementation did not support and totally relies on backwards-compatibility of the client-side.

As a programmer, the first instinct after encountering such things would be to rewrite one and maintain it for oneself. And so did I. To be completely fair, I could have just switched to the official Ruby implementation and call it a day, but I am not a big fan of the official server-side code. I think their client-side software, including the UI and the code itself, is fantastic and I may not be able to do something at the same level of competence, but the Ruby on Rails backend is really not something I like that much. Similar to the Go server, it depends on timestamps from system clock (it may not be strictly monotonic) for synchronization tokens, which, though should be totally fine 99.9% of the times, could bring some unexpected synchronization behavior during multi-client synchronization scenarios, as one or two bug reports may indicate (this is purely speculation), especially when there is no lock in place to limit how many parallel synchronization requests can happen at once for each user, and their conflict-detection logic somehow ignores conflicts whose timestamps are within an arbitrary amount of time apart, for some reason that I could not yet understand. Again, this is far from a disaster and should be fine for most of the use-cases, and I am probably just thinking too much (I cannot even give any specific reason why those designs were not the best ideas), but still, not something I love. So I set out to rewrite a synchronization server in the language I like, Rust, and named it SFRS.

During implementation, I noticed that none of the aforementioned updates is obvious from what we can see directly. The new Standard Notes website includes some documentation, but its description of the synchronization protocol was still identical to the old one that the Go implementation uses. The explanation on the official website is also a bit too vague and general, making me confused while trying to implement myself. I have reported these discrepancies to Standard Notes, and it seems that they are in the middle of a big refactoring of the client-side (probably to get rid of AngularJS and use React instead), and the documentation will be updated after they finish. In case of anyone like me who tried to re-invent the wheel, I decided to list out all of these discrepancies in this blog article before the official documentation is updated.

Please keep in mind that I am not the designer of the protocol and all of these are based on the official documentation plus what I can extract from the source code of the official implementation. I cannot guarantee that these will be correct, but they at least worked for me and I will try my best to explain what I discovered during my trial-and-error process of getting things to work.

Requests and Responses

This is not a discrepancy, but something not stated clearly in the documentation. At least for the official client, the client-side will always send request bodies in the form of application/json, and will expect responses also in the form of application/json. However, since GET requests do not have a body, the parameters will be passed in the query string instead, and this is the only exception.

I am not sure if other client-side software may use other formats for requests, like application/x-www-form-urlencoded, but at least using JSON everywhere makes the official clients work perfectly.

Authorization Endpoints

The most unexpected discrepancy comes from the authorization endpoints, i.e. /auth*. The official documentation says that the response of endpoints that return a token (the registration endpoint /auth and the sign-in endpoint /auth/sign_in) should be

{"token" : "..."}

...which is incompatible with their client-side implementation. The actual Standard Notes client expects an extra field, user, with email and uuid as its attributes. The full response should look like

{
  "token": "...",
  "user": {
    "email": "...",
    "uuid": "..."
  }
}

I am not sure why the client expects such an object, and why the user must have a UUID even though it seems to be used nowhere in the client. My initial implementation did not even need a UUID to identify users. Whatever the case is, simply adding these fields made the client happy and stopped crashing, which is a good sign.

Synchronization Tokens

The description of the /items/sync endpoint is, to me, the most confusing piece of information in their documentation. The confusion starts in the basis of synchronization -- sync_token and cursor_token -- which they described as

sync_token: the sync token returned from the previous sync call. Leave empty if first sync.

limit: (optional) the number of results to return. cursor_token is returned if more results are available.

I don't know about others, but for me this is not clear at all what they are supposed to be. I know that to synchronize, you need something to record where the client was last time, so the server can send whatever the client does not yet have on the next request. But here we have two different similarly-named entities with seemingly similar functionality -- both should be used to record where the client was last time -- but apparently they are totally different.

At first, I assumed that only one of cursor_token and sync_token is needed depending on the circumstance, i.e. maybe when limit is set, sync_token is no longer needed. This was not the case, and it caused the client to misbehave as of my testing. Then I tried several different approaches, whose process was too messy to talk about in an organized manner.

Finally, after several failures and some digging into the source code of the official implementation and the Go implementation, I ended up with something that works. In this working configuration, sync_token and cursor_token are defined as follows, respectively:

  • sync_token: always holds reference to the latest known state of the current user during the last successful synchronization, no matter whether if the latest item has been sent to the client or not, and no matter the client sets a limit or not. In official and Go implementation, this is always the timestamp of the last synchronization, while in my Rust implementation, it is the maximum ID of known items for the user. (This ID is incremented atomically each time an item is created or updated, so it's like a clock that "ticks" upon each insertion / update event)
  • cursor_token: refers to the latest state that has been sent to the client. If present, this should always point to a state earlier than that pointed to by sync_token. This should only be returned when there is a limit parameter in place and the server knows that there are more to send to the client. The presence of this token instructs the client to perform another synchronization ASAP to continue receiving the rest of content. However, even if cursor_token is present, sync_token should still be updated to the latest state, instead of the state that cursor_token points to.

Conflict Types and Detection

The official documentation describes the unsaved field in the response of /items/sync to contain items that conflicted during synchronization. This is already obsolete as of the latest client-side implementation. Instead, now the conflicted items should be returned in a field called conflicts and with the following structure:

{
  "type": "sync_conflict|uuid_conflict",
  "unsaved_item": { ... },
  "server_item": { ... }
}

where | means OR. If type is set to sync_conflict, then unsaved_item should be set to null and server_item should be set to the conflicting item that exists on the server. If type is set to uuid_conflict, then server_item should be set to null and unsaved_item should be set to the conflicting item sent by the client.

The distinction between sync_conflict and uuid_conflict is not clear from the official client-side source code. It only says uuid_conflict could happen if a user imports old backups, which is better than nothing but still confusing. It turned out that this distinction is specific to the official server implementation:

  • sync_conflict: a conflict that occurs when an item is updated while the same item (with the same UUID) has already been updated by another client since the last synchronization of the current client.
  • uuid_conflict: occurs when two users tries to upload an item with the same UUID.

The whole reason that uuid_conflict is a thing is due to the design choice of the official server: it uses the client-provided uuid field as a primary key in its database. This is fine, because we can deal with it through this uuid_conflict, but it can be avoided by choosing not to use the field as primary key. This is exactly what I did in my Rust implementation, though I was not aware of this issue beforehand. It was only after I have tried to understand why uuid_conflict is a thing for a whole night that I realized its purpose.

SFRS, Rust for Standard Notes

The above is about what I can recall from dealing with the protocol. Although faced with a few challenges, I am happy to say that I ended up with something that works, and the protocol itself is relatively simple and consice overall. The source code of my implementation is on GitHub, and I am already dogfooding it to test if there was anything else I missed. That said, I have to warn you that this is still in very early stage, documentation is still missing (though I think my comments on the synchronization part of the code is better than official) and I might decide to make breaking changes in case anything critical happens, but I am pretty confident that the likelihood of such an event is pretty low.

There are just a few things left that I would like to mention. The first is about my choice to use a per-user mutex to limit concurrent calls to /items/sync for each users to one and only one. Though I have simulated and examined some possible scenarios in my head, I was afraid that due to non-atomicity of the synchronization operation, unexpected things could happen had two parallel synchronizations been processed in just the right order. This should be fine for most users as I do not believe anybody will be synchronizing from a bazillion devices at the same time to trigger any performance hits caused by mutex.

Also, I have opted to use the Rocket web framework for Rust to write this implementation. This is far from the optimal choice, since Rocket does not even support async/await (yet, I can see great progress in that direction and it looks like support is coming very soon), and uses a threadpool to handle connections. However, I feel really attracted by the API design of Rocket, as it is very elegant and abstracts away a lot of the verbosity of Rust when used with web development. Considering the fact that I will need a threadpool (or a queue on another thread) anyway to handle SQLite transactions, this should not be performance hit too severe, and I did not intend to make SFRS something suitable to be used by a million users simultaneously anyway.

Encryption is not Security

We are not in a time short of products that claim to "encrypt" your personal data with "millitary standards" and thus keeping them safe from leakage or delibrate attacks. This can really work in convincing a lot of non-tech-savvy people and even some with rudimentary computer knowledge about how secure those products are -- until some leakage events happens out of nowhere and everybody gets screwed.

The problem here is that things involving encryption does not necessarily imply security of the particular data you are concerned about. Encryption is a broad term that can be applied to anything that includes some algorithm to prevent part of the population from accessing some data. Anything from simple dictionary-based cipher to modern cryptography all fall into this category, but I am not even talking about the vulerabilities concerning different ciphers here. What I am talking about is the question of which part of the population exactly do you want to block from accessing, in other words, the threat model.

The word "secure" itself is vague unless the context specifies a well-defined threat model. What are you afraid of? Who should be able to see your data and who should not? How do we ensure you are you, not someone else faking your identity? Of course, encryption is a powerful tool to achieve any sort of security, but any implementation cannot be said to be secure under all threat models. You are using millitary-grade encryption right now to browse this post because my blog is using HTTPS protocol which encrypts all plain-text traffic, but to my server, to me, the content still needs to be decrypted, and nothing prevents me from publishing all your IP addresses in some log format. No matter whether you consider IP address privacy-sensitive, I think it is pretty obvious that in this case, if I were to claim my website being secure from such leakage, it would be bogus. The HTTPS protocol defends against people spying on your Internet connection, but does absolutely nothing about both ends of communication. It is secure under the threat model where nobody in the communication channel except both endpoints can be trusted, but nothing else. One cannot imply that such use of encryption is secure under any other circumstances.

It is how encryption is used in a product that matters, not whether it is used. My previous absurd example is laughable, but when such claims come from some more complex or even "commercial" software products, somehow, many of us forget what being secure actually means. And I know that it is a stretch to assume everybody can learn these basics, but frankly, in the age of Internet, one must have such knowledge to "survive" -- I mean, to keep private data safe. There are a lot of resources out there about introductory cryptography, and I am 100% sure every single one of them will talk about the definition of "secure" and threat models at the very beginning. To be fair, many of them are not intended for people without a technical background, so we definitely need some of such resources in simpler language. But what we also need is people that actually try to learn about how to ensure their own security.

Not surprisingly, it also lies upon developes and, in the case of commercial software, companies, to stop throwing buzzwords that are not even well-defined in the first place (trust me, even some open-soruce projects do this). Do not claim your product to be secure just because you, somehow, used some encryption, somewhere, without mentioning what you are defending against, and how all the buzzwords contribute to this. Do not ever try to imply your product being more secure just because of encryption -- explain what it is for, and what adversary it could prevent. And, of course, you should always know, for yourself, what you are defending against, because some developers really do not. It is not whether actually being secure that matters here -- after all, the word is not well-defined by itself -- but the false sense of security you might implant on your users. The feeling of "I'm secure" without knowing what the hell it even means is much, much more dangerous than any security vulnerability.

(This article was partly motivated by some Magisk module here in China that saves your payment password and auto-fills upon fingerprint authentication to some paryment apps. It claimed to be somehow more secure due to its use of encryption, but it actually just encrypts the keys with ANDROID_ID, which, though is not the same across all applications anymore, is still not intended for security purposes and can be predicted, given that the adversary can read the data files. It defends against no extra adversaries compared to not encrypting in the first place, but somehow people do believe its claims, and maybe the developer also does really think so.)

The Zygon War Speech from The Doctor

(From "The Zygon Inversion", in 9th series of Doctor Who)

...

"It's not fair."

"Oh it's not fair. Oh I didn't realize that -- it's not fair. You know what, my TARDIS doesn't work properly, and I don't have my personal tailor."

"These things don't equate."

"These things have happened. They are facts. You, just want cruelty to beget cruelty. You are not superior to people who are cruel to you. You are just a whole bunch of new cruel people. A whole bunch of ... you cruel people, being cruel to some other people, who end up being cruel to you. The only way that anyone can live in peace, is if they are prepared to forgive. Why don't you break the cycle?"

"Why should we?"

"... What is it that you actually want?"

"War."

"Ah, right. And when this war is over, when you have a homeland, free from humans, what do you think it's gonna be like? Do you know? Have you thought about it? Have you given it any consideration? Because you are very close to getting what you want. What's it gonna be like? Paint me a picture. Are you going to live in houses? Do all people go to work? Will there be holidays? Oh, will there be music? Do you think people will be allowed to play violins? Who's gonna make the violins? Well, ..., oh, you don't actually know, do you? Because, like every other tantruming child in history, Bonnie, you don't actually know what you want. So, let me ask you a question about this brave new world of yours. When you've killed all the bad guys, and when it's all perfect, and just, and fair, when you have finally got it, exactly the way you wanted, what are you going to do with people like you? The troublemakers. How are you going to protect your glorious revolution, from the next one?"

"We'll win."

"Oh, will you? Well, maybe. Maybe you will win. But nobody wins for long. The wheel just keeps turning. So come on, break the cycle."

"Why are you still talking?"

"Because I want to get you to see... and I'm almost there."

"Do you know what I see, Doctor? A box, a box with everything I need. A fifty percent chance -- for us two."

"Everyone fingers on buzzers! Are you feeling lucky? Are you ready to play the game? Who's gonna be quick and who's gonna be luckiest?"

"This is not a game."

"No, it's not a game sweetheart and I mean that most sincerely."

"And why are you doing this? ... You set this (the truth and consequences buttons) up, why?"

"Because it's not a game. This is a scaled model of war. Every war ever fought, right there in front of you. Because it's always the same. When you fire that first shot, no matter how right you feel, you have no idea who's going to die. You don't know who's children are going to scream and burn. How many hearts will be broken. How many lives shattered. How much blood will spill until everybody does what they were always going to have to do from the very beginning. SIT DOWN AND TALK. (Sighs) Listen to me, listen -- but I just, I just want you to think."

"I will not change my mind."

"Then you would die stupid. Alternatively, you could step away from that box. You could walk right out that door, and you can stand your revolution down."

"I'm not stopping this, Doctor. I started it, I will not stop it. You think they will let me go after what I have done?"

"You're all the same you screaming kids, you know that? Look at me, I'm unforgiveable. Well here's the unforseeable, I forgive you. After all you've done. I forgive you."

"You don't understand. You will never understand."

"I don't understand? Are you kidding me? Of course I understand. And you're calling this a war, this funny little thing? This is not a war. I fought in a bigger war than you will ever know, and it's the worst thing you could ever imagine. And when I close my eyes... I hear more screams than anyone could ever be able to count. And you know what you do with all that pain? Shall I tell you where you put it? You hold it tight, that burns your hand. And you say this: no one else will ever have to live like this. No one else will have to feel this pain. Not on my watch."

...

"It's empty, isn't it? Both boxes, there's nothing in them. Just buttons."

"Of course. You know how you know that? Because you've started to think. Like me. ... No one should have to think like that. And no one will. Not on my watch. ...Gotcha."

How I Unlocked Xiaomi Qin 2 Pro and Installed Phh GSI

For a guide instead of a diary, please click here

Note: This article describes the process of me finding a way to flash custom ROMs onto my Qin 2 Pro phone, and this is not meant to be a comprehensive guide that anyone could do after reading. If you decide to follow this, please make sure you read the entire article and you have sufficient technical knowledge to do so. The unlock and flashing process is more complicated than any Android device I have ever used. I take absolutely no responsibility for bricked or broken phones.

All steps in this article was carried out on firmware version 1.1.0 (China) on Qin 2 Pro. Anything else is not tested.

Edit: Please note that all the below methods are not discovered by me from scratch in any sense; rather, they are pretty much all based on great work done by people at 4PDA. I apologize if it was not clear previously, but I would like to express my great thanks to those people, especially those who somehow retrieved all the necessary files and keys and made the first flashable package on this device. Without their work, I would not have ordered the phone in the first place, let alone trying to run any sort of custom ROMs.

Background

Recently, I got interested in a phone from an unknown branch of Xiaomi, called Qin 2 Pro. This is a phone that stands against the recent trend of smartphones: a small screen, no fancy in-display fingerprint scanning, and even without a front-facing camera, but still with Android 9 preloaded and a gorgeous thin bezel without anything blocking the beuty. You may know that I had a k-touch i9, which is even smaller, but that phone was not really usable as an actual "phone" -- the screen is just too small to operate on. I am really into small ones, but I still want it to be large enough to, for example, type on. The Qin 2 Pro, to me, was very interesting because it's like somebody brought some good old phone to the future and updated it with a modern screen, battery and operating system.

It looks particularly like Xiaomi 2, especially the white variant, which was the device on which I began doing Android development. But even the nostalgia did not overcome the big, big bummer that held me back from placing an order: the bootloader is locked and there is no official way to unlock at all. This all changed when I saw a post on a Chinese forum, which stated that the OP had already unlocked the bootloader successfully. Almost immediately upon seeing that, I placed an order for the phone, hoping that I could now finally run my favorite ROMs on this device.

Unfortunately stupid me forgot to check the actual method of unlocking, and there was none. The guy on the Chinese forum did not reveal any details about how the unlock method works, let alone a step-by-step guide of how to do so. At this point I already had the phone on hand, so i decided to search around in case there are other places with clues on how to unlock the device and install modified firmware.

The Not-So-Techy Unlock Process, and learn some Russian

The phone actually comes with an OEM unlock switch in developer options. However, if you try to run fastboot flashing unlock or fastboot oem unlock or anything about unlocking to the phone after turning this on, it will just report that the functionality is not implemented, with an embarassing Chinglish spelling error. Normally I would give up at this point, but since it is from a questionable manufaturer (Qin is not part of Xiaomi, but more like an ODM), my gut feeling told me that there must be some vulnerabilities, either in the kernel or in the firmwares, that we can make use of for at least an apparent unlock.

This first piece of clue came from a Russian forum, 4PDA. As you may know, I am from China, which is not the same as Russia, which means I could not understand even a word in Russian. But after some exhaustive google-fu, I just could not find any other pages about the phone, except all those articles that talks about nothing but the specifications. Seriously, what's the point of writing a "review" when it's just the specifications?

That aside, it seemed to me that I had to somehow try to find pieces of information from that forum. There is a post on 4PDA that serves as a central topic for the Qin 2 (Pro) device. Using Google Translate, I quickly noticed a guy claiming that he could unlock the device, but again, without detailed information. Later in that thread, the same guy posted an rar archive named Android_device_unlock.rar, which is definitely very appealing. Sadly, when I clicked to download the file, I just ended up with a 404 error.

You would not believe me how much time I wasted by assuming the link was broken, but I did. I just looked over that post and tried to find further information, but people there just seemed to suddenly begin talking about making "pac" ROMs, and on how to remove those Chinese apps, never revisiting the unlocking topic again.

At some point it came to me that maybe the link was not broken after all. Maybe it is just showing 404 because I am visiting the forum as a guest, not a registered account. With the help of Google Translate, I stumbled through the registration process, finally blocked by the CAPTCHA at the end that required knowledge of Russian. I do have some Russian friends, but I did not feel like bothering them for stupid reasons, so I tried to learn to read the CAPTCHA. Fortunately, I was able to find some pretty comprehensive guides on the CAPTCHA on 4PDA. It turned out to be some Russian numerals, and this guide listed all possible numerals in Russian that may appear in the CAPTCHA. I just had to use my pattern-matching "brain power" to read the thousands, the hundreds, the tens and finally the ones, rewrite it as normal digits, and voi-la, the system accepted my registration.

After passing the CAPTCHA and logging in, the file was finally available for download. The archive contained a fastboot Linux ELF executable, a shell script, a rsa4096_vbmeta.pem which looked suspiciously like a private key, a signidentifier_unlockbootloader.sh, and a start.sh, with the rest as some documentations in Chinese that we do not really need. Presumably start.sh was supposed to automate the unlock process, but somehow it would not work on my machine. Fortunately, I was able to figure out the correct way to unlock, using information from that script.

It turned out that the phone was indeed locked, and the unique token of the device must be signed with a designated private key to facilitate the unlock process. The token can be fetched by ./fastboot oem get_identifier_token with the device in fastboot mode (adb reboot bootloader). Note that you must use the executable from the archive, since my normal fastboot binary did not work when running this command. I think there are some vendor-specific modifications to the fastboot protocol. After the command is executed, the output would look like the following

...
Identifier token:
XXXXXXXXXXXXXXXXXXXXXXXXXXXX
OKAY [  0.017s]
finished. total time: 0.017s

The XXXXXXXXX part is the token we need. Something to note is that on my device, the XXXXX part was broken into two separate lines, and it has to be concatenated to one line without any line break or space in order to form a complete token (this was also learned from several trial-and-errors). Now you can execute

./signidentifier_unlockbootloader.sh ${TOKEN} rsa4096_vbmeta.pem signature.bin

with ${TOKEN} replaced with your actual token retrieved from the last step, which generates a signature.bin that the bootloader understands. Then,

./fastboot flashing unlock_bootloader signature.bin

will send the signature to the device and unlock the bootloader. The bootloader will prompt for confirmation on the device, and you have to press the volume down button to unlock.

After some waiting, now you will be greeted with an expensive paperweight. This was exactly what the guy who published the unlock files was talking about, which was not translated into something that made sense to me at first. I understood it with the help of my Russian friends (finally), and it seemed that you have to use the ResearchDownload tool to re-install the firmware to bring it back to life after unlocking. I assume the bootloader was not really designed for unlock-ability, so maybe it wiped something it should not wipe when erasing userdata.

There are a lot of guides on how to use ResearchDownload, the tool to flash Spreadtrum devices, so I am not going to repeat every possible detail here. Basically, you just fire up the tool after installing Spreadtrum drivers, press the first button, choose the "pac" ROM file, wait for it to load (note: you might want to disable EraseUBOOT and EraseUBOOTLOG here via the settings icon in case anything breaks during re-flashing UBOOT and bricking the phone, but this is optional and if nothing fails you are always fine), then press the "double arrow" button, after which you restart your phone while holding the "volume up" button (on the "normal" Qin 2 non-pro it is presumably the AI button, which is the button besides the red power button). The flashing process will now begin and you can release the volume button.

Our nice Russian friends on 4PDA privided a link to the pac version of firmware 1.1.0 for Qin 2 Pro. Earlier in the thread, the same guy who published the unlock files also published links for firmwares of the normal Qin 2, with a link to the ResearchDownload tool and drivers in the same post.

When the firmware is re-flashed, everything started to work just fine. The system boots, and in developer options it shows that the phone is now unlocked. But now I started to understand why people on 4PDA just do not talk about this "unlock" -- the bootloader, in its unlocked state, only allows flashing signed images. Anything not matching the original signature will fail to load and get stuck in infinite waiting. At least, now I can actually use the fastboot flash command, instead of the ResearchDownload software that is notoriously inconvenient and runs only on Windows, though I still could not actually flash anything without official signature.

Flashing

Now it's time to figure out how to actually flash anything to the Qin 2 Pro. The first thing I tried, obviously, is just using the ResearchDownload software to install unsigned images (replacing the ones that comes with "pac"). And obviously, this does not work. It fails with the same symptoms as when trying to flash over fastboot -- just stuck with infinite waiting.

The guy on 4PDA actually had some working non-official ROMs on Qin 2 non-pro, but none was available for Qin 2 Pro, and the available ones were in pac format. Wondering how he managed to make them, I downloaded his packages along with the "original" ones, and tried to compare them to find something. Again obviously, I could not find anything useful. How did I assume that I could manually compare through all the binary files?

Skimming through the thread again and again, and I finally noticed the same guy saying something about avbtool and somehow it may enable flashing even without unlocking. I came to realize that maybe this is indeed something about AVB, or Android Verified Boot, a secure boot scheme used by Android. Since the bootloader seems to verify the flashed images against "something", it must be pulling the whitelisted public keys from somewhere. It is either hardcoded, or loaded somewhere else that can be overwritten. If it were hardcoded, then this is game over, but since someone was able to make non-official ROMs for Qin 2, this means that at least it should be overwritable. If I were to implement anything like such a bootloader, where would I store the public keys? Well, when AVB already implements such a thing called vbmeta, why would I reinvent the wheel? If it pulls whitelisted public keys from vbmeta, then surely it can be overwritten.

But how would one overwrite vbmeta? Android is not stupid enough to leave such a hole in the system that vbmeta itself isn't signed. As per documentation, vbmeta itself is, obviously, protected by another key, which is then hardcoded into the bootloader itself. It's like when you configure Secure Boot on normal PCs, where the UEFI firmware verifies the bootloader against a limited set of keys, and the bootloader in turn verifies the OSes with a possibly larger set of keys. Maybe the bootloader on our device does not verify vbmeta at all? I tried to modify some content of the original vbmeta image, and ran fastboot flash vbmeta vbmeta.img, but it still ended up with the same infinite wait.

At this point, I was pretty out of hope. But something quickly struck my mind when I was wandering around on YouTube for nothing -- I remember I have seen something with vbmeta in its name. It was the private key file from the previous Android_device_unlock.rar used to unlock the device. The name rsa4096_vbmeta.pem is now very suspicious -- the bootloader accepted unlock token sigend by this key, and this key has vbmeta in its name. Maybe this is exactly what I was looking for -- a key that can sign a modified vbmeta image?

I set out to use the avbtool command that comes with AOSP to generate an empty vbmeta with the disabled flag, while specifying this rsa4096_vbmeta.pem as the key.

avbtool make_vbmeta_image --key Android_device_unlock/rsa4096_vbmeta.pem --algorithm SHA256_RSA4096 --flag 2 --output vbmeta_my.img

and tried to flash this vbmeta_my.img. And.... it did not work.

But I so firmly believed the rsa4096_vbmeta.pem was the right key that I ran avbtool verify_image against the official vbmeta.img (you can find one by unpacking the "pac" firmware from 4PDA; details on how to unpack can be Googled) with this key specified, and they did match. This told me that there must be something else I was missing that made the bootloader reject my image.

Calming my self down, I first tried to inspect the original vbmeta image by avbtool info_image. This gave me a bunch of output

Minimum libavb version:   1.0
Header Block:             256 bytes
Authentication Block:     576 bytes
Auxiliary Block:          13504 bytes
Algorithm:                SHA256_RSA4096
Rollback Index:           0
Flags:                    0
Release String:           'avbtool 1.1.0'
Descriptors:
    Chain Partition descriptor:
      Partition Name:          boot
      Rollback Index Location: 1
      Public key (sha1):       ea410c1b46cdb2e40e526880ff383f083bd615d5
    Chain Partition descriptor:
      Partition Name:          system
      Rollback Index Location: 3
      Public key (sha1):       e2c66ff8a1d787d7bf898711187bff150f691d27
    Chain Partition descriptor:
      Partition Name:          vendor
      Rollback Index Location: 4
      Public key (sha1):       9885bf5bf909e5208dfd42abaf51ad9b104ee117
    Chain Partition descriptor:
      Partition Name:          product
      Rollback Index Location: 10
      Public key (sha1):       766a95798206f6e980e42414e3cb658617c27daf
    Chain Partition descriptor:
      Partition Name:          dtbo
      Rollback Index Location: 9
      Public key (sha1):       ea410c1b46cdb2e40e526880ff383f083bd615d5
    Chain Partition descriptor:
      Partition Name:          recovery
      Rollback Index Location: 2
      Public key (sha1):       d9093b9a181bdb5731b44d60a9f850dc724e2874
    Chain Partition descriptor:
      Partition Name:          l_modem
      Rollback Index Location: 5
      Public key (sha1):       e93e7d91ba1a46b81a5f15129b4dc5769bf41f26
    Chain Partition descriptor:
      Partition Name:          l_ldsp
      Rollback Index Location: 6
      Public key (sha1):       e93e7d91ba1a46b81a5f15129b4dc5769bf41f26
    Chain Partition descriptor:
      Partition Name:          l_gdsp
      Rollback Index Location: 7
      Public key (sha1):       e93e7d91ba1a46b81a5f15129b4dc5769bf41f26
    Chain Partition descriptor:
      Partition Name:          pm_sys
      Rollback Index Location: 8
      Public key (sha1):       e93e7d91ba1a46b81a5f15129b4dc5769bf41f26
    Chain Partition descriptor:
      Partition Name:          dtb
      Rollback Index Location: 11
      Public key (sha1):       ea410c1b46cdb2e40e526880ff383f083bd615d5

which is certainly not empty. What if the bootloader was checking if all the partitions are present? (spoiler: this is not the reason) I decided then to first try to recreate the original vbmeta.img.

To recreate it, I need to specify the public keys themselves, not just sha1 values. I guessed that the public keys themselves must be in the vbmeta file, so I tried to use a hex editor to open the file. Quickly, I found something interesting

The boot part is obviously the name of the partition, but what follows? There is no other purpose of vbmeta than to specify the keys (or hashes) for the partitions, so I just assumed this was the key and copied everything from the header 00 00 10 00 (which looks like some length or the e value used in RSA cryptography) to the next 00. I put them in a file called keys/key_boot.bin, and re-ran the avbtool command with the boot partition added with its own key

avbtool make_vbmeta_image --key Android_device_unlock/rsa4096_vbmeta.pem --algorithm SHA256_RSA4096 --flag 2 --chain_partition boot:1:keys/key_boot.bin --output vbmeta_my.img

(note the 1 part in boot:1:blahblah is from Rollback Index Location)

I then inspected the generated vbmeta again, and to my surprise, it actually showed up with the same hash for boot as the original one. I then repeated the process for all the partitions, after which I ended up with a command like

avbtool make_vbmeta_image --key Android_device_unlock/rsa4096_vbmeta.pem --algorithm SHA256_RSA4096 --flag 2 --chain_partition boot:1:keys/key_boot.bin chain_partition system:3:keys/key_system.bin --chain_partition vendor:4:keys/key_vendor.bin --chain_partition product:10:keys/key_product.bin --chain_partition dtbo:9:keys/key_dtbo.bin --chain_partition recovery:2:keys/key_system.bin --chain_partition l_modem:5:keys/key_l_modem.bin --chain_partition l_ldsp:6:keys/key_l_ldsp.bin --chain_partition l_gdsp:7:keys/key_l_gdsp.bin --chain_partition pm_sys:8:keys/key_pm_sys.bin --chain_partition dtb:11:keys/key_dtb.bin --output vbmeta_my.img

flashing this vbmeta image, it worked. I then tried to generate my own RSA private key with openssl and use avbtool extract_public_key against my own private key to get a pub.bin, and then replaced some of the original keys with my pub.bin. Then it stopped getting accepted by the bootloader.

I started to question my assumptions about this, but did not completely give up. See, my own vbmeta.img was only a few KBs in size, while the official one is 1 MB, there must be some discrepancies here. I then compared my image and the official one binary-wise, and quickly discovered that the official one has a bunch of padding after the end of seemingly a block of data, before which the two were completely the same. At the end of the 0-padding string (and also the end of the entire file), there was a seemingly random string, but the first two bytes were readable ASCII characters: DHTB. I searched for this "magic" value and indeed found something about this -- it was some sort of checksum placed at the end of images, used by Samsung and Spreadtrum. There is even a tool called dhtbsign which signs a boot image, but it does not work with my vbmeta images. I decided to YOLO it and tried a sha256sum of my own generated vbmeta.img, and it was different from the one after DHTB. But what if there was some sort of padding? The dhtbsign repository indicated the presence of padding for boot images, so why couldn't there be one for vbmeta?

There is a string 00 40 00 00 right after the DHTB part and what looked like a sha256 checksum. Interpreting it as a Little Endian integer, it is 16384, which I guessed was the length of the file including the padding. I tried to add --padding_size 16384 to the avbtool command that generated the vbmeta image, and ended up with an image exactly 16384 bytes. Running sha256sum again, this time the checksum matches exactly with the one after DHTB.

Now it all makes sense. If the bootloader requires a checksum, then surely a modified one without the checksum would not pass. I quickly came up with a python script to convert a padded vbmeta.img to one with DHTB

import hashlib
import sys

f = open(sys.argv[1], "rb")

b = f.read()

sha = hashlib.sha256(b).digest()

f.close()
f = open("vbmeta_signed.img", "wb")
f.write(b)

f.seek(1048576 - 512)

f.write(b'\x44\x48\x54\x42\x01\x00\x00\x00')
f.write(sha)
f.write(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x40\x00\x00')
f.seek(1048576 - 1)
f.write(b'\x00')
f.close()

This computes the SHA256 checksum and writes it to the position where I found DHTB in the original image. Running my own image through this script and flashing it to the device.... This time it finally works.

I then checked if I could still flash the boot image signed by the original keys, and it surely failed. I then tried to re-sign the boot image with my own private key

avbtool add_hash_footer --image boot.img --partition_name boot --partition_size 36700160 --key my_private_key.pem --algorithm SHA256_RSA4096

(36700160 is the length of the original file while SHA256_RSA4096 is what avbtool info_image shows)

Now this re-sigend image is flashable through fastboot. I have, finally, successfully flashed something on to this device.

I am not posting my modified vbmeta.img because it only works with my own private key. Since this post was never ment to be a guide that even a newbie can follow, please generate your own vbmeta against your own private keys. If I were ever to post my own Recoveries or ROMs, I will then upload my version so that people can just flash it without re-signing.

Installing GSI (Generic System Image)

Now here comes the part where I doubt whether my efforts were even meaningful, because I flash a GSI successfully without ANY sort of signature. I just downloaded phh's GSI and ran

fastboot -S 100M system system.img
fastboot erase userdata

(-S 100M is needed due to data too large error, erase is because format does not work).

it succeeded without any error. Maybe I should try this right after I unlocked the bootloader -- maybe the bootloader just does not check the signature of system.img whatsoever? Maybe my entire effort of hacking vbmeta.img was in vain. Or maybe this is exactly caused by my vbmeta.img, because I have added the disabled flag, but it DID check the signature of boot.img. For now, this does not make any sense to me, but I do not care anyway. I can at least use fastboot flash to install my custom boot / recovery images after all.

I rebooted the phone, and the Android boot animation showed up, but it got stuck because fastboot erase userdata left data partition in an unformatted state. Since there is no usable recovery, I had to use adb shell (phh's GSI allows this by default; very useful on a device like this) to run mkfs.ext4 on the userdata partition. (This is NOT meant to be a comprehensive guide, so please find out how to format userdata yourself.) I force-rebooted the phone again, and it then booted into phh's GSI successfully, with most of the features working.

For now I have only tested phh's GSI 9.0. I plan to start working on porting recoveries and (possibly) device-specific ROMs for fun after my final exams. However, I am not entirely sure, due to the fact that it is VERY tedious to debug anything on this phone. The boot-to-recovery hotkey does not work unless the phone is turned completely off before booting, but when the system is messed up, you cannot actually turn it off except rebooting. The only options left would be to crack the phone open and disconnect the battery (which is not hard with just nails but tedious) or to flash official ROM via ResearchDownload, reboot via adb, and start all over again.

Update 2020-01-01: I was able to get phh's GSI 10.0 running though a few patches might be needed. I will send PR to phh after I finish my exams. I will also see if I could simplify the unlock process so more people would be able to do it.

Update 2020-01-03: Unfortunately I was not able to actually run TWRP on this device. Somehow the whole thing crashes soon after the first splash screen, and without a working ramoops (the one included on this device was pretty much non-functional), I have little clue how to even debug this thing. adb was not able to start, and it seemed that even the init process crashed at some point.

Update 2020-01-09: For Qin 2 Pro, you will need arm64-ab (treble_arm64_XXX) variant of phh GSI; For the "normal" Qin 2, you will need arm32_binder64-ab (treble_a64_XXX).

Recent Changes of this Blog: Reverse-proxying Listed

As you may have noticed, this is not exactly the original en.typeblog.net blog. The original one has been deleted, and then recreated with most of the original posts and 301 redirections from the old URLs to the new ones.

I'm doing this partly to unify this blog with my main blog (zh_CN), typeblog.net, as part of a recent restructure of my hosting infrastructure for personal services. Previously, these two blogs were on completely different platforms: the main one was hosted with some spaghetti code I put together before I even entered university, while this English one (or rather, Random one), was simply a page on Listed, a blogging platform provided by my favourite note-taking application, Standard Notes.

The custom blogging software I made turned less and less maintainable as years went by. Most of the dependencies were outdated, and it was untested on newer version of Node.js (yes, it was based on Node), plus that CoffeeScript, the language I wrote it in, nearly went dead, once, during these years before recent revival. This is normally fine if I just leave it as-is, in the same way as

If it ain't broke, don't fix it

But if I somehow decide to move my hosting provider, then everything starts to break down. During the last migration of my main blog, I spent nearly a day just fixing compilation alone, not to mention all those issues after it could even run on its own. That program also depended heavily on GitHub: I have to push to a GitHub repository to add new posts to the blog, which was not exactly convenient for writing. I cannot just start writing anywhere because that there were little to none synchronization capability, except to push those unfinished posts to GitHub, in public, before they even go live.

In the past month I have initiated another "mass migration" of my personal services from the current provider to a new one for various reasons. At this point, the software the main blog ran on was mostly a pile of junk that I can't even bother to fix. This is when I made the decision that I should host both of my blogs on Listed, the platform that directly connects to my notes so that I can blog wherever I can access my notes.

Problems quickly emerged following that decision. First of all, it is not even possible to connect to multiple instances of Listed blogs from a single Standard Notes account. All SN plugins have to occupy a unique identifier, but all Listed blogs share the same identifier when added to SN, making it impossible for the SN client to distinguish between any two blogs. Secondly, Listed lacks some of the features I would really want to retain for my main blog. I didn't care that much for this random blog, but, as some examples, I'd really like to still have a naïve comment system, and I'd also like my inserted pictures scaled properly, with click-to-enlarge, instead of taking up a whole page just because it's high resolution.

Listed does allow customized CSS, but as far as I can tell this does not make it possible to implement a comment system or a click-to-enlarge picture. On the other hand, I'm not quite in the mood of reinventing the wheel again just yet, due to the trauma left on me from the last attempt. Listed, as a blogging service itself, does it pretty well and there seemed to be little reason I should rewrite the code; however I didn't want to bother the author with a bunch of feature requests that probably only I would like either. These "features" only make the software unnecessarily complicated, which was a big "no" to people at Standard Notes.

But I still want something as easy to maintain as possible for blogging, so that I could concentrate on writing, instead of configuring everything and making sure they do not break when I want to write. Trust me, after doing all of these, the "writing mood" would have long gone into the void. I am a self-hosting guy, but not for my blog. I want my blog to be available even during maintanence of my main server. I want to be able to write on it even while I am working to fix the server, or during boring server upgrade sessions. I even have some quick notes to myself that I would like to be able to access when the server is broken. The Standard Notes server software is a good example of something easy-to-maintain for self-hosting, plus that the writing aspect of the client software does not really need access to the server all the time (it only needs it for synchronization), making it "always available", and that's why I am thinking of writing everything with it. Forking Listed and running my own copy of it? That's some extra maintanence I would rather not get into right now -- basically it's not much different from maintaining the custom blogging system.

It then came to me that maybe I could just reverse-proxy my blog on Listed and insert some extra elements that I need to the page, instead of modifying Listed itself. This way, both Listed and my customizations can be kept very simple: none of them are re-inventing the wheel already done by the other. Listed would not have to care about my strange needs, and my customizations could have been just several lines of Nginx configuration, at least I thought it would be.

So I spent some time trying to use sub_filter of Nginx to replace the plugin identifier in the JSON response and also add styles and scripts to the page. Then I somehow stumbled upon HTMLRewriter of Cloudflare Worker, a "severless computing" provider. I'm not into any of those buzzwords though, but this API was really interesting to me because it made it quite easy to rewrite HTML documents. No more messing around with RegExp (which cannot really parse HTML), and I won't have to configure and maintain my own server for my blog, plus that it is JavaScript so JSON parsing and preprocessing is like a breeze. All I needed was to put together several lines of script and stuff it on their server, and I was good to go.

At this point, I could migrate my main blog to Listed successfully while retaining all of those features I want, with a simple reverse proxy in the front running on Cloudflare. I have also put toghether a simple comment system based on Cloudflare Workers, inserted to the page also via HTMLRewriter. No part of the blog itself needs my maintanence, and I can still stay comfortably in the ecosystem of Standard Notes. Of course, running everything on Cloudflare is a bad idea, but my blog is just for distributing public posts, and even user comments are nothing security-critical (it does not even need a password). I'm totally fine with it, especially if it can enable me to concentrate on writing, not blog software maintanence.

That's for my main blog, and after the successful migration, I thought I should apply the same thing to this English version too. After all, the picture problem I mentioned earlier was noticed entirely on this blog: I had an article with a picture from Wikipedia, and that single frickin' picture took up the entire screen. Unfortunately, the original English blog (the predecessor to this one) was bound to the en.typeblog.net domain via the official Listed server, and I could not remove it from the Listed control panel myself. Instead of sending an email and waiting for staff to remove that custom domain, I decided just to delete the original blog entirely and create a new one from scratch, with all the articles preserved and 301 redirections set on their respective URLs. This proved to be totally unnecessary: after deletion, the Listed server still accepts requests from en.typeblog.net, and will still try to request for HTTPS certificate for my domain when it needs to, only to find that it's no longer bound to it and ends up in errors. I still had to send them email to notify them about the removal. But hey, that's what I did and not reversible. And that is where we are now.

I'm not open-sourcing my reverse-proxy worker because, hey, that's not something valuable. It is so simple that anyone can probably put it together in around half an hour. I do not want people to start using it to work around the Extended limitation of custom domains on Listed either, though I'm literally doing that right now, except that I am actually an Extended subscriber, which kind of justifies it, at least for me. Since I write in multiple languages, it really does not make sense I cannot add multiple Listed blogs to one SN account...

Reflecting on the decision, it seemed that I was just offloading maintanence to others, like Cloudflare and Listed. It's not like anything is "serverless"; it's just that someone will fix things for me if it would ever go wrong. My scripts are so simple and maintanence-free only because I paid for someone (yes, I got the CF Worker paid plan, also Standard Notes Extended) to maintain a stable API and environment. I will most definitely still host security-critical systems myself, like the Standard Notes server itself and Mastodon. Blog is the one non-security-critical exception I would rather someone else host it for me, after so many years of frustration. I hope I could really start to write something, more frequently, from now on.

Make Linux (Xorg / Wayland) Great Again on Touchscreen Devices

As you may or may not know, I've got a brand-new Surface Pro 6 in place of my GPD Pocket, partially due to the unbearable slowness of Atom processors. It is easy to see that I'm not a Microsoft guy, especially not a Windows guy, thus I almost immediately wiped the OEM Windows and installed ArchLinux on my Surface. Why did I bother buying a Microsoft device in the first place you ask? Well, somehow I just wanted a 2-in-1 device and this seems the only usable choice.

I haven't written a blog post on how to configure the whole thing for Linux yet, and I might do it later. For now, I would like to focus on one aspect of the configuration, which is the touch screen. Everybody knows that not many desktop OSes work well on a touch screen -- possibly only Windows work fine. Mac lacks support for touch screens as a whole and that's probably because of their strategy of avoiding competition with themselves. For Linux, there is basic support for touch screen in almost every desktop environment -- touch at least works as pointer events. However, you are out of luck if you would like to use your device purely by touch screen, without a mouse or trackpad -- you can't even get right-click menus to appear or type words into the text boxes (the QT on-screen input method overrides the IME thus not very useful for for CJK speakers like me).

For on-screen keyboards that are compatible with other IMEs, there are a ton of solutions and I'm not going to reinvent the wheel again. But for the right-click menus, I haven't seen a lot except in some applications they try to make it work by designating long-press events as right-click. It seemed to me that the only way forward was to implement something on my own, something that works across desktop environments to ensure that I won't have to write everything again in case I switch desktop environment or migrate to Wayland in the future.

Evdev

Since I want to make something not depending on a specific display system or desktop environment, things like LD_PRELOAD hooking magic is out of luck for this purpose. Although it is completely possible to hook into the input processing logic of libinput, which most of the desktops use nowadays and I have done it before for my GPD Pocket, for something even a little bit complicated like this one, the hook will just depend on too much of the internal logic of libinput, and they may even be version-specific. If I choose to do this, I may as well fork my own version of libinput and maintain it myself, and this could even be easier.

The only solution here seems to be going down one or two levels inside the Linux input stack. An image on the Wikipedia Wayland page shows the current Linux input driver stack clearly:

The input driver stack of current Xorg display server is also similar due to the adoption of libinput as the default input abstraction layer. What lies directly below the libinput library is the evdev driver of the Linux kernel, which exposes device nodes in the form of /dev/eventX for each enumerated input device.

These devices are character devices that are readable by any program with sufficient privileges and the access is not exclusive, which means that we can detect input events without hacking into any library at all. All that is needed is a privileged daemon that reads from evdev devices just like what libinput does, and the detection of long-press events are pretty trivial to implement by hand-rolling -- something I believe most programmer have done in some point. Besides, a user-space library libevdev can be used to parse the events from those character devices, further reducing the work we need to do.

Next problem is how do we simulate the right-click event after a long-press event having been registered. Actually, those evdev devices are not only readable, but also writable, and what is written is injected into the normal event stream of the corresponding input device. This property seems very useful to us, and the linux input subsystem also comes with a /dev/uinput device node that allows setting up new virtual evdev devices directly from userspace. Either way, simulating input is as simple as writing events into the corresponding device nodes, which is also well-supported by libevdev for convenience.

A Python Toy

There is a simple binding of libevdev for Python, simply named python-evdev. With it, basically you just create an InputDevice instance pointing to the evdev device node, and then write an asynchronous asyncio loop calling InputDevice.async_read_loop() to loop over all the events emitted by the device.

For faking the input device, python-evdev also provides an interface for the UInput part of libevdev -- it can even copy the capabilities of any given InputDevice to create an almost identical fake input device, which is extremely convenient for our purpose.

For some reason, the pen and the touchscreen on my Surface Pro 6 shows up as different evdev device nodes. Fortunately, everything in the python-evdev bindings are compatible with the asyncio asynchronous interface of Python, so we can simply run multiple async loops, one for each device that needs right-click emulation.

With these in mind, I quickly threw something together. I've put the code on Gist, but basically it's just the following snippet

async for ev in dev.async_read_loop():
  if ev.type == ecodes.EV_ABS:
    abs_type = ecodes.ABS[ev.code]
    # Track the position of touch
    # Note that this position is not 1:1 to the screen resolution
    if abs_type == "ABS_X" or abs_type == "ABS_MT_POSITION_X":
      pos_x = ev.value
    elif abs_type == "ABS_Y" or abs_type == "ABS_MT_POSITION_Y":
      pos_y = ev.value
  elif ev.type == ecodes.EV_KEY:
    tev = KeyEvent(ev)
    if tev.keycode == 'BTN_TOUCH':
      if tev.keystate == KeyEvent.key_down:
        if trigger_task != None:
          trigger_task.cancel()
        trigger_task = asyncio.get_event_loop().create_task(trigger_right_click())
        pos_last_x = pos_x
        pos_last_y = pos_y
      elif tev.keystate == KeyEvent.key_up:
        if trigger_task != None:
          trigger_task.cancel()

where trigger_task is simply a asynchronous task to trigger right click after a certain delay. This is to ensure that the right click only happens if a touch event isn't released within a certain interval -- that is, a real LONG click.

While debugging this code, I came across a problem that the resolution of the touch device is not 1:1 with the screen resolution -- it is usually very high compared to what the screen can offer. Since it is impossible to keep a finger still while doing long-pressing, a fuzz within a certain range must be allowed, but it will be wildly different on every device because of the different resolution ratio between the touch device and the screen, and also because of possibly HiDPI screens. To calculate the ratio and to analyze the dpi of the screen itself, some interface must be used to query the data from the actual display server that is in use, which kind of conflicts with my goal to make it display-server-independent.

What I ended up doing is to just read the maximum allowed fuzz from an environment variable, and therefore every user can simply adjust one environment variable if they find themselves uncomfortable with the sensitivity of the long-press detection.

Another problem here is that this program might conflict with some applications that already implements right-click emulation, e.g. Telegram and Firefox. These applications are those obeying the rules, and breaking them is not something a compatibility script should do, at least not explicitly. The reason of the conflict is simple -- the script emulates another right-click when the right-click action has already been triggered. The extra click, though it is still a right-click, may cancel the previously-triggered right-click action such as context menus if the click happens outside of it. I could not find an obvious solution to this problem, so I simply made the delay of my script shorter (or considerably longer, just not nearly the same) than the delays used by most of these applications. This way, the right-click our script triggers will not happen too close with the right-click emulation implemented by the applications, giving us some space to avoid the conflict manually (e.g. by removing the finger quicker).

Native Implementation

Python is a great language for quick prototyping. However, after playing with the prototype script I wrote, I found some strange stuff happening, probably due to some bugs or limitations of the python-evdev binding. The most annoying is that the UInput interface seems to stop working after running for a while without obvious errors. The program doesn't crash -- it just stops sending emulated events without any visible clue. Restarting it fixes the problem immediately, and I tried to reduce the program to its simplest form, only to find that this still happens randomly.

Exhausted by the debugging process, I wrote a simple C program to see if it happens by directly calling into the native libevdev APIs. And it doesn't -- the C program I wrote to test it works perfectly after an entire day of use. At this point, I've already lost interest in figuring out what the hell is wrong with that Python binding -- it might be better to just rewrite the whole stuff in C. After all, it didn't seem a good idea to me to run a Python interpreter all the time just for this simple feature. Something in native code would be much more elegant as long as I am concerned. Of course, it would be much better if I can do a kernel module for this, but then it seems like an overkill to me.

To implement the same functionality in C, we have to use something like select or epoll to poll on the fds opened from /dev/eventX devices. When the fd is readable, an event can then be pulled from the libevdev interface, just as how it is in the Python bindings. For the delayed task, I used a timerfd provided by recent versions of Linux, which is simply an fd that becomes readable after a set interval, working perfectly in a select or epoll context.

With all of these in mind, everything went smoothly implementing the C version. This time, I exposed more configurable options via environment variables, e.g. the delay of long-press events, the fuzz, and a blacklist and whitelist of devices to be included in the list that the program will listen on for long-press events. I've also implemented a touch device detector based on its capabilities, so you can expect it to work out of the box without messing with your non-touch mouse inputs. Unfortunately, this doesn't work when you have a touchpad, because they look exactly like a touchscreen. This also doesn't take dynamic changes of devices into consideration.

Anyway, it was a fun programming exercise considering that I've never written a proper C program before, let alone things like select and timerfd. My code of this program is on my Gitea instance, while an AUR package for it is also available (compiled version in ArchLinuxCN repository).

Final Thoughts

Linux desktops are kind of a mix of both love and hatred to me. They are flexible, configurable, but sometimes they just miss that critical feature that I would not die without but would be annoyed. Sometimes, things get worse when there are multiple competing solutions of one problem but none of them are compatible with each other, while none of them being a fully working solution.

I'm not to blame anyone for this. After all, the Linux desktop community is still like a hobby community -- we are always seen as 'nerds'. I just want it to become better and possibly remove some obstacles that brought me annoyance, so that nobody else would. Plus that it was really a fun programming practice to actually implement something with C, a language that I dare not to touch until this day.

Hello World Again

This should be the first new article to this blog after the new domain https://en.typeblog.net is online.

Long story short, I've not been writing or even doing anything productive for a while due to some "emotional" problems. I'll probably write something about it in the future, but now I might be feeling better and might be able to restart my blog, starting from this one.

This is not my main blog, instead it's just an alias domain to the Listed service of StandardNotes, which is far more convenient to maintain since it directly connects to my StandardNotes notebook. The comfortable writing experience is part of the motivation for me to restart blogging and try to become normal again. I should post something about the FOSS and self-hosted note tool StandardNotes soon - it has become much better since I last used it before my "emotional" problems arised.

The update here might be more frequent because I'll post whatever I am thinking about directly from my personal notes (of course, I only post stuff that I think appropriate to publish). Some of them might not even be complete articles, but rather drafts or ideas for future articles that I think I should tell everyone beforehand. In addition, since my main blog, https://typeblog.net, is Chinese-only, this will also be the place that I post English-only posts.

You can subscribe to this English version of my blog from the home page via e-mail. The only drawback is that I can't integrate this one with my ISSO comment system to receive feedbacks and discussions. Some may prefer it this way, but personally I'd say the fun of blogging is partly brought by those discussions. I'll probably investigate more on the possiblity of this soon.

So, yeah. Hello, world, again.

Troubleshooting a mysterious Mastodon bug: the Accept-Encoding header and federation

The story

As you may all know, I am the administrator of a Mastodon instance, https://sn.angry.im. One thing that is really fun doing this job (and every SysAdmin job) is that you run into different problems from time to time, sometimes without doing anything or sometimes after some upgrade.

Last week, Mastodon v2.4.0 was out and I, along with my friend, admin at https://cap.moe, decided to upgrade to the new release as quickly as possible. Since there was nothing breaking in the new version, it didn't take long before we both finished executing a few Docker commands and restart into the new version. As usual, we tried to post something to ensure that everything works fine after any upgrade, and this is where things started to break.

We first noticed that I cannot see anyone on cap.moe on my home timeline, while he could see everyone from my instance on his timeline. We thought this was a problem of subscription, so we both did a resubscription task in the administrator panel of our Mastodon instances. However, it was not fixed in any way by this. We then tried to mention each other in a toot to find out if it was because a timeline logic error, but it was not. Still, he could see me but I can't see anyone on his instance.

One thing interesting is that, since some other instances, for example, pawoo.net, can see both of our instances' posts, I can simply retoot one of his toots on pawoo and I will receive the toot on my instance in several seconds. I didn't know what this meant, but it was really something 面白い.

Since other mysterious bugs have happened before and just magically fixed themselves after a while, I decided that it was a good idea to leave it alone and see if things go back to normal. Now it is a week after the initial upgrade, and nothing has changed throughout the entire week, and I can't bear a Mastodon timeline without the jokes from fakeDonaldTrump account of cap.moe to fill my spare time anymore. I finally decided to troubleshoot this "bug".

Attempts

My first idea was that it could be caused by some errors in the task queue or something in the database, both of which could be reset by applying an instance block and removing it after everything is cleared from my instance, at least this was what I believed. This, obviously, was not the case. After removing the instance block, everything was still like what they were before. Mastodon provides no support for really removing users anyway, at least in the database. As what the admin of cap.moe said:

This is completely suicide attack.

If you are an administrator, do NEVER attempt anything that works like a suicide attack, because it solves nothing but adds complexity.

The only option left here is to dump all the traffic and see what's going wrong with the requests. As I had already known, the ActivityPub protocol, which Mastodon relies on, uses active pushes rather than passive pulls to distribute messages. Thus, it could be something on my side that prevented the push to succeed. I decied to capture all the traffic by tcpdump and inspect it using Wireshark.

Since all the traffic of my Mastodon instance is HTTPS-encrypted behind a reverse proxy, I could only dump all the traffic between Nginx and the upstream, then feed all of them into Wireshark to filter by HTTP headers. This was a pain, but I eventually did it and figured out something from the traffic: My instance was replying with 401 Unauthorized to the pushes from cap.moe.

A little inspection into the source code indicated that such error is linked to signature verification. Each ActivityPub needs to be signed by an Actor's private key, which can be verified using the public key. I assumed that this could only be caused by database errors -- my database must have stored a different public key from the original one, either by an error in database upgrade or some random cosmos radiation. I checked the public key by

account = Account.find(id_on_cap_moe)
account.public_key

in the Ruby console of Mastodon. I also asked the admin of cap.moe to run the same command with the id on his own instance, and then we compared the output public key. Unfortunately, they are exactly the same -- This can't be the problem either.

The solution

With all the attempts above failed, I decided that I should compare the request of a successful delivery with the failed one. I tried to toot something on pawoo and then toot something on cap.moe, while I kept tcpdump running. After this, I fed them to Wireshark as usual and followed the individual HTTP streams. The Siganture header drew my attention.

This is the header in the failed request

Signature: keyId="https://cap.moe/users/PeterCxy#main-key",algorithm="rsa-sha256",headers="(request-target) user-agent host date accept-encoding digest content-type",signature="ZC4c0wxPRn+RVYTeAaPjEgA3PDW/jHQ3CdUSn3u+mH2HUxsiQV3TV0dObzC4Z9VGOmY0ZE0cbQ9KiketDxPAq99InDnDjJ49aUT6/L0gSXJQlpM4SGGT8VyipkFm/dzoxbJ8jiT9WjcrXwD1/sJV4IvuA0LJs96mRkuexykguSu2PefvS7PTw5ufAxGTWn3YmtvkMeYLBi5V7LUz3xcONe2iqcSO6hKZ77puTvvWJZgfeNxMyoRXyrcrKUSUZhgfR8z7rwPgxvcoigfiL/SH0xrKyBIdO6HjjjuMsTOSa4xRsrGgopowpAx19ya83YiTRdvkO720u3Dy3ZsWifoRCw=="

And in the successful request from pawoo

Signature: keyId="https://pawoo.net/users/PeterCxy#main-key",algorithm="rsa-sha256",headers="(request-target) user-agent host date digest content-type",signature="Esf8TAlrYId7XhP7AKlRdGTz+tWXT+/ehYCrCLKCgx3UWPxnzNBssawr7oG5xPuB1QU/TLw6M09Rp9pd+0+F20GaEVUE2UTLNwKDizDbEj2XmK7RjEE4ys3Md1b8E+d4YbTVnUWqi0WnufUNTrjLCdyPCPHn3fqJ5Bv9/W4aUDF+nFbJAZr2n1cmu6Nb28nhS1PQAz7AzzsZy/Du+R6S3x91OjRMIa7Xt1EgLWH6/TEchUsxiP78QKZIbzIlEca+BhWCQiQ2qjO+VtwNDDypqh9HheNn23iuy4xm6hKwjHiVVkfekbEK47fNRXH5fakhmHmN7Zl813lrotkIGbDrdA=="

Notice that the headers in the failed signature indicated that the accept-encoding header is also signed, while it was absent in the successful request.

Now I knew what was wrong with the Mastodon stuff: I erased the Accept-Encoding header in my Nginx reverse proxy configuration! This was due to the use of sub_filter, since I needed to insert something into the HTML of Mastodon while I was too lazy to modify the source code and re-build the Docker image myself.

The solution seems easy now. Originally, my Nginx configuration included

proxy_set_header Accept-Encoding "";

Since I do still want to use sub_filter for HTML pages, I changed it to

set $my_encoding $http_accept_encoding;
if ($http_content_type != "application/activity+json") {
  set $my_encoding "";
}
proxy_set_header Accept-Encoding $my_encoding;

This erases the Accept-Encoding header except when the content type is application/activity+json, which is used to communicate between Mastodon nodes.

Save and reload the Nginx configuration, everything works fine now.

The cause and more questions

After asking the maintainer of Mastodon, @[email protected], I figured out where was this problem introduced:

https://github.com/tootsuite/mastodon/pull/7425/commits/4de98db0312de2a45d8f08d6f6611ebc64eed8b1

This pull request added direct support of gzip compression in Mastodon, thus bringing the Accept-Encoding header into the signature. My erasure of this header, obviously, broke the signature check and made all of these happen.

However, these questions are still not answered after all of these:

  1. Why am I only losing federation with some 2.4.0 instances but not all? The pull request seemed to be enabled by default and there should be no way to disable it.
  2. What's the point of including this header in the signature?

I couldn't find the answer on my own, and I decided not to because nothing is wrong now.

And that's it, the process of troubleshooting a mysterious bug.

"Blocklists"

There just really can't be any idea worse than blocklists.

As a Mastodon instance administrator, I've seen the growth and popularization of Mastodon as a decentralized social media, especially after the recent case of data leakage of Facebook. This can't be a better phenomenon as to us, since we have always hoped that people will one day wake up from the dream that large entities, such as governments and companies, would ever protect their freedom and / or privacy. However, while the amount of users and administrators of Mastodon increases, unexpected things also happen, due to the fact that some of the users just followed others to join Mastodon without knowing what they are actually doing. One of these is the emergence of Mastodon blocklists.

I saw such blocklist for the first time on a Mastodon post, which was published as an artical on Telegraph [1]. To be honest, it was really disturbing to me at the first sight, because I was not expecting this to happen so soon on Mastodon -- I was just talking about the possibility of such things happening on Mastodon with my friend that morning. Not surprisingly, this blocklist is, just like every other blocklists I've seen, full of personal prejudice and unjustified / unclear criteria. What's more disturbing is that people are actually requesting Mastodon to introduce auto-subscription to these blocklists [2], with unmanned scripts to download and apply every line in the blocklists published by some unknown and maybe prejudiced guy.

To make it clear, I am personally totally fine with the idea of doamin blocks / account blocks that is present in Mastodon for a long time. These are essential tools for some Mastodon instances to be legal, because instances have different values and different applicable laws. To maintain federation, these differences must be respected. What I am entirely against is to brainlessly take some random guy's blocklist and apply them blindly to your own instance, believing that the list completely correspond to your own value, and thinking that you have avoided a lot of extra work of blocking SPAM / Child Porn / ... instances and accounts.

Once people got the power of "control", they're making there own place where they escape from before, there is nothing new under the sun.

This was the response from my friend @AstroProfundis on this issue.

Truly, there is nothing new under the sun. It has not been long after the case that an activitist on Twitter was blocked by a popular blocklist that everyone just blindly follows [3], and people are fleeing from Twitter and Facebook for their overwhelmingly centralized power, and now people are again building their own centralized kindoms using blocklists, pretending that every instance is still independent even when they are using the same list of blocked users and domains. Well, unless you call them federate laws.

What are we hoping from a federated social media in the first place? Think about it. To me, it's the ability to scatter users into different instances with diverse values and views of the world. It's the possibility that if several instances are compromised or act against what users want, they can simply switch to the others and still get the same happy life as before. It's also the opportunity that every minority group can have their voice conveyed through the entire Fediverse. Sure, instances can each have their own rules of blocking, but they will never affect the Fediverse as a whole, and, as I personally believe, there will never be a consensus so wide that most of the instances will block a particular group of people. And, our lovely well-crafted blocklists will completely ruin these.

I've set up my own e-mail server before, which is a federated protocol with an idea similar to Mastodon, and what I discovered is that, with the blocklists, one will be essentially prevented from doing so if he / she wants the e-mails to be delivered properly to most of the e-mail hosts. These lists, by trusting popular IPs and distrusting unpopular ones, are essentially favoring gigantic hosts that owns the resources to perform complex machine-learning based fancy filtering algorithms on their outgoing e-mails. (Or even filter the outgoing e-mails by hand? Huh.) Moreover, once blocked, the process of disputing and unblocking will be overwhelmingly hard and complex for any individual e-mail host to get through. Yes, there are multiple lists following seemingly different standards. Yes, there are ways you could get yourself unblocked providing that proper justification is given. Will these make any difference? No. Even North Korea says that its people can put up disputes against their jurisdictional decisions -- despite the fact that this would never work.

I really hope that there will be some study on how much of these blocklists reflect their criteria written on paper, without much prejudice. Since there has been none, I can only conclude from my personal experience that such blocklists tend to become prejudiced while growing. This also includes a blockbot that is present recently in the Chinese community of Telegram users, which blocked a bunch of innocent people just for their ideas being in conflict with the maintainer's. Our lovely followers of this bot, without knowing anything, blocked such people from every controllable group.

Blocking is a destructive operation. It should be the last resort following failure to communicate, rather than something to be automated and to be blindly followed. If the maintainers of blocklists call them Hatelists, I will be completely fine for them, since by doing so they are actively informing people that this will include personal ideas, and this is not something to be subscribed to without further thinking. As long as they are still called Blocklists, I would say a big, big "NO" to them.

Dear Mastodon administrators, please always remember that, unless you share the same value with the maintainers of blocklists now, forever and for all the possible foreseeable future, do think twice before you follow someone to block a domain or a user. Do not ruin the Fediverse by your own hands.

Because I really don't know what will be the next Mastodon Fediverse to go to.

References

  1. Blockchain Blocklist Advisory
  2. PR #7059: Domain blocking as rake task
  3. When do Twitter block lists start infringing on free speech?