Reversing the fanfiction.net Android App

FanFiction.net is maybe the largest repository of fanfiction there is online. They are very protective of their content, practically banning scrapers and third-party apps for their site through their TOS (which is understandable for an ad-based website with that many users). Because of this there have been few alternatives of reading fanfiction.net content on mobile beyond the mobile website. That is, until fictionpress released the official app.

Since the old website is mostly HTML-based I was interested in how the official app communicated with the servers. There is no official API to fanfiction.net, but there had to be some kind of internal service for the app. I did most of the necessary reversing work in March 2017 and completed it in January.

The purpose of this article is to give a general outline of the reversing process, for reference when future app updates are to be reversed again. A basic description of the (quite interesting) communication protocol employed by the app is also given to facilitate sniffing the protocol without putting too much effort into the static analysis of the android app. Because of time constraints and the limited usefulness I have not spent more time on this - this text may help in picking up where I left off.

First Steps

The first thing to do in any protocol reversing task is Wireshark. While it is likely that the traffic would be encrypted in some way, it can help knowing the servers and API hosts the app communicates with. When interacting with the app, there is one stream that is particularly interesting:

443 is always a nice port to see, as it points to some kind of HTTPS API that is usually fairly simple to use. However, there is something odd about this packet dump: In normal SSL you'd expect to see some kind of "handshake" before the actual stream starts. Instead the app appears to immediately start communicating using something that wireshark labels as "Continuation Data" (which it isn't).

Attempting to connect to this server using curl also does not work:

* Rebuilt URL to: https://173.205.184.56/
*   Trying 173.205.184.56...
* TCP_NODELAY set
* Connected to 173.205.184.56 (173.205.184.56) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 173.205.184.56:443
* stopped the pause stream!
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 173.205.184.56:443

This isn't a HTTPS server at all. The actual data stream is also not very telling: It looks (mostly) like random data, so it is probably encrypted in some way.

Analyzing the app

Now that traffic analysis has lead to a dead end, it's time to actually start with the reversing. For static and dynamic analysis of android apps the wonderful apktool is very useful. It can extract an APK archive, disassemble the app dex using the baksmali disassembler and reassemble an APK after the code has been modified. Smali also comes with an IDEA plugin that offers syntax highlighting and even basic refactoring of smali assembler.

As with many android apps, the fanfiction.net app is obfuscated. This can help reduce app size but unfortunately also makes our job a little more difficult. In this particular instance the obfuscation also uses characters that are legal java bytecode identifiers but are illegal in actual java code, which makes recompiling decompiled code a little more difficult. As long as you stick with the assembler though, smali can handle these characters just fine (when the characters become problematic it can often help to simply run the code through an obfuscator again, replacing the special characters with "normal" alphanumeric names, but this was not necessary here).

Browsing through the classes a little, there are two suspicious kinds of code. There are some unusual constant arrays in the class Qq:

Some knowledge of cryptography or a quick google search for this data shows that this is actually a Rijndael S-box - one of the core components of the AES block cipher. Other arrays in this class contain the inverse S-box, the Rijndael key schedule and some further AES lookup tables. There are also some revealing constant strings such as "Key length not 128/192/256 bits." and "AES engine not initialised". These lead us right to the open-source code this class actually comes from: Bouncycastle's AESFastEngine. Following the same trail, another class in the APK becomes apparent: Qp is actually the GCMBlockCipher from the same project. Interestingly, neither of these classes is actually used in the source that is visible right now, which is unusual because obfuscators typically remove unused code from the binary.

The other suspicious code in the binary is well-documented code. While we don't have access to comments, meaningful exception messages are much more common in library code than in application code. Quickly it becomes apparent that okhttp is also present - for example, the class Pn contains the error "Cannot retry streamed HTTP body" which comes from RetryAndFollowUpInterceptor. Pe, known as RealConnection in the okhttp project, also contains the URLs https://api-slow.geolb.net and https://api-fast.geolb.net which sound very interesting. api-slow also happens to resolve to 173.205.184.56, the address the stream of unknown format went to!

There is something off about our RealConnection, though. Instead of all manner of connection setup code that normally fills this class, there is some odd reflection going on:

Clearly this code invokes some constructor of some class and then casts the resulting object to java.net.Socket. This would be suspicious even if we weren't looking for code that specifically encrypts a TCP connection.

Now we can finally make use of all the features smalidea gives us. A breakpoint just before the check-cast helps us see what this socket actually is.

Weird... The socket is of type o.at, but there is no such class in the project at all! That is certainly worth investigating. The java.lang.Class instance comes from another class au. This class contains an absolutely giant byte array with hard-to-decipher data, and some code to accompany (and presumably decode) it.

With static analysis this would take ages to understand. Luckily, baksmali allows us to just patch the class and log whatever we want. One function in particular, $$d(SBS)Ljava/lang/String;, seems worth looking at: Its return value is used as a parameter to lots of reflection APIs used in the class. Logging its return value will surely lead somewhere.

There we go!

Quite a lot is going on here, and it's not worth it trying to understand all of it. The DexFile and ClassLoader references do show there is some dynamic loading going on though.

A "nice" thing about android class loading is that it's actually not easy to load a class from memory. That is what the delete strings are for: The app is calling java.io.File#delete to delete the dex it has extracted and loaded. What if the app could be stopped from deleting these dex files? Well, that can be done: Set a break point on the string decoding routine that triggers when the decoded string equals delete.

While the debugger is pausing the app, let's take a look in the app data directory - /data/data/com.fictionpress.fanfiction/ - using adb.

generic_x86_64:/data/data/com.fictionpress.fanfiction # ls
app_textures app_webview cache code_cache databases files no_backup shared_prefs

Nothing interesting there? Wait!

generic_x86_64:/data/data/com.fictionpress.fanfiction # ls -lah
total 364K
drwx------ 10 u0_a73 u0_a73 4.0K 2018-01-25 16:21 .
drwxrwx--x 94 system system 4.0K 2018-01-24 14:42 ..
-rw-r--r--  1 u0_a73 u0_a73  69K 2018-01-25 16:21 .  ​
-rw-------  1 u0_a73 u0_a73 8.5K 2018-01-25 16:21 .    
drwxrwx--x  2 u0_a73 u0_a73 4.0K 2018-01-24 11:33 app_textures
drwxrwx--x  2 u0_a73 u0_a73 4.0K 2018-01-24 11:33 app_webview
drwxrwx--x  3 u0_a73 u0_a73 4.0K 2018-01-24 11:33 cache
drwxrwx--x  2 u0_a73 u0_a73 4.0K 2018-01-24 11:33 code_cache
drwx------  2 u0_a73 u0_a73 4.0K 2018-01-25 16:21 databases
drwxrwx--x  6 u0_a73 u0_a73 4.0K 2018-01-25 16:21 files
drwxrwx--x  2 u0_a73 u0_a73 4.0K 2018-01-24 11:33 no_backup
drwxrwx--x  2 u0_a73 u0_a73 4.0K 2018-01-25 16:21 shared_prefs

Well well well... a directory is only supposed to have one . entry, and it should certainly be a directory (d flag at the start). A closer look shows that there are additional spaces behind the point. Now adb pull can be used to transfer them off the device and take a closer look. They turn out to be two dex files, which can again be extracted with apktool.

There's the o.at that was missing earlier! It extends java.net.Socket as expected, and even makes use of the bouncycastle cipher classes found earlier. And finally, there's a byte array - 16 bytes, which is the size of an AES-128 key. This class also contains the actual protocol used for communication with the backend.

The Protocol

The protocol is actually quite simple. It starts off with a 16-byte IV used to initialize the AES/GCM cipher stream. After that follows a sequence of "datagrams". Each is lead by a two-byte length header. The body of each datagram is some AES/GCM data, including a 16-byte auth tag.

Now that the key is available it is easy to decode the streams found in the first packet capture. The content is actually a (not further encrypted) HTTP/2 stream.

Conversation 3    ----------------->>>
:path: /api/search/story/facet/category/v1
:method: POST
:scheme: https
:authority: api-slow.geolb.com
x-token: 'removed'
content-type: multipart/form-data; boundary=75794034-0ffe-41b9-90cf-8ffeafe102ee
content-length: 645
accept-encoding: gzip

--75794034-0ffe-41b9-90cf-8ffeafe102ee
Content-Disposition: form-data; name="page"
Content-Length: 1

1
--75794034-0ffe-41b9-90cf-8ffeafe102ee
Content-Disposition: form-data; name="x_json"
Content-Length: 382

{"C2Id":0,"CategoryId":removed,"CategoryId2":0,"CharacterId1":0,"CharacterId2":0,"CharacterId3":0,"CharacterId4":0,"Crossover":0,"GenreId":0,"GenreId2":0,"LanguageId":0,"NotCharacterId1":0,"NotCharacterId2":0,"NotGenreId":0,"NotPairing":0,"NotVerseId":0,"Pairing":0,"Prefix":0,"Query":"","QueryField":0,"Rating":103,"RequesterId":0,"SortId":0,"Status":0,"Time":0,"VerseId":0,"Words":0}
--75794034-0ffe-41b9-90cf-8ffeafe102ee--

If you have ever looked at the source of the fanfiction.net search page, this json will look familiar.

The most interesting part here is that once the key is known, a MitM attacker can decrypt and potentially modify the stream of any app user. This includes a session token which could potentially be used to impersonate the user, though I have not done much further research into what endpoints are actually available with this API.

Response

I contacted fanfiction.net (specifically dev@fictionpress) about this research on 2018-01-24, and got a response within hours (kudos to them!).

Thank you for the detailed report.

1) Our protocol is NOT designed to be secure against MIM but main goal is improved latency vs TLS and normal TLS sockets in both memory usage, cpu, throughout.

2) Android fragmentation introduced lots of different hardware TLS versions which is incompatible with proper HTTP2 and thus our custom and heavily modified and optimized okhttp stack.

Our protocol can be broken if MIM is employed in the handshake stage but not after.  But again, this is not our primary goal. Our primary goal is latency, memory usage, portability, and cpu usage in that order.

By their request I am not publishing the actual encryption key.

This has been an interesting app to reverse-engineer. Since I am busy with university, this is where the research ends for now. While one could sniff some more app traffic to enumerate the API endpoints, there are few uses for this API that do not violate the ToS anyway.