Mostly finished README and extension now may work with Shaka

2025-02-25 00:15:23 +01:00 · 2021-08-01 22:16:11 +09:00 · 2021-08-01 22:16:11 +09:00 · e9e22bdb2d
commit e9e22bdb2d
parent faf90be4aa
5 changed files with 27295 additions and 20 deletions
--- a/README.md
+++ b/README.md
@ -5,9 +5,11 @@
 - This work is based (obviously) on the [widevine-l3-decryptor extension](https://github.com/cryptonek/widevine-l3-decryptor). Many parts are the same, parts of Readme are a verbatim copy, etc.
 - I have no working knowledge of Chrome extension structure.
 - Some parts of code are copied from RFC documents, wikipedia, etc. *shrug*
- Tldr: The result seems to work, but relies on code lifting into wasm module and lots of brute-forcing, resulting in about 15-minute wait for a single RSA decryption.
+- Tldr: The result seems to work, but relies on code lifting into wasm module and lots of brute-forcing, resulting in about 15-minute wait for a single RSA decryption. **UPDATE** While writing README, I found encoding tables
 - I am too lazy to improve on this.
 - Bignum arithmetics was taken from [CryptoPP](https://github.com/weidai11/cryptopp) library. I found it the easiest library to work with and easiest to compile into wasm as well. 
+- Should work with widevine for 64bit Windows of version 4.10.2209. Unlikely to work for any other versions.
+

 ## Introduction

@ -18,8 +20,8 @@ But Widevine's least secure security level, L3, as used in most browsers and PCs
 This Chrome extension demonstrates how it's possible to bypass Widevine DRM by hijacking calls to the browser's [Encrypted Media Extensions (EME)](https://www.html5rocks.com/en/tutorials/eme/basics) and (very slowly) decrypting all Widevine content keys transferred - effectively turning it into a clearkey DRM.

 ## Usage
-To see this concept in action, just load the extension in Developer Mode and browse to any website that plays Widevine-protected content, such as https://bitmovin.com/demos/drm _[Update: link got broken?]_.
-First, extension will try to brute-force input encoding for the code-lifted part, dumping progess to console. Then, assuming it succeeds, keys will be logged in plaintext to the javascript console.
+To see this concept in action, just load the extension in Developer Mode and browse to any website that plays Widevine-protected content, such as https://bitmovin.com/demos/drm .
+First, extension will try to brute-force input encoding for the code-lifted part. Then, assuming it succeeds, keys will be logged in plaintext to the javascript console. (__Update__: will avoid brute forcing now)

 e.g:

@ -42,7 +44,7 @@ ffmpeg -decryption_key 100b6c20940f779a4589152b57d2dacb -i encrypted_media.mp4 -

 It is my honest opinion that DRM is a malignant tumor growing upon various forms of media, and that people that either implement or enforce implementation are morally repugnant and do no good to society. With that in mind, I was sad to learn in May 2021 that the original extension would soon be rendered obsolete. I found myself with some free time on my hands, and so I decided to try and replicate original key extraction. Unfortunately, there was not much data pertaining to what process the original's authors used, and even some confusion as to [who was the one who performed extraction](https://github.com/tomer8007/widevine-l3-decryptor/issues/14). Nevertheless, I decided to give it a go, and hopefully boost my flagging self-confidence a little. I did not succeed in either of those tasks, but I managed to write a barely-functioning decryptor, and decided to document the steps I followed, in case they are of use to somebody else. 

-### Reverse enginering and emulating
+### Reverse engineering and emulating

 In order to deal with executable, I decided to use [Ghidra](https://github.com/NationalSecurityAgency/ghidra), despite its association with NSA, mostly because it is free and has most features that I wanted. I also wrote a simple snippet to be able to debug dll.

@ -132,7 +134,7 @@ while(true)

 After several days of investigation, it became obvious that that is a form of code obfuscation, breaking down code flow into small segments and arranging them in switch statement in order defined by a primitive PRNG. PRNG can be controlled to execute if/else statements and loops. The *halt_baddata* portion causes access violation crash when reached. Any jump table index outside bounds leads to *while(true)* executing indefinitely. Since switch is driven by PRNG, decompiler cannot seem to find limits of jump tables, resulting in invalid switch statements or mangled decompilation. I tried to ameliorate that by [fixing jump tables](ghidra_scripts/FixJumptable.py), but results were not encouraging. I then tried to [follow the instruction flow](ghidra_scripts/Deobfuscatorr.py) by using Ghidra Emulator API. AFter a lot of experimentation, I drew the following conclusions:

- Many of the switch cases are almost-duplicates, and some are either never reached or only reached in case of failed check, craching program or sending it into infinite loop. 
+- Many of the switch cases are almost-duplicates, and some are either never reached or only reached in case of failed check, crashing program or sending it into infinite loop. 
 - The anti-debugging code is hidden within the switch statements.
 - Most of anti-debugging code seems to be similar to what is decribed [here](https://anti-debug.checkpoint.com/techniques/misc.html). The list of the debugger windows names is exactly the same, which is amusing (and outdated).  
 - Some functions actually use memory checksums **as PRNG seeds** which makes guessing where it would go after impossible without knowing the checksum. And how many iterations it took to calculate it. And results of various checks in the middle. Etc...
@ -188,11 +190,11 @@ After staring at the wall of poorly decompiled code for a while, I realized that
 }
 ```

-Same function was used most of the time, but with different offsets and initial arrays, resulting in a variety of permutations. Regardless, I was able to roughly identify montgomery multiplications, subtractions and additions performed on 256-byte arrays (implying the use of 2048 bit keys). One of the most important factors was the use of "ADC" assemler command, mostly restricted to two areas of the code, which I tentatively identified as "signature generation" and "session key decryption". I concentrated on the former, since I could access and verify the output. Which did however raise the question about what kind of input the function took. More about that later. 
+Same function was used most of the time, but with different offsets and initial arrays, resulting in a variety of permutations. Regardless, I was able to roughly identify montgomery multiplications, subtractions and additions performed on 256-byte arrays (implying the use of 2048 bit keys). One of the most important factors was the use of "ADC" assembler command, mostly restricted to two areas of the code, which I tentatively identified as "signature generation" and "session key decryption". I concentrated on the former, since I could access and verify the output. Which did however raise the question about what kind of input the function took. More about that later. 

-Of course, sick, sadisitic minds behind the obfuscation did not use a straightforward exponentiation algorithms. As described in Google patent [US20160328543A1](https://patentimages.storage.googleapis.com/0c/f3/c0/08ce4394385810/US20160328543A1.pdf), they multiply input by constant and output by reversing constant, use permutation function to confuse memory layouts, and seem to use "split variables" at times, though not often in this case. In any case, resulting exponentiation function also has some additions which cancel each other in the end. 
+Of course, sick, sadistic minds behind the obfuscation did not use a straightforward exponentiation algorithms. As described in Google patent [US20160328543A1](https://patentimages.storage.googleapis.com/0c/f3/c0/08ce4394385810/US20160328543A1.pdf), they multiply input by constant and output by reversing constant, use permutation function to confuse memory layouts, and seem to use "split variables" at times, though not often in this case. In any case, resulting exponentiation function also has some additions which cancel each other in the end. 

-In order to extract the exponent from the code, I first logged most of the inputs and outputs of the functions that seemed to operate on bignum, unscrambling the permutation using the already generated tables in memory. Then, I used [python script](log_parsing/prfnd.py) to guess the operations performed on the numbers, and a [separate script](log_parsing/prfold.py) to map those operations into a tree. The second script went through several iterations as I tried various things, including adding [dual number](https://en.wikipedia.org/wiki/Dual_number) support in order to extract exponent from the result's derivative. Ultimately, I settled on simple single-variable tracing. Finding a route that did not lead to exponential explosion in number of polynomial powers was somewhat of a challenge, but eventually (after,once again, a week or two of work :| ) I succeded in extracting an exponent and multiplicative constant:
+In order to extract the exponent from the code, I first logged most of the inputs and outputs of the functions that seemed to operate on bignum, unscrambling the permutation using the already generated tables in memory. Then, I used [python script](log_parsing/prfnd.py) to guess the operations performed on the numbers, and a [separate script](log_parsing/prfold.py) to map those operations into a tree. The second script went through several iterations as I tried various things, including adding [dual number](https://en.wikipedia.org/wiki/Dual_number) support in order to extract exponent from the result's derivative. Ultimately, I settled on simple single-variable tracing. Finding a route that did not lead to exponential explosion in number of polynomial powers was somewhat of a challenge, but eventually (after,once again, a week or two of work :| ) I succeeded in extracting an exponent and multiplicative constant:

 ```
 Integer sec_pwr("3551441281151793803468150009579234152846302559876786023474384116741665435201433229827460838178073195265052758445179713180253281489421418956836612831335004147646176505141530943528324883137600012405638129713792381026483382453797745572848181778529230302846010343984669300346417859153737685586930651901846172995428426406351792520589303063891856952065344275226656441810617840991639119780740882869045853963976689135919705280633103928107644634342854948241774287922955722249610590503749361988998785615792967601265193388327434138142757694869631012949844904296215183578755943228774880949418421910124192610317509870035781434478472005580772585827964297458804686746351314049144529869398920254976283223212237757896308731212074522690246629595868795188862406555084509923061745551806883194011498591777868205642389994190989922575357099560320535514451309411366278194983648543644619059240366558012360910031565467287852389667920753931835645260421");
@ -202,7 +204,7 @@ Integer sec_mul("0x15ba06219067ebfbe9ed0b5f446f1dca81c3276915b6cd27621bfefe5cf28

 An example of log and script output can be found in [log_parsing](/log_parsing) folder. 

-One can easily see that the exponent is 3072 bits in length, which is a lot longer than expected (2048). Obvisouly, since exponent is periodic, it can be extended to any length. It can be also confirmed that this is not a complete exponent, since the first bignum-like structure in the function does not match the encryption input. (Decryption of the RSA is easily done using public exponent, 65537). There is also no linear. or quadratic, or... (I checked polynomials to about 128th power) dependency. Which leads me to the following stage. 
+One can easily see that the exponent is 3072 bits in length, which is a lot longer than expected (2048). Obviously, since exponent is periodic, it can be extended to any length. It can be also confirmed that this is not a complete exponent, since the first bignum-like structure in the function does not match the encryption input. (Decryption of the RSA is easily done using public exponent, 65537). There is also no linear. or quadratic, or... (I checked polynomials to about 128th power) dependency. Which leads me to the following stage. 

 ### Descending into despair 

@ -219,7 +221,7 @@ In there, Param_1 seems to be constant, or at least input-independent. It is sti
 010506030600040701030601060303060006010000030100000301030004050106010006010106010300030600030100060101000103000606030303010001060600030301060303060300030601000100000006010103010300010101010101060100030300030303010006000003000301060100060106000000000600000003030402050003010001010003000600060106010601010100030601010303000103010405010003000303000601040506040506000606060600030600000103030300010601030006030000030001000600000601010603000001010001010000010001000000030603030000060006030303000001030405060306060401060000010301030601000600010001030103060101010606000006000004050001060006030304050600030306010001000606060600000003060006000301060600000100060003030001060600010306030003010300010303000001010606010300010101010006030000010103000301010101060001010405040207020205010000030603000606000100000006030600030104050001060300000600030000030303060003060600030606060000000001060606010003030101010104070205040506010600060004010603060300030101010303030300010301000000010001010300000600030101000601030300040501010600010001000000060000000000060301060301060100010101030000060405040501010106060006010001030103010101010106030600060104050103000001060604050006010100060306010300030000030600010101030606060301060301000100000003030100010103000003030405060601030000060106010600000000060000030600000601000001010006030004010601000006010000010001060301000103060106030003010306010601030101060106040702050000010300030300010601060103000004050103060405000401000000040501000303060006000106060306030606060103000003060301010000000606000300030003000104050103060303010606000601000100060301000601030103060600060004010000000304010301030000010003000603000100060006010000010303030600030104050006000601060006010600040503000001060306010300060000010003010606010401030103060301060006000000010303000003000006010304070501010405030100000300000000040101060000010600000106030306010103060000010001060601060000000303000300010303030101060601030001010300000301030106010600000601000006030000010100000604050001030603010006010106000601010601030301000001010601030000000001000603040106010306060101010000
 ```

-Lookup tables in *ConstUser_18016b077* essentally map 11 bit number(2x3 bit+5bit "carry") to 8-bit number(3-bit output plus carry). There are also other tables in the code that work on larger number of bits. But, since input and outputs are permuted in random order (and possibly have a carry bit), I could not for the life of me figure out what each of the (several thousands of) tables actually *did*. Each operation seemed to invoke a new table, or at least, a new sequence offset. 
+Lookup tables in *ConstUser_18016b077* essentially map 11 bit number(2x3 bit+5bit "carry") to 8-bit number(3-bit output plus carry). There are also other tables in the code that work on larger number of bits. But, since input and outputs are permuted in random order (and possibly have a carry bit), I could not for the life of me figure out what each of the (several thousands of) tables actually *did*. Each operation seemed to invoke a new table, or at least, a new sequence offset. 

 In any event, we have 4 or those numbers somehow generated from input and presented to exponentiation function. Where they are split into 18-bytes overlapping increments, processed in a loop, compressed back to 4-byte integers and passed on into *yet another* function:

@ -228,7 +230,7 @@ void ManyMutiplies_1801720e0
 (byte* param_1, byte* param_2, byte* param_3, byte* param_4, byte* param_5, byte* out)
 ```

-Where... I have no idea :( I've spent a lot of time looking at the code, but to this day I have no idea what *exactly* it does to 4 input buffers. Those buffers do not seem to be representations of 256-byte bignums ( buffer length vary, but are mostly multiples of 90). A lot ofoperations involve preparations like
+Where... I have no idea :( I've spent a lot of time looking at the code, but to this day I have no idea what *exactly* it does to 4 input buffers. Those buffers do not seem to be representations of 256-byte bignums ( buffer length vary, but are mostly multiples of 90). A lot of operations involve preparations like

 ```
    do {
@ -258,7 +260,7 @@ Which seem to use lookup tables (DAT_18091af30) to look up 8-byte carries? Yeah,

 ### Code lifting

-After spending far too much time staring dumbly on decompiler and trying to run code modifications in Ghidra emulator, I decided to try dumping decompiled code into c++ file and making it compile again, with the "bright" idea of "maybe manipulatinfg inputs will give me some insight". I believe that is what is called "code lifting"? That came with its own set of challenges. The major one was the fact that decompiler was confused by overlapping buffer accesses, and could not separate local variables properly. Other was that somebody in Ghidra decompiler team thought that accessing, say, last two bytes in uint64 should be represented as *variable._6_2* instead of, say *\((short\*)&variable)\[3\]*. One of those is not proper C... So I had to go through code and replace that. As well as guess at stack variable overlaps and split those, which took weeks of painstaking register comparison. 
+After spending far too much time staring dumbly on decompiler and trying to run code modifications in Ghidra emulator, I decided to try dumping decompiled code into c++ file and making it compile again, with the "bright" idea of "maybe manipulating inputs will give me some insight". I believe that is what is called "code lifting"? That came with its own set of challenges. The major one was the fact that decompiler was confused by overlapping buffer accesses, and could not separate local variables properly. Other was that somebody in Ghidra decompiler team thought that accessing, say, last two bytes in uint64 should be represented as *variable._6_2* instead of, say *\((short\*)&variable)\[3\]*. One of those is not proper C... So I had to go through code and replace that. As well as guess at stack variable overlaps and split those, which took weeks of painstaking register comparison. 

 Next hurdle was a function that took two buffers already encoded into long form and spat out long form of almost-output. That one first ran table generation (unpacking?) and then jumped to runtime-generated point. Then it used a long array of addresses and values to jump over 6(?) possible code points and execute a variety of operations on data. The structure in the array looked somewhat like:

@ -275,20 +277,108 @@ Next hurdle was a function that took two buffers already encoded into long form

 And the array was long... 5153 operations long. If my guess about Fourier transformation is correct, that would probably be the function that performs inverse transformation, but once again, no idea ;( 

-The final hurdle of the code-lifting, and the one that contributed the most to the wasm size, was constant extraction. Some constants were available from the beginning, while others, such as lookup tables, were generated at various points at runtime. There were over 600 constants used, so in the end I just automatically grabbed them from [memory dumps](/memory_dumps) with a python script without checking the appropriate legth, which resulted in a lot of overlap (it is better to have a too-long constant than access violation of undefined behavior). It is probably possible to cut the wasm size by at least half by carefully removing overlaps (and checking afterwards, since some seem necessary). 
+The final hurdle of the code-lifting, and the one that contributed the most to the wasm size, was constant extraction. Some constants were available from the beginning, while others, such as lookup tables, were generated at various points at runtime. There were over 600 constants used, so in the end I just automatically grabbed them from [memory dumps](/memory_dumps) with a python script without checking the appropriate length, which resulted in a lot of overlap (it is better to have a too-long constant than access violation of undefined behavior). It is probably possible to cut the wasm size by at least half by carefully removing overlaps (and checking afterwards, since some seem necessary). 

-After performing all that, I managed to recreate *HasMulAdc_18016d24d* in c++ code. Unfortunately, I did not gain any insight. The dependencies of actual input number on input buffers seemed highy non-linear as well. After a lot of trial and error(s), I was left with no recourse but to recreate input function for signature, which, luckily, was not obfuscated by switch statement. Unlike previous version, hovewer, actual RSA message to be exponentiated was never in memory during runtime, so I had to trace its creation from protobuf message. 
+After performing all that, I managed to recreate *HasMulAdc_18016d24d* in c++ code. Unfortunately, I did not gain any insight. The dependencies of actual input number on input buffers seemed highly non-linear as well. After a lot of trial and error(s), I was left with no recourse but to recreate input function for signature, which, luckily, was not obfuscated by switch statement. Unlike previous version, however, actual RSA message to be exponentiated was never in memory during runtime, so I had to trace its creation from protobuf message. 

-(to be continued)
+One of the first ideas I came with, which eventually proved to be the most fruitful, was tracking SHA1 invocations. All SHA1 invocations should use the same starting values, as per [wiki](https://en.wikipedia.org/wiki/SHA-1):
+```
+h0 = 0x67452301
+h1 = 0xEFCDAB89
+h2 = 0x98BADCFE
+h3 = 0x10325476
+h4 = 0xC3D2E1F0
+```
+
+By searching for those values or round constants in memory and tracking references to them, I managed to find a few areas that appeared to calculate SHA1, one of them quite near the exponentiation code(abridged):
+
+```
+void Longstringproc_18017e3b0(byte **param_1,stdstring *data,uint len,stdstring *param_4)
+
+{
+  byte *charbuffer;
+  longlong lVar1;
+  undefined8 local_24b8;
+  byte output_24b0 [512];
+  byte local_22b0 [2056];
+  byte local_1aa8 [2056];
+  byte local_12a0 [1040];
+  byte local_e90 [1040];
+  byte local_a80 [1032];
+  byte local_678 [1032];
+  byte local_270 [82];
+  SHA1_buf buffer;
+  ulonglong local_50;
+  undefined8 uStack72;
+  lVar1 = 0x100;
+  STLStringResizeUninitialize(param_4,0x100,0);
+  charbuffer = (byte *)GetStrOffset_1801d456e(param_4,0);
+  local_24b8 = *param_1;
+  FUN_18011394e();
+  Fill_SHA_buffer_18016ae81(&buffer);
+  LooksLikeSha1_1801695b1((byte *)data,len,&buffer);
+  Shabuf_Transform_18016b9ac((uint *)local_270,&buffer);
+  Crazery_18016c0bb((char *)local_270,local_678,local_a80,local_e90,local_12a0);
+  OtherConstUser_180169484(0x10004020000345e1,local_678,local_678,local_1aa8);
+  OtherConstUser_180169484(0x1000402000007410,local_a80,local_a80,local_22b0);
+  Maybe_MEMSET_180512a50((char *)output_24b0,0xaa,0x200);
+  HasMulAdc_18016d24d(local_24b8,local_1aa8,local_22b0,local_e90,local_12a0,output_24b0);
+  do {
+    *charbuffer = output_24b0[lVar1 + -1];
+    charbuffer = charbuffer + 1;
+    lVar1 = lVar1 + -1;
+  } while (lVar1 != 0);
+ }
+
+```
+
+Indeed, that proved to be a signing function, with *data* being the message to be signed. *LooksLikeSha1_1801695b1* calculates message hash, while other functions encode and decode normal hash to and from longform. As I mentioned before, at no point does the exponentiated value itself (that is, message hash padded as per [RSA-PSS gudelines](https://datatracker.ietf.org/doc/html/rfc3447#page-36) with "0xbc" appended) appear in memory in "normal" form, even permuted. Neither is the [MGF1](https://en.wikipedia.org/wiki/Mask_generation_function#MGF1) calculated "in the clear". So where*is* it calculated? Why, in the function using runime-generated jump tables, of course" That is, *Crazery_18016c0bb*... That function also uses the same functionality as *ConstUser_18016b077*, but with a twist: they use modulo arithmetics to permute the byte order in memory. Otherwise, the procedure is the same.
+
+Unfortunately (for me), Ghidra was confused by missing jump table and produced garbage in decompiler, do I had to decompile the function mostly by hand. Fortunately, it only had ~6 jump entries which were not very long. After that, I ran the function while logging [data inputs and outputs](misc/rolls1.txt). In there, the first part is mostly MGF1 dunction, and of particular interest are these entries, since the data manipulated is 256 bytes, the size of RSA input:
+
+```
+0x3f9 Total length: 1026 Zeros1: 84 Chunk1: 942 Input: 330 Output: 11218 Cnt: 26762
+0x812 First len: 86 Second len: 940 Source 1: 1402 Source 2: 11218 Destination: 19942 Cnt: 26763
+0x60f Initial skip: 3 Processed len: 1023 Second len: 3 Input: 19942 Output: 11218 Cnt: 26764
+0x812 First len: 1026 Second len: 0 Source 1: 7530 Source 2: 11218 Destination: 7530 Cnt: 26765
+0x812 First len: 1026 Second len: 0 Source 1: 19942 Source 2: 7530 Destination: 11218 Cnt: 26766
+0x812 First len: 1026 Second len: 0 Source 1: 3582 Source 2: 11218 Destination: 3582 Cnt: 26767
+0x812 First len: 1026 Second len: 0 Source 1: 15058 Source 2: 7530 Destination: 7530 Cnt: 26768
+0x812 First len: 1026 Second len: 0 Source 1: 7530 Source 2: 3582 Destination: 15058 Cnt: 26769
+0x812 First len: 1026 Second len: 0 Source 1: 1490 Source 2: 7530 Destination: 1490 Cnt: 26770
+0x812 First len: 1026 Second len: 0 Source 1: 11218 Source 2: 1490 Destination: 7530 Cnt: 26771
+0x812 First len: 1026 Second len: 0 Source 1: 15058 Source 2: 15058 Destination: 3582 Cnt: 26772
+0x812 First len: 1026 Second len: 0 Source 1: 7530 Source 2: 7530 Destination: 1490 Cnt: 26773
+0x2b4 Length skip:  0 Length proc:  1026 Len 00:  12 Input1 7530 Input 2 7530 Output 23150 Cnt: 26774
+```
+
+After that, the input is somehow split into 4 parts of 259 bytes each. Part of the division is just splitting original input into sum of two numbers. The exact nature of further manipulation remains a mystery to me. 
+
+With this, the whole signing process is in c++ code!... so I can sign license requests, but cannot create custom inputs for decryption... yet.

 ### FaIlUrEs uPoN fAiLuReS

+By this point, i HaVe fAiLeD AlReaDy iN My gOaL oF eXtRaCtInG RsA key <_< IaM pRoBaBLy mISsinG soMeThINg trIViAl. *AGAIN*
+
+All that i had left for me was to maybe find a way to modify input so that it would approach the encoded value, thereby decrypting ciphertext with section key. To do that, I tried to modify values at various steps above, then running the whole encryption/decryption cycle to see what the input is like. Some modifications did not produce any input differences at all, hinting at redundancies/ variable splitting (meaning, I was modifying something that was used as obfuscation and then cancelled out). Eventually, I found a few values (steps/memory offsets) that produced "linear" modifications to the input, linear in this case meaning that modification to a single byte resulted in localized modifications to the "input", not affecting previous input bytes unless wrapping was involved. Unfortunately, try as I may, I could not figure out the actual encoding used... Also, there were several locations affecting input (last 21 bytes of seed+padding and first 235 bytes were split into different variables). Eventually,I gave up and decided to brute-force input in 2-bit increments. Since one "decryption" operation took about ~1 second, that took... Quite a while.
+
+Failure, failure, FaIlURe <_< But better than nothing?
+
+**Update:** While writing this ReadMe, I kept fiddling with input encoding, and found out that just an operation later the output was XORed with another buffer to form original input. I also realized that whitebox engineers (that I would rather do something more moral, like human experimentation) were lazy, and used the same buffer to long form encoding tables in all locations instead of chaining them. So, after brute forcing table order, I managed to work out encoding procedure that seems to work for most inputs (maybe all, but I cannot prove that), so only a single decrypting operation is needed instead of ~4000 on average.
+
+Now I had a chunk of c++ code doing "decryption" and "encryption" given a long-form input guess. I needed a way to connect in into Chrome browser. The obvious way, of course, is to use [Emscripten](https://emscripten.org/docs/getting_started/downloads.html) to compile c++ code into WebAssembly, support for which was added to Chrome... recently enough. Emscripten also provides an initial JS wrapper for the exported functions. 
+
+Luckily, a [single command](/build_wasm.bat) managed to compile c++ file with some CryptoPP support after a few minor modifications. Originally, I put the brute-forcing code out into Javascript so it could be more easily interrupted and monitored. Unfortunately, while the program was working, it was unbearably slow and tended to freeze video playback. AnoTHer faIlUre <_< At least that one was later resolved. 
+
+The last thing was to remove [OAEP padding](https://datatracker.ietf.org/doc/html/rfc8017#page-19). Why is it so hard to get a proper info on those formats outside of RFC? Unfortunately, original repository used library that combined decryption with padding removal, so I decided to simply put a rough implementation of RFC into c++ code, since it was already plenty bloated. That seemed to work well enough for the purpose.
+
 ### Conclusion

-### Some references
+In the end, I only extracted about half of the RSA key. I am not sure how long is the key that remains in whitebox, though I have checked values up to about 64000 (power value, not bits). Neither I am sure why or how input was split into 4 buffers. I am leaving this ReadMe and scripts here in some hope that they may help when Google inevitably changes key again. As an additional reference, author of original repo, Tomer8007, uploaded writeup on [original extraction method](https://github.com/tomer8007/widevine-l3-decryptor/wiki/Reversing-the-old-Widevine-Content-Decryption-Module), seemingly somewhere around the time I was uploading my repo. It is a lot better than mine here, so give it a read as well.
+
+All in all, it was a decent, albeit somewhat depressing exercise that I have little desire to ever repeat. I will probably cease updating repo soon after Readme is finished, so for people that want it modified: please fork or copy it and modify as you see fit. Attribution is appreciated, though ;)
+
+*The end.*

-https://datatracker.ietf.org/doc/html/rfc3447#page-36

-https://patentimages.storage.googleapis.com/0c/f3/c0/08ce4394385810/US20160328543A1.pdf

-https://github.com/tomer8007/widevine-l3-decryptor/wiki/Reversing-the-old-Widevine-Content-Decryption-Module
--- a/eme_interception.js
+++ b/eme_interception.js
@ -90,7 +90,7 @@ EmeInterception.prototype.addListenersToNavigator_ = function()
 {
  if (navigator.listenersAdded_) 
    return;
-
+ console.log('Adding listeners to navigator');
  var originalRequestMediaKeySystemAccessFn = EmeInterception.extendEmeMethod(
      navigator,
      navigator.requestMediaKeySystemAccess,
@ -114,7 +114,35 @@ EmeInterception.prototype.addListenersToNavigator_ = function()
    }.bind(this));

  }.bind(this);
+  if(navigator.mediaCapabilities)
+  {
+      if(navigator.mediaCapabilities.decodingInfo)
+      {
+            var originalDecodingInfoFn = EmeInterception.extendEmeMethod(
+      navigator.mediaCapabilities, navigator.mediaCapabilities.decodingInfo,"DecodingInfoCall");
+       navigator.mediaCapabilities.decodingInfo = function() 
+  {
+    var self = arguments[0];
+    //console.log(arguments);
+    // slice "It is recommended that a robustness level be specified" warning
+    var modifiedArguments = arguments;
+    //var modifiedOptions = EmeInterception.addRobustnessLevelIfNeeded(options);
+    //modifiedArguments[1] = modifiedOptions;

+    var result = originalDecodingInfoFn.apply(null, modifiedArguments);
+    // Attach listeners to returned MediaKeySystemAccess object
+    return result.then(function(res) 
+    {
+        //console.log(res);
+        if(res.keySystemAccess)
+      this.addListenersToMediaKeySystemAccess_(res.keySystemAccess);
+      return Promise.resolve(res);
+    }.bind(this));
+
+  }.bind(this);
+  
+      }
+  }
  navigator.listenersAdded_ = true;
 };

--- a/misc/Memsearcher.py
+++ b/misc/Memsearcher.py
@ -0,0 +1,322 @@
+# script that I used to look up values and dump constants out of memory dumps produced by Ghidra emulator. Very rough.
+import struct
+import gzip
+import sys
+import os
+fl1=sys.argv[1]
+#fl2=sys.argv[2]
+srch=None
+if len(sys.argv)>2:
+  srch=sys.argv[2]
+def readRegister(dct,fl):
+  dt=fl.read(2)
+  if len(dt)<2: return False
+  nmlen=struct.unpack("<H",dt)[0]
+  nml=fl.read(nmlen)
+  if len(nml)<nmlen: return False
+  nm=nml.decode("ascii")
+  dt=fl.read(2)
+  if len(dt)<2: return False
+  ln=struct.unpack("<H",dt)[0]
+  if ln==16:
+    dt=fl.read(16)
+    if len(dt)<16: return False
+    vals=struct.unpack("<QQ",dt)
+    val=(vals[0]<<64)+vals[1]
+  elif ln==8:
+    dt=fl.read(8)
+    if len(dt)<8: return False
+    val=struct.unpack("<Q",dt)[0]
+  elif ln==4:
+    dt=fl.read(4)
+    if len(dt)<4: return False
+    val=struct.unpack("<I",dt)[0]
+  else:
+    dt=fl.read(2)
+    if len(dt)<2: return False
+    val=struct.unpack("<H",dt)[0]
+  if not "registers" in dct:
+    dct["registers"]={}
+  dct["registers"][nm]=val
+  return True
+def readMemoryChunk(dct,fl):
+  if not "mem" in dct:
+    dct["mem"]={}
+  dt=fl.read(8)
+  if len(dt)<8: return False
+  start=struct.unpack("<Q",dt)[0]
+  dt=fl.read(8)
+  if len(dt)<8: return False
+  ln=struct.unpack("<Q",dt)[0]
+  if ln>0:
+    dat=fl.read(ln)
+    if len(dat)<ln: return False
+    dct["mem"][start]=dat
+  return True
+def readSnapshot(dct,fl):
+  numreg=struct.unpack("<I",fl.read(4))[0]
+  for i in range(numreg):
+    if not readRegister(dct,fl):
+      print("Corrupt snapshot: not enough registers")
+      return
+  while readMemoryChunk(dct,fl):
+    pass
+
+basedir="./"
+def loadSnapshot(dct,name):
+  global basedir
+  fname=os.path.join(basedir,name)
+  with gzip.open(fname, 'rb') as f:
+    readSnapshot(dct,f)
+kt={}
+loadSnapshot(kt,fl1)
+print(kt["mem"].keys())
+import codecs
+st="22e54cd8"#"22E54CD8A10671840752EF46"
+if srch is not None:
+  st=srch
+bts=codecs.decode(st,"hex")
+def find_all(a_str, sub):
+    start = 0
+    while True:
+        start = a_str.find(sub, start)
+        if start == -1: return
+        yield start
+        start += len(sub)
+
+for offs in kt["mem"]:
+  i=list(find_all(kt["mem"][offs],bts))
+  for ko in i:
+    print("{:x}".format(offs+ko))
+def readAddr(dct,addr,nm):
+  for offs in kt["mem"]:
+    if offs<=addr and offs+len(kt["mem"][offs])>=addr:
+      dt=kt["mem"][offs]
+      return dt[addr-offs:addr-offs+nm]
+def readULL(dct,addr):
+  dt=readAddr(dct,addr,8)
+  return struct.unpack("<Q",dt)[0]
+def readUI(dct,addr):
+  dt=readAddr(dct,addr,4)
+  return struct.unpack("<I",dt)[0]
+def readUS(dct,addr):
+  dt=readAddr(dct,addr,2)
+  return struct.unpack("<H",dt)[0]
+def readByte(dct,addr):
+  dt=readAddr(dct,addr,1)
+  return struct.unpack("<c",dt)[0][0]
+
+def lesserConstShuffle(const,p1,p2):
+  global kt
+  ret=[]
+  length=(const  >> 0x24) & 0x3fff
+  offset=(const  & 0x3fffff)
+  if length>0:
+   eax=0
+   ret=[0]*length
+   for k in range(length):
+     eax=eax&0xf8
+     #print("{:b}".format(eax))
+     fl=p1[k]
+     eax=eax^fl
+     f2=(p2[k]<<8)
+     fl=f2+eax
+     f3=readByte(kt,offset+k+0x180a85ad0)<<11
+     #print(readByte(kt,offset+k+0x180a25040))
+     fl=fl+f3
+     eax=readByte(kt,0x1809cde30+fl)
+     ret[k]=eax&7
+  return ret
+def cnstShuffle(const,p1,p2):
+  global kt
+  ret=[]
+  length=(const  >> 0x24) & 0x3fff
+  offset=(const  & 0x3fffff)
+  slen=(const >>0x32)
+  if length>0:
+   eax=0
+   ret=[0]*length
+   for k in range(length):
+     eax=eax&0xf8
+     #print("{:b}".format(eax))
+     fl=p1[k]
+     eax=eax^fl
+     f2=(p2[k]<<8)
+     fl=f2+eax
+     f3=readByte(kt,offset+k+0x180a25040)<<11
+     #print(readByte(kt,offset+k+0x180a25040))
+     fl=fl+f3
+     eax=readByte(kt,0x1809cde30+fl)
+     ret[k]=eax&7
+  if slen>0:
+    while len(ret)<slen+length:
+      ret.append(0)
+    for l in range(slen):
+      k=l+length
+      eax=eax&0xf8
+      esi=(p2[k]<<8)
+      esi=esi|eax
+      eax=(readByte(kt,offset+k+0x180a25040)<<11)
+      eax=(eax^esi)
+      eax=readByte(kt,0x1809cde30+eax)
+      ret[k]=eax&7
+  return ret
+def otherShuffle(const,p1,p2):
+  global kt
+  ret=[]
+  length=(const  >> 0x24) & 0x3fff
+  offset=(const  & 0x3fffff)
+  sublen=(const >> 0x16) & 0x3fff
+  if sublen ==0:
+    eax=0
+  else:
+    eax=0
+    rtval=0
+    for a in range(sublen):
+      eax=(readByte(kt,offset+a+0x180a25040)<<11)+(p2[a]<<8)+(eax&0xf8)+(p1[a]&0x7)
+      eax=readByte(kt,0x1809cde30+fl)
+  slen=(const >>0x32)
+  if length>0:
+   eax=0
+   ret=[0]*length
+   for k in range(length):
+     eax=eax&0xf8
+     fl=p1[k+sublen]
+     eax=eax^fl
+     f2=(p2[k+sublen]<<8)
+     fl=f2+eax
+     f3=readByte(kt,offset+k+sublen+0x180a25040)<<11
+     #print(readByte(kt,offset+k+0x180a25040))
+     fl=fl+f3
+     eax=readByte(kt,0x1809cde30+fl)
+     ret[k]=eax&7
+  if slen>0:
+    while len(ret)<slen+length+sublen:
+      ret.append(0)
+    for l in range(slen):
+      k=l+length+sublen
+      eax=eax&0xf8
+      esi=(p2[k]<<8)
+      esi=esi|eax
+      eax=(readByte(kt,offset+k+0x180a25040)<<11)
+      eax=(eax^esi)
+      eax=readByte(kt,0x1809cde30+eax)
+      ret[k]=eax&7
+  return ret
+
+def cnPack(lst,stp=0):
+  eax=0
+  ret=[]
+  for k in range(stp,len(lst),1):
+    ebx=lst[k]&3
+    ebx=(ebx<<(eax&6))# 110
+    ecx=((k-stp)>>2)
+    while len(ret)<ecx+1: ret.append(0)
+    ret[ecx]|=ebx
+    eax+=2
+  return ret
+def cnUnpack(lst,ln=None):
+  ret=[]
+  for k in range(len(lst)):
+    l=lst[k]
+    ret.append(l&0x3)
+    ret.append((l>>2)&0x3)
+    ret.append((l>>4)&0x3)
+    ret.append((l>>6)&0x3)
+  if ln is not None:
+   while(len(ret)<ln): ret=[0,*ret]
+  return ret
+
+"""
+Tables:
+each side has 3 bits (permuted) - 8 values. 4 have carry
+Many tables do who-knows-what XD
+some are "sum" tables  -can be recognized by the fact that they have all values in each row if varied (and can be permuted to symmetrical form)
+With additional "carry" sum tables can affect up to 3 cells forward by variation (more if other cells are full) 
+Some are normalizer, work on the doubled arguments (a=b) and allow to pack data afterwards... 
+a+carry(b) table? (carry  goes to next "digit" - interleaved??
+table has separate encodings for a,b,c (8x8 table)
+etc ...
+"""
+
+#0x12000027448  -sum table, it seems
+#0x1200002b000  - another sum table?
+#0x12000026a1b - normalizer table...
+#0x25000037501 - carry flipper?
+#0x1000002cd3a  - carry ... no-flipper? not sure
+for q in range(8):
+ ls=[]
+ for w in range(8):
+  a2=[1]*0x40e
+  a1=[1]*0x40e
+  a1[0]=0
+  a2[0]=0
+  a1[1]=q
+  a2[1]=1
+  a1[2]=w
+  a2[2]=w
+  #a3=cnstShuffle(0x12000033f37,a1,a2) #0x120000054d6
+  a3=cnstShuffle(0x12000033f37,a1,a1) #0x120000054d6
+  #print(a3)
+  ls.append(a3[4])
+ print("{}:  {}".format(q,ls))
+carry=0
+
+def printTC(num):
+ offs=num<<11
+ ecr=set()
+ print("TC {}".format(num))
+ for carry in range(32):
+  st=[]
+  for q in range(8):
+   dm=offs+(q<<8)+(carry<<3)+q
+   dt=readByte(kt,0x1809cde30+dm)
+   ec=dt>>3
+   vl=dt&7
+   st.append(vl)
+   ecr.add(ec)
+  print("{}: {}".format(carry,st))
+ print(ecr)
+printTC(22)
+printTC(47)
+sys.exit()
+for ss in range(256):
+    offs=ss<<11
+    crr=set()
+    print("Table # {}".format(ss))
+    for q in range(8):
+     ls=[]
+     for w in range(8):
+       dm=offs+(q<<8)+(carry<<3)+w
+       dt=readByte(kt,0x1809cde30+dm)
+       ls.append([dt>>3,dt&0x7])
+       crr.add(dt>>3)
+     print("{}:  {}".format(q,ls))  
+    print("Carries: {} {}".format(len(crr),crr))
+print("{:x}".format(readULL(kt,0x181253ac8)))
+llen=38848
+
+with open("dats.lg","r") as fl:
+  for ln in fl:
+    ls=ln.strip()
+    if "DAT" in ls:
+      offs=ls.split("_")[1]
+      l=[]
+      for i in range(llen):
+       l.append(readByte(kt,int("0x"+offs,16)+i))
+      ssl='{'+', '.join(["{}".format(k) for k in l])+'};';
+      print("unsigned char DAT_{} [{}]={}".format(offs,len(l),ssl))
+    elif "INT" in ls:
+      offs=ls.split("_")[1]
+      l=[]
+      for i in range(llen):
+       l.append(readUI(kt,int("0x"+offs,16)+i*4))
+      ssl='{'+', '.join(["{}".format(k) for k in l])+'};';
+      print("unsigned int INT_{} [{}]={}".format(offs,len(l),ssl))
+    elif "QWORD" in ls:
+      offs=ls.split("_")[1]
+      l=[]
+      for i in range(llen):
+       l.append(readULL(kt,int("0x"+offs,16)+i*8))
+      ssl='{'+', '.join(["{}".format(k) for k in l])+'};';
+      print("unsigned long long QWORD_{} [{}]={}".format(offs,len(l),ssl))
--- a/misc/oaep.py
+++ b/misc/oaep.py
@ -0,0 +1,31 @@
+#simple script that shows how to unmask OAEP padding from big integer
+import hashlib
+
+def i2osp(integer, size):
+    return bytes([((integer >> (8 * i)) & 0xFF) for i in reversed(range(size))])
+
+def mgf1(input_str, length, hash=hashlib.sha1):
+    """Mask generation function."""
+    counter = 0
+    output = b""
+    while len(output) < length:
+        C = i2osp(counter, 4)
+        output += hash(input_str + C).digest()
+        counter += 1
+    return output[:length]
+def decode(bts,seed):
+  zr=mgf1(bts,20)
+  sd=bytearray()
+  for (a,b) in zip(seed,zr):
+   sd.append(a^b)
+  seed=bytes(sd)
+  xormask=mgf1(seed,len(bts))
+  ret=bytearray()
+  for (a,b) in zip(bts,xormask):
+   ret.append(a^b)
+  return bytes(ret)
+ii=int("e4ae6c475d00d73552eae63d3456cd59f17e0f4bbad2a587d34c774658b9b5ce7857491e6e06fbc79cc8f688ad20e9c2f6d65419b3ec86657c1b87a80cd4a5c012a1d7571b842ff7c0f56c1d83ae003b73e73633f65f4c3644f0570c57dffa72f7e00788365a0726511b05bb3d440777770742cc776f3266456755b803b3743a0cd1b139d2a8522b1f6e4970afd74096a9e11abbdbfdb06b10a529877840e825d42b117c285bb064fc4778dd4242cb2e9df49e63c3ab60dc54a0f2d45126683bb71602bf5963468e56e8e84bc6c58c3c68f4670b080937db93aa22d90f35d8e8767654965f40b2fde20a84d2d57e9e12ecf9dddf02c3943cb0d2f513d0c965",16).to_bytes(256,'big')
+print("isValidPrefix={}".format(ii[0]==0))
+print("Masked data: {}".format(ii[21:].hex()))
+print("Seed Mask: {}".format(mgf1(ii[21:],20).hex()))
+print("Unmasked data: {}".format(decode(ii[21:],ii[1:21]).hex()))
--- a/misc/rolls1.txt
+++ b/misc/rolls1.txt