DEV Community

Gealber Morales
Gealber Morales

Posted on • Originally published at gealber.com

Challenge RE #7

Introduction

From my previous posts you can notice that I've been poisoning myself with small doses of assembly language and C. The best combination for a fast effect 😁, soon I will be immune to it. To be honest I've been enjoying the challenges, because all of them are accessible, in the same sense that David Hilbert said:

A mathematical problem should be difficult in order to entice us, yet not completely inaccessible, lest it mock at our efforts. It should be to us a guide post on the mazy paths to hidden truths, and ultimately a reminder of our pleasure in the successful solution.

Should be enjoyable in general, without frustrating you. Wanted to remark that because the guy who made this challenges, Dennis Yurichev, did a great job on it.

Enough for the talk. Let's see the 7th challenge. The assembly code to understand is the following

<f>: 0: movzx edx,BYTE PTR [rdi] 3: mov rax,rdi 6: mov rcx,rdi 9: test dl,dl b: je 29 d: nop DWORD PTR [rax] 10: lea esi,[rdx-0x41] 13: cmp sil,0x19 17: ja 1e 19: add edx,0x20 1c: mov BYTE PTR [rcx],dl 1e: add rcx,0x1 22: movzx edx,BYTE PTR [rcx] 25: test dl,dl 27: jne 10 29: repz ret 
Enter fullscreen mode Exit fullscreen mode

Analysis

The first 4 instructions, give us the impression we are dealing with a string in rdi register. Specially for the copy of character and the jump to the end of the program. The character it's copied into edx register. Let's keep describing the program before we have a complete signature of f.

Next to this, we can see the following instructions

lea esi,[rdx-0x41] cmp sil,0x19 ja 1e 
Enter fullscreen mode Exit fullscreen mode

Something interesting here is the lea esi,[rdx-0x41] instruction which give us the clue that in rdx we might have something with more than 65 bits. Why 65? The magic here is that 0x41 or 65 it's the ASCII code for the character 'A'. Then when we combine these two instructions, what we are checking is if the character is NOT between 'A' and 'Z' ASCII characters. Basically if belongs to the lowercase characters in the English alphabet. If that's the case we jump then to 1e memory position.

Now on this memory position we have the following instructions

add rcx,0x1 movzx edx,BYTE PTR [rcx] test dl,dl jne 10 
Enter fullscreen mode Exit fullscreen mode

Which will pass to the next character in the sequence, and continue with the loop in case the character is not the '\0' character.

The last instructions to analyze, are the following

add edx,0x20 mov BYTE PTR [rcx],dl 
Enter fullscreen mode Exit fullscreen mode

Remember that at this point we have checked if the character is an lowercase letter, so what we have at this point need to be an uppercase letter. When we add 0x20, to an uppercase letter we will get its corresponding lowercase letter.

For example:

'A' ASCII code is 65, after adding 0x20 would be 97, which is indeed the ASCII code for 'a' 
Enter fullscreen mode Exit fullscreen mode

With this we already know what the code does, it's lowercasing a provided string. The code in C would be like this:

void f(char *str) { if (*str == '\0') return; while (*str != '\0') { if (*str - 0x41 > 0x19) { str++; continue; } // lowercasing a character in case is latin letters *str += 0x20; str++; } } 
Enter fullscreen mode Exit fullscreen mode

Which can be expressed shorter as

void f(char *str) { for ( ;*str != '\0'; str++) { if (*str - 0x41 <= 0x19) { *str += 0x20; } } } 
Enter fullscreen mode Exit fullscreen mode

Conclusion

That's it!! The code performs a basic lowercase of a string.

Top comments (0)