How to Read a Character in Assembly Language
Arrays, Address Arithmetic, and Strings
CS 301: Associates Language Programming Lecture, Dr. LawlorIn both C or assembly, you tin classify and access retentiveness in several different sizes:
| C/C++ datatype | Bits | Bytes | Register | Access memory | Allocate retention |
| char | eight | one | al | BYTE [ptr] | db |
| short | 16 | ii | ax | WORD [ptr] | dw |
| int | 32 | 4 | eax | DWORD [ptr] | dd |
| long | 64 | 8 | rax | QWORD [ptr] | dq |
For case, nosotros tin put full 64-bit numbers into retentivity using "dq" (Data Quad-word), and so read them back out with QWORD[yourLabel].
We can put individual bytes into memory using "db" (Information Byte), and then read them dorsum with BYTE[yourLabel].
C Strings in Assembly
In manifestly C, you lot can put a string on the screen with the standard C library "puts" function:
puts("Yo!"); (Try this in NetRun at present!)
You can aggrandize this out a bit, by declaring a string variable. In C, strings are stored as (constant) character pointers, or "const char *":
const char *theString="Yo!"; puts(theString);
(Try this in NetRun now!)
Internally, the compiler does two things:
- Allocates memory for the string, and initializes the retention to 'Y', 'o', '!', and a special zero byte called a nul terminator that marks the terminate of the string.
- Points theString to this allocated memory.
In assembly, these are separate steps:
- Allocate retention with thedb(Data Byte) pseudo instruction, and store characters there, like db `Yo!`,0
- Dissimilar C++, y'all can declare a string using any of the three quotes: "doublequotes", 'singlequotes', or `backticks` (backtick is on your keyboard beneath tilde ~)
- Yet, newlines similar \n Merely piece of work within backticks, an odd peculiarity of the assembler nosotros use (nasm).
- Note we manually added ,0 after the string to insert a zero byte to terminate the cord.
- If you forget to terminate the cord, puts can impress neat garbage after the string until it hits a 0.
- Point at this memory using a jump label, just like we were going to jmp to the cord.
Here'southward an example:
mov rdi, theString ; rdi points to our string extern puts ; declare the function call puts ; call it ret theString: ; label, just similar for jumping db `Yo!`,0 ; data bytes for string (don't forget nul!)
(Endeavour this in NetRun now!)
In assembly, there's no syntax divergence between:- a label designed for a leap educational activity (a block of code)
- a label designed for a call education (a function ending in ret)
- a label designed equally a string arrow (a nul-terminated string)
- a label designed as a information pointer (allocated with dq)
- or many other uses--it's simply a pointer!
Nosotros can too change the pointer, to move downward the string. Since each char is one byte, moving by 4 bytes moves past iv chars here, printing "o assembly":
mov rdi, theString ; rdi points to our string
add rdi,4 ; motion down the cord by 4 chars
extern puts ; declare the function call puts ; call it ret theString: ; label, simply similar for jumping db `Hello assembly`,0 ; data bytes for string
(Effort this in NetRun now!)
Accost Arithmetic
If y'all classify more than one constant with dq, they announced at larger addresses. (Recall that this is backwards from the stack, which pushes each additional detail at an ever-smaller address.) So this reads the 5, like you'd expect:
dos_equis: dq five ; writes this abiding into a "Data Qword" (8 byte cake) dq thirteen ; writes another constant, at [dos_equis+8] (bytes) foo: mov rax, [dos_equis] ; read retention at this characterization ret
(Endeavor this in NetRun now!)
Adding 8 bytes (the size of a dq, 8-byte / 64-flake QWORD) from the starting time constant puts us directly on superlative of the second constant, thirteen:
dos_equis: dq 5 ; writes this constant into a "Data Qword" (8 byte block) dq xiii ; writes another constant, at [dos_equis+8] (bytes) foo: mov rax, [dos_equis+eight] ; read memory at this characterization, plus 8 bytes ret
(Endeavor this in NetRun at present!)
Accessing an Assortment
An "assortment" is just a sequence of values stored in ascending guild in memory. If nosotros listed our data with "dq", they show upwardly in memory in that gild, and so we can exercise pointer arithmetics to option out the value we want. This returns seven:
mov rcx,my_arr ; rcx == address of the array
mov rax,QWORD [rcx+one*eight] ; load element 1 of array
retmy_arr:
dq 4 ; array chemical element 0, stored at [my_arr]
dq 7 ; assortment chemical element 1, stored at [my_arr+8]
dq 9 ; assortment chemical element 2, stored at [my_arr+16]
(Try this in NetRun now!)
Did y'all ever wonder why the first assortment chemical element is [0]? It'southward because information technology's zip bytes from the start of the arrow!Go along in heed that each array element above is a "dq" or an 8-byte long, so I move down past eight bytes during indexing, and I load into the 64-bit "rax".
If the array is of 4-byte integers, we'd
declare them with "dd" (information DWORD), move down by iv bytes per int array element, and store the respond in a 32-bit register like "eax". Only the arrow annals is e'er 64 bits!mov rcx,my_arr ; rcx == accost of the array
mov eax,DWORD [rcx+one*4] ; load chemical element one of assortment
retmy_arr:
dd 0xaaabbbcc ; assortment chemical element 0, stored at [my_arr]
dd 0xc001007 ; array element i, stored at [my_arr+iv]
(Endeavor this in NetRun now!)
Information technology's extremely easy to accept a mismatch between one or the other of these values. For case, if I declare values with dw (2 byte shorts), merely load them into eax (4 bytes), I'll have loaded two values into one register. So this lawmaking returns 0xbeefaabb, which is 2 xvi-chip values combined into ane 32-bit register:mov rcx,my_arr ; rcx == address of the assortment
mov eax,[rcx] ; load element 0 of array (OOPS! 32-bit load!)
retmy_arr:
dw 0xaabb ; assortment element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]
(Try this in NetRun now!)
You can reduce the likelihood of this type of error by calculation explicit retentiveness size specifier, like "WORD" below. That makes this a compile mistake ("error: mismatch in operand sizes") instead of returning the wrong value at runtime.mov rcx,my_arr ; rcx == address of the array
mov eax, WORD [rcx] ; load element 0 of array (OOPS! 32-bit load!)
retmy_arr:
dw 0xaabb ; assortment element 0, stored at [my_arr]
dw 0xbeef ; assortment element 1, stored at [my_arr+2]
(Try this in NetRun now!)
(If nosotros really wanted to load a 16-bit value into a 32-bit register, we could use "movzx" (unsigned) or "movsx" (signed) instead of a plain "mov".)| C++ | Bits | Bytes | Assembly Create | Associates Read | Example |
| char | 8 | ane | db (data byte) | mov al, BYTE[rcx+i*ane] | (Attempt this in NetRun now!) |
| brusque | 16 | two | dw (information WORD) | mov ax, WORD [rcx+i*2] | (Try this in NetRun now!) |
| int | 32 | 4 | dd (information DWORD) | mov eax, DWORD [rcx+i*iv] | (Endeavour this in NetRun now!) |
| long | 64 | 8 | dq (data QWORD) | mov rax, QWORD [rcx+i*eight] | (Try this in NetRun now!) |
| Human being | C++ | Assembly |
| Declare a long integer. | long y; | rdx (nothing to declare, simply use a register) |
| Re-create one long integer to another. | y=x; | mov rdx,rax |
| Declare a pointer to an long. | long *p; | rax (nothing to declare, employ whatsoever 64-bit register) |
| Dereference (look upwards) the long. | y=*p; | mov rdx,QWORD [rax] |
| Find the accost of a long. | p=&y; | mov rax,place_you_stored_Y |
| Access an assortment (easy mode) | y=p[2]; | (sorry, no easy way exists!) |
| Admission an array (hard manner) | p=p+ii; y=*p; | add rax,2*8; (move frontward past two 8 byte longs) mov rdx, QWORD [rax] ; (take hold of that long) |
| Access an array (too clever) | y=*(p+two) | mov rdx, QWORD [rax+ii*8]; (yes, that really works!) |
Loading from the wrong place, or loading the wrong amount of data, is an INCREDIBLY Mutual problem when using pointers, in any linguistic communication. You Will make this mistake at some signal over the grade of the semester, and this results in a crash (rare) or the wrong data (most often some foreign shifted & spliced integer), and so be careful!
Walking Pointers Down Arrays
There'due south a classic terse C idiom for iterating through a string, by incrementing a char * to walk down through the bytes until yous hit the zero byte at the end: while (*p++!=0) { /* exercise something to *p */ } If you unpack this a bit, you find:
- p points to the first char in the string.
- *p is the offset char in the string.
- p++ adds 1 to the arrow, moving to the side by side char in the string.
- *p++ extracts the first char, and moves the pointer down.
- *p++!=0 checks if the first char is goose egg (the finish of the cord), and moves the arrow down
Here'due south a typical case, in C:
char due south[]="cord"; // declare a cord char *p=s; // point to the start while (*p++!=0) if (*p=='i') *p='a'; // replace i with a puts(south);
(Endeavour this in NetRun now!)
Hither'southward a similar pointer-walking play a joke on, in assembly:
mov rdi,stringStart again: add rdi,1 ; move pointer down the string cmp BYTE[rdi],'a' ; did we hit the letter 'a'? jne once more ; if not, keep looking extern puts telephone call puts ret stringStart: db 'this is a corking cord',0
(Effort this in NetRun now!)
(We'll see how to declare modifiable strings later.)shieldswassied1955.blogspot.com
Source: https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_15_strings_arrays.html
0 Response to "How to Read a Character in Assembly Language"
Post a Comment