How to Read a Character in Assembly Language

Arrays, Address Arithmetic, and Strings

CS 301: Associates Language Programming Lecture, Dr. Lawlor

In both C or assembly, you tin classify and access retentiveness in several different sizes:

C/C++ datatype Bits Bytes Register Access memory Allocate retention
char eight one al BYTE [ptr] db
short 16 ii ax WORD [ptr] dw
int 32 4 eax DWORD [ptr] dd
long 64 8 rax QWORD [ptr] dq

For case, nosotros tin put full 64-bit numbers into retentivity using "dq" (Data Quad-word), and so read them back out with QWORD[yourLabel].

We can put individual bytes into memory using "db" (Information Byte), and then read them dorsum with BYTE[yourLabel].

C Strings in Assembly

In manifestly C, you lot can put a string on the screen with the standard C library "puts" function:

puts("Yo!");      

(Try this in NetRun at present!)

You can aggrandize this out a bit, by declaring a string variable.  In C, strings are stored as (constant) character pointers, or "const char *":

const char *theString="Yo!"; puts(theString);      

(Try this in NetRun now!)

Internally, the compiler does two things:

  • Allocates memory for the string, and initializes the retention to  'Y', 'o', '!', and a special zero byte called a nul terminator that marks the terminate of the string.
  • Points theString to this allocated memory.

In assembly, these are separate steps:

  • Allocate retention with thedb(Data Byte) pseudo instruction, and store characters there, like    db `Yo!`,0
    • Dissimilar C++, y'all can declare a string using any of the three quotes: "doublequotes", 'singlequotes', or `backticks` (backtick is on your keyboard beneath tilde ~)
    • Yet, newlines similar \n Merely piece of work within backticks, an odd peculiarity of the assembler nosotros use (nasm).
  • Note we manually added ,0 after the string to insert a zero byte to terminate the cord.
    • If you forget to terminate the cord, puts can impress neat garbage after the string until it hits a 0.
  • Point at this memory using a jump label, just like we were going to jmp to the cord.

Here'southward an example:

mov rdi, theString ; rdi points to our string extern puts  ; declare the function call puts    ; call it ret  theString:    ; label, just similar for jumping 	db `Yo!`,0  ; data bytes for string (don't forget nul!)      

(Endeavour this in NetRun now!)

In assembly,  there's no syntax divergence between:
  • a label designed for a leap educational activity (a block of code)
  • a label designed for a call education (a function ending in ret)
  • a label designed equally a string arrow (a nul-terminated string)
  • a label designed as a information pointer (allocated with dq)
  • or many other uses--it's simply a pointer!

Nosotros can too change the pointer, to move downward the string.  Since each char is one byte, moving by 4 bytes moves past iv chars here, printing "o assembly":

mov rdi, theString ; rdi points to our string
add rdi,4 ; motion down the cord by 4 chars
extern puts ; declare the function call puts ; call it ret theString: ; label, simply similar for jumping db `Hello assembly`,0 ; data bytes for string

(Effort this in NetRun now!)

Accost Arithmetic

If y'all classify more than one constant with dq, they announced at larger addresses.  (Recall that this is backwards from the stack, which pushes each additional detail at an ever-smaller address.)  So this reads the 5, like you'd expect:

dos_equis: 	dq five   ; writes this abiding into a "Data Qword" (8 byte cake) 	dq thirteen  ; writes another constant, at [dos_equis+8] (bytes)   foo: 	mov rax, [dos_equis] ; read retention at this characterization 	ret

(Endeavor this in NetRun now!)

Adding 8 bytes (the size of a dq, 8-byte / 64-flake QWORD) from the starting time constant puts us directly on superlative of the second constant, thirteen:

dos_equis: 	dq 5   ; writes this constant into a "Data Qword" (8 byte block) 	dq xiii  ; writes another constant, at [dos_equis+8] (bytes)  foo: 	mov rax, [dos_equis+eight] ; read memory at this characterization, plus 8 bytes 	ret

(Endeavor this in NetRun at present!)

If you add anything between 0 and eight, similar adding 1 byte, you will load part of the 5 and office of the 13, resulting in a weirdly split and shifted result.

Accessing an Assortment

An "assortment" is just a sequence of values stored in ascending guild in memory.  If nosotros listed our data with "dq", they show upwardly in memory in that gild, and so we can exercise pointer arithmetics to option out the value we want.  This returns seven:

mov rcx,my_arr ; rcx == address of the array
mov rax,QWORD [rcx+one*eight] ; load element 1 of array
ret

my_arr:
dq 4 ; array chemical element 0, stored at [my_arr]
dq 7 ; assortment chemical element 1, stored at [my_arr+8]
dq 9 ; assortment chemical element 2, stored at [my_arr+16]

(Try this in NetRun now!)

Did y'all ever wonder why the first assortment chemical element is [0]?  It'southward because information technology's zip bytes from the start of the arrow!

Go along in heed that each array element above is a "dq" or an 8-byte long, so I move down past eight bytes during indexing, and I load into the 64-bit "rax".

If the array is of 4-byte integers, we'd

declare them with "dd" (information DWORD), move down by iv bytes per int array element, and store the respond in a 32-bit register like "eax".  Only the arrow annals is e'er 64 bits!
mov rcx,my_arr ; rcx == accost of the array
mov eax,DWORD [rcx+one*4] ; load chemical element one of assortment
ret

my_arr:
dd 0xaaabbbcc ; assortment chemical element 0, stored at [my_arr]
dd 0xc001007 ; array element i, stored at [my_arr+iv]

(Endeavor this in NetRun now!)

Information technology's extremely easy to accept a mismatch between one or the other of these values.  For case, if I declare values with dw (2 byte shorts), merely load them into eax (4 bytes), I'll have loaded two values into one register.  So this lawmaking returns 0xbeefaabb, which is 2 xvi-chip values combined into ane 32-bit register:
mov rcx,my_arr ; rcx == address of the assortment
mov eax,[rcx] ; load element 0 of array (OOPS! 32-bit load!)
ret

my_arr:
dw 0xaabb ; assortment element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]

(Try this in NetRun now!)

You can reduce the likelihood of this type of error by calculation explicit retentiveness size specifier, like "WORD" below.  That makes this a compile mistake ("error: mismatch in operand sizes") instead of returning the wrong value at runtime.
mov rcx,my_arr ; rcx == address of the array
mov eax, WORD [rcx] ; load element 0 of array (OOPS! 32-bit load!)
ret

my_arr:
dw 0xaabb ; assortment element 0, stored at [my_arr]
dw 0xbeef ; assortment element 1, stored at [my_arr+2]

(Try this in NetRun now!)

(If nosotros really wanted to load a 16-bit value into a 32-bit register, we could use "movzx" (unsigned) or "movsx" (signed) instead of a plain "mov".)
C++
Bits
Bytes
Assembly Create
Associates Read
Example
char 8
ane
db (data byte)
mov al, BYTE[rcx+i*ane]
(Attempt this in NetRun now!)
brusque 16
two
dw (information WORD)
mov ax, WORD [rcx+i*2] (Try this in NetRun now!)
int 32
4
dd (information DWORD)
mov eax, DWORD [rcx+i*iv] (Endeavour this in NetRun now!)
long 64
8
dq (data QWORD)
mov rax, QWORD [rcx+i*eight] (Try this in NetRun now!)
Human being C++ Assembly
Declare a long integer. long y; rdx (nothing to declare, simply use a register)
Re-create one long integer to another. y=x; mov rdx,rax
Declare a pointer to an long. long *p; rax    (nothing to declare, employ whatsoever 64-bit register)
Dereference (look upwards) the long. y=*p; mov rdx,QWORD [rax]
Find the accost of a long. p=&y; mov rax,place_you_stored_Y
Access an assortment (easy mode) y=p[2]; (sorry, no easy way exists!)
Admission an array (hard manner) p=p+ii;
y=*p;
add rax,2*8; (move frontward past two 8 byte longs)
mov rdx, QWORD [rax] ;  (take hold of that long)
Access an array (too clever) y=*(p+two) mov rdx, QWORD [rax+ii*8];  (yes, that really works!)

Loading from the wrong place, or loading the wrong amount of data, is an INCREDIBLY Mutual problem when using pointers, in any linguistic communication.  You Will make this mistake at some signal over the grade of the semester, and this results in a crash (rare) or the wrong data (most often some foreign shifted & spliced integer), and so be careful!

Walking Pointers Down Arrays

There'due south a classic terse C idiom for iterating through a string, by incrementing a char * to walk down through the bytes until yous hit the zero byte at the end:
        while (*p++!=0) { /* exercise something to *p   */ }

If you unpack this a bit, you find:

  • p points to the first char in the string.
  • *p is the offset char in the string.
  • p++ adds 1 to the arrow, moving to the side by side char in the string.
  • *p++ extracts the first char, and moves the pointer down.
  • *p++!=0  checks if the first char is goose egg (the finish of the cord), and moves the arrow down

Here'due south a typical case, in C:

char due south[]="cord";   // declare a cord char *p=s;           // point to the start while (*p++!=0) if (*p=='i') *p='a';  // replace i with a puts(south);      

(Endeavour this in NetRun now!)

Hither'southward a similar pointer-walking play a joke on, in assembly:

mov rdi,stringStart again: 	add rdi,1 ; move pointer down the string 	cmp BYTE[rdi],'a' ; did we hit the letter 'a'? 	jne once more  ; if not, keep looking  extern puts telephone call puts ret  stringStart: 	db 'this is a corking cord',0      

(Effort this in NetRun now!)

(We'll see how to declare modifiable strings later.)

shieldswassied1955.blogspot.com

Source: https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_15_strings_arrays.html

0 Response to "How to Read a Character in Assembly Language"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel