59d0 Assembler - Telecomix Crypto Munitions Bureau

Assembler

From Telecomix Crypto Munitions Bureau

Jump to: navigation, search

Assembler is a large number of languages for programming microprocessors. There is often one dialect for each single processor release. Assembler for the 80386 processor is not same as for the 80486, there are more words in the 486 language.

Warning: This article will take forever before it gets anywhere near completion. The topic requires HUGE amounts of text to be written.

Contents

[edit] x86

x86 is used by PC, and nowadays also Mac computers.

The x86 language with its hundreds of dialects is probably the most used machine language. There are at least five different syntaxes for describing the instructions: gas, yasm, masm, fasm and the now very old and not so often used tasm.

  • gas is the GNU assembler. Its syntax is reversed and feels like it is built for compilers, not humans.
  • yasm is a very simple assembler. Its syntax is built for humans. Successor of the nasm assembler.
  • masm Microsoft assembler. It is a bit confused and is mostly used for writing windows EXE files.
  • fasm is the fast assembler. It is the only assembler language that has a self-assembling assembler, that i know of.
  • tasm is an oldskool assembler that was used for demo's and viruses.

Yasm is the assembler of choice for this tutorial.

[edit] Introduction

Understanding by examples. This is yasm syntax. (Go check the web pages for more documentation.)

16-bit real mode code. Only works for 386 and newer processors (since it uses 32-bit registers.) This code might not work if there is an operating system running. It set the cursors position on the terminal, using an old Cathode Ray Tube (CRT) controller chip present on all PC computers (if you run linux, pressing CTRL+ALT+1 gets you into this terminal mode, in most default installations). This code works even today, but the CRT chip has probably often been replaced by BIOS SMM emulation or similar.

set_cursor_pos:
	; al = x, ah = y

	push eax
	push ebx
	push ecx
	push edx

	xor ebx, ebx

	mov bl, al
	mov ecx, ebx
	mov bl, ah
	mov eax, 80
	mul bx
	add eax, ecx			; calculate offset for the crtc

	; eax = ax = y * 25 + x

	mov edx, 0x3D4
	mov ecx, eax
	mov al, 0x0F
	out dx, al
	mov eax, ecx
	inc edx
	out dx, al
	mov al, 0x0E
	dec edx
	out dx, al
	mov eax, ecx
	mov al, ah
	inc edx
	out dx, al			; send the offset to crtc

	pop edx
	pop ecx
	pop ebx
	pop eax
	ret

; quick crtc tutorial:
; index port = 3D4
; data port = 3D5
; index 0E = high cursor offset
; index 0F = low cursor offset
; 
; the cursor is set to the coordinates in these two crtc-registers (hi+lo-byte
; share the offset of y*columns+x, wich points to the cursor.. kinda)

[edit] x86 is CISC

  • Means it has variable opcode length.
  • Means it is a bit stupid. But everyone use it.

[edit] Real mode (16-bit)

  • This is the state the computer enters after a boot, often before the operating system has begun loading
  • Addressing: 20-bit, with segment and offset

[edit] Protected mode (32-bit)

  • A bunch of data structures that is read by the CPU
  • Offset within a segment may be anything from 0 to 2^32-1 ("4GB segments")
  • Paging
    • W^X
  • Task selectors for machine-aided context switches (was cool when we had 166MHz computers)
  • Extended interrupt vector table
  • Rings (ring 0 is operating system mode with full privilege, ring 3 is application program mode. Ring 1 and 2 is almost never used, but has some more privileges than ring 3.)

[edit] Long mode (64-bit)

  • No segments, only offsets (Think: a single 16 billion GB large segment)
  • lots of relative branching (to save space, i guess, otherwise each branch would be at least 9 bytes long)
  • sysenter table
  • paging
  • Gather data here, eventually

[edit] Other more esoteric modes

  • System Management Mode (BIOS uses this to fool the operating system by emulating devices that does not exist)
  • Unreal mode (used by DOS 4GB extenders)

[edit] How to switch between modes

put code here.

[edit] Virtualization

  • Gather data here, eventually

[edit] Trusted platform module

  • Gather data here, eventually

[edit] Example of a simple operating system

  • modify the SNAFU code

[edit] Other processor families

  • ARM
    • Routers, switches, iPods. Stuff that needs efficient small processors.
    • Really cute assembler language
  • SPARC and UltraSPARC
    • Corporate and scientific machinery. Birthplace of the Solaris operating system.
  • MIPS
    • Small and energy-efficient architecture, like ARM.
  • Alpha (now dead)
    • Shady corporate machines only
  • Itanium
    • Executes bundles of three instructions each. Each bundle consists of a 5-bit header specifying type of instruction coding, followed by three 41-bit long opcodes.
    • CPUs does not automatically try to optimize execution of code, as with other architectures. Instead, it relies heavily on aid from the compiler (or the assembler programmer) to describe how data will be used in the future, in order to avoid accessing data before it has been assigned values.
    • In other ways its a bit like x86, if one ignores that the code looks really weird
    • Pretty dead?
  • HC11
    • Small industrial processor, 16-bit
  • Z80
    • ZiLog, the company owning Z80 was formed as a splinter group from Intel. Z80 had the chance to become the x86 in the late 80ies, but it failed for some reason. Z80 still exist and is one of the most sold processors in all history.
    • Very much like x86 in its design
  • 6502
    • Nintendo 8-bit for NES and with an added 16 Bit mode for the SNES
    • C64
    • Apple I and Apple II
  • IBM mainframes has their own assembler language.
Personal tools
< 7 /html> 0