Machine language is the lowest possible level of programming. It's called machine language because it involves working directly with the computer hardware. Different computers use different hardware and so those differences are reflected in the machine language of each computer.
Typically when people say "Machine Language" they are actually referring to Assembly Language. Machine language is just the electrical signals bouncing around on the circuit board of your computer. Because these electrical signals have two possible states, On or Off, we use binary numbers as an abstract representation of these electrical signals. Assembly language is a second level of abstraction from the same electrical signals.
A computer is made up of a number of components which work together to make a functional machine.
A microprocessor is the brain of any computer. It is the component responsible for executing programs and generally directing all functions of the computer. In a personal computer the microprocessor is accompanied by other components which perform specialized functions like video, sound and general Input/Output. Taken as a whole these components make up the personality of a personal computer. It is the microprocessor which (loosely) defines aspects such as memory limits, speed and program structure. Understanding the microprocessor is fundamental to understanding computer programming.
Microprocessors don't understand English, nor do they really understand any programming language. You probably have heard that computers only understand numbers but that's not exactly true either. As with any digital circuit, the microprocessor is simply a device which accepts electrical signals and provides a (usually) predictable output in the form of other electrical signals. These signals sometimes take the form of 5 volts or 0 volts. We use binary numbers as an abstract representation of these voltages, thus we communicate with the computer using the abstract language of binary arithmetic. The resulting "language" is represented by long strings of ones and zeros. Long strings of ones and zeros are difficult to read so most programs are composed in languages using English-like words.
As defined above, memory is where programs and data are stored. When we say "memory" we are usually referring to electronic memory. Electronic memory is packaged in an integrated circuit built from thousands of tiny transistors. A transistor is an electronic semiconductor component which can be made to act as a switch. Such a switch may be turned "On" or "Off". It is this property of transistors which allows us to represent the state of the computer hardware using binary numbers.
Binary numbers are well suited for this purpose because in binary there are only two digits, "1" and "0". Humans ususally express values in decimal which represents quantities in powers of 10. Computers use binary which represents quantities in powers of 2.
As stated above, most people don't like working directly with electrical voltage levels or long strings of ones and zeroes. Long ago computer scientist and engineers came up with some solutions. Essentially all three boil down to the same concept; Computers should make work easier and that means computers should make programming easier. We want to program in a language we understand (ie english) and we want a computer program to make the necessary conversion to binary numbers the computer can work with. The programs which convert english-like words to binary numbers are Assemblers, Compilers and Interpretors.
The 6502 is the most popular microprocessor from personal computers of the 70's and 80's. It has appeared in computers manufactured by Apple, Atari, Commodore and others. Over the years many variants of the 6502 have appeared with various improvements and additions. The 6510 used in the C-64 adds an 8-bit bi-directional I/O port. The 65c02 used in the Apple IIc uses CMOS construction to run cooler (sometimes faster) and adds some useful instructions. Other incarnations added timers and internal memory to the 6502.
All variations of the 6502 share the same base instruction set and that is what makes them a "family". This base instruction set is structured in such a way that each instruction is one byte. Each instruction may be immediately followed by one or two operand bytes (or no operand at all.) The number of operand bytes is dictated by the instruction itself.
The 6502 is an 8-bit microprocessor, so named because the data bus and internal registers are 8 bits wide (exception: the Program Counter is 16 bits.) When data larger than 8 bits is to be processed the 6502 must perform multiple fetch operations and will do so automatically for instructions with 16-bit operands. There are three primary busses used to interface 6502 to the outside world:
Most times programmers only worry about the information on the Data and Address Busses, although the Control Bus really does affect programs and how (or whether) they function. The Control signals of the 6502 don't truly represent a "bus" (and that term is an abstraction anyway.) Some of the control signals can be queried through the Processor Status Register (P).
The 6502 sees memory as a long series of sequential addresses. Each address represents a "box" where data is stored. Each "box" contains 8 bits which represet a value from 0 to 255. When the 6502 wants to store or retrieve a value from a particular memory location, it places the address of that memory location on the address bus. The control bus signals whether the access is to be a read (from memory into 6502) or write (from 6502 into memory).
The Address Bus of the 6502 is 16 bits wide. That means 6502 can access only 65,536 unique memory addresses (2^16 bytes.) All of the computer's RAM, ROM and memory-mapped I/O devices must fit within the 65,536 possible addresses. Most 6502-based computer systems manage to do this by Bank Switching. Bank Switching allows more than one device to occupy the same space in the memory map by using a Soft-Switch to control which device is currently present at that address. A Soft-Switch is a kind of memory-mapped I/O which can act as a toggle switch between two possible hardware states.
$0000 | $C000/D000 | $FFFF |
---|---|---|
RAM | I/O | ROM |
6502 contains a number of internal registers which serve as temporary data repositories or which modify/reflect the processor operating mode. Most registers can be read/written by instructions but some are read-only and cannot be directly changed.
1 5......8 | 7......0 | Description |
---|---|---|
A | Accumulator A | |
Y | Index Register Y | |
X | Index Register X | |
PCH | PCL | Program Counter PC |
-------1 | S | Stack Pointer S |
NV-BDIZC | Processor Status Flags P |
The Processor Status Flags allow your program to control/monitor the various modes of processor operation. Here's a detailed description of their functions.
As stated above, assembly language is an abstract representation two levels above the electrical signals in your computer. No human could make heads or tails of the millions of signals jumping across the various circuits at the speed of light every machine cycle. You'd have to be a super-genious to write a program of any size in pure binary numbers. (Although some gifted folks can do this, I don't recommend trying it or you risk your own sanity ;)
Assembly language programs are made up of mnemonics. The function of most mnemonics can be modified by an Addressing Mode. An addressing mode tells the computer where to find the target of any particular mnemonic. In 6502 assembly language, we indicate the addressing mode of a mnemonic in the operand. Considering all addressing modes of all mnemonics, each possible combination of mnemonic/address mode corresponds directly to a unique opcode.
Addressing Mode | Symbol | Operand Size | Description |
---|---|---|---|
Immediate | #value eg LDA #7 | 1 byte | The target of the mnemonic is a single byte immediately following the instruction |
Implied | egINX | 0 bytes | The target of the mnemonic is defined implicitly by the instruction itself |
Accumulator | egASL | 0 bytes | The instruction operates explicitly on the Accumulator; really a form of Implied Addressing |
Absolute | address eg LDA $7777 | 2 bytes | The target is the memory location whose address is the two bytes immediately following the opcode |
Zero Page | address eg LDA $77 | 1 byte | The target is the Zero Page memory location whose address is the byte immediately following the opcode |
Relative | address eg BEQ $7777 | 1 byte | The target is the memory location whose address is calculated by summing the Program Counter with the signed integer which immediately follows the opcode |
Absolute Indexed by X | address,X eg LDA $7777,X | 2 bytes | The target is the memory location whose address is calculated by summing the 16-bit integer immediately following the opcode with the contents of the X register |
Zero Page Indexed by X | address,X eg LDA $77,X | 1 byte | The target is the memory location whose address is calculated by summing the 8-bit integer immediately following the opcode with the contents of the X register. The sum is restricted to 8 bits, thus the target is always in Zero Page |
Absolute Indexed by Y | address,Y eg LDA $7777,Y | 2 bytes | The target is the memory location whose address is calculated by summing the 16-bit integer immediately following the opcode with the contents of the Y register |
Zero Page Indexed by Y | address,Y eg LDA $77,Y | 1 byte | The target is the memory location whose address is calculated by summing the 8-bit integer immediately following the opcode with the contents of the Y register. The sum is restricted to 8 bits, thus the target is always in Zero Page |
Indirect | (address) eg JMP ($7777) | 2 bytes | The target is the memory location whose address is contained in the two memory locations pointed to by the two bytes immediately following the opcode. This addressing mode has a known bug which causes the target to be miscalculated when those two memory locations cross a page boundary; the bug was corrected as of the WDC 65c02 |
Indirect Indexed (AKA "Indirect by Y") | (address),Y eg LDA ($77),Y | 1 byte | The target is the memory location whose address is calculated by summing the 16-bit integer contained in the two Zero Page memory locations pointed to by the byte immediately following the opcode with the contents of the Y register |
Indexed Indirect (AKA "Indirect by X") | (address,x) eg LDA ($77,X) | 1 byte | The target is the memory location whose address is contained in the two memory locations whose Zero Page address is calculated by summing the byte immediately following the opcode with the contents of the X register |
As you can see, there is an addressing mode for every occasion. The 6502 programmer should take great care in selecting the addressing mode which best suits his needs. Most times addressing modes are selected based on what is easiest to code and that is usually fine. Sometimes when speed and memory constraints are critical considerations it pays to spend some time considering what mode to use.
As you now know, a 6502 address is 16 bits wide. That means two 8-bit bytes are used to represent an address. The byte with the greatest numerical value is called the high-byte; the byte with the lowest numerical value is called the low-byte. You also should know that addresses in ML programs are stored low-byte first. In the address $C0F0 the high-byte is $C0, the low-byte is $F0 and the address is stored in memory as "... $F0,$C0 ..."
Binary | 1100000011110000 | becomes | 11110000,11000000 |
---|---|---|---|
Hexadecimal | $C0F0 | becomes | $F0,$C0 |
Decimal | 49392 | becomes | 240,192 |
There is a more advantageous way to view memory which is closer to the way the 6502 actually works. Imagine memory as a book with 256 pages and each page containing 256 words. Now the high-byte becomes the Page Address and the low-byte becomes the Byte Offset. This analogy works and the terminology makes more sense than to view memory as one big line.
$00 | $01 | $02 | $03 | $04 | $05 | $06 | $07 | $08 | $09 | $0A | $0B | $0C | $0D | $0E | $0F |
$10 | $11 | $12 | $13 | $14 | $15 | $16 | $17 | $18 | $19 | $1A | $1B | $1C | $1D | $1E | $1F |
$20 | $21 | $22 | $23 | $24 | $25 | $26 | $27 | $28 | $29 | $2A | $2B | $2C | $2D | $2E | $2F |
$30 | $31 | $32 | $33 | $34 | $35 | $36 | $37 | $38 | $39 | $3A | $3B | $3C | $3D | $3E | $3F |
$40 | $41 | $42 | $43 | $44 | $45 | $46 | $47 | $48 | $49 | $4A | $4B | $4C | $4D | $4E | $4F |
$50 | $51 | $52 | $53 | $54 | $55 | $56 | $57 | $58 | $59 | $5A | $5B | $5C | $5D | $5E | $5F |
$60 | $61 | $62 | $63 | $64 | $65 | $66 | $67 | $68 | $69 | $6A | $6B | $6C | $6D | $6E | $6F |
$70 | $71 | $72 | $73 | $74 | $75 | $76 | $77 | $78 | $79 | $7A | $7B | $7C | $7D | $7E | $7F |
$80 | $81 | $82 | $83 | $84 | $85 | $86 | $87 | $88 | $89 | $8A | $8B | $8C | $8D | $8E | $8F |
$90 | $91 | $92 | $93 | $94 | $95 | $96 | $97 | $98 | $99 | $9A | $9B | $9C | $9D | $9E | $9F |
$A0 | $A1 | $A2 | $A3 | $A4 | $A5 | $A6 | $A7 | $A8 | $A9 | $AA | $AB | $AC | $AD | $AE | $AF |
$B0 | $B1 | $B2 | $B3 | $B4 | $B5 | $B6 | $B7 | $B8 | $B9 | $BA | $BB | $BC | $BD | $BE | $BF |
$C0 | $C1 | $C2 | $C3 | $C4 | $C5 | $C6 | $C7 | $C8 | $C9 | $CA | $CB | $CC | $CD | $CE | $CF |
$D0 | $D1 | $D2 | $D3 | $D4 | $D5 | $D6 | $D7 | $D8 | $D9 | $DA | $DB | $DC | $DD | $DE | $DF |
$E0 | $E1 | $E2 | $E3 | $E4 | $E5 | $E6 | $E7 | $E8 | $E9 | $EA | $EB | $EC | $ED | $EE | $EF |
$F0 | $F1 | $F2 | $F3 | $F4 | $F5 | $F6 | $F7 | $F8 | $F9 | $FA | $FB | $FC | $FD | $FE | $FF |
Page $00 | Page $01 | ... | Page $CF | Page $D0 | ... | Page $DF | Page $E0 | ... | Page $FF |
---|---|---|---|---|---|---|---|---|---|
RAM | I/O | ROM |