Robert M Posted November 18, 2004 Share Posted November 18, 2004 Assembly Language Programming - Lesson 9 - The Memory Model. ------------------------------------------------------- In lesson 8, I introduced the parts of the 650x microprocessor that are important to you the budding assembly language programmer. Another name for a microprocessor is Central Processing Unit which can be abbreviated as CPU. I am tired of typing processor and from now on I will tend to say CPU instead. In this lesson, we are going to understand the rest of the computer system as seen by the programmer. I will start with a general model and then I will present the specifics of the Atari VCS system. This lesson is long, and it may be difficult to assimilate all the information here. I recommend reading it through at least once. As we progress to later lessons you will see the information here coming into play as we start to write code. You can then refer back to this lesson to help you understand what is going on inside the VCS when your program is running. The Generic System: =================== +------+ (Unidirectional 16-bit address bus) +---------+ | |======+===============+================+========>| (ADL) | | 650x |(R/W) | | | | Address | | CPU |------|-----------+---|----------+-----|-------->| Decode | | | | | | | | | Logic | | | V V V V V | | | | +--------+ +--------+ +-------------+ +---------+ +->| | | ROM | | RAM | | Peripherals | | | | | +------+ | Memory | | Memory | | | | | | | ^ +--------+ +--------+ +-------------+ (enables) +------+ | ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | |System| | | | | | | | | | | | | | |Clock |---|-------|--+-|--------|--+-|-------+ | +---------+ | | +------+ | | | | +---------|--------------------+ | | | +--------|--------------|-----------------------+ | | | | +=======+=============+==============+ (Bi-directional 8-bit data bus) The diagram above shows the simplest model of a computer system I could conceive. It shows five basic components connected together by wires that carry information in the form of bits as digital signals between them: System Clock: ************* The system clock is the heartbeat of any digital computer. In any digital computer, each tick of the clock means that a piece of work has been completed by the system. For each tick of the clock, the system transitions from one state to another state within the set of all possible states for the system as constricted by the program running on the machine. Between the ticks the circuits of the system take that time to settle into their a new final state and wait for the next tick to set them into motion again. Every computer system has a maximum system clock speed it can handle. The limit is determined by how long it takes the circuits to settle to a final state after each tick. Our generic system shows a single clock, many systems actually have multiple clocks that are either in sync or out of sync mulitples of each other. Even the humble Atari VCS has mulitple clocks. The video color clock that the TIA chip in the VCS uses to create the TV signal is 3 times the speed of the CPU clock. CPU: **** The internals of the CPU was the focus of the previous lesson. What's new here is that the CPU is connected to other external devices. The system clock is connected to the CPU. Every time the system clock ticks the CPU completes a logical operation. In the 650X family of processors the CPU takes 2 to 7 ticks (cycles) of the system clock to complete execution of a single assembly language instruction. The simplest instructions take 2 clock cycles, and the most complex instructions take 7 clock cycles. When you write a game for the Atari VCS you will need to keep track of how many clock cycles your code takes to execute. You will have only 76 CPU clock cycles available on each horizontal scanline of the television image in which to execute the instructions that draw your game. A quick bit of math shows that this means you can use at the minimum 10 complex instructions and at the most 38 simple instructions to draw a single line of your game's picture. In reality your code will be a mix of simple and complex instructions. Fun stuff, which will all be covered in detail in later lessons. The CPU can't do much on its own. It needs the clock to tick, and it needs 2 kinds of memory RAM and ROM. If you want a computer system to be able to receive input (e.g. joysticks) and produce output (e.g. TV picture) then it needs peripherals to provide those services. The CPU communicates with memory and peripherals through two buses of wires. A bus is a set of wires used to carry multiple bits of information all at the same time. Each wire in a bus carries a single bit at a time. In the system above there are 2 buses: the address bus and the data bus. The address bus is a unidirectional bus in our sample system. That means that only the CPU is allowed to put 16-bit binary numbers onto the address bus. All other components connected to the address bus are limited to just listening. It doesn't have to be that way, the address bus can be shared by the CPU and peripherals, but that is a complication we don't want to go into here. With the address bus being 16-bits our system can have 2^16 = 65536 different addresses. Those 65536 addresses form the address space of the system. Each memory or peripheral component in the system is assigned a set of addresses in the address space. Much like each house on a street has its own address number. When the CPU wants to communicate with memory or a peripheral device, it puts the address of that device onto the address bus. I will discuss this process more later. The data bus in our system is 8-bits wide. The data bus is a bi-directional bus. That means that the CPU can put a byte (8 bits) of data onto the bus to be read by memory or a peripheral device, or an external device can put a byte of data on the data bus as requested by the CPU via the address bus and the CPU will read the data from the data bus into one of its 6 registers (see lesson 8) ROM Memory: *********** ROM stands for Read-Only Memory. Each byte of data stored in a ROM has its own unique address. Read-Only means that the CPU can read bytes of information stored in the ROM into its registers, but the CPU can not write any bytes into the ROM and store them there. If a CPU attempts to write a byte to a ROM, then ROM ignores the CPU. In a cartridge based system like the VCS, all program code is stored in a ROM. In a computer like the C64, there are programs stored in ROM, like the Basic interpreter. There is also lots of RAM where programs are loaded into from a disk or other peripheral. RAM Memory: *********** RAM stands for Random Access Memory. Personally I hate the name RAM as it doesn't tell you squat. A better name would be WAM for Writes Allowed Memory. RAM is a collection of byte sized storage areas each given their own address. The CPU can read from a byte of RAM or write to one as well. When a CPU writes to a RAM memory location, the new byte of information completely replaces whatever value was previously stored in that RAM location and the old value is lost forever. The contents of the CPU stack are stored in RAM. The variables of your program are stored in RAM. Variables are any values that need to change over time. The position of a player on the screen for example. Peripherals: ************ Without peripherals the CPU could not communicate with the outside world. Peripherals receive electrical signals from outside the system and convert them into bits of data in their internal registers which the CPU can then read. Some peripherals do the opposite and take bytes written to their registers by the CPU which the peripheral then converts into an electrical signal transmitted out of the system. The joysticks and paddles are an example of inputs to the VCS system that are translated by a peripheral device into bits the CPU can read. The switches on the console face are inputs as well. The video out signal of a VCS is an example of an output signal from a peripheral in the VCS. Address Decode Logic: ********************* I briefly mentioned earlier that each RAM and ROM byte of memory, and the registers of each peripheral device are assigned a unique address within the computer system. Most devices are made so that they only look at a few address lines on the address bus and not all 16. For example, the VCS has 128 bytes of RAM. It takes 7 bits to enumerate 128 (2^7) objects (bytes of RAM). So the RAM device in the VCS only monitors the first 7 bits of the address bus to know which RAM byte the CPU wants to access. The CPU can address 2^16 = 65536 addresses, and the RAM is only using 128 of those addresses, but if the RAM is not looking at all 16 address lines it won't see the complete address. How does the RAM know that the CPU wants it to respond and not some other device? If more than one external device responds to a query by the CPU their answers on the data bus will overlap and be mixed together. This is a bad condition known as "cross-talk". So how can a computer system avoid cross-talk? The solution is the Address Decoder Logic in the system, which I am going to start abbreviating as ADL. The ADL is connected to the address lines that the other devices are not connected to. It monitors the signals from the CPU on those lines and in response it creates an "enable" signal on one of the wires running from the ADL to each device in the system (take a moment to find the wires in the diagram that carry the enable signal to each device. Each device will ignore all addresses placed on the address bus by the CPU unless the ADL has also sent an enable signal on the device's enable line. Then and only then will the device listen to the processor and respond as asked. ******************************************************************************** NOTE: There is an additional signal from the CPU called the Read/Write signal or R/W. It is a single bit to indicate to the target device whether the CPU wants to Read a byte of data from the device, or Write a byte of data to the device. The R/W signal is on the diagram just below the Address bus. It is connected to all devices capable of Reads and Writes. For a ROM, a read operation can be assumed since it is a Read-Only device. ********************************************************************************* The main function of the ADL is to take the possible 64K addresses from the CPU and divide them among the various memory and peripheral devices in the system. The 64K addresses of the CPU are called the address space, which is divided by the ADL into chunks of the sizes needed by each device. It is completely possible to write a list of all 64K addresses and list which device they are assigned to. Such a list is called a Memory Map since it maps each address to a device. As an assembly language programmer you must learn how to read a memory map so you can use those addresses in your code to access the various devices in the system. Let's draw a memory map for our imaginary computer system. Our imaginary system has 16K bytes of RAM, 32K bytes of ROM, and the peripheral devices (which we will lump together in one mass) take up 16K bytes most of which is a dedicated video RAM for a hi-res display. Woot! Hey, it imaginary so I can do whatever I want. By design the sum of all the memory and devices is 64K addresses, exactly the number the CPU in our system can proiduce. So there are just enough addresses, not too many and not too few. It is possible to have more devices than addresses and few devices than addresses available. We will examine those cases later. Since this is a 650X processor it behooves us for reasons too complex to go into here, but which may become evident as the lessons go on, to place the RAM in the lowest addresses followed by the peripherals and then the ROM. If you familiarize yourself with the default memory map of the Commodore 64 (see C64 Programmer's Reference Guide) you will see the same basic pattern to the map. So here is our memory map: Addresses: ---------- 0 to 16383 = 16K of RAM memory 16284 to 32767 = 16K of peripheral devices 32768 to 65535 = 32K of ROM memory If the CPU wants to access RAM it will assert an address in the range from 0 to 16383. If the CPU needs the next program instruction from the ROM it will place the 16-bit value of the PC register onto the address bus which will be in the range from 32768 to 65535. etc. Now, what happens if there aren't enough devices to fill the complete address space? For example, what if the RAM is only 2K bytes, the peripherals take up 1K bytes, and the ROM is 4K bytes? Depending on how the ADL is implemented it will be handled 1 of 2 ways. The ADL might be designed to ignore unneeded addresses, in which case if the CPU puts an unused address on the bus the ADL will not enable any of the devices. The other option for the ADL design is to repeat the same devices over and over again in the memory map until all of the addresses are assigned. This second method means that individual byte locations in a device may have mulitple addresses assigned to them by the ADL and not just one. Accessing any of the addresses assigned to a byte will access the exact same single byte location in a device. Think of it like having your mail forwarded from a previous address to your new permanent address. As you will see later, this second method is what the designers of the VCS choose for their ADL implementation. ******************************************************************************** SIDE-NOTE: (Feel free to skip this paragraph) The first method of ADL design is more costly up front because it takes more circuitry to implement, but it produces a system that is easier to expand in the future to contain more devices. The second method is cheaper up front, but makes later expansion of the system more difficult and more expensive. Atari engineers have stated that they saw the VCS as a very short lived system for which future expansion was highly unlikely. It is no surprise then that they chose the second method for their ADL design. That choice, however, has made expanding the hardware capabilities of the VCS more difficult. Oh well, what is done is done. ******************************************************************************** When a device is repeated in the memory map, we refer to the first appearance of the device as the primary image of the device. Later repetitions of the same device are called mirror images or shadow images. I prefer the name shadow since that name is more mysterious and romantic sounding. Here is an example memory map with shadows. The RAM is 4K, the peripherals are 8K and the ROM is 16K: Addresses: ********** 0 to 4095 = 4K RAM - primary image 4096 to 8191 = 4K RAM - 1st shadow image 8192 to 12287 = 4K RAM - 2nd shadow image 12288 to 16383 = 4K RAM - 3rd shadow image 16384 to 24575 = 8K Peripherals - primary image 24576 to 32767 = 8K Peripherals - 1st shadow image 32768 to 49151 = 16K ROM - primary image 49152 to 65535 = 16K ROM - 1st shadow image. It is worth noting (since this is the case in the VCS) that the shadows don't have to repeat one after another right after the primary image of each device. They might be arranged like this for example: Addresses: ********** 0 to 4095 = 4K RAM - Primary image 4096 to 8191 = 4K RAM - 1st shadow image 8192 to 16383 = 8K Peripherals - Primary image 16284 to 32767 = 16K ROM - Primary image 32768 to 36863 = 4K RAM - 2nd shadow image 36864 to 40959 = 4K RAM - 3rd shadow image 40960 to 49151 = 8K Peripherals - 1st shadow 49152 to 65535 = 16K ROM - 1st shadow. In short, the devices can be arranged and repeated in memory in every way conceivable. The final arrangement chosen is driven by the needs of the CPU and the minimization of cost. The most important thing to remember as a programmer is that from the point of view of your program executing in the CPU, the primary image and all shadow images are equivalent, so your code should use whichever image you want. --------------------------------------------------------------------------------- The Atari VCS System model and Memory Model: ============================================ In this second half of the lesson I will describe the Atari VCS system, and document its memory map. +------+ (Unidirectional 13-bit address bus) +---------+ | |========+================+===========+========+===>| | | 6507 |(R/W) | | | | | Address | | CPU |--------|------------+---|-------+---|----+------->| Decode | | |(RDY) | | | | | | | | Logic | | |<-------|------------|---|-------|---|----|-+ | | (ADL) | | | V V V V V V | V | | | | +-----------+ +-----------+ +-----+ +-----+ +---------+ +->| | | 2/4K ROM | | 128 bytes | | PIA | | TIA | | | | | +------+ | Cartridge | | RAM | | | | | | | | | ^ +-----------+ +-----------+ +-----+ +-----+ (enables) +------+ | ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | |System| | | | | | | | | | | | | | | | | |Clock |---|-------|--+-|----------|--+-|-------+-|-|----+ | +------+ | | +------+ | | | | +---------+-|------|-----------+ | | | +----------|----------------|------|--------------+ | | | | | +=======+===============+================+======+ (Bi-directional 8-bit data bus) 6507 CPU: ********* The microprocessor in the Atari VCS is the 6507 variant in the 650x family of processors. What makes this processor different from the other members of its family? First, internally it has a 16-bit address bus like all 650x processors, but externally it only has 13 address bus lines. That means it has a total address space of 2^13 = 8192 bytes! One eigth the potential address space. Other differences of note is the lack of support for both maskable and non-maskable interrupts of any kind. In retrospect I wish they had attached the driving paddles to the maskable interrupt line with a means to select any of the 4 paddles as the input to the interrupt pin. It could have made reading the paddles a breeze. Oh well, it is how it is. The last major difference is the inclusion of the RDY input line. The RDY signal is one of the most important features of the 6507 processor and it is of critical necessity to the VCS programmer. The RDY line works like this. In your program you write a byte of any value to a memory location named WSYNC (Actual memory address of WSYNC is 2). Immediately after completing this instruction, the 6507 CPU falls asleep. It stays asleep until the TIA video chip sends a signal to the CPU on the RDY line saying it is starting to draw the next line of the TV display. As a VCS programmer you will use the RDY feature to synchronize the CPU's execution of your program code with the TIA rendering of the TV image. Keeping the CPU in sync with the TIA is a critical component of all VCS games. If the program code gets out of sync with the TIA, graphics may become scrambled, and the TV image may roll or jump. System Clock: ************* The primary system clock of the VCS ticks at 3.58 MHz, that's over 3 and a half million ticks per second! The primary clock is named the color clock because every tick of the color clock the TIA draws one pixel on the screen, and must determine what color the pixel should be. The color clock goes only to the TIA video peripheral device. The color clock is divided by 3 to make the system clock used by the CPU and all other devices in the system. So the 6507 CPU is running in the VCS at about 1.193 MHz. The only thing that really matters to you as a VCS programmer is that the color clock is 3x the CPU clock which means the TIA draws 3 pixels on the screen for every tick of the CPU clock, and you will have 76 CPU clock cycles for each television scanline. I stated earlier that 650x assembly language instructions take a minimum of 2 CPU clock ticks and a maximum of 7 ticks. That means the TIA will draw at least 2x3=6 pixels and possibly as many as 7x3=21 pixels in the time it takes for the CPU to finish one instruction of your program. As you can see, time is at a premium in VCS software. Its one of the things that makes it a fun challenge. 2/4K Cartridge ROM: ******************* The VCS was designed to accept cartridge ROMs up to 4K bytes in size. ROMs are usually sold in numbers of bytes that are powers of 2, so 1K, 2K, and 4K byte ROMS are typical sizes for Atari games because that is the size chips you can buy. Obviously, history has shown that VCS games can be made bigger than 4K via bankswitching. Bankswitching carts work by including their own ADL in the cart which takes the enable signal from the ADL of the VCS and the value on the address bus, combined with the bits in an internal register of some sort to access a larger ROM one 4K chunk at a time. That's all I want to say about bank switching because it is an advanced topic not suited for a newbie course. 128 bytes of RAM: ***************** The Atari VCS comes standard with a whopping 128 bytes of RAM. 128 bytes is not much RAM. A VCS program must store all of its variables and the CPU stack (if used) in that 128 bytes of RAM. In the diagram I show the RAM as a separate device from the PIA. In fact, the RAM is a feature of the PIA, but it is functionally separate enough from the rest of the PIA that I drew them apart from each other. PIA - Parallel Interface Adapter: ********************************* The PIA is a standard peripheral chip for the 650x family of processors. Its chip number is 6532. The PIA provides the following services in addition to the 128 bytes of RAM: -> A timer which can be set by your program to keep track of periods of time. The timer counts time in ticks of the system clock. To translate that to real-world time, your program would need to use the timer to count 1,190,000 ticks of the system clock to equal 1 real-world second. In most systems the timer can generate an interrupt when the timer expires, but that is not the case in the VCS due to the limitations of the 6507. -> Two 8-bit parallel input and output ports named PORT A and PORT B. In theory, each pin of each port can be programmed to act as an input or an output. In reality they are used in the VCS almost exclusively as inputs. PORT B is only good for input as it is connected to the switches on the console (difficulty, B/W, select, reset) and there are no lines exposed to the outside world for customized connection. Port A is connected to the joystick ports. The 8 lines of PORT A are divided into 2 groups of 4 lines for each joystick port. The 4 lines on each port are used to read the Up, Down, Left, and Right control signals from the user. In case you are wondering, the Fire button signals and the paddle signals are captured by the TIA. TIA - Television Interface Adapter: *********************************** If you are a VCS fan, then you have probably already heard something about the TIA peripheral chip. The primary job of the TIA is to draw the televsion image. The TIA is not a very powerful video chip by modern standards, but it gets the job done. The TIA is a dumb device when it comes to drawing the television screen. The TIA can draw a single line of pixels for a TV image. It does that over and over again until it is told to shutup. A television doesn't want to see and endless stream of picture lines. It wants a specific number of lines followed by a rest period for the raster beam to return to the top of the screen to start the next frame. The program running on the CPU must keep track of the time that has passed (how many lines have been drawn) and tell the TIA when to stop drawing lines so the TV can return to the top of the screen and begin the next frame. The program must track how much time has passed and tell the TIA to begin drawing lines again for the next frame. If the program does a bad job of counting TV lines drawn or counting the time so that each frame is exactly the same length as all others, then the TV image will roll or jump. Wheee!! Besides drawing lines for a TV image the TIA helps the CPU by providing these additional services: -> Syncing the CPU clock with the TIA color clock via the RDY line signal. -> Six input ports. Four of the six ports are used to measure the position of the tennis paddles. The remaining 2 are used to read the state of the left and right fire buttons. -------------------------------------------------------------------------------- Let's examine the memory map of the VCS. The 6507 has a 13-bit address bus, so the memory map is a mere 8K bytes in size. I have drawn a VCS memory map below. As you can see there are a large number of shadow images for the peripherals and RAM within the 8K address space. In addition, since the 6507 is internally still a 16-bit processor, the entire VCS memory map is repeated 8 times to fill the 64K internal address space of the CPU. Complete VCS Memory map: ************************ Addresses: 0 to 47 = TIA Control Registers Primary Image 48 to 95 = [shadow] TIA Control Registers 96 to 127 = [shadow-partial] TIA Control Registers 128 to 255 = 128 bytes of RAM Primary Image (zero page image allow fast access) 256 to 303 = [shadow] TIA Control Registers 304 to 351 = [shadow] TIA Control Registers 352 to 383 = [shadow-partial] TIA Control Registers 384 to 511 = [shadow] 128 bytes of RAM (The CPU stack uses this image) 512 to 559 = [shadow] TIA Control Registers 560 to 607 = [shadow] TIA Control Registers 608 to 639 = [shadow-partial] TIA Control Registers 640 to 671 = 6532-PIA I/O ports and timer Control Registers Primary image 672 to 703 = [shadow] 6532-PIA Control Registers 704 to 735 = [shadow] 6532-PIA Control Registers 736 to 767 = [shadow] 6532-PIA Control Registers 768 to 815 = [shadow] TIA Control Registers 816 to 863 = [shadow] TIA Control Registers 864 to 895 = [shadow-partial] TIA Control Registers 896 to 927 = [shadow] 6532-PIA Control Registers 928 to 959 = [shadow] 6532-PIA Control Registers 960 to 991 = [shadow] 6532-PIA Control Registers 992 to 1023 = [shadow] 6532-PIA Control Registers 1024 to 2047 = [shadows] Repeat the entire pattern from $0000-03FF 2048 to 3071 = [shadows] Repeat the entire pattern from $0000-03FF 3072 to 4095 = [shadows] Repeat the entire pattern from $0000-03FF 4096 to 6143 = Lower 2K Cartridge ROM (4K carts start here) 6144 to 8191 = Upper 2K Cartridge ROM (2K carts go here) 8192 to 65535 = [shadows] Repeat the entire pattern from $0000-1FFF, seven times. You are probably looking at this map and thinking, "Eeep! What have I gotten myself into?". Don't worry, it looks worse than it is. In fact you can ignore the vast majority of shadows in the memory map of the VCS. Here is an abbreviated VCS memory map that lists only the items from the above map that you will need to do 99.9999% of all VCS programs. Simplified VCS Memory Map: (What a programmer needs) ************************** Addresses: 0 to 47 = TIA Control Registers - Primary Image ... 128 to 255 = 128 bytes of RAM - Primary Image (zero page image for fast access) ... 384 to 511 = [shadow] 128 bytes of RAM (The CPU stack uses this shadow image) ... 640 to 671 = 6532-PIA I/O ports and timer Control Registers Primary image ... 4096 to 6143 = Lower 2K Cartridge ROM (4K carts start here) 6144 to 8191 = Upper 2K Cartridge ROM (2K carts go here) ... There, that is not nearly as bad. I think we have covered plenty enough for this lesson to get you a basic understanding of computer system architecture and particularly the architecture of the Atari VCS. We could expand the VCS memory map to show the addresses of the individual control registers inside the PIA and TIA peripherals, but you don't need that level of detail yet as a programmer. --------------------------------------------------------------------------------- Exercises: - Ask at least one question about this lesson, in the lesson thread. If you made it through the whole thing you must have at least 1 question. Don't worry if it all seems too complex at this point. Reading this lesson will put the knowledge into your mind and in later lessons these (probably) boring earlier lessons will start to click together into something coherent and more meaningful and your journey to become a VCS programmer will be much easier and enjoyable, IMO. --------------------------------------------------------------------------------- A look ahead to future lessons I have planned: Lesson 10 - Memory pages and Hexadecimal Notation for numbers. Lesson 11 - Setting up DASM and compiling a sample program. Lesson 12 - How to format an assembly language program properly. Lesson 13 - The LDA, LDX, LDY, STA, STX, STY, and RTS instructions Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.