Jump to content

kenjennings' Blog - Part 9 of 11 -- Simple Assembly for Atari BASIC


Recommended Posts

Memory Copy

A number of features on the Atari benefit from fast memory copies. High speed data copying provides convenience to the user by reducing wait times for actions that would use slow, time-consuming BASIC loops. Copying a character set (1024 bytes) is a good example. High speed memory moves also enable use of Atari features that are not otherwise possible due to BASIC's speed. For example, vertical movement and animation of Player/Missile graphics in BASIC is not realistic, but a memory move function makes P/M graphics animation practical in an Atari BASIC program.

Though Atari BASIC lacks a general purpose memory move there is a hack around this problem that exploits the way Atari BASIC manages strings. As simply as possible: A BASIC program may dig through the variable control structures and change the starting address of a string variable, effectively overlaying the string on any memory possible: screen memory, player/missile memory, color registers, character sets, hardware addresses, etc. String assignments then cause memory to be assigned or copied at near machine language speeds. The copy is done in ascending order and there is no consideration for source and destination memory (strings) that overlap. However, this string abuse method is a subject for an entirely different discussion and is not what will be done here.

The subject of memory moves on the 6502 includes many considerations and is the inspiration for numerous discussions, arguments, and holy wars between programmers around the subject of speed, efficiency, optimization, and the odors emanating from the grandmothers of those who disagree with one's obvious and indisputable common sense. In short, although copying from A to B seems like a simple subject, there are a many choices and different methods to arrange a 6502 algorithm to copy memory. Topics that make a difference in code:

  • How many bytes are being copied? More or less than 128 bytes? More or less than 256 bytes? More or less than 32K bytes?
  • Does the code need to consider whether or not the source and destination memory overlap?
  • Should the memory be copied in ascending or descending order?
  • Is speed important? Should loops be unrolled into explicit code?
  • Is the size of the code more important than speed?

All of the above can start a group of programmers on a never-ending flame war. For the sake of defusing argument I will admit that the result created here will not be everyone's idea of best, fastest and most efficient. We will follow the rule, “moderation in all things.” Well, more of a guideline than a rule.
Let's say that we are copying four bytes from a source to a destination. The fastest way to copy is to do it directly and explicitly:

The next level of optimization is a loop to move a range of bytes:
LDX initialization), roughly half the size of the explicit method.  The size efficiency becomes greater the more bytes are copied.  This code can copy 4 bytes, 10 bytes, or 100 bytes and remains the same size, while the earlier explicit method expands with every byte.  However, this code uses 15 to 17 CPU cycles (not including the LDX initialization which occurs only once) for every byte it copies which is more than twice as long as the explicit copy.  Additionally, this still uses absolute addresses and is not a general purpose, reusable routine. 

Here is a little optimization on this method:
LDX initialization.  Execution time (without the LDX) is reduced to 13 to 15 cycles per byte copied – just about twice as long at the explicit method.  Again, like all other examples this uses absolute addresses and so is not a general purpose, reusable routine. 

A general purpose routine must use indirection instructions allowing the source and destination addresses to be different each time the program uses the routine. This means Page Zero locations must be initialized for source and destination:.
BPL to detect the index wrapping around from $00, a positive value, to $FF, a negative value.  This means the branching evaluation cannot encounter any other negative value which limits this code 129 bytes.  129?  not 128?  Why? 
             ; $00-$04=$FC    LDX #$FC     LOOP    LDA SOURCE-$FC,X    STA DEST-$FC,X      INX      ; Continues for $FD, $FE, $FF                 BNE LOOP ; and quits at index $00

So, then what do do when more there are more than 256 bytes to copy? In the simple loop examples just add another loop that copies more bytes with different source and destination addresses. But, when there is a lot of memory to copy this quickly becomes redundant.

The advantage of using the indirection method is that the source and destination addresses in Page Zero can be easily modified and used to repeat the loop again. The example below copies 1024 bytes:
The looping part is 14 bytes long. It can copy any number of 256 byte pages just by changing the X register. This is a practical routine for the Atari which has character sets 512 or 1024 bytes long. Player/missile graphics may use memory maps of 256 bytes. More generally, aligning any data to fit within the range of an index register allows reasonably compact code like this.

The next problem is how to copy any number of bytes. Simply change perspective and think of the problem in two parts. First, if the number of bytes is 256 or greater then the previous page copy shown above can take care of all the whole, 256-byte pages. The high byte of the size provides the number of whole 256 byte pages. That leaves the size's low byte which specifies the remaining zero to 255 bytes. This second part just needs a slightly different byte copying loop that will stop early at a specific count. Assuming the source, destination, and size are all set into designated page 0 locations, then the relocatable, re-usable code could look something like this:

0100 ; MEMMOVE.M650105 ;0110 ; GENERAL MEMORY MOVE0115 ;0120 ; GENERIC MEMORY MOVE FROM0125 ; SOURCE TO DESTINATION0130 ; ASCENDING.0135 ;0140 ; USR 3 ARGUMENTS:0145 ; SOURCE == FROM ADDRESS0150 ; DEST == TO ADDRESS0155 ; SIZE == NUMBER OF BYTES0160 ;0165 ; RETURN VALUE IS BYTES COPIED.0170 ;0175 ZRET = $D4 ; FR0 $D4/$D5 Return Value0180 ZARGS = $D5 ; $D6-1 for arg Pulldown loop0185 ZSIZE = $D6 ; FR0 $D6/$D7 Size0190 ZDEST = $D8 ; FR1 $D8/$D9 Destination0195 ZSOURCE = $DA ; FR1 $DA/$DB Source0200 ;0205 .OPT OBJ0210 ;0215 *= $9200 ; Arbitrary. this is relocatable0220 ;0225 INIT0230 LDY #$00 ; Make the return0235 STY ZRET ; value clear to 00240 STY ZRET+1 ; by default.0245 PLA ; Get argument count0250 BEQ BYE ; Shortcut for no args0255 ASL A ; Now number of bytes0260 TAY0265 CMP #$06 ; Source, Dest, Size0270 BEQ PULLDOWN0275 ;0280 ; Bad arg count. Clean up for exit.0285 ;0290 DISPOSE ; Any number of arguments0295 PLA0300 DEY0305 BNE DISPOSE0310 RTS ; Abandon ship.0315 ;0320 ; Pull args into Page Zero.0325 ; This code works the same0330 ; for 1, 4, 8... arguments.0335 ;0340 PULLDOWN ; arguments in Y0345 PLA0350 STA ZARGS,Y0355 DEY0360 BNE PULLDOWN0365 ;0370 ; Since we're good to start the0375 ; copy, then set the return0380 ; value to the size.0385 ;0390 LDA ZSIZE+10395 STA ZRET+10400 LDA ZSIZE0405 STA ZRET0410 ;0415 ; Moving full 256 byte pages or not?0420 ;0425 LDY #00430 LDX ZSIZE+1 ; Number of pages is0435 BEQ MOVEPARTIAL ; Zero so, try partial.0440 ;0445 ; Copy full index range of 256 bytes.0450 ;0455 MOVEPAGE0460 LDA (ZSOURCE),Y0465 STA (ZDEST),Y0470 INY0475 BNE MOVEPAGE ; Y rolled $FF to $000480 INC ZSOURCE+1 ; Next page src0485 INC ZDEST+1 ; next page dst0490 DEX ; this page done0495 BNE MOVEPAGE ; Non-zero means more0500 ;0505 ; A partial page remains?0510 ;0515 MOVEPARTIAL0520 LDX ZSIZE ; Low byte remainder.0525 BEQ BYE ; Zero, exit0530 ;0535 MOVEMEM0540 LDA (ZSOURCE),Y0545 STA (ZDEST),Y0550 INY ; Copy ascending.0555 DEX ; and subtract count.0560 BNE MOVEMEM0565 BYE0570 RTS0575 ;0580 .END

This copies any number of bytes in ascending order from the source to destination. Earlier the article mentioned an Atari BASIC string hack that assigns strings to specific memory addresses and uses the string assignment action to copy from a source address to a target address. String assignments work in ascending order. Therefore, this routine achieves the same result as the string method (ascending copy) with a bit less hacking of the BASIC variable table. Copying in reverse order or automatically detecting source and target overlap would add more code for options that are not as frequently needed.

This is certainly not the highest performing option possible. But, it is fairly compact and uses Page zero allowing the routine to be general-purpose, and reusable, and that is a reasonable goal for a routine to support Atari BASIC. If this discussion concerned writing an entire video game in assembly then the code would focus on execution time and use unrolled loops or other bells and whistles for copying faster at the expense of code size.

100 REM TEST MEMORY MOVE OPERATIONS105 REM110 GRAPHICS 0:POKE 710,0:POKE 82,0115 DIM S$(258),D$(259)120 GOSUB 503:REM RESET SRC AND DST125 GOSUB 10000:REM LOAD MEMMOVE130 REM RUN TESTS FOR 8,255,256,257135 RESTORE 230140 READ MSIZE145 IF MSIZE<1 THEN ? "Done":END150 ? "Testing move size ";MSIZE;" . . . ";155 X=USR(MMOV,ADR(S$),ADR(D$)+1,MSIZE)160 DCOUNT=0165 FOR I=1 TO 259170 IF D$(I,I)="*" THEN DCOUNT=DCOUNT+1175 NEXT I180 IF DCOUNT=MSIZE THEN ? "OK":GOTO 190185 ? "FAILED! ";DCOUNT;" <> ";MSIZE190 GOSUB 504:REM RESET DST195 GOTO 140200 REM205 REM NUMBER OF BYTES TO MOVE FOR210 REM EACH TEST. TESTS LESS THAN215 REM ONE PAGE AND THE BORDER220 REM CONDITIONS AROUND ONE PAGE.225 REM230 DATA 8,255,256,257,-1235 END500 REM501 REM RESET MEMORY502 REM503 S$="*":S$(258)="*":S$(2)=S$504 D$=".":D$(259)=".":D$(2)=D$505 RETURN9997 REM9998 REM SETUP ML MEMMOVE UTILITY9999 REM10000 DIM MM$(68)10001 MMOV=ADR(MM$)10002 RESTORE 24000:? "Loading MMOV..."10003 FOR I=0 TO 6710004 READ D:POKE MMOV+I,D10005 NEXT I:?10006 RETURN23996 REM H1:MEMMOVE.OBJ23997 REM SIZE = 6823998 REM START = 3737623999 REM END = 3744324000 DATA 160,0,132,212,132,213,104,24024001 DATA 58,10,168,201,6,240,5,10424002 DATA 136,208,252,96,104,153,213,024003 DATA 136,208,249,165,215,133,213,16524004 DATA 214,133,212,160,0,166,215,24024005 DATA 14,177,218,145,216,200,208,24924006 DATA 230,219,230,217,202,208,242,16624007 DATA 214,240,8,177,218,145,216,20024008 DATA 202,208,248,96The program tests different sizes to verify the partial and full page copy loops, and also tests the border conditions around a full page copy. After each each copy it tests the contents of the destination memory counting all the locations that were changed by the memory move. Successful output will look like this:

Below are the source files and examples of how to load the machine language routine into BASIC included in the disk image and archive:

MEMMOVE File List:

MEMMOVE.L65 Mac/65 source listing
MEMMOVE.T65 Mac/65 source listed to H6: (linux)
MEMMOVE.ASM Mac/65 assembly listing
MEMMOVE.TSM Mac/65 assembly listing to H6: (linux)
MEMMOVE.OBJ Mac/65 assembled machine language program (with load segments)
MEMMOVE.BIN Assembled machine language program without load segments
MEMMOVE.LIS LISTed DATA statements for MEMMOVE.BIN routine.
MEMMOVE.TLS LISTed DATA statements for MEMMOVE.BIN routine to H6: (linux)

MAKEMMOV.BAS BASIC program to create the MEMMOVE.BIN file. This also contains the MEMMOVE routine in DATA statements.
MAKEMMOV.TLS LISTed version of MAKEMMOV.BAS to H6: (linux)

TESTMOVE.BAS BASIC program that tests the MEMMOVE USR() routine.
TESTMOVE.TLS LISTed version of TESTMOVE.BAS to H6: (linux)

ZIP archive of files:

Tar archive of files (remove the .zip after download)

Great peace have those who love your law; nothing can make them stumble.
Psalm 119:165

Attached thumbnail(s)
  • blogentry-32235-0-73038200-1469592514.pn
Attached File(s)

Link to comment
Share on other sites

This topic is now closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...