+Vorticon Posted March 25 Share Posted March 25 Writing to the SAMS seems slower than reading from it. Is that a thing or something on my end? 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted March 25 Share Posted March 25 (edited) I see. Yes, my own memory expansion, which can show up anywhere in the address space, is indeed installed by modifications inside the console. The p-system uses the DSR space itself almost always. My feeling is that the p-system is easiest to expand with a RAMdisk. Edited March 25 by apersson850 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 25 Share Posted March 25 2 hours ago, Vorticon said: Writing to the SAMS seems slower than reading from it. Is that a thing or something on my end? Once a page is mapped in to a RAM window it is the same as the RAM that it replaced. Literally on the same card. I have never noticed a difference. ( I will double check now) Something you might try is keep a variable of the SAMS page that is mapped in and only call the mapping function if it needs to change. Sometimes that speeds things up but sometimes the logic takes longer than just doing the mapping. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 25 Share Posted March 25 So my quick and dirty result is the opposite. (of course they did) I think it's because when I read SAMS I have to throw away the result with DROP so that's extra time spent in the loop. The test code reads a 64K chunk of SAMS memory as 32K memory words. The SAMS code is assembler and the tests are Forth. There is computation to convert a virtual address (0..>FFFF) to the real address in RAM so that takes a bit of time. The screen shot shows the results. 2 Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted March 25 Share Posted March 25 I'll have to come up with a timing test to accurate characterize what is going on here. Maybe it's my imagination 😁 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 26 Share Posted March 26 (edited) In theory writes to any RAM is slower than reading from it, because most operations do a read-before-write... meaning that every write is actually two operations. It's not possible to slow down a memory access without wait state hardware on the device, which I'm pretty sure the AMS doesn't have. If the memory isn't ready, the CPU will just take whatever's on the bus. (At the lowest level, that is) Edited March 26 by Tursi 3 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted March 26 Share Posted March 26 16 hours ago, hhos said: Two of these should give some idea of what is needed to add the function you want. I'm not trained to read schematics, so perhaps you could explain in simple terms what changes it would take to an existing card to support paging in the >6000 - >7fff region? I'm just curious why they chose not to do this in the first instance, since it would be a very useful addition. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 26 Share Posted March 26 9 hours ago, Tursi said: In theory writes to any RAM is slower than reading from it, because most operations do a read-before-write... meaning that every write is actually two operations. It's not possible to slow down a memory access without wait state hardware on the device, which I'm pretty sure the AMS doesn't have. If the memory isn't ready, the CPU will just take whatever's on the bus. (At the lowest level, that is) That Tursi is not just a pretty face. I wrote up a better test (I think) that uses MOVE which is an Assembly language word that moves bytes. On real hardware, it takes 4K of data from ROM at >0000 and writes to 64K of SAMS in 4K chunks (16 pages) I then did the reverse and wrote the SAMS data back to ROM. Reading is much faster. Edit: Changed WRITEPAGES to do exactly the same thing as READPAGES Spoiler NEEDS DUMP FROM DSK1.TOOLS NEEDS FORGET FROM DSK1.FORGET NEEDS ELAPSE FROM DSK1.ELAPSE NEEDS VIRT>REAL FROM DSK1.SAMS HEX 1000 CONSTANT 4K 2000 CONSTANT $2000 1 SEGMENT \ uses SAMS pages 16..32 (64K) DECIMAL \ copy 4K in ROM from >0000 to SAMS : WRITEPAGES 65535 0 DO 0000 I VIRT>REAL 4K MOVE 4K +LOOP ; \ copies back to ROM address : READPAGES 65535 0 DO I VIRT>REAL 0000 4K MOVE 4K +LOOP ; 1 Quote Link to comment Share on other sites More sharing options...
matthew180 Posted March 26 Author Share Posted March 26 6 hours ago, Asmusr said: I'm not trained to read schematics, so perhaps you could explain in simple terms what changes it would take to an existing card to support paging in the >6000 - >7fff region? I'm just curious why they chose not to do this in the first instance, since it would be a very useful addition. I have not looked at the SAMS schematics, but I suspect a reason for not paging the >6000 area is to have a stable area for code that does not page (controls the paging), and an area that does get paged. But this is really speculation unless someone was there and knows why the design decisions were chosen. For the technical part, probably more address decoding and new registers to control the new page area. Again, speculation since I have not looked at the SAMS implementation, but all basic external memory pagers work the same. 3 Quote Link to comment Share on other sites More sharing options...
matthew180 Posted March 26 Author Share Posted March 26 (edited) 12 hours ago, TheBF said: I wrote up a better test (I think) that uses MOVE which is an Assembly language word that moves bytes. Your test looks like Forth, not assembly; there is no "MOVE" instruction in assembly. `MOVB` is the assembly language instruction to move bytes, `MOV` will move a word (16-bit). `MOV` does not incur the read-before-write penalty, so that would be a better test to compare read vs write speed. Edit: heh, I forgot, the 9900 still does the read-before-write even for 16-bit operations. Edited March 27 by matthew180 technical correction 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 26 Share Posted March 26 11 minutes ago, matthew180 said: Your test looks like Forth, not assembly; there is no "MOVE" instruction in assembly. `MOVB` is the assembly language instruction to move bytes, `MOV` will move a word (16-bit). `MOV` does not incur the read-before-write penalty, so that would be a better test to compare read vs write speed. Yes it's Forth but MOVE is written in Forth Assembler. Forth is just the console to call the program and put it in a loop. CODE MOVE ( src dst n -- ) \ forward character move *SP+ R0 MOV, \ pop DEST into R0 *SP+ R1 MOV, \ pop source into R1 TOS TOS MOV, NE IF, \ if n=0 we are done \ need some copies R0 R2 MOV, \ dup dest R0 R3 MOV, \ dup dest TOS R3 ADD, \ R3=dest+n \ test window: src dst dst+n WITHIN R0 R3 SUB, R1 R2 SUB, R3 R2 CMP, HI IF, \ do cmove> ............ TOS W MOV, \ dup n W DEC, \ compute n-1 W R1 ADD, \ point to end of source W R0 ADD, \ point to end of destination BEGIN, *R1 *R0 MOVB, R1 DEC, \ dec source R0 DEC, \ dec dest TOS DEC, \ dec the counter in TOS (R4) EQ UNTIL, ELSE, \ do cmove ............. BEGIN, *R1+ *R0+ MOVB, \ byte move, with auto increment by 1. TOS DEC, \ we can test it before the loop starts EQ UNTIL, ENDIF, ENDIF, TOS POP, NEXT, ENDCODE Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 26 Share Posted March 26 16 minutes ago, matthew180 said: Your test looks like Forth, not assembly; there is no "MOVE" instruction in assembly. `MOVB` is the assembly language instruction to move bytes, `MOV` will move a word (16-bit). `MOV` does not incur the read-before-write penalty, so that would be a better test to compare read vs write speed. I take your point that MOV would be a better test than MOVB as I am using in my program. I can check that too. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 26 Share Posted March 26 For full disclosure VIRT>REAL "program" is written in Forth Assembler also. (It uses machine code in the system to avoid loading the assembler.) HEX CODE VIRT>REAL ( virtaddr -- real_address ) \ Lee Stewart's code to replace >1000 UM/MOD. 2.3% faster C020 , SEG , \ SEG @@ R0 MOV, \ segment# to R0 0A40 , \ R0 4 SLA, \ page# segment starts C144 , \ R4 R5 MOV, \ address to R5 0245 , 0FFF , \ R5 0FFF ANDI, \ page offset 09C4 , \ R4 0C SRL, \ page of current segment A100 , \ R0 R4 ADD, \ bank# 8804 , BANK# , \ R4 BANK# @@ CMP, \ switch page ? 1309 , \ NE IF, C804 , BANK# , \ R4 BANK# @@ MOV, \ YES, update BANK# 06C4 , \ R4 SWPB, 020C , 1E00 , \ R12 1E00 LI, \ select SAMS 1D00 , \ 0 SBO, \ card on C804 , SREG , \ R4 SREG @@ MOV, \ map the page 1E00 , \ 0 SBZ, \ card off \ ENDIF, 0204 , PMEM , \ R4 PMEM LI, \ page_mem->tos A105 , \ R5 R4 ADD, \ add computed offset to page NEXT, \ 46 bytes ENDCODE 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 26 Share Posted March 26 40 minutes ago, matthew180 said: Your test looks like Forth, not assembly; there is no "MOVE" instruction in assembly. `MOVB` is the assembly language instruction to move bytes, `MOV` will move a word (16-bit). `MOV` does not incur the read-before-write penalty, so that would be a better test to compare read vs write speed. Wow! Your are so right @matthew180. I had no idea. I replaced the MOVE program with MOVEW which uses MOV in a loop to move memory words. There is almost no difference in speed now. For reference here is MOVEW in Forth Assembler. Spoiler \ MOVEW replaces MOVE16. Jul 2022 Brian Fox CODE MOVEW ( src dst n -- ) \ n= no. of bytes to move *SP+ R0 MOV, *SP+ R1 MOV, BEGIN, *R1+ *R+ MOV, TOS DECT, NC UNTIL, TOS POP, NEXT, ENDCODE 5 Quote Link to comment Share on other sites More sharing options...
hhos Posted March 26 Share Posted March 26 (edited) On 3/26/2024 at 2:30 AM, Asmusr said: I'm not trained to read schematics, so perhaps you could explain in simple terms what changes it would take to an existing card to support paging in the >6000 - >7fff region? I'm just curious why they chose not to do this in the first instance, since it would be a very useful addition. First, I'll assume they would refer to the Super AMS design team.🙂 They were probably just wanting to use the RAM spaces that had been already allocated, and for good reason I might add. Whenever you depart from the default memory map you risk having two memories dueling for control of the same data bus. This is always a bad idea, obviously. I would bet on all the 0s winning in any data bit conflicts myself.😀 In the case of the cartridge port memory that's probably not a big deal. Your software can easily test for the presence of a cartridge before turning on the switch that makes the RAM mapped to 6000-7FFF available. Someone plugging in a cartridge while the RAM is available will just cause a reset of the system, so very little harm done? This is not so with the 0000, 4000, and 8000 banks. The 0000-1FFF space has the system ROM there until it is disabled(requires console modification). Once the system ROM is locked out the majority of conflicts would probably be related to the GPL interpreter being unavailable? It's something to plan for if you're going to do it. I believe there is someone who has his console modified so he can have 64K of RAM mapped in all inside the console. Apersson850 is the one that comes to mind, but I'm not absolutely certain on that. Anyway, there are far better sources than myself on what the loss of the system ROM would mean. The 4000-5FFF space is a bit more complicated. If you have this activated and then find you need to access the floppy drive or the RS232C port then you are back to the dueling data scenario. The bit that activates the 4000-5FFF region on my design is bit5 with the CRU base address at >5FC0. To shut it down: LIMI 12,>5FC0 the SAMS doesn't have a plan for this AFAIK LIMI 12,>1E00 the SAMS doesn't have a plan for this AFAIK LI 12,>1E00 SBZ 5 Then you can do your call to the DSR of your choice. In short, to answer your question, it's a pain to do use these areas for extra RAM, but it might be worth it. They decided it wasn't worth the trouble. You decide if it's worth it for you.😊 HH Edited April 6 by hhos Wrong address for CRU. And then the wrong instruction. 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted March 26 Share Posted March 26 6 hours ago, matthew180 said: `MOV` does not incur the read-before-write penalty, so that would be a better test to compare read vs write speed. Yes, it does. The instructions that have byte variants all read before write, just because that was easiest when the logic in the CPU was designed. But a Load Immediate doesn't read the register first, before it stores a value there, as there is no Load Immediate Byte version. If you check the number of memory accesses by MOV and MOVB you'll see that they are the same, four. Read instruction, read destination, read source, write destination. 5 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted March 26 Share Posted March 26 12 minutes ago, hhos said: I believe there is someone who has his console modified so he can have 64K of RAM mapped in all inside the console. Apersson850 is the one that comes to mind, but I'm not absolutely certain on that. Anyway, there are far better sources than myself on what the loss of the system ROM would mean. The 4000-5FFF space is a bit more complicated. If you have this activated and then find you need to access the floppy drive or the RS232C port then you are back to the dueling data scenario. Yes, I have such a design in my console. The 64 K RAM can be mapped over all 8 K segments in the console, one at a time. If you map out the console ROM you have to map it in before returning from your assembly program, or you die... But you can copy all of the ROM to RAM and then for example modify the interrupt vectors if you want to do something with the TMS 9901, for example. If works fine to use the RAM over the console ROM for some buffer storage. Enable, store, disable or enable, read, disable. The DSR space complicate things further if you, like @Vorticon want's to use SAMS together with Pascal, as the p-code card occupies this space too. 2 Quote Link to comment Share on other sites More sharing options...
matthew180 Posted March 27 Author Share Posted March 27 5 hours ago, apersson850 said: Yes, it does. The instructions that have byte variants all read before write, just because that was easiest when the logic in the CPU was designed. Sure enough, I forgot about that. The read-before-write info is not in the datasheet though (that I could find), and very easy to forget about. I only ever saw it mentioned as a note in the System Design book TI published. 11 hours ago, TheBF said: Wow! Your are so right @matthew180. I had no idea. So, I was mistaken. There must be something else going on with your tests. 2 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted March 27 Share Posted March 27 7 hours ago, matthew180 said: The read-before-write info is not in the datasheet though (that I could find), and very easy to forget about. I only ever saw it mentioned as a note in the System Design book TI published. Oddly enough, it's explained in detail in the TI-99/4A console and peripheral expansion system technical data, 1049717-1. Here's a copy of section B.1.2.1 READ before WRITE considerations There are different READ and WRITE addresses for most of the memory-mapped devices (MMDs). This is because the TMS 9900 does a READ operation at the destination address prior to writing to it. Many of the MMDs have internal address registers that autoincrement after either a READ or a WRITE operation. This autoincrement characteristic of the MMD may not produce the desired results if it is not taken into consideration when designs or modifications are made. The READ before WRITE exists because the TMS 9900 is a word-oriented machine from the memory access standpoint. The several byte-oriented instructions are carried out by the machine in a word execution format, and the other byte in the word must not be altered. The machine itself must save the unaltered byte, concatenate the new byte to it, and return the word to memory. The internal logic of the TMS 9900 is designed this way because it was to the designers distinct advantage to do this same READ before WRITE on both byte and word moves. Another thing I noticed in that document may shed some light on the discussion we've had about the usefulness of LIMI 2 when LIMI 1 does the same job. This is about how to design a card to be used in the expansion box. There are two levels on which to interrupt, bu the TI-99/4A supports only one (INTA*). THIS IS THE ONE YOU MUST USE. Interrupt level status bits are defined by the Personal Computer PCC at Texas Instruments, and for the moment are not sensed by the TI-99/4A. If they were to be sensed, the TI-99/4A would cause a line to go low (SENILA*), which tells the PCB logic to gate its status bit to the system data bus. I don't know, but there could be a reason for LIMI 2 being used buried here, since they obviously at some time had two interrupt levels in consideration. Reading this it sounds very much as somebody planned more computers than just the TI-99/4A to be able to use the expansion box. 6 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 27 Share Posted March 27 10 hours ago, matthew180 said: Sure enough, I forgot about that. The read-before-write info is not in the datasheet though (that I could find), and very easy to forget about. I only ever saw it mentioned as a note in the System Design book TI published. So, I was mistaken. There must be something else going on with your tests. Indeed. My MOVE program tests for overlapping memory and chooses one of two different loops. I might have an problem with that overlap detector. 1 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted March 27 Share Posted March 27 14 hours ago, apersson850 said: The internal logic of the TMS 9900 is designed this way because it was to the designers distinct advantage to do this same READ before WRITE on both byte and word moves. "to the designers distinct advantage" made sense to me--it was a trade-off. But what did it really mean? According to a Texas Instruments patent disclosure, the 9900 instruction set is decoded by the number of leading zeroes. This number selects the "entry point" to the CPU's control ROM (microcode). For each "entry point", instructions share a common operand decoding. None or 1 is Format I 2 zeroes is Format III,IV,XOP 3 zeroes is Format II etc The "BYTE" bit isn't up front in the instruction. So whether it's 0 or 1, it can't affect the entry point. MOV has the same number of zeroes as A (add) and ADD definitely needs the destination to be read. As a result, all Format I instructions fetch the destination operand, even MOV which doesn't need it. 9995 This CPU doesn't need to fetch or store a whole word if the instruction is a BYTE type. Special case: does MOVB avoid read-before-write. (I think it does?) 99000 The 99000 cpus eliminated read-before-write as a special case for MOV. Register to register MOV takes just 3 cycles, where MOVB still takes 4. (MOV: WR read, AUMS, WR write.) TL;DR Much more than just read-before-write. We're going to decode the 9900 instructions at the bit level. You'll see the leading zeros concept throughout. Format I instructions, A or MOV for example, begin with one or no zero. Format I have general destination and source operands. Two leading zeroes means Format III: one destination register, one general source operand. Format I and III examples MOV 1100 Td dddd Ts ssss MPY (format III) 0011 10 dddd Ts ssss where Ts,Td are: 00 register, RX 01 register indirect , *Rx 10 symbolic @Y(R0) or indexed @Y(Rx) 11 register indirect auto increment, *Rx+ dddd and ssss are register numbers. Some longer tables sorted by leftmost bits, counting down. Format I. One or no leading zero. Two general operands. Opcode is 3 bit ALU operation code, one bit "BYTE" flag. Then 6 bits DST, 6 bits SRC. (Total 16 bits) Opcodes: 1111 SOCB 1110 SOC (set ones) 1101 MOVB 1100 MOV (move) 1011 AB (add byte) 1010 A (add) 1001 CB 1000 C (compare) 0111 SB 0110 S (subtract) 0101 SZCB 0100 SZC (set zeroes) Formats III, IV, XOP have 2 leading zeroes. 0011 xx MPY, DIV, LDCR, STCR 0010 xx XOP, XOR, CZC, COC These are 8 similar opcodes made using 6 bits. 2 zeroes, a 1, then 3 more bits. (2^3 = 8.) Leaving 10 bits for operands. Opcodes have started to nibble at operand fields, eating Td first. Decoding: 001x xxdd ddTs ssss The dddd is a 4 bit number, interpreted as register, shift count, or XOP number. Finally 6 bits for one general source operand. (Note source operand always occupies rightmost bits!) 2 + 1 + 3 + 4 + 6 bits = 16) Format II has 3 leading zeroes. 16 opcodes identified by first 8 bits. Rest is an 8 bit offset: 0001 xxxx Jumps and CRU single bit 0001 1111 TB 0001 1110 SBO 0001 1101 SBZ 0001 1100 JOP 0001 1011 JH ... 0001 0000 JMP (Notice that from xxxx, the CRU bit instructions are easy to separate. Jumps that test status bits have some 1s. Plain JMP has zeroes. For jumps, you could easily write out sum-of-products equations of the xxxx bits and first 5 register bits. Leading zeroes concept continues thru the end of instruction set. 4 zeroes is Format V: Shifts : SRA ... 5 zeroes is Format VI: CLR, BL, ... 6 zeroes is Format VII: LWPI,RTWP and VIII LI, CI,... 9995 and others have: 7 zeroes is MPYS,DIVS 8 zeroes is LWP, LST Observation: the E/A Roman numeral is approximately the bit position of the first 1. Swap Bus The Byte variant of Format I causes the "Swap Bus" to meddle with the byte order of operand values. On their way between memory, ALU, then back to memory, they either sail through or get swapped as needed. It's like Scylla and Charybdis. Swap Bus The "Swap bus" is just more silicon control lines to multiplex the bytes into the left byte! The "Swap Bus" is 16 2-input multiplexers between the ALU and memory data register. (Abbreviated MD or MDR.) Byte Swap isn't an extra clock cycle--it's just gates switched on or off as the value moves across the swap bus. On the other hand, microcode for SWPB needs to tell it to swap on the front half, not the back half cycle. (Otherwise SWPB is like MOV!) Memory -> MDR > Swap -> ALU B -> (no Swap) -> MDR -> Memory. TL;DR again. Even more concerning microcode: As said above, the number of leading zeroes selects the "entry point" to the CPU's control ROM (microcode) for an instruction group having the same operand decoding steps. Again, the "BYTE" bit isn't up front in the instruction, so it can't affect the entry point. Control ROM is an array of silicon lines which pass across all other functional blocks of the CPU. A group of lines is like a punched card that turns on or off all the functional units of the CPU. Some lines are for the first 1/4 or 1/2 of the cycle, some for later. (One clock cycle is really 4 internal clocks, and 4 steps is enough to: 1. Compute address by adding two things like WP + Rx 2. Set the memory bus address 3. Strobe Read or Write signal 4. Capture read word internally Format I has activated one chunk of microcode, which does the work for two general operands. Microcode for One-operand format , like Format III, would decode only the source field general operand. Observe why the general dest field comes before the source field. Instructions with more bits in the opcode will eat up Td, then dddd. But general source can stay in a common field. Decode general source could be a "subroutine" saving a lot of silicon space! (There might be other tricks.) The location of source bits remains the same down to the instructions that leave just 4 bits for an operand! Of course in single general (CLR) or immediate register (AI) formats, the "Source" is also a destination. LI seems to be a special case! ALU Microcode ultimately sets up the ALU with input operands A and B, and an operation code. SOC 111 B = B OR A MOV 110 B = B A 101 B = B + A C 100 B = -B + A (set status) S 011 B = -B + A (huh?) SZC 010 B = B AND NOT A NEG ???? B = -B INV ???? B = NOT B ABS ???? this one has microcode SETO ???? B = FFFF SLA ???? While(SC--), multiplex B bits from left or right neighbor bit etc A full set of 9-bit operation codes is known for the 990/12 and the ALU chip 74S181. See Bipolar Microcomputer Components Databook. A set of 16 operation codes suffices for the 74LS181 ALU, a really common chip. See any TTL data book. Historical Note: extra adders In drafts of the schematic for the 99/5, a memory mapper used a set of 181s to add logical address and a base address. 171s used just for the add operation! A 16-bit adder allowed any 32-byte boundary. (seen in first draft of 99/5 from Ron Wilcox to Don Bynum.) In the 990 memory mapper, a 16-bit base address was also shifted left 5 bits then added to the logical address. This selected any 32-byte boundary for 2 megabytes max. ( The 990 supported 3 "segments" where one of 3 base address registers was added to the logjcal addresses. So a program segment could be mapped from any RAM address to any logical start address. You might have two shared code segments and a data segment.) The 32-byte unit in a proposed home computer would match the 990 minicomputer exactly. Except in the home computer, the 64K would be divided into 32 little 2K windows. instead of beginning at 2K page boundaries. Four extra 181s would be pricey for a home computer. The 181s were eliminated after a first draft for a 9&/5. To be absorbed into the 99/8 mapper chip. 6 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted March 27 Share Posted March 27 1 hour ago, FarmerPotato said: "to the designers distinct advantage" made sense to me--it was a trade-off, but what did it really mean? Most probably that they could reuse quite a lot of the handling of the instructions if they behaved in the same way, byte or word. As you also are into in your post, just more detailed. 1 Quote Link to comment Share on other sites More sharing options...
+dhe Posted April 7 Share Posted April 7 Finally had some time to go back and play with a small bit of assembly. I've been proof reading an english translation of the german assembly books kursi. One of the comments he made in an example was that, we needed to get the MSB in to the LSB position for the next instruction. He mentioned instead of using SWPB, we would use SRL, while that does move the MSB>LSB, you loose the value of the LSB (unless it's already zero! 😃 ). Using Steve's Cheat Sheet, I was able to quickly grab the instructions FMT, using xdt99's listing, I quickly got the number of clock cycles. - using a shift instruction cost double the clock cycles as swpb, tested by single stepping with Classic99. That got me thinking about another example I've seen, which is setting a register to zero, by using XOR. That also costs four additional clock cycles over LI or two additional clock cycles compared to CLR. With CLR being the easy winner for expressing intent. Quote * test - does swpb and srl work the same, which is best? 0011 0026 0208 16 li r8,>FF01 fmt 8 0028 FF01 0012 002A 06C8 18 swpb r8 fmt 6 0013 002C 0208 16 li r8,>FF01 002E FF01 0014 0030 0988 36 srl r8,8 fmt 5 0015 * not same, lsb becomes 0 0016 0032 0208 16 li r8,>FF01 0034 FF01 0017 0036 0B88 36 src r8,8 fmt 5 0018 * does the same, but costs an extra 18 clock cycles 0021 * test Set Reg to Zero - most efficient. 0022 003A 0207 16 li r7,>0007 fmt 8 003C 0007 0023 003E 04C7 18 clr r7 fmt 6 0024 0040 0207 16 li r7,>0007 0042 0007 0025 0044 29C7 20 xor r7,r7 fmt 3 4 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted April 7 Share Posted April 7 1 hour ago, dhe said: Finally had some time to go back and play with a small bit of assembly. I've been proof reading an english translation of the german assembly books kursi. One of the comments he made in an example was that, we needed to get the MSB in to the LSB position for the next instruction. He mentioned instead of using SWPB, we would use SRL, while that does move the MSB>LSB, you loose the value of the LSB (unless it's already zero! 😃 ). Using Steve's Cheat Sheet, I was able to quickly grab the instructions FMT, using xdt99's listing, I quickly got the number of clock cycles. - using a shift instruction cost double the clock cycles as swpb, tested by single stepping with Classic99. That got me thinking about another example I've seen, which is setting a register to zero, by using XOR. That also costs four additional clock cycles over LI or two additional clock cycles compared to CLR. With CLR being the easy winner for expressing intent. Of course, there can be many reasons for choosing one operation over another. Regarding getting the MSB to the LSB, the point of using “SRL Rn,8” over “SWPB Rn” is usually to zero the MSB in the same instruction—saving instructions can outweigh speed. If you want to duplicate SWPB with a shift, “SRC Rn,8” will do nicely. Furthermore, if you want to test the result, the shift is the only way—SWPB does not affect the status register. Regarding zeroing a register, CLR is, indeed, the clear (groan) winner—unless, for some reason, you need a comparison. Both XOR and LI affect the status register, whereas, CLR does not. LI also takes an additional 2 bytes over the other two, when dealing only with registers. ...lee 4 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted April 7 Share Posted April 7 There's not much to learn by testing status bits after doing a CLR, regardless of whether it is possible or not. You already know that the result it pretty close to zero. A method you haven't mentioned yet is taking advantage of the peculiar design of the TMS 9900. If you want to access R7 byte by byte you can simply do MOVB R7,somewhere MOVB @MYREGS+15,somewhere+1 In the case you don't know what registers you are running with you can do this MOVB R7,somewhere STWP R4 MOVB @15(R4),somewhere+1 The result at somewhere is the same as after MOVB R7,somewhere SWPB R7 MOVB R7,somewhere+1 The first two have the possible advantage that the content of R7 is not disturbed. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.