Jump to content
IGNORED

Code Optimization


Recommended Posts

I finally got around to implementing a table-driven CRC-16 routine. Table-driven is usually touted

as faster, so I wanted to test that claim. With it I have been able to drop overall CRC validation

of the Geneve operating system from 8.5s to just under 3.0s.

 

The following code takes advantage of the 9995 internal RAM. The 128K data file is loaded from

disk and stored in RAM (not internal 9995 ram!) for the CRC loop.

 

I'm looking for any ideas that might further optimize the loop at label NEXTBYTE. I suspect it's

about as tight as possible but would appreciate an outside opinion or two :)

 

********************************************************************************
* CRC16s 2013 March 31
*
*	 Approximate execution speed under emulation, 128K file:
*	 1. 8.5s CRC calculated 'old faithful' way one byte at a time
*	 2. 4.5s CRC calculated with table; primary loop in local SRAM page
*	 3. 3.0s CRC calculated with table; primary loop in 9995 scratchpad
*
WS	 EQU >F000	 9995 workspace
*
MAIN3 LWPI WS		 LET'S LOAD OUR WORKSPACE
 LIMI 0		 turn off those interrupts!
*
* Copy CRC calculation code to u9995 for fastest execution
* Called @>F020 after file is loaded into RAM
*
 LI R1,STARTCHIPCODE
 LI R2,>F020
 LI R7,SAVE9995
 CLR R8
CHIPLP MOV *R2,*R7+		 ultra-conservative - save scratchpad
 MOV *R1+,*R2+		 then move code
 INC R8			 # of words are we overwriting
 CI R1,ENDCHIPCODE end of code?
 JNE CHIPLP		 not yet
 MOV R8,@CT9995 yes. save the word count for exit[/font]
*----------------------------=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
* Create 16-bit CRC table in memory
*
GENERATECRC16
 LI R7,CRCTABLE512 point to start of table
 LI R8,0			 0 to 255
LOOPI
 MOV R8,R10		 get next CRC value into RESULT (R10)
 SLA R10,8			 RESult shifted LEFT 8 bits
 CLR R1			 for table, always start as if cleared
 XOR R1,R10		 R1^R10->r10 bug, earlier reversed operands ):-
*inner loop starts:
 LI R9,8			 now for our inner loop
LOOPJ MOV R10,R2		 R2=temp
 ANDI R2,>8000
 SLA R10,1
 CI R2,>8000		 temp have msbit set?
 JNE SKIP			 no
 LI R3,>1021
 XOR R3,R10		 yes, XOR it
SKIP DEC R9			 any left?
 JNE LOOPJ			 yes
*end inner loop
 MOV R10,*R7+	 move crc value into table
 INC R8		 +1
 CI R7,CRCTABLE512+512 last entry?
 JNE LOOPI

*
* Set up registers to loop through the 128K MDOS file
*
GOODFILE2
 LI R3,16	 number of 8k pages
 LI R4,PGETBL+8 Using memory starting at >10000 (first page above 64K)
 LI R5,CRCLIST location to store CRC bytes
 LI R9,MDOSCRC location to put the MDOS embedded CRC (from memory)
*
 BL @>F020	 ** call into on-board RAM - calculate CRC from 128K byte file
*
EXIT LI R1,SAVE9995 restore 9995 scratchpad saved earlier
 LI R2,>F020
REST1 MOV *R1+,*R2+
 DEC @CT9995
 JNE REST1
 BLWP @0			 Thank MDOS for working so well, then exit
*-------------------------------------------------------------------------------
* CRC Calculation - Table driven
* approx 66 bytes (0xF020-F062) - careful of internal WS's at 0xF080+
* Destroys: R0,R1,R3,R4,R7,R8
*
*
STARTCHIPCODE
 MOVB *R4+,@>F111 get a page of data
 LI R0,>2000
 CLR R8		 clear the CRC
*
* COMPUTE THE CRC using our CRC table (generated earlier)
*
NEXTBYTE
NOSKIP MOVB *R0+,R1	 GET character from buffer
 XOR R8,R1	 XOR the MSByte!
 SRL R1,7		 we want a word index. [Can we ignore the Lsbit, one less bit shift?]
*	 ANDI R1,>FFFE	 for now lets mask it [Yes! uP doesnt care]
 MOV @CRCTABLE512(R1),R1 get the table value (16-bit)
 SLA R8,8		 Shift the old result 8 bits
 XOR R1,R8	 and xor with table entry. R8 is the new value
SKIPIT CI R0,>4000		 end of this page?
 JL NEXTBYTE		 no, get another[/font]
 MOV R8,*R5+		 yes,add CRC to the list
 DEC R3			 any pages left?
 JNE STARTCHIPCODE
 RT *** remove if we don't move to on-chip!
ENDCHIPCODE

******************************************************
* end of code, data follows
*
H1021 DATA >1021	 crc polynomial (16-bit)
CRCTABLE512
 BSS 520 CRC table buffer. Pad a few bytes for stupid programmer errors
SAVE9995
 BSS 256 scratchpad storage for u9995
CT9995 DATA 0	 number of words used
*---------------------------------------------------------
 END

Edited by InsaneMultitasker
Link to comment
Share on other sites

I never looked into the details for CRC, but it looks like you are processing 1-byte at a time. Is is possible to read a 16-bit value from buffer and operate on that instead? Also, the CI R0,>4000 would be a candidate for replacement, if only to using a register counter that is setup outside of the loop, then replace the CI with DEC Rx and JNE.

 

Also, if you can't do a 16-bit value, then you could unroll the loop twice, i.e. do MOVB *R0+,R1 twice inside the loop and decrement the loop counter twice.

  • Like 1
Link to comment
Share on other sites

I never looked into the details for CRC, but it looks like you are processing 1-byte at a time. Is is possible to read a 16-bit value from buffer and operate on that instead? Also, the CI R0,>4000 would be a candidate for replacement, if only to using a register counter that is setup outside of the loop, then replace the CI with DEC Rx and JNE.

 

Also, if you can't do a 16-bit value, then you could unroll the loop twice, i.e. do MOVB *R0+,R1 twice inside the loop and decrement the loop counter twice.

Thank you. Good call on the DEC / JNE. That will be easy to implement. I can also easily unroll the loop once, maybe twice in the processor space, to reduce the loop count.

 

The CRC routine that seems "common" in TI terminal emulators operates on a byte of data, with the 8 bits calculated in real-time. The table pre-calculation at the start of my program loosely corresponds to the bit-wise code. The look-up table is also byte-oriented but it utilizes properties of mod 2 and polynomial division, thereby eliminating a lot of the bit-shifting in the common calculation. ( I don't have the common code handy to post for comparison.)

 

Alas, I don't see a way to operate on a 16-bit value given the CRC's mathematical properties. However, it occurs to me that the memory at 0x0000-0x1fff is RAM on this machine, so I might eek out some speed if I create the CRC table at offset 0x0000, and eliminate the base address.

 

[color="#000000"]MOV [/color][color="#006666"]@CRCTABLE512[/color][color="#666600"]([/color][color="#000000"]R1[/color][color="#666600"]),[/color][color="#000000"]R1 [/color][color="#000088"]get[/color][color="#000000"] the table value [/color][color="#666600"]([/color][color="#006666"]16[/color][color="#666600"]-[/color][color="#000000"]bit[/color][color="#666600"])[/color] 
would become
MOV *R1,R1

I don't remember if XOR allows indirect register access. If so, I should be able to change
[b]from 3 instructions:[/b]
[color="#000000"]		 MOV *R1[/color][color="#666600"],[/color][color="#000000"]R1		   [/color][color="#000088"]get[/color][color="#000000"] the table value [/color][color="#666600"]([/color][color="#006666"]16[/color][color="#666600"]-[/color][color="#000000"]bit[/color][color="#666600"])[/color][color="#000000"] 
        SLA R8[/color][color="#666600"],[/color][color="#006666"]8[/color] [color="#660066"]Shift[/color][color="#000000"] the old result [/color][color="#006666"]8[/color][color="#000000"] bits 
        XOR R1[/color][color="#666600"],[/color][color="#000000"]R8     	    [/color][color="#000088"]and[/color][color="#000000"] xor [/color][color="#000088"]with[/color][color="#000000"] table entry[/color][color="#666600"].[/color][color="#000000"] R8 [/color][color="#000088"]is[/color][color="#000000"] the [/color][color="#000088"]new[/color][color="#000000"] value [/color]
[b]to 2 instructions:[/b]
[color="#000000"]		 SLA R8[/color][color="#666600"],[/color][color="#006666"]8[/color] [color="#660066"]Shift[/color][color="#000000"] the old result [/color][color="#006666"]8[/color][color="#000000"] bits 
        XOR *R1[/color][color="#666600"],[/color][color="#000000"]R8          [/color][color="#000088"]and[/color][color="#000000"] xor [/color][color="#000088"]with[/color][color="#000000"] table entry[/color][color="#666600"].[/color][color="#000000"] R8 [/color][color="#000088"]is[/color][color="#000000"] the [/color][color="#000088"]new[/color][color="#000000"] value [/color]

 

Guess I have a few things to try this weekend :)

Link to comment
Share on other sites

I don't know why the AA editor keeps translating color and fonts in the 'code' section. Here is the snippet.

MOV @CRCTABLE512(R1),R1 get the table value (16-bit)
would become
MOV *R1,R1

I don't remember if XOR allows indirect register access. If so, I should be able to change
from 3 instructions:
	 MOV *R1,R1		 get the table value (16-bit)
	 SLA R8,8			 Shift the old result 8 bits
	 XOR R1,R8			 and xor with table entry. R8 is the new value
to 2 instructions:
	 SLA R8,8			 Shift the old result 8 bits
	 XOR *R1,R8		 and xor with table entry. R8 is the new value

Link to comment
Share on other sites

Unrolling the computation helped as did combining the table lookup with the XOR. When I originally wrote the code, I had reversed the source and destination. XOR did not like a symbolic address in the destination; however, using one in the source was acceptable. I had replaced one of the shift instructions with SWPB and ANDI until I read the 9995 manual more closely ;) I'd say this one is conquered for now.

 

thank you

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...