+TheBF Posted September 22, 2020 Share Posted September 22, 2020 6 hours ago, InsaneMultitasker said: I believe the DSRLNK in question starts the scan at 0x1200 [R12 is first set to 0x1100 and 0x0100 is added to R12 before the test starts] then loops back around to 0x1000 [R12 set to 0x0f00 then 0x0100 is added to R12] , thus skipping the floppy controller in favor of higher CRU bases. This was also done to hit the HFDC and other devices at 0x1000 before the floppy controller. There are programs where changing the first scanned address was beneficial long ago. If this is indeed the DSRLNK version I shared long ago, I probably had forgotten about this modification. Good eagle eye. Thanks. Nice to know some history. I have another question. I see things being saved but then not restored or used from the saved locations. Any ideas on the thinking behind that? Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 23, 2020 Author Share Posted September 23, 2020 10 hours ago, Lee Stewart said: Spoiler HEX \ 0 <= ud <= 0x1FFFF (13171) \ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d' : SQRTUD ( ud -- n ) DUP >R \ MSW to return stack IF 2 SRL \ /4 if MSW > 0 4000 + \ correct for missing MSW THEN >R \ loop limit to return stack -1 -1 \ index and root starts BEGIN 2+ DUP \ add 2 to root and dup ROT + DUP \ root+index and dup R U< \ index < limit? WHILE SWAP \ yes..reorder index and root REPEAT \ next round R> DROP DROP \ drop limit and index 1 SRL \ correct root (small now..no need for 32-bit arithmetic) R> IF 1 SLA \ *2 to correct root for initial /4 THEN ; DECIMAL OK—here is the Forth Assembler version of the above unsigned integer square root word, SQRTUD : HEX \ 0 <= ud <= 0x1FFFF (13171) \ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d' ASM: SQRTUD ( ud -- n ) 2 @(SP) R1 MOV, \ LSW to R1 as loop limit *SP+ R0 MOV, \ pop MSW to R0 NE IF, \ MSW not 0? R1 2 SRL, \ loop limit /4 R1 4000 AI, \ add shifted bit from MSW THEN, R2 SETO, \ set running root to -1 R3 SETO, \ set loop index to -1 BEGIN, \ root-building loop R2 INCT, \ running root+2 R2 R3 A, \ add running root to index R3 R1 C, \ compare index to limit HE UNTIL, \ leave loop when logical higher or equal R2 1 SRL, \ /2 to correct root R0 0 CI, \ MSW 0? NE IF, \ no R2 1 SLA, \ *2 to correct root for /4 THEN, R2 *SP MOV, \ root to stack ;ASM DECIMAL I will post the machine code version later. ...lee Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 23, 2020 Author Share Posted September 23, 2020 And—the machine code version of SQRTUD : HEX \ 0 <= ud <= 0x1FFFF (13171) \ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d' CODE: SQRTUD ( ud -- n ) C069 0002 C039 1303 0921 0221 4000 0702 0703 05C2 A0C2 8043 1AFC 0912 0280 0000 1301 0A12 C642 ;CODE DECIMAL I probably should call this word by a different name because of its specialized nature. It will only work with 1 in the MSW of the double number input. ...lee Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 23, 2020 Author Share Posted September 23, 2020 For those wanting to test the beta version of build 13, here is fbForth 2.0:A13— fbForth200_A13_20200922.zip Included are inverted (ends in “9.bin”), non-inverted (ends in “8.bin”) and individual bank (end in “b0.bin” – “b3.bin”) binaries, as well as the current FBLOCKS file. The individual binaries are bank-switched in inverted order, so, if you need to assemble your own composite binary, the programmed (inverted) order is b0, b1, b2, b3. Reversing this order to b3, b2, b1, b0, makes the composite binary, effectively, non-inverted—clear as mud, right? ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 23, 2020 Share Posted September 23, 2020 7 hours ago, Lee Stewart said: For those wanting to test the beta version of build 13, here is fbForth 2.0:A13— fbForth200_A13_20200922.zip 85.74 kB · 0 downloads Included are inverted (ends in “9.bin”), non-inverted (ends in “8.bin”) and individual (end in “b0.bin” – “b3.bin”) binaries, as well as the current FBLOCKS file. The individual binaries are bank-switched in inverted order, so, if you need to assemble your own composite binary, the programmed (inverted) order is b0, b1, b2, b3. Reversing this order to b3, b2, b1, b0, makes the composite binary, effectively, non-inverted—clear as mud, right? ...lee Not a clue. Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 23, 2020 Share Posted September 23, 2020 Oooo. K. Haha. Great work Quote Link to comment Share on other sites More sharing options...
atrax27407 Posted September 23, 2020 Share Posted September 23, 2020 I've been waiting for "that shoe to drop". Great stuff! 1 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 On 9/22/2020 at 10:01 PM, Lee Stewart said: OK—here is the Forth Assembler version of the above unsigned integer square root word, SQRTUD : Spoiler HEX \ 0 <= ud <= 0x1FFFF (13171) \ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d' ASM: SQRTUD ( ud -- n ) 2 @(SP) R1 MOV, \ LSW to R1 as loop limit *SP+ R0 MOV, \ pop MSW to R0 NE IF, \ MSW not 0? R1 2 SRL, \ loop limit /4 R1 4000 AI, \ add shifted bit from MSW THEN, R2 SETO, \ set running root to -1 R3 SETO, \ set loop index to -1 BEGIN, \ root-building loop R2 INCT, \ running root+2 R2 R3 A, \ add running root to index R3 R1 C, \ compare index to limit HE UNTIL, \ leave loop when logical higher or equal R2 1 SRL, \ /2 to correct root R0 0 CI, \ MSW 0? NE IF, \ no R2 1 SLA, \ *2 to correct root for /4 THEN, R2 *SP MOV, \ root to stack ;ASM DECIMAL I just could not let it go, so here is a full (well, almost full) 32-bit version of SQRTUD in Forth Assembler: Spoiler HEX \ 0 <= ud <= 0xFFFE0001 (4294836225 = 65535^2) \ Registers: R0,R1 = udh,udl (loop limit) \ R2,R3 = idxh,idxl \ R4,R5 = rooth,rootl (& opening 0 test) \ R6 = loop flag ASM: SQRTUD ( ud -- n ) *SP+ R0 MOV, \ pop udh to R0 *SP R1 MOV, \ udl to R1 EQ IF, \ udl = 0? R0 R5 MOV, \ udh = 0, also? THEN, \ this will work if at least one of R0, R1 <> 0 NE IF, R0 FFFE CI, \ check for max HE IF, R0 FFFE LI, \ too high or at max.. R1 0001 LI, \ ..force max THEN, R2 CLR, \ set loop.. R3 CLR, \ ..index idx to 0 R4 CLR, \ set running.. R5 0001 LI, \ ..root to 1 BEGIN, \ root-building loop R5 INCT, \ root+2 OC IF, R4 INC, \ correct rooth THEN, R5 R3 A, \ add rootl to idxl OC IF, R2 INC, \ correct idxh THEN, R4 R2 A, \ add rooth to idxh \ compare index to limit R6 CLR, \ assume false R2 R0 C, \ compare idxh & udh H IF, R6 SETO, \ idxh > udh. set true ELSE, EQ IF, \ equal - check low words R3 R1 C, \ compare idxl & udl HE IF, R6 SETO, \ idxl >= udl. set true THEN, THEN, THEN, R6 INC, \ if 0, was true, else, false EQ UNTIL, \ leave loop when logical higher or equal R5 1 SRL, \ /2 to correct rootl R4 1 SRL, \ /2 to correct rooth OC IF, R5 8000 AI, \ add LSb shifted out of rooth to MSb of rootl THEN, THEN, R5 *SP MOV, \ root to stack ;ASM DECIMAL I said, “almost full 32-bit,” because to get back a 16-bit answer, the maximum square is limited to 4,294,836,225 (0xFFFE0001). It is very fast except when the MSW is very large. The maximum square takes 4 seconds! If anyone cares, I will post a machine code version tomorrow—might, anyway. ...lee 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 24, 2020 Share Posted September 24, 2020 I was going to modify the one I had to use 32 bit operators but knew it was a rabbit hole. Nice work by you. 1 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 24, 2020 Share Posted September 24, 2020 Lee, Here is something you could try to compare your new and old versions. Spoiler \ Sieve of Erathosenes for FbForth 1 CONSTANT 1 2 CONSTANT 2 HEX : FREEMEM ( -- n) FF00 HERE - ; : ?MEM ( n -- ) FREEMEM OVER < IF ." Out of memory" ABORT THEN ; : SEEPRIMES ( -- ) CR ." Primes: " 2 DO I HERE + C@ 0= IF I . THEN ?TERMINAL IF ." Primes halted" ABORT THEN LOOP ; \ byte array uses unallocated memory at HERE DECIMAL : PRIMES ( n -- ) ?MEM CR ." Running..." HERE OVER ERASE 1 0 HERE + C! \ mark as prime like 'C' version 1 1 HERE + C! 2 \ start at 2 BEGIN OVER OVER DUP * > WHILE DUP HERE + C@ 0= IF OVER OVER DUP * DO 1 I HERE + C! DUP +LOOP THEN 1+ REPEAT CR ." Complete." CR DROP CR ." Press ENTER to see primes:" KEY 13 = IF SEEPRIMES THEN ; Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 2 hours ago, TheBF said: I was going to modify the one I had to use 32 bit operators but knew it was a rabbit hole. Nice work by you. ’Twas a bit of a rabbit hole, indeed. The final penny to drop was needing to accommodate the “on carry” test for the first pass through the loop. I could not see anything wrong with my code, but it was going off into the weeds. It finally dawned on me that, in the very first pass for both the running root and the loop index, the carry bit was getting set (-1 to +1 and -1 to 0, respectively). ...lee Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 1 hour ago, TheBF said: Lee, Here is something you could try to compare your new and old versions. Reveal hidden contents \ Sieve of Erathosenes for FbForth 1 CONSTANT 1 2 CONSTANT 2 HEX : FREEMEM ( -- n) FF00 HERE - ; : ?MEM ( n -- ) FREEMEM OVER < IF ." Out of memory" ABORT THEN ; : SEEPRIMES ( -- ) CR ." Primes: " 2 DO I HERE + C@ 0= IF I . THEN ?TERMINAL IF ." Primes halted" ABORT THEN LOOP ; \ byte array uses unallocated memory at HERE DECIMAL : PRIMES ( n -- ) ?MEM CR ." Running..." HERE OVER ERASE 1 0 HERE + C! \ mark as prime like 'C' version 1 1 HERE + C! 2 \ start at 2 BEGIN OVER OVER DUP * > WHILE DUP HERE + C@ 0= IF OVER OVER DUP * DO 1 I HERE + C! DUP +LOOP THEN 1+ REPEAT CR ." Complete." CR DROP CR ." Press ENTER to see primes:" KEY 13 = IF SEEPRIMES THEN ; Not sure what you are suggesting I test here. A couple of notes, though: (1) fbForth and TI Forth both have 0, 1, 2, 3 defined as constants in the kernel. (2) Minor nit— SEEPRIMES is missing an input ‘n’ in the stack effects. ...lee 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted September 24, 2020 Share Posted September 24, 2020 13 hours ago, Lee Stewart said: The maximum square takes 4 seconds! This kind of got me wondering... if doing it mathematically takes 4 seconds, would a dumb binary search be faster? In theory it should be able to get down to the answer within 33 iterations. It'd be straight-forward... square the current number, and depending on whether the result is bigger or smaller than the target, step the appropriate direction. I might be missing something - code I get, math theory not so much. 1 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted September 24, 2020 Share Posted September 24, 2020 21 minutes ago, Tursi said: This kind of got me wondering... if doing it mathematically takes 4 seconds, would a dumb binary search be faster? In theory it should be able to get down to the answer within 33 iterations. It'd be straight-forward... square the current number, and depending on whether the result is bigger or smaller than the target, step the appropriate direction. I might be missing something - code I get, math theory not so much. Like this: https://codebase64.org/doku.php?id=base:fast_sqrt 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 24, 2020 Share Posted September 24, 2020 1 hour ago, Lee Stewart said: Not sure what you are suggesting I test here. A couple of notes, though: (1) fbForth and TI Forth both have 0, 1, 2, 3 defined as constants in the kernel. (2) Minor nit— SEEPRIMES is missing an input ‘n’ in the stack effects. ...lee I wondered if there is any performance difference between the old version and the new version. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 1 minute ago, TheBF said: I wondered if there is any performance difference between the old version and the new version. There surely is. The two-word manipulations alone would insure that. But, you are right, I should test it. ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 24, 2020 Share Posted September 24, 2020 1 minute ago, Lee Stewart said: There surely is. The two-word manipulations alone would insure that. But, you are right, I should test it. ...lee If you mean OVER OVER, maybe you have 2DUP. ? Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 1 minute ago, TheBF said: If you mean OVER OVER, maybe you have 2DUP. ? No, I mean double-word math, testing for carry, etc. ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 24, 2020 Share Posted September 24, 2020 Just now, Lee Stewart said: No, I mean double-word math, testing for carry, etc. ...lee Sorry we are cross paths. I was referring to the sieve benchmark. Didn't make it clear. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 4 minutes ago, TheBF said: Sorry we are cross paths. I was referring to the sieve benchmark. Didn't make it clear. Are you referring to testing fbForth 2.0:A13 against 2.0:12? Otherwise, I really am lost. ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 24, 2020 Share Posted September 24, 2020 1 minute ago, Lee Stewart said: Are you referring to testing fbForth 2.0:A13 against 2.0:12? Otherwise, I really am lost. ...lee Yes. Running the sieve benchmark on both versions of FbForth. Perhaps nothing in that area has changed, but I recall 2.0:12 being significantly slower than my system and Turbo Forth. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 24, 2020 Author Share Posted September 24, 2020 1 minute ago, TheBF said: Yes. Running the sieve benchmark on both versions of FbForth. Perhaps nothing in that area has changed, but I recall 2.0:12 being significantly slower than my system and Turbo Forth. Yeah, I will try it with each version, with and without operation of the Forth ISR. ...lee 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 26, 2020 Author Share Posted September 26, 2020 Per @atrax27407’s request, here is the RPK file for the beta A13 build: fbForth200.rpk ...lee 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 30, 2020 Share Posted September 30, 2020 I have been AWOL for a while working on my version of DSR Link code. In the process I found two small things that can be "improved" in insanemultitasker version that was my starting place. 1. Removing a MOV and a SRL by using the empty R6 register to read and copy into the NAMBUF Oops. This was an mistake. 2. Moving the init R2 to >4000 outside of the scanning loop. It is invariant in the loop so no need to set it every time. This is still valid I put the changes into the TI ASM code along with comments of how I understood this thing. This code is not tested and as noted I don't use the conventional Assembler much anymore so treat this as faulty syntax but the changes have been shown to work in my Forth Assembler version. This stuff is probably common knowledge for many but perhaps the comments will be helpful for another newbie somewhere down the road. ? 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted October 6, 2020 Share Posted October 6, 2020 Lee we had discussed creating faster dictionary searches using a hashing method like PolyForth. I found this book by Dr. Ting called inside Forth83, which is a fantastic little Forth for DOS by Laxen & Perry. It used a 4 way hash on the dictionary which is described herein. Here is the link to the book: http://www.forth.org/OffeteStore/1003_InsideF83.pdf Page 73 shows real code on how they Laxen and Perry did it. It is quite amazing what happens when this is applied. HsForth for DOS which I use to for my cross-compiler had an add-on that hashed the dictionary and compile times were almost 10X faster. I don't use it these days because it takes extra dictionary memory but it was great for 4.7MHz PCs. In this case we could expect something on the order or 3..4 times I suspect. I am working on how I could make this work with Camel Forth. Not sure I can fit it all in an 8K kernel, but I am playing with concepts in situ. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.