Jump to content
IGNORED

Differential calculations


Asmusr

Recommended Posts

Differential calculations are an efficient way to reduce the time it takes to do complex calculations on old computers. On modern computers this doesn't matter much, and perhaps not in BASIC either where all the calculations are floating point, but in an assembly programs this can make a lot of difference.


For the first example, let's say we need to calculate x squared from 1 to 10. An obvious way to do this is:




10 for x=1 to 10
20 print x*x
30 next x



But we don't like the multiplication since this takes much more time than addition or subtraction. Let's investigate how the value of x*x changes when x changes from x to x+1:




dx = (x+1)*(x+1) - x*x = x*x + x + x + 1 - x*x = 2*x + 1



That's not bad since 2*x can be done as a binary shift in assembly, and +1 is an INC instruction, which together are faster than a multiplication.


In BASIC it looks like this:




10 x=1
20 for n=1 to 10
30 print x
40 x=x+2*n+1
50 next n



But we can go further to reduce the complexity of the calculations. Let's investigate what happens to dx=2*x+1 as x changes from x to x+1:



ddx = 2*(x+1)+1 - (2*x+1) = 2*x + 2 + 1 - 2*x - 1 = 2


We like that :) : no more multiplications. For each loop we just need to add 2 to dx, and dx to x:




10 x=1
20 dx=1
30 for n=1 to 10
40 print x
50 dx=dx+2
60 x=x+dx
70 next n



So we have ended up calculating a series of squares using additions only. In assembly dx=dx+2 can simply be written as INCT @DX which is even faster than a general addition.


This was a very simple example. Let's take another where differential calculations really matter. We want to rotate all the pixels on the screen by a certain angle (a). The formulas for rotating a single point (x, y) are:



x1 = x * cos(a) - y * sin(a)
y1 = y * cos(a) + x * sin(a)


That's pretty complex, but as we move through a scanline x always changes from x to x+1. Let's investigate what happens to x1 and y1 in this case:


dx1 = (x+1) * cos(a) - y * sin(a) - (x * cos(a) - y * sin(a)) = (x+1) * cos(a) - y * sin(a) - x * cos(a) + y * sin(a) = (x+1) * cos(a) - x * cos(a) = cos(a)

dy1 = y * cos(a) + (x+1) * sin(a) - (y * cos(a) + x * sin(a)) = y * cos(a) + (x+1) * sin(a) - y * cos(a) - x * sin(a) = (x+1) * sin(a) - x * sin(a) = sin(a)


So as we move a pixel to the right, dx1=dx1+cos(a) and dy1=dy1+sin(a). Since we want to rotate the entire image by the same angle a, cos(a) and sin(a) are constants, so again we have reduced the complex calculations to simple additions.


If we do the same math when we move one scanline down (y -> y + 1) we get dx1=dx1-sin(a) and dy1=dy1+cos(a), so we can rotate an entire image using just additions and subtractions.


I hope this shows how differential calculations can be useful. If you have other examples, please post them here.

  • Like 13
Link to comment
Share on other sites

The calculation of a jump that was discussed the other day is also a differential example. In general, for moving something around in 2D with acceleration ddx, ddy and velocity dx, dy we have:

 

dx = dx + ddx

x = x + dx

 

dy = dy + ddy

y = y + dy

 

Even with constant ddx and ddy these simple formulas can produce many interesting trajectories, but if we want to simulate something like space gravity it gets a bit more complex.

  • Like 1
Link to comment
Share on other sites

I always imagined the INCT instruction would show itself useful when playing with addressing.

 

To iterate across a chunk of ram performing 16 bit data operations, INCT makes sense.

 

We just don't get to do a lot of 16 bit work given that the peripherals vdp/sound all want 8 bit data.

 

---

 

For the rotation example, I am assuming the reference point is (0,0) ?

 

-M@

Link to comment
Share on other sites

I always imagined the INCT instruction would show itself useful when playing with addressing.

 

To iterate across a chunk of ram performing 16 bit data operations, INCT makes sense.

 

We just don't get to do a lot of 16 bit work given that the peripherals vdp/sound all want 8 bit data.

 

---

 

For the rotation example, I am assuming the reference point is (0,0) ?

 

-M@

 

I use INCT to iterate through lists of pointers when it's not possible to use post-increment. I use DECT to decrement the counter in CPU RAM to CPU RAM copy loops, although I could often use DEC with half the initial value.

 

Yes the reference point is (0,0) for the rotation. So if you rotate around another point (x0,y0) you need to subtract that from (x,y) before calculating the initial values of (x1,y1) for the top left corner of the image to rotate. However, once that's done the incremental calculations are the same.

  • Like 2
Link to comment
Share on other sites

In the fbForth 2.0 source code, I have used INCT 36 times and DECT 525 times. Most of the DECT uses are for reserving 16-bit stack space (parameter and return stacks grow downward). There are probably an equal number of indirect 16-bit increments via *SP+ or *R+ for popping the stacks.

 

...lee

Link to comment
Share on other sites

In the fbForth 2.0 source code, I have used INCT 36 times and DECT 525 times. Most of the DECT uses are for reserving 16-bit stack space (parameter and return stacks grow downward). There are probably an equal number of indirect 16-bit increments via *SP+ or *R+ for popping the stacks.

 

Interesting. I just checked my QR Code generator source, and it looks like I used INCT just 2 times (and 1 of those wasn't code I wrote, but a copy of GPLLNK routine), and used DECT 9 times.

 

That reminds me that I probably should put up the assembly language code for my QR Code generator up on GitHub soon before I have a disk crash or something. I've been meaning to do it, but was a bit embarrassed by how spaghetti-like it is. Though I just just go ahead and put it up as-is this weekend.

Link to comment
Share on other sites

I have deducted by observing and experimenting (using high-level programming languages on modern PCs), rather than going through the math. Which I apparently should. ;)

I think one can quite easily calculate any precision sine wave table (in Assembly), but then the output rather asks for ROM than RAM. If your time progression is linear then one can avoid sine, but for jumping in time, a table is more or less essential.

Interesting. I just checked my QR Code generator source, and it looks like I used INCT just 2 times (and 1 of those wasn't code I wrote, but a copy of GPLLNK routine), and used DECT 9 times.

 

Yes, with auto-increment, one would perhaps expect to see more DECT than INCT.

I ran through my 100+ 8K Assembly ditties.

I use one DECT in the general "return using stack" segment. Otherwise sometimes one more DECT.

The use of INCT ranges from 0 to 8. Operating on stacks, lists and structures, and probably more or less coincidentally when on word boundaries.

That reminds me that I probably should put up the assembly language code for my QR Code generator up on GitHub soon before I have a disk crash or something. I've been meaning to do it, but was a bit embarrassed by how spaghetti-like it is. Though I just just go ahead and put it up as-is this weekend.

 

Since accidentally hitting FCTN QUIT in 1982, I think I've done a good deal of backups. ;)

Edited by sometimes99er
Link to comment
Share on other sites

XB ROMs use:

INCT 29 times

DECT19 times

 

Example of ROM Page 2:

* Move 26 Characters from TOPSTK to VDP ROLLOUT                
LN7A90 LI   R1,TOPSTK      *#            
LN7A94 LI   R3,VDPROA      * VDP ROLL OUT AREA >O3CO            
LN7A98 MOVB @LR3,*R15      * VDP Address             
LN7A9C ORI  R3,>4000                   
LN7AA0 MOVB R3,*R15        * VDP Address             
LN7AA2 LI   R0,>001A       * Put 26 in BYTE COUNTER              
LN7AA6 MOVB *R1+,@VDPWD    *# VDP WRITE DATA             
LN7AAA DEC  R0             * BYTE COUNTER-1             
LN7AAC JNE  LN7AA6         * 0? NO LOOP              
LN7AAE CLR  @FAC8                     
LN7AB2 INCT @SUBSTK        *# SUBSTACK+2              
LN7AB6 MOVB @SUBSTK,R9                  
LN7ABA SRL  R9,8           * Strip High byte from word            
LN7ABC AI   R9,VAR0                    
LN7AC0 MOV  R10,*R9                    
LN7AC2 B    *R11           * RETURN
*
* Move 26 Characters from VDP ROLLOUT to TOPSTK               
LN7AC4 LI   R1,TOPSTK      *#            
LN7AC8 MOVB @>7A97,*R15    * >C0  to VDP Address            
LN7ACC MOVB @>7A96,*R15    * >03  to VDP Address            
LN7AD0 LI   R0,>001A       * BYTE COUNTER (26=>001A)            
LN7AD4 MOVB @VDPRD,*R1+    *# VDP READ DATA             
LN7AD8 DEC  R0             * BYTE COUNTER-1             
LN7ADA JNE  LN7AD4         * 0? NO LOOP               
LN7ADC CLR  @FAC8                     
LN7AE0 MOVB @SUBSTK,R9     *#             
LN7AE4 SRL  R9,8           * Strip High byte from word            
LN7AE6 AI   R9,VAR0                    
LN7AEA MOV  *R9,R11                    
LN7AEC DECT @SUBSTK        * SUBSTACK-2              
LN7AF0 B    *R11           * RETURN
Link to comment
Share on other sites

Using this little script here is what I count in console ROMs

( find INCT & DECT )
VARIABLE ICNT        ( counter variables)
VARIABLE DCNT
: CELLS  2* ;
: INCT?   ( n -- )   0FF0 AND   05C0 =  IF   1 ICNT +! THEN ;
: DECT?   ( n -- )   0FF0 AND   0640 =  IF   1 DCNT +! THEN ;
: COUNTEM ( addr cnt -- )
          OVER + SWAP        ( convert to address range)
          DO
              I @  INCT?     ( read each address)
              I @  DECT?
          2 +LOOP            ( advance by 2 bytes)
          DECIMAL
          CR ." INCT : "  ICNT ?
          CR ." DECT : "  DCNT ?
          CR ." done"
;
( test data )
CREATE T  05C0 , 05C0 , 05C0 , 05C0 , 05C0 , 05C0 ,
          0640 , 0640 , 0640 , 0640 , 0640 , 0640 ,
DECIMAL
T  12 CELLS COUNTEM  ( returns 6 & 6 )


HEX 0 2000 CELLS COUNTEM

INCT : 34

DECT : 27

done ok

 

 

Edited by TheBF
  • Like 2
Link to comment
Share on other sites

Using this little script here is what I count in console ROMs

( find INCT & DECT )
VARIABLE ICNT        ( counter variables)
VARIABLE DCNT
: CELLS  2* ;
: INCT?   ( n -- )   0FF0 AND   05C0 =  IF   1 ICNT +! THEN ;
: DECT?   ( n -- )   0FF0 AND   0640 =  IF   1 DCNT +! THEN ;
: COUNTEM ( addr cnt -- )
          OVER + SWAP        ( convert to address range)
          DO
              I @  INCT?     ( read each address)
              I @  DECT?
          2 +LOOP            ( advance by 2 bytes)
          DECIMAL
          CR ." INCT : "  ICNT ?
          CR ." DECT : "  DCNT ?
          CR ." done"
;
( test data )
CREATE T  05C0 , 05C0 , 05C0 , 05C0 , 05C0 , 05C0 ,
          0640 , 0640 , 0640 , 0640 , 0640 , 0640 ,
DECIMAL
T  12 CELLS COUNTEM  ( returns 6 & 6 )


HEX 0 2000 CELLS COUNTEM

INCT : 34

DECT : 27

done ok

 

 

This has to be for GROM and VDP access as that is exactly why you see it in the XB ROMs.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...