IGNORED

# Computing Square Roots

## Recommended Posts

I thought I'd share an article describing the square root algorithm that comes in SDK-1600. It's not new code -- I wrote this quite some time back -- but I'm fairly proud of it. It's also incredibly non-obvious, so I thought I'd share some of the tricks and techniques that went into making it.

It runs very quickly compared to the EXEC square root routine (about .8ms worst case for an integer sqrt, and 1ms for sqrt(8Q8) => 8Q8), correctly supports the full range of unsigned integers, and can compute fractional bits of the square root. (eg. fixed-point arithmetic) The article below is adapted and extended from the documentation I included in SDK-1600.

My code is based loosely on the following C code algorithm below by Paul Hseih and Mark Borgerding. It was originally published at http://www.azillionmonkeys.com/qed/sqroot.html , however it no longer appears to be there. Here is the original code I worked from:

```   unsigned int mborg_isqrt2(unsigned int val)
{
unsigned int g, g2, b, b2, gxb;

g   = 0;                // guess
g2  = 0;                // guess^2
b   = 1 << 15;          // bit
b2  = 1 << 31;          // 2*bit^2
gxb = 1 << 30;          // bit*(2*guess+bit)

do
{
if (g2 + gxb <= val)    // (guess+bit)^2 <= val?
{
g   ^= b;          // guess += bit
g2  += gxb;        // (g + b)^2 = g^2+gxb
gxb += b2;         // b(2(g+b)+b) = b(2g+b)+b^2
}

b   >>= 1;              // bit >>= 1
b2  >>= 2;              // 2(b/2)^2 = 2b^2/4
gxb = (gxb - b2) >> 1;  // b(2g+b/2)/2 = (b(2g+b)-2b^2/4)/2
} while (b != 0);

return g;
}
```

The comments are very terse, so you may wonder: How does it work?

The algorithm successively approximates the square root by attempting to set bits in the "guess", starting at the top and working its way down. It will set a bit in the "guess" if doing so will keep "guess*guess" smaller than the target value.

Before I get into the formality, let's start with a simple example to set the stage. Suppose I wanted to take the square root of 26. I start with my "guess" set to 0, and start trying to set bits at the top, working down. Let's say I start with the '8' bit. 8 squared is 64 -- too large. So I don't set that bit, and my guess stays at 0. Now I can try setting the 4 bit. 4 squared is 16, so I can set the 4 bit in the guess. Next, I try the 2 bit. (4+2) squared is 36 -- too large. So, I don't set the 2 bit. Finally, I try the 1 bit. (4+1) squared is 25, so I can set that bit in the guess, giving '5' as the result. 5 is the largest integer whose square is equal to or below 26. Tada!

The quick example above was hopefully easy to follow, but if you implement it naïvely, it will have a ton of multiplies -- one for each guess. Multiplies are not exactly cheap on the Intellivision. How does the code above actually do this, then, without doing a ton of multiplies?

The trick is to rely on the following relationship you may remember from algebra class:

(x + y)
2
= x
2
+ 2xy + y
2

In the context of this algorithm, 'x' is the guess, and 'y' is the bit it is considering to add to the guess. Let's rename those variables to make this easier to follow, using the names the C code above used. I'll use 'g' for the guess and 'b' for the bit being considered. That gives us:

(g + b)
2
= g
2
+ 2gb + b
2

At a high level, here's what the code does:

• Is (g + b)2 bigger than val?
• Yes: Update 'g' by adding 'b'
• No: Do not update 'g'

[*]Right shift 'b' by one to test the next bit position.

Rather than find (g + b)2 directly, it instead keeps a variable, g2, that holds the square of the current guess. The variable 'gxb' represents "2gb + b2". Thus, it's simple to find (g + b)2 by simply adding g2 and gxb.

The code that computes gxb is perhaps a little subtle. It has two pieces: The conditional piece that happens when you decide to set a bit in the guess, and the unconditional piece that happens each iteration when you shift your guess bit right by 1. Let's tackle these both separately, starting with the unconditional step.

The variable gxb contains (2gb + b2). The variable 'b' contains the bit we just guessed on. And, the variable b2 contains (2b2). (Its purpose will hopefully become clear in a moment.) When we move from one iteration to the next, the value of b changes, so the required value for gxb also changes. To make the update step clearer, let's call the values for the next iteration gxb' and b'. We then have the relations:

• b' = b / 2
• b2' = b2 / 4 -- If b drops by half, then its square drops by a factor of 4.
• gxb' = 2gb' + b'2

That last relation is kinda interesting. Let's do a little more algebra on it:

• gxb' = 2gb' + b'2
• gxb' = 2g(b/2) + (b/2)2 -- substitution
• gxb' = 2g(b/2) + b2/4 -- break the fraction out of the square
• gxb' = (2gb + b2/2) / 2 -- factor a division by 2 out of everything
• gxb' = (2gb + b2/2 + b2/2 - b2/2) / 2 -- Let's try to turn b2/2 into b2
• gxb' = (2gb + b2 - b2/2)/2 -- Tada... Part of the result should look familiar...
• gxb' = (gxb - b2/2) / 2 -- Aha! Expressed most of gxb' in terms of gxb. Now what about the last part?

The remaining step is to figure out b2/2. Well, this happens to be the value that's in b2'. Tada! That gives us the expression we see in the C code above, gxb = (gxb - b2) >> 1.

What about the conditional update, when we decide to add a bit to the guess? The value of g changes, which also changes the value of gxb for the next iteration. This part is perhaps easier to reason about, though. Let's walk through the math again to show how the update works. Here, g' and gxb' represent the new values of g and gxb after adding b to the guess:

• g' = g + b
• gxb' = 2g'b + b2

Again, applying a little algebra...

• gxb' = 2g'b + b2
• gxb' = 2(g + b)b + b2 -- substitution
• gxb' = 2gb + 2b2 + b2 -- distribute multiplication over addition
• gxb' = 2gb + b2 + 2b2 -- reorganize terms. underlined portion should look familiar
• gxb' = gxb + 2b2 -- Aha! Now what's that last part?

And again, we're able to write the update as a simple addition. The term we're adding is 2b2, which just happens to be the variable b2. Nice how that worked out, eh?

That pretty much captures how the mborg_isqrt2() function works. That's the routine by Mark Borgerding and Paul Hseih.

My code takes this a step further. The original code was content to take integer square roots with integer results. I wanted something that could work in a fixed-point environment. For example, if you take the square root of an 8Q8 number (that is, a 16-bit number with 8 integer bits and 8 fractional bits), you might want the result to also be 8Q8. If you used a plain integer square root function for this, though, you'd end up with a 4Q4 result.

Aside: It may not be immediately obvious why the integer square root of an 8Q8 number is 4Q4. If you instead think of an 8Q8 number as a quotient, though, it's a little easier to see. If your original number is "x / 256" (recall, 256 = 2
8
), the square root of that is sqrt(x / 256) == sqrt(x) / sqrt(256) == sqrt(x) / 16. This is equivalent to having 4 fractional bits, as 16 = 2
4
.

To accommodate this, I opt for a more "destructive" approach, and more creative shifting. What do I mean by destructive?

The original code does not modify the value whose square root we're taking. Rather, as it builds its guess up, it also builds up guess2, comparing it against the original value. This is fine for a pure integer square root, but it can never give you fractional bits without going to a more extended accumulator. That's not terribly desirable.

So, instead, as I commit bits to the guess, I also subtract their contribution to the square from the target value. Thus, the value we compare against each iteration gets smaller as we add 1 bits to the guess. This erodes away the original value as we establish what bits to set in the result. (If you've ever done long-division, this should feel a little familiar.) Every time we set a bit in the guess, we eliminate up to 2 bits of the original value. This suggests another tweak.

Instead of shifting b and gxb right by 1 every iteration (and b2 right by 2), let's instead fix their position and shift the (now eroding) target value left every iteration. The value b2 then shifts right by 1. b and gxb don't shift at all. What does this buy us?

Well, it eliminates at least one shift and reduces another from a 2-bit shift down to a 1-bit shift. Even more importantly, though, it allows us to keep iterating the algorithm beyond a simple integer square root. Once we start shifting the target value left, that begins introducing fractional bits to the right of the original input. Our "destructive" updates actually update these fractional bits. As a result, we can iterate this square root algorithm for another 7 or so iterations beyond what the original integer algorithm would have let us. This enables the algorithm to compute an 8Q8 square root from a 16 bit pure integer.

I've included below both my assembly code (also available in SDK-1600), as well as a C model to help you understand how it works. This routine is, I estimate, between 5x and 25x as fast as the EXEC's square root code, and can handle numbers all the way up to 65535 (unlike the EXEC, which apparently tops out at 32767, based on the documented bug in B17 Bomber).

An aside on performance: When searching around the net for other references on mborg_isqrt2(), I found a thesis paper from 2003 that referred to it. The author of the thesis implemented the mborg_isqrt2() routine on an 8-bit PIC16F76 MCU. Their implementation took around 5.5ms. The PIC16F76 has a 5MHz instruction rate. I'm rather proud that my 16-bit implementation is about 6x as fast on a machine whose instruction rate is around 50x slower (about 100k instructions/second vs. around 5M instructions/second.) Sure, theirs is probably a full 32-bit sqrt(), but still... I don't know why theirs should be so slow.

The EXEC square root routine is also a dog, with a highly variable execution time. For fun, I measured a couple square-root calls that occur during B-17 Bomber. The fastest calls were a couple thousand cycles, and the slowest were around 15,000 cycles.

------------------------------------------------------------------------------

```/* Note: The code below assumes 16-bit ints.
It's easily generalized to 32 bits. */

unsigned int joe_isqrt(unsigned int val, unsigned int qpt)
{
unsigned int rslt = 0;      /* result word                              */
unsigned int sqtb = 0x4000; /* 1/2 of square of test bit                */
unsigned int updt = sqtb;   /* bit * (2*guess + bit)                    */
unsigned int iters;
unsigned int remd = val;    /* What remains of the value we're rooting  */

/* If Q-point is odd, force it to even.  This loses 1 bit of precision. */
if ((qpt & 1) == 1)
val >>= 1;

qpt = (qpt + 1) >> 1;

/* Iterate 8 times for integer part, plus qpt/2 times for fraction bits */
iters = qpt + 8;

while (iters--)
{
/* ---------------------------------------------------------------- */
/*  The 'guess update' relies on the following relationship, where  */
/*  'g' is our current guess, and 'b' is the bit we wish to add.    */
/*                                                                  */
/*      (g + b)^2 = g^2 + 2*g*b + b^2                               */
/*                                                                  */
/*  The 'guess update' corresponds to "2*g*b + b^2" portion.  If    */
/*  (g + b)^2 remains smaller than the original value, we can add   */
/*  the guess bit to the root.                                      */
/*                                                                  */
/*  One transformation this algorithm makes is to subtract the      */
/*  value of the guessed bit from the original value, rather than   */
/*  add it to our successive guesses.  Thus, we compare the guess   */
/*  update to the "remaining value" as opposed to our cumulative    */
/*  squared guess.                                                  */
/*                                                                  */
/*  The "obvious" implementation shifts the guess bit right every   */
/*  iteration, with the squared guess bit shifting right by 2.      */
/*  To keep everything fitting in 16 bits, and to support fixed     */
/*  point square roots, we instead shift the remainder left 1 bit   */
/*  every iteration, and shift our squared guess bit right one bit. */
/*  This also means we never need to refer to the guess directly,   */
/*  only the squared guess bit.                                     */
/*                                                                  */
/*  The transformation is legal because we're guaranteed to         */
/*  eliminate at least one bit of the remainder every iteration.    */
/*                                                                  */
/*  The 'guess update' computation requires some explanation.  If   */
/*  we guess '1', then we only need to add "2*g*(b/2) + (b/2)^2"    */
/*  to the guess for the next guess.  If we guess '0', then we      */
/*  first need to subtract "2*g*b + b^2" from the guess, and        */
/*  then add "2*g*(b/2) + (b/2)^2" to the guess.                    */
/* ---------------------------------------------------------------- */

/* ---------------------------------------------------------------- */
/*  The main test:                                                  */
/*                                                                  */
/*  If the 'guess update' is not larger than the remainder, or if   */
/*  the remainder was bigger than 0x7FFF, then we guess '1' into    */
/*  the square root.  Otherwise, we guess '0'.                      */
/* ---------------------------------------------------------------- */
int big = remd >= 0x8000;
remd  <<= 1;

if (big || remd >= updt)
{
rslt = (rslt << 1) | 1;     /* Set bit in result                */
remd -= updt;               /* Subtract update from remainder   */

updt += sqtb + (sqtb >> 1);
sqtb >>= 1;                 /* Update squared guess bit         */
} else
{
rslt = (rslt << 1);         /* Clear bit in result              */
sqtb >>= 1;                 /* Update squared guess bit         */
updt -= sqtb;
}
}

return rslt;
}
```

------------------------------------------------------------------------------

```;* ======================================================================== *;
;*  These routines are placed into the public domain by their author.  All  *;
;*  copyright rights are hereby relinquished on the routines and data in    *;
;*  this file.  -- Joseph Zbiciak, 2008                                     *;
;* ======================================================================== *;

;; ======================================================================== ;;
;;  NAME                                                                    ;;
;;      SQRT        Calculate the square root of a fixed-point number       ;;
;;      SQRT.1      Calculate the square root of an integer                 ;;
;;      SQRT.2      Calculate the square root of a fixed-point number       ;;
;;                                                                          ;;
;;  AUTHOR                                                                  ;;
;;      Joseph Zbiciak <intvnut AT gmail.com>                               ;;
;;                                                                          ;;
;;  REVISION HISTORY                                                        ;;
;;      12-Sep-2001 Initial revision . . . . . . . . . . .  J. Zbiciak      ;;
;;      24-Nov-2003 Minor tweaks for speed . . . . . . . .  J. Zbiciak      ;;
;;                                                                          ;;
;;  INPUTS for SQRT                                                         ;;
;;      R1      Unsigned 16-bit argument to SQRT()                          ;;
;;      R5      Pointer to DECLE containing Qpt, followed by return addr.   ;;
;;                                                                          ;;
;;  INPUTS for SQRT.1                                                       ;;
;;      R1      Unsigned 16-bit argument to SQRT()                          ;;
;;                                                                          ;;
;;  INPUTS for SQRT.2                                                       ;;
;;      R0      Qpt for fixed-point value                                   ;;
;;      R1      Unsigned 16-bit argument to SQRT()                          ;;
;;                                                                          ;;
;;  OUTPUTS                                                                 ;;
;;      R0      Zeroed                                                      ;;
;;      R1      Unmodified                                                  ;;
;;      R2      SQRT(R1)                                                    ;;
;;      R3, R4  Unmodified                                                  ;;
;;      R5      Trashed                                                     ;;
;;                                                                          ;;
;;  NOTES                                                                   ;;
;;      The way this code handles odd Q-points on fixed-point numbers is    ;;
;;      by right-shifting the incoming value 1 bit, thus making the         ;;
;;      Q-point even.  This has the negative effect of losing precision     ;;
;;      on odd Q-point numbers.  Rectifying this without losing any         ;;
;;      performance would require significantly larger codesize.            ;;
;;                                                                          ;;
;;  CODESIZE                                                                ;;
;;      44 words                                                            ;;
;;                                                                          ;;
;;  CYCLES                                                                  ;;
;;      cycles = 139 + 71*(8 + Qpt/2) worst case for SQRT                   ;;
;;      cycles = 121 + 61*(8 + Qpt/2) best case for SQRT                    ;;
;;                                                                          ;;
;;      Subtract 4 cycles if Qpt is even.                                   ;;
;;      Subtract 8 cycles if calling SQRT.1.                                ;;
;;      Subtract 14 cycles if calling SQRT.2.                               ;;
;;                                                                          ;;
;;  SOURCE                                                                  ;;
;;      Loosely based on a C code example (mborg_isqrt2) by Paul Hseih and  ;;
;;      Mark Borgerding, found on the web here:                             ;;
;;                                                                          ;;
;;          http://www.azillionmonkeys.com/qed/sqroot.html                  ;;
;;                                                                          ;;
;;      Includes additional optimizations that eliminate some of the math.  ;;
;; ======================================================================== ;;

SQRT        PROC
MVI@    R5,     R0      ;   8 Get Qpt from after CALL
INCR    PC              ;   6 (skip CLRR R0)
@@1:        CLRR    R0              ;   6 Set Qpt == 0
@@2:        PSHR    R5              ;   9 Alt entry point w/ all args in regs.
PSHR    R1              ;   9 save R1
PSHR    R3              ;   9 save R3
;----
;  41 (worst case: SQRT)
;  33 (if SQRT.1)
;  27 (if SQRT.2)

CLRR    R2              ;   6 R2 == Result word
MVII    #\$4000, R3      ;   8 R3 == 1/2 of square of test bit
MOVR    R3,     R5      ;   6 R5 == bit * (2*guess + bit)

INCR    R0              ;   6
SARC    R0,     1       ;   6 Check to see if Qpt is odd
;           BC      @@even_q        ; 7/9
SLR     R1,     1       ;   6 Note: We lose LSB if odd Q
@@even_q:

B       @@first         ;   9
;----
;  60 (worst case, q is ODD)
;  54 (q is EVEN)
;====
;  83 (worst case: SQRT, q is ODD)

@@loop:     SLLC    R1,     1       ;   6 Shift the value left by 1
BC      @@b1            ; 7/9 MSB was 1, force guessed bit to 1.

@@first:    CMPR    R5,     R1      ;   6 Is (guess+bit)**2 <= val?
BNC     @@b0            ; 7/9 C==0 means the bit should be 0

@@b1:       RLC     R2,     1       ;   6 Yes:  Set bit in result and
SUBR    R5,     R1      ;   6       subtract guess from value
ADDR    R3,     R5      ;   6 \
SLR     R3,     1       ;   6  |-- Calculate next guess value
ADDR    R3,     R5      ;   6 /
DECR    R0              ;   6
BNEQ    @@loop          ; 9/7 Guess next bit

PULR    R3              ;  11 Restore R3
PULR    R1              ;  11 Restore R1
PULR    PC              ;  11 Return
;----
;  71*k + 31 worst case

@@b0:       SLL     R2,     1       ;   6 No:  Clear bit in guessed result
SLR     R3,     1       ;   6 \__ Calculate next guess
SUBR    R3,     R5      ;   6 /
DECR    R0              ;   6
BNEQ    @@loop          ; 9/7 Guess next bit
;----
;  61*k - 2 best case

@@done:     PULR    R3              ;  11 Restore R3
PULR    R1              ;  11 Restore R1
PULR    PC              ;  11 Return
ENDP

;; ======================================================================== ;;
;;  End of file:  sqrt.asm                                                  ;;
;; ======================================================================== ;;
```

Edited by intvnut
##### Share on other sites

BTW, for whatever reason, the AA forum didn't like my last two superscript tags. I just left them as-is after fighting with them a bit. I entered everything properly in the raw BBcode.

I did have to use the raw editor, because the rich text one always seems to scramble my code posts.

Edited by intvnut
##### Share on other sites

Urgl... I meant to post this pic in the article for fun. Ah well. :-)

##### Share on other sites

It´s from the "Donald in mathmagic land" cartoon, isn´t it? :-)

##### Share on other sites

It´s from the "Donald in mathmagic land" cartoon, isn´t it? :-)

Yep!

##### Share on other sites

Very good article, Joe. Even I was able to follow most of it. Most.

Thanks!

-dZ.

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.