Jump to content

Recommended Posts

I was looking at integer square roots, and found (on wikipedia no less) this algorithm. It appears to run in constant time and only uses fast operations (shifts, adds and subtracts). This also only uses 16-bit intermediate values, so no libgcc required.

 

I haven't analyzed this for performance or correctness, but it looks interesting.

 

Link:

http://en.wikipedia....em_.28base_2.29

 

Yeah, I tested a bunch of algorithms and I ended up using the one you mentioned above (although I did not find it on wikipedia). I found a site comparing 10 or so implementations in C, tested the ones that seemed to fit my needs and this one is pretty accurate in the range I need it to be (for square roots with results between 2 and roughly 40) and works predictably fast. I also stepped away from the fixed point implementation in favor of adapting the algorithm to work better with integers only, that turned out to be much faster as most of the calculations can be done with upscaled ints and I only need to downscale ever so often.

  • 1 month later...
  • 4 weeks later...

Found a new one! :)

 

I don't know if I've mentioned, but I am frequently impressed by the assembly code this compiler puts out. :) It's very pleasant to work through.

 

Anyway, my function looks like this. byte2hex is an unsigned int array with 256 entries - each entry is the two ASCII characters needed to represent that hex character. (so for instance entry 15 contains 0x3046, which as two ASCII characters is 0 and F.)

 

void fast_hex_out(int x) {
   unsigned char z;

   z=(x>>;
   unsigned int dat = byte2hex[z];
   VDPWD = dat>>8;
   VDPWD = dat&0xff;

   z=(x&0xff);
   dat = byte2hex[z];
   VDPWD = dat>>8;
   VDPWD = dat&0xff;
}

 

The assembly (which was inlined) looks like this (C and comments inserted by me):

 

z=(x>>;

  A22E  C142  mov  R2,R5                 * R2 contains 'x', copy to R5 ('z')
  A230  0985  srl  R5,8                  * shift z to LSB

unsigned int dat = byte2hex[z];   

  A232  C0C5  mov  R5,R3                 * copy z to work R3
  A234  A0C5  a    R5,R3                 * multiply by 2 to get int array index
  A236  C0E3  mov  @>a2c8(R3),R3         * get value from array byte2hex into R3 ('dat')
        A2C8

VDPWD = dat>>8;         

  A23A  C103  mov  R3,R4                 * make a copy of 'dat' (probably unnecessary)
  A23C  D804  movb R4,@>8c00             * write high byte to VDP (nicely skipping the shift!)
        8C00

VDPWD = dat&0xff;         

  A240  06C3  swpb R3                    * get low byte of 'dat'
  A242  D803  movb R3,@>8c00             * write it to VDP (also nicely done!)
        8C00

* the above was all very good! Small problem is below. The code makes a new copy of 'x'
* but never masks off the lower bits. The end result is that the entire 16-bit word is doubled
* and used as an index instead of just (x&0xff), and we get the wrong offset once the
* word is greater than 0x00FF.

z=(x&0xff);

   * never happens?

dat = byte2hex[z];

  A246  C0C2  mov  R2,R3                   * get a copy of 'x' (which still has all 16 bits) into work R3
  A248  A0C2  a    R2,R3                 * multiply by 2 to get int array index
  A24A  C0E3  mov  @>a2c8(R3),R3          * get value from array byte2hex into R3 ('dat)     
        A2C8

VDPWD = dat>>8;         
  A24E  C103  mov  R3,R4                 * the rest is the same as the block above
  A250  D804  movb R4,@>8c00             
        8C00

VDPWD = dat&0xff;
  A254  06C3  swpb R3                    
  A256  D803  movb R3,@>8c00    

 

Addendum:

 

If z is declared as unsigned int instead of unsigned char, then it works fine. The code is very similar, but an ANDI is added to handle the masking.

 

 

Edited by Tursi

GCC actually checks the size of a masked value, and that's why you don't get a warning for assigning a properly masked int to an unsigned char, but you would for assigning the full int.

 

At any rate, adding an explicit cast makes no difference. Note that the similar assignment just a few lines up (z=(x>>8)) is compiled properly. I'm not looking to find a workaround, I'm just issuing a bug report.

 

 

 

  • 1 month later...

Okay, this one is not my fault, but it is another case of mashing types back and forth. (I am really starting to see why Little Endian was considered easier on so many machines...)

 

unsigned char Round;

(... later ...)
VDPWD = '0' + Round % 10;
This generates this incorrect code (happens the same way for divide as well):

 

    movb @Round, r1    * get the variable into R1 MSB
    srl  r1, 8         * move to LSB
    mov  r1, r2        * copy to R2
    clr  r1            * zero R1 (32-bit value is now >000000xx)
    li   r3, >A        * load R3 with 10 (divisor)
    div  r3, r1        * do the division (R1 has dividend, R2 has remainder)
    mov  r2, r1        * copy R2 to R1 - however, remainder was 16 bit! We are about to treat it like 8-bit.
    ai   r1, >3000     * add '0' as an 8-bit byte value - this is wrong <---
    movb r1, @>8C00    * move the result to the video chip
The divide did the same thing. Worked around by making 'Round' 16-bit, that did the right thing. :)

On request, I made a video that goes through all the steps of setting this up from scratch (and includes my lib). It's not perfect, I make and correct a few mistakes, and work out a few things on camera. It's an hour long and clearly that's far too long for entertainment purposes, not to mention how tired and cranky /I/ get (as it took me over 3 hours to do). But on the chance that it's helpful to anyone, I posted it here:

 

  • Like 7

On request, I made a video that goes through all the steps of setting this up from scratch (and includes my lib). It's not perfect, I make and correct a few mistakes, and work out a few things on camera. It's an hour long and clearly that's far too long for entertainment purposes, not to mention how tired and cranky /I/ get (as it took me over 3 hours to do). But on the chance that it's helpful to anyone, I posted it here:

 

 

*looking at the youtube thumbnail...* Look mom! My post is in a Youtube video! ;)

 

Great job Tursi, anything we can do to get more people using and exercising the compiler is welcomed.

Edited by TheMole

No docs exist for the lib. I'll probably doxygenify it at some point, which is better than nothing only in that instead of reading the comments in the files, you can read all the comments in one file, but it's better than nothing. ;) At any rate, whatever I do will end up on Github too.

Does 16 division work reliably with the latest release (meaning dividing a 16-bit signed integer by a 16-bit signed integer)? I'm a patch or two behind, and I get several errors when compiling and I am suspecting errors (miscalculations) at runtime as well... Trying to figure out if it's worth upgrading.

Tursi: your "as" problem stems from the fact that you use two different prefixes in your configure steps for gcc and binutils. They really are meant to be installed using the same prefix, which kinda makes sense if you think about a unix file system approach where all libraries are in /lib, all binaries in /bin, etc... So I just used --prefix="/Users/danny/tms9900", and then tms9900-as is recognized just fine, no need to copy shit around to get up and running.

 

I noticed this while compiling gcc for my mac, using your instructions as a guideline. For those that are interested, I can provide the toolchain compiled for OSX (64 bit intel only) in a tarball, but it's a tad too big to attach to this post. Just drop me a pm if you're interested, it also includes mac binaries for elf2cart, elf2ea5 and Lucien's ea5split.

Edited by TheMole

Ah.. so far as the 'as' thing goes, I was just following the instructions earlier in this thread without spending any time investigating why that was. That's really good to know, and I will fix my own toolchain that was. :) I'll put a comment on the video description as well when I get to it.

  • 2 weeks later...

Not a big, but while looking through the compiled code I noticed a pattern that may be a nice easy optimization. In certain cases of less than you get two comparisons:

 

62D0 1303 jeq >62d8

62D2 1102 jlt >62d8

 

It stands to reason in those cases that the less than case is more likely than the equals case - might it make sense to test that one first?

Not a big, but while looking through the compiled code I noticed a pattern that may be a nice easy optimization. In certain cases of less than you get two comparisons:

 

62D0 1303 jeq >62d8

62D2 1102 jlt >62d8

 

It stands to reason in those cases that the less than case is more likely than the equals case - might it make sense to test that one first?

Wouldn't this just be replaced with:

jle >62d8

 

jle is listed as an instruction in my book. A simple peephole optimization should take care of that.

Don't jump instructions all just test the status bits and it's up to the previous instructions to deal with signed/unsigned values?

At least that's what this book shows.

*edit*

I'm trying to figure out this table.

 

jle tests L> and jlt tests the A> bit

 

I can't find where L> and A> are explained in this book but L> must be Logical greater than and A> must be Arithmetic greater than?

Edited by JamesD
  • 1 month later...

Hi Everyone,

 

Following the Tursi video and the instructions on harmlesslion.com I was able to get a XP SP3 VM built with GCC and the latest patches.

 

Great work! Many thanks!!!

 

Successfully compiled the Hello World program - ran it directly off the PC using HDX and it worked! (Yay!) (EA option 5, then HDX1.EA.HELLO)

 

However - I am not a C programmer... I have tinkered with the Hello World and now have # marks flying around on the screen and such... however, I'd like to port joe (Wordstar mode) to the TI... Looking at that source code made me run back to Hello World. Nano - which I thought would be simple was also super scary.

 

Can anyone recommend a decent C tutorial?

 

is stdio something that works on the TI - or am I relegated to writing to VDP memory?

 

Now that I have it - and it works - what would be the next steps? (besides hire a programmer)

 

Cheers, Arthur...

I would imagine an internet search should turn up some tutorials on C (I've found one for C++). It seems like a good C tutorial should be pretty easy to find - as far as my somewhat limited experience I can tell, the worst language for trying to learn on your own is javascript.

Hi Everyone,

 

Following the Tursi video and the instructions on harmlesslion.com I was able to get a XP SP3 VM built with GCC and the latest patches.

 

Great work! Many thanks!!!

 

Successfully compiled the Hello World program - ran it directly off the PC using HDX and it worked! (Yay!) (EA option 5, then HDX1.EA.HELLO)

 

However - I am not a C programmer... I have tinkered with the Hello World and now have # marks flying around on the screen and such... however, I'd like to port joe (Wordstar mode) to the TI... Looking at that source code made me run back to Hello World. Nano - which I thought would be simple was also super scary.

 

Can anyone recommend a decent C tutorial?

 

is stdio something that works on the TI - or am I relegated to writing to VDP memory?

 

Now that I have it - and it works - what would be the next steps? (besides hire a programmer)

 

Cheers, Arthur...

 

Ah, text editors are not easy targets to start with, especially not relatively full-featured ones like joe or nano. Just looking at the hardware specs of the TI, the way these modern programs approach their in-memory buffers will be challenging to port to something like the TI. stdio and cohorts are not available on the TI, so you would have to write those functions yourself first, which is not necessarily a simple task as C's string handling is not straightforward for non-C enthusiasts :).

 

I think a more fruitful approach might be to just start from scratch. I find that the best way to learn a new programming language is to start out with a small but 'nifty' little project and learn as you go. Otherwise it is difficult to keep interest if you're only following tutorials that keep to the dry subject matter.

  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...