Sorting: looking for a better algorithm

+Vorticon · April 16, 2023

Hi. I'm tinkering with a project that I'm not sure I will fully pursue yet, but nonetheless it requires sorting an integer array in ascending order (it can sort a 2 dimensional array for either column 1 or 2). I implemented a sort function as shown below, but performance for even 30 elements is dismal. Any suggestions for something better?

5000 REM ARRAY SORTING ROUTINE
5020 EFLAG=0
5030 FOR I=1 TO R
5035 IF I=R THEN 7160
5040 IF COL=2 THEN 7110
5060 V1=DLIST(I,1)::V2=DLIST(I+1,1)
5070 IF V2<V1 THEN DLIST(I,1)=V2::DLIST(I+1,1)=V1::EFLAG=1
7080 GOTO 7160
7110 V1=DLIST(I,2)::V2=DLIST(I+1,2)
7120 IF V2<V1 THEN DLIST(I,2)=V2::DLIST(I+1,2)=V1::EFLAG=1
7160 NEXT I
7170 IF EFLAG=1 THEN 5020
7180 RETURN

apersson850 · April 16, 2023

Some versions to pick from.

Sorting algorithms

We did discuss sorting in the p-system context a while ago.

Edited April 16, 2023 by apersson850

+Vorticon · April 16, 2023

I was not happy that I was going through the entire array each time there was an element position swap, so I figured out a better way which goes through it only once, backtracking just enough to make sure the immediate previous element(s) after a swap was(were) still smaller. Much much faster now. Is this still considered a bubble sort? It does not seem to quite match the bubble sort in the article mentioned in the previous message by @apersson850

5000 REM ARRAY SORTING ROUTINE
5010 FOR I=1 TO R
5020 IF I=R THEN RETURN
5030 POINT=I
5035 IF COL=2 THEN 5110
5080 V1=DLIST(POINT,1)::V2=DLIST(POINT+1,1)
5090 IF V1>V2 THEN DLIST(POINT,1)=V2::DLIST(POINT+1,1)=V1 ELSE 5140
5100 POINT=POINT-1::IF POINT>=1 THEN 5080 ELSE 5140
5110 V1=DLIST(POINT,2)::V2=DLIST(POINT+1,2)
5120 IF V1>V2 THEN DLIST(POINT,2)=V2::DLIST(POINT+1,2)=V1 ELSE 5140
5130 POINT=POINT-1::IF POINT>=1 THEN 5110
5140 NEXT I

SteveB · April 16, 2023

I used Bubble-Sort in the past ... quite good for smaller arrays. I recently read about Gnome Sort but haven't tried it yet.

I don't really get what you are doing with DLIST(x,1) and DLIST(x,2) ... you swap them independently?

You are talking about an integer array ... in which range are the integers to be sorted? 0 to 255 or much bigger?

+Vorticon · April 16, 2023

26 minutes ago, SteveB said:

I used Bubble-Sort in the past ... quite good for smaller arrays. I recently read about Gnome Sort but haven't tried it yet.

I don't really get what you are doing with DLIST(x,1) and DLIST(x,2) ... you swap them independently?

You are talking about an integer array ... in which range are the integers to be sorted? 0 to 255 or much bigger?

Yes the array dimensions can be sorted independently depending on which one is needed for the calculations based on the value of the COL variable (1 or 2).

And the integers are not limited in range (obviously within the limits of computer representation).

Background: it's a statistical program that I might grow into a full suite. Nobody has any need for such on the TI, so it'll be more of an educational exercise if I decide to go through with it :lol:

SteveB · April 16, 2023

Ok, I was thinking of some kind of bucket-sort for a limited range of integers from 0 to m. Just make a second array size [0..m] or use CALL LOAD with LowMen for a fake Byte-Array and just count the occurences of all n values, then go over this list from 0 to m, you have a linear O(n+m) complexity ... unbeatable, but not applicable to your task.

+Vorticon · April 16, 2023

2 minutes ago, SteveB said:

Ok, I was thinking of some kind of bucket-sort for a limited range of integers from 0 to m. Just make a second array size [0..m] or use CALL LOAD with LowMen for a fake Byte-Array and just count the occurences of all n values, then go over this list from 0 to m, you have a linear O(n+m) complexity ... unbeatable, but not applicable to your task.

Not too clear on the concept. Can you elaborate?

SteveB · April 16, 2023

26 minutes ago, Vorticon said:

Not too clear on the concept. Can you elaborate?

given the array 3,6,4,2,4 to be sorted and you know, there are just values from 0 to 10 ... you make an initial list

(0,0,0,0,0,0,0,0,0,0)

and then count the occurences .. one time the 3, two times the 4, ... using the number as the index to your sort-array

(0,0,1,1,2,0,1,0,0,0)

now you unroll again ... there is a no 0 and no 1, but one 2 and one 3 and two 4 and one 6 ...

2,3,4,4,6

Sorted!

Edited April 16, 2023 by SteveB

+Vorticon · April 16, 2023

Brilliant! Obviously a special case sort routine but super useful when applicable.

+Vorticon · April 16, 2023

1 hour ago, SteveB said:

I used Bubble-Sort in the past ... quite good for smaller arrays. I recently read about Gnome Sort but haven't tried it yet.

Turns out my second routine is exactly a Gnome Sort! Apparently also referred to as "stupid sort". Figures... :lol:

+TheBF · April 16, 2023

I post this in 2017 in a different thread. This example is 5.3 times faster than bubble sort.

EDIT: I reported the time of Bubble sort wrong 6 years ago. It is 155 seconds. COMBSORT is 29 seconds.

A long time ago I came across an article in BYTE magazine (April 1991) about some medical doctors (really) who figured out a great improvement to bubble sort by trial and error.

The COMB sort is similar to the Shell Metzner in that is compares items in the array that are separated by a 'GAP'.

The difference is in how you calculate the GAP.

Comb sort uses a magic value of 1.3. Each time through the array, the GAP is divided by 1.3.

This gives an amazing improvement over BUBBLE sort with only a couple of extra lines of code

And the bigger the array the better the improvement, but even on 32 chars it is ~~10x~~ 5.3X faster. 🙂

I have provided two simple DEMO programs that sort characters on the screen for anyone who might need a sort routine.

Bubble sort demo

100 REM Bubble Sort DEMO
120 REM a simple sort
130 CALL CLEAR
135 CALL SCREEN(12)
140 PRINT "*BUBBLE SORT DEMO*"
150 LET SIZE=31

210 REM write chars to screen
220 C=64+SIZE
230 FOR I=1 TO SIZE
240   CALL HCHAR(10,I,C,1)
250   C=C-1
260 NEXT I

400 REM sort chars on screen
420 LET PASSES=0

430 REM start of loop
440 SORTED=1

450 FOR I=1 TO SIZE-1
452   CALL GCHAR(10,I,X)
454   CALL GCHAR(10,I+1,Y)
460   IF X<Y THEN 530
480   CALL HCHAR(10,I,Y)
490   CALL HCHAR(10,I+1,X)
520   SORTED=0
530 NEXT I
540 PASSES=PASSES+1
550 REM test if sort is finished
560 IF SORTED=1 THEN 600
570 GOTO 440

600 PRINT "Sort completed in";PASSES;"times through the array"
640 END
650 REM 155 sec.

100 REM Comb Sort DEMO
120 REM a faster simple sort
130 CALL CLEAR
135 CALL SCREEN(12)
140 PRINT "*COMB SORT DEMO*"
150 LET SIZE=31

210 REM write chars to screen
220 C=64+SIZE
230 FOR I=1 TO SIZE
240   CALL HCHAR(10,I,C,1)
250   C=C-1
260 NEXT I

400 REM sort chars on screen
420 PASSES=0
425 GAP=SIZE
430 REM start of loop
440 SORTED=1
442    GAP=INT(GAP/1.3)
450    FOR I=1 TO SIZE-GAP
452      CALL GCHAR(10,I,X)
454      CALL GCHAR(10,I+GAP,Y)
460      IF X<Y THEN 530
480      CALL HCHAR(10,I,Y)
490      CALL HCHAR(10,I+GAP,X)
520      SORTED=0
530    NEXT I
540 PASSES=PASSES+1
550 REM test if sort is finished
560 IF (GAP=1) THEN 600
570 GOTO 440

600 PRINT "Sort completed in";PASSES;"times through the array"
640 END
650 REM 29 sec.

+TheBF · April 16, 2023

This video shows different sorts graphically on TI-99. (in Forth but the relative times between methods is still valid)

+Vorticon · April 17, 2023

Nice demo!

Looks like the insertion sort is essentially the gnome sort, isn't it?

+TheBF · April 17, 2023

7 minutes ago, Vorticon said:

Nice demo!

Looks like the insertion sort is essentially the gnome sort, isn't it?

Thanks.

They are similar but Gnome sort scans back and Forth. Insertion sort just does one direction from what I can see.

Looks like performance is equivalent.

Gnome sort - Wikipedia

Insertion sort - Wikipedia

I think Combsort is a bit faster than both on average at least that's what my demo shows.

Combort is also simple. Bubble sort + a bit more.

+TheBF · April 17, 2023

Here is the comb sort on an array. It sorts 32 floats in 5 seconds.

A bit faster if you remove the PASSES counter.

Spoiler

120 REM Combsort
130 CALL CLEAR
135 CALL SCREEN(12)
140 PRINT "*BUBBLE SORT DEMO*"
160 LET SIZE=32
170 DIM Q(32)


210 REM FILL array
220 PRINT "FILL ARRAY"
230 FOR I=1 TO SIZE
240   Q(I)= SIZE-I
260 NEXT I

400 PRINT "Sorting..."
420 LET PASSES=0
424 GAP=SIZE
430 REM start of loop
440 SORTED=1
442 GAP=INT(GAP/1.3)
450 FOR I=1 TO SIZE-GAP
460   IF Q(I)<Q(I+GAP) THEN 530
470 rem swap the array items
480   TEMP=Q(I)
490   Q(I)=Q(I+GAP)
500   Q(I+GAP)=TEMP
520   SORTED=0
530 NEXT I
540 PASSES=PASSES+1
550 REM test if sort is finished
570 IF GAP=1 THEN 600
580 GOTO 440

600 PRINT "Sort completed in";PASSES;"times through the array"
650 FOR I=1 TO SIZE
660 PRINT Q(I);
670 NEXT I
1000 END

apersson850 · April 17, 2023

Quicksort is a recursive algorithm, where the list is split around an arbitrary value in the list, where all values lower goes into one list and all higher goes into another. Then these new lists are split again around an arbitrary value in the new lists and you continue until the lists are down to one character. Then the whole array is sorted.

For efficiency, the sorting method normally reverts to insertion sort when the lenght of the lists get small, since that's more efficient.

Selecting the pivoting value is of course somewhat important, since the ideal situation is to get the two lists exactly half of the list you started with. But to make sure you get that you need to sort the list first...

A compromize is to take the middle value of three samples from the list, first, middle and last. Lists containing a lot of small values and a few large will usually reduce efficiency, just like a list which is already sorted.

SteveB · April 17, 2023

2 hours ago, apersson850 said:

A compromize is to take the middle value of three samples from the list, first, middle and last.

Middle-of-Three is usually a good compromise ... still some Quicksort implementation totally freak out when handed an already sorted list ... The language for the implementation should support recursions, I wouldn't do it in TI BASIC. When implementing it in assembly you need to implement the recursion manually. A whole new science is sorting datasets larger than your RAM, which has been a big issue in the early mainframe world... tape-to-tape sorting.

+mizapf · April 17, 2023

To describe Quicksort in a recursive fashion, one could say: "To quicksort a list, pick a value from it, put all list elements lower or equal to that value into one list, and all elements higher into another list, and quicksort those lists. Join both sorted lists with the picked element between them."

+Vorticon · April 17, 2023

What does O(n) mean? I'm seeing it all over the place as a measure of efficiency of sort algorithms.

+TheBF · April 17, 2023

8 minutes ago, Vorticon said:

What does O(n) mean? I'm seeing it all over the place as a measure of efficiency of sort algorithms.

I believe O is "order" as in the the time to complete the function is on the order of 'n'.

So a sort that completes in O(n) is a linear time, ie: gets longer with n.

O(n^2) like Bubble Sort gets longer at n x n time.

+TheBF · April 17, 2023

The stuff you can find on the internet.

This look like a nice explanation.

https://adrianmejia.com/most-popular-algorithms-time-complexity-every-programmer-should-know-free-online-tutorial-course/

+mizapf · April 17, 2023

f=O(g) means that the function f does not grow "essentially" faster than g. That is, g gives you an upper bound for the behavior of f for rising arguments. "Essentially" means it suffices to find some value C so that f/g ≤ C for x→ ∞.

Example: f(x)=2x²+x

f=O(x²) because (2x²+x) / x² = 2 + 1/x, which is certainly < 3 for x>1. So you can say that f grows at most 3 times as fast as x², i.e. f does not grow essentially faster than x².

The good thing about that is that this upper bound g can be expressed in much simpler terms. If you want to say something about the growth of a function, you can leave all those nasty terms away.

There is a whole zoo of similar symbols like "O()", introduced by Paul Bachmann and Edmund Landau, hence the name "Landau symbols" or "Bachmann-Landau notation".

+Vorticon · April 17, 2023

So O(n) is better from a performance standpoint than O(n²) because the first is linear and the second quadratic in growth?

SteveB · April 17, 2023

20 minutes ago, Vorticon said:

So O(n) is better from a performance standpoint than O(n²) because the first is linear and the second quadratic in growth?

Exactly ... and Quicksort is O(n*log(n)) and therefore better than Bubblesort with O(n^2) .. my above Bucketsort variant has only O(n+m) ... linear, but with a very limited use-case.

+FarmerPotato · April 17, 2023

Two books that I learned from are Algorithms, by Sedgwick, and Numerical Recipes in C.

Sedgwick comes in Pascal, C and other flavors.

Numerical Recipes covers just enough sorting, and is a reference on many other mathematical computing topics.

Sorting: looking for a better algorithm

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members