flashjazzcat Posted July 18, 2014 Share Posted July 18, 2014 (edited) I currently find myself writing an awful lot of linked list code, and one application is a message queue. The queue can hold up to sixty-four messages, and is based on an number of arrays (LSB/MSB pointer to buffer for each message, etc). A property of the queue is that although it is ostensibly FIFO, it is doubly linked and messages can be removed from anywhere in the list (in chronological order), since all processes use the same queue for messaging. Anyway: one thing I've been trying to optimise is the search for unused array elements. Originally, each message slot had a corresponding used/free flag. When a new message is added to the queue (always at the end), the array of flags first had to be scanned until a free element was found: ldx #MaxQueueEntries @ dex ; x already contains max queue entries lda QueueNodeAlloc,x ; quick scan for free entry bne @- inc QueueNodeAlloc,x ; say entry is used Note we already know at least one element is free before entering the loop. This approach is fine when there are few messages in the queue, or when unused/released elements are always near the top of the array. But when the queue becomes full, this linear search is just terrible: up to c. 600 cycles just to find a free slot. So I considered using bitmap allocation, and threw a 256-element look up table at the problem: ldx #8 ; find unallocated queue node (from 64 entries) @ dex ; at least one bit in one of the 8 bytes is guaranteed to be set ldy QueueNodeAllocTable,x ; get first byte in bitfield with bits set (free) beq @- lda BitLUT,y ; look up lowest set bit in byte tay ; save bit number (0-7) clc adc XLUT,x ; add to byte * 8 sta KernelPtr2 ; save node number lda QueueNodeAllocTable,x ; now mark in use and QueueNodeLUT,y ; by clearing bit in allocation table sta QueueNodeAllocTable,x ... QueueNodeLUT .byte 255-1,255-2,255-4,255-8,255-16,255-32,255-64,255-128 QueueNodeAllocTable ; 1 bit per node: 0=used, 1=free .byte $ff,$ff,$ff,$ff,$ff,$ff,$ff,$ff XLUT .byte 0,8,16,24,32,40,48,56 BitLUT .byte 0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 0-15 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 16-31 .byte 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 32-47 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 48-63 .byte 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 64-79 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 80-95 .byte 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 96-111 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 112-127 .byte 7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 128-143 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 144-159 .byte 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 160-175 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 176-191 .byte 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 192-207 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 208-223 .byte 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 224-239 .byte 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0 ; 240-255 Bits set in the bitmap represent free slots. "BitLUT" converts every possible bit combination (where at least one bit is clear) into the lowest bit number of any set bits in a byte. So, if %01101110 (110) is extracted from the bitmap, this will translate to bit position 1 via the LUT. Added to the byte position * 8 (also from a LUT), the actual index of the first free array element is found. The bit is immediately cleared in the bitmap to indicate that the slot is allocated. My question is basically twofold: which is best, and is there anything faster? A linear search (at only around ten cycles per iteration) is unbeatable when the list is empty, but sub-optimal when the list is full. The bitmap method performs worse until there are around eight messages at the bottom of the queue: at that point, it leaves the linear search in the dust. I'm inclined to think the linear search is the best compromise of brevity and performance in all but exceptional cases (i.e. when the message queue fills up for some reason and messages aren't being pulled out in a timely manner). The look up table for bitmap allocation might be ideally suited to situations where the size of the array is much greater, and unallocated elements are not regularly found near the beginning of the search (such as a memory allocator or disk VTOC). Edited July 18, 2014 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 18, 2014 Share Posted July 18, 2014 (edited) How about keeping the free and allocated slots in two separate doubly-linked lists? Then alloc and free are O(1) operations. ; Returns newly allocated message slot number in X register allocMsg ; Take the head of the free list and make it the new head of the allocated list ldx free mvy forward,x free mva #$FF backward,y ldy allocated stx allocated tya sta forward,x txa sta backward,y rts ; Expects X register to contain message slot number to free freeMsg ; Unlink node X and make it the new head of the free list ; ...left as an exercise for the reader rts ; Space for 65 doubly linked list nodes ; Keep at least one node allocated at all times to eliminate need ; for termination check in allocMsg. Termination is indicated with $FF. free dta 0 allocated dta 64 forward :63 dta #+1 :2 dta $FF backward dta $FF :63 dta # dta $FF EDIT: Fixed second to last line to read ":63 dta #" instead of ":63 dta #-1". Edited July 18, 2014 by Xuel 1 Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 18, 2014 Share Posted July 18, 2014 (edited) [deleted] Edited July 18, 2014 by Xuel Quote Link to comment Share on other sites More sharing options...
Island2Live Posted July 18, 2014 Share Posted July 18, 2014 (edited) Just to sum up what you've told: 1. you have a FIFO 2. any element in the FIFO - meaning from the first to the last - can be removed 3. removing an element in the FIFO frees the corresponding slot I would go with a linked list: Each element in the FIFO has two additional pointers; one to the next and one to the previous element. You need also two separate pointers: One pointing at the first occupied (PTR_OCC) (active, soon to be processed) element in the FIFO and one pointing at the first free element (PTR_FREE). Finding a free element is a snap this way, because the Free-Pointer PTR_FREE always points to a free element. Removing/deleting an element is just a matter of letting the pointers of the next and previous elements of the to-be-delete-element point to each other and let the current free element point to the now freed element. Now let PTR_FREE point to this new free element and there you go. Inserting an element into the FIFO is straightforward. There are of course side effects for the wraparound of your 64 entries and you need to decide if your FIFO is already full. But in my opinion and given the info in your post this should be the fastest way to go. [update] Ah, Xuel was faster! Kind regards, Henrik (Island2Live) Edited July 18, 2014 by Island2Live 2 Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 18, 2014 Share Posted July 18, 2014 I would go with a linked list: Each element in the FIFO has two additional pointers; one to the next and one to the previous element. I agree these can be part of the whole message structure, but if you can, make them static and arrange them in an interleaved fashion so that you can use ABS,x and ABS,y addressing with offsets instead of using actual pointers. Ah, Xuel was faster! I left work early so that I could try to be first! Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 19, 2014 Author Share Posted July 19, 2014 Good ideas - thanks both! I'll give a free linked list a try tomorrow. The message queue itself already acts as the "allocated" list, so keeping track of free elements in another list should be a snap. No problems of wrap-around in the message queue, BTW: it doesn't use a tail pointer chasing the head, since messages can be pulled out from anywhere in the queue (the only proviso being that they'll appear to the receiver in chronological order). That brings me to another optimisation: since the queue is system-wide, the GetMessage function has to walk the linked list until it finds a message whose intended recipient is either the calling process, or "any process". Each process keeps a running count of the number of pending messages which are intended for it, so at least we don't have to walk the message queue to find out there are NO messages for a given process. Of course having a separate linked list tracing the messages for each process ID would improve matters, but I think we might end up with linked-list overload for what is ostensibly a fairly simple structure. Anyway: it's great getting some fresh eyes on this stuff, and I'll give a more informed response tomorrow. Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 19, 2014 Share Posted July 19, 2014 Why are you combining all messages into a single collection? Is it to save memory? Is it motivated by the "any process" messages? You could eliminate the need to search through the messages while satisfying the chronological ordering requirement by giving each process its own message queue, i.e. a simple FIFO -- no need for a linked list. Maybe there could be a separate queue for the "any process" messages or you could duplicate them across all process queues. If the message structure is large and you want to conserve memory then you could use a hybrid approach where 1) each process has a simple message queue which just contains offsets into the shared message slot space and 2) the shared space is managed with linked lists as before. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 19, 2014 Author Share Posted July 19, 2014 (edited) Why are you combining all messages into a single collection? Is it to save memory? Is it motivated by the "any process" messages? I decided on the single message queue after discussing the matter with Prodatron (the author of SymbOS, after which my API is modelled): he said he was using a single message queue for the sake of simplicity (and reduced storage requirements), and it's clearly working well for him. I didn't yet ask him exactly what kind of data structures are used: I just went it alone and designed something which made sense to me. You could eliminate the need to search through the messages while satisfying the chronological ordering requirement by giving each process its own message queue, i.e. a simple FIFO -- no need for a linked list. Maybe there could be a separate queue for the "any process" messages or you could duplicate them across all process queues. This is a really excellent idea, and similar to something I was kicking around before I fell asleep last night. Any time a message is added which pertains to particular process, the absolute message slot number is added to a little FIFO list attached to that process. It's more storage, of course: you'd have to put some kind of reasonable hard-limit on how many messages a process could queue up. If a message is intended for all processes, just add the message slot number to every process's FIFO list, and attach a counter to the single copy of the message in the main queue, which ticks down every time a process reads the message. When the counter is zero, the message is removed from the main queue. A little bit risky if a given process dies before reading the message, but other than that it seems perfectly workable. If the message structure is large and you want to conserve memory then you could use a hybrid approach where 1) each process has a simple message queue which just contains offsets into the shared message slot space and 2) the shared space is managed with linked lists as before. That's another good approach, but then we get arrays of arrays to handle linked lists for each process descriptor. But it might be the optimal solution. It occurred to me that the free list need only be forward linked, since it's sufficient to always pull the first free node we find: ; ---------------------------------------------------------------------------- ; Initialize message queue's free node list ; ---------------------------------------------------------------------------- .local InitQueueFreeList mva #1 QueueFreeNodeHead ldx #1 Loop1 inx cpx #65 bcs Done txa sta QueueFreeNodeNext-2,x bne Loop1 Done mva #0 QueueFreeNodeNext-2,x rts .endl ; ---------------------------------------------------------------------------- ; Get a free message queue node ; and remove it from the head of the free list ; Returns node in Y ; ---------------------------------------------------------------------------- .local GetFreeMsgNode ldy QueueFreeNodeHead lda QueueFreeNodeNext-1,y sta QueueFreeNodeHead rts .endl ; ---------------------------------------------------------------------------- ; Add a message queue node to the head of the free list ; Pass node in KernelPtr2 ; ---------------------------------------------------------------------------- .local AddToQueueFree lda QueueFreeNodeHead ldx KernelPtr2 sta QueueFreeNodeNext-1,x stx QueueFreeNodeHead rts .endl So when a node is freed up, it just gets pushed back onto the head of the free queue. This looks wonderfully simple and efficient. Just need to sort out the addressee issue now. Note that indices are numbered +1 of their actual offset; this is just to allow use of 0 as a terminal pointer. Edited July 19, 2014 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 19, 2014 Author Share Posted July 19, 2014 On the bus, I realized we could also just do this: ; ---------------------------------------------------------------------------- ; Get a free message queue node ; and remove it from the free list ; Returns node in A ; ---------------------------------------------------------------------------- .local GetFreeMsgNode dec FreeNodePtr ldy FreeNodePtr lda FreeNodes,y rts .endl ; ---------------------------------------------------------------------------- ; Add a message queue node to the free list ; Pass node in A ; ---------------------------------------------------------------------------- .local AddToQueueFree ldy FreeNodePtr sta FreeNodes,y inc FreeNodePtr rts .endl FreeNodePtr .byte 64 FreeNodes ?Node equ 1 .rept MaxQueueEntries .byte ?Node ?Node equ ?Node+1 .endr This is effectively a "bucket" of free nodes, which initially contains all 64 nodes. When we want a free one, we just pop the top one off the stack. Whichever node we're finished with goes back onto the top of the stack. Bitmap allocation and look up table can be saved for the memory allocator. 1 Quote Link to comment Share on other sites More sharing options...
tschak909 Posted July 22, 2014 Share Posted July 22, 2014 So basically a max of 256 message end points? (sorry for stupid question) -Thom Quote Link to comment Share on other sites More sharing options...
dmsc Posted July 23, 2014 Share Posted July 23, 2014 Hi, On the bus, I realized we could also just do this: ; ---------------------------------------------------------------------------- ; Get a free message queue node ; and remove it from the free list ; Returns node in A ; ---------------------------------------------------------------------------- .local GetFreeMsgNode dec FreeNodePtr ldy FreeNodePtr lda FreeNodes,y rts .endl ; ---------------------------------------------------------------------------- ; Add a message queue node to the free list ; Pass node in A ; ---------------------------------------------------------------------------- .local AddToQueueFree ldy FreeNodePtr sta FreeNodes,y inc FreeNodePtr rts .endl FreeNodePtr .byte 64 FreeNodes ?Node equ 1 .rept MaxQueueEntries .byte ?Node ?Node equ ?Node+1 .endr This is effectively a "bucket" of free nodes, which initially contains all 64 nodes. When we want a free one, we just pop the top one off the stack. Whichever node we're finished with goes back onto the top of the stack. Bitmap allocation and look up table can be saved for the memory allocator. Good idea, it's simple and efficient. Also, the messages could be stored "interleaved" to use direct indexed addressing to access the contents: .local GetFreeMsgNode dec FreeNodePtr ldy FreeNodePtr ldx FreeNodes,y rts .endl .local StoreNewMessage jsr GetFreeMsgNode lda #msgByte1 sta MessageBuffer1,x lda #msgByte2 sta MessageBuffer2,x lda #msgByte3 sta MessageBuffer3,x ; ... txa jsr PushMessageToProcess .endl - dmsc. 2 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 23, 2014 Share Posted July 23, 2014 man.... high end 6502 programming coding.... Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 23, 2014 Author Share Posted July 23, 2014 (edited) Also, the messages could be stored "interleaved" to use direct indexed addressing to access the contents: .local StoreNewMessage jsr GetFreeMsgNode lda #msgByte1 sta MessageBuffer1,x lda #msgByte2 sta MessageBuffer2,x lda #msgByte3 sta MessageBuffer3,x ; ... txa jsr PushMessageToProcess .endl Yes - this is a good idea too. Always much faster, and certainly ideal for shorter buffers (if we're keeping an eye on code size too). The rectangle list uses interleave like this, since there are only six bytes in each rect (Left [word], Right [word], Top [byte], and Bottom [byte]). Wrap these up in a .LOCAL range and things become very readable too: .local RectList LeftLo .ds MaxWindowRects LeftHi .ds MaxWindowRects RightLo .ds MaxWindowRects RightHi .ds MaxWindowRects Top .ds MaxWindowRects Bottom .ds MaxWindowRects Next .ds MaxWindowRects FreeHead .ds 1 FreeNext .ds MaxWindowRects .endl ... mva Left RectList.LeftLo-1,x mva Left+1 RectList.LeftHi-1,x mva Right RectList.RightLo-1,x mva Right+1 RectList.RightLo-1,x mva Top RectList.Top-1,x mva Bottom RectList.Bottom-1,x Each window in the window list has a node pointer to the head of its rectangle list in "RectList". Using the freelist or free stack (hard to choose between them) has really speeded up node allocation there too. Good stuff. man.... high end 6502 programming coding.... Well, dunno how high-end it is, but it's fun to brainstorm these problems with nifty coders. Edited July 23, 2014 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.