Cache table search

jindroush · August 2, 2023

I have code which 'randomly' touches bytes in large array which won't fit in memory. So it's stored on disk.
I have function read_byte, which splits the pointer to sector number and offset into sector.
Then I have function for reading the cached sector - there are two page arrays - combined they give me sector number, the offset in the array is page where in memory is sector cached. Third array stores sector usage counters for dropping least used sectors.

Maximum number of sectors is 0x800. I can have about 0x40 sectors cached in memory. Although these numbers are devised in runtime, I think I can make them 'set in stone' if needed.

Why I'm saying all of this - the code spends most of the time in sec_find_loop loop and on almost every run it matches (ie. jumps to sec_find_match) and I can't think of anything on instruction level to make it faster. So the structure change is needed, but I can't think of anything small and fast. BTW, I have this verified in Altirra's profiler, without it I wouldn't have the idea...

Pseudo-code looks like this (and most of the time, too many calls of first line take most of cpu time).

cached_sect_index = CACHE[ pointer >> 8 ]
if( ! cached_sect_index )
{
    cached_sect_index = find_free_cache_slot( CACHE )
    read_sector_to_cache( cached_sectors, cached_sect_index, pointer >> 8 )
    CACHE[pointer>>8] = cached_sect_index
}

return cached_sectors[ CACHE[pointer>>8] ][ pointer & 0xFF ]

This is current implementation in ASM (of the first line of pseudocode)

get_sector_Y_A_page_in_memory:
		STY	sect_hi
		STA	sect_lo

		LDX	#0
sec_find_loop:
		CMP	sectnum_lo,X
		BNE	sec_find_next
		TYA
		CMP	sectnum_hi,X
		BNE	sfln
		STX	cached_sect_index
		BEQ	sec_find_match
sfln:
		LDA	sect_lo

sec_find_next:
		INX
		CPX	cache_size
		BCC	sec_find_loop

phaeron · August 3, 2023

This depends a lot on your access pattern.

For mostly sequential access patterns, the simplest thing to do is a direct mapped cache -- mask the low byte of the sector to get the cache index, and use only that cache entry for the sector. This gives fast access/update at the cost of only being able to cache one sequence overlapping a series of cache entries, which can give you a poor hit rate due to aliasing. Going to 2-way or 4-way set associativity by combining 2 or 4 cache entries at each index can improve this, at the cost of more time and space to manage a replacement algorithm for which set way to replace.

Another way you can attack this is with a hash table. This will allow for a fully associative cache for better hit rates and less aliasing than an N-way set associative cache, but will require more code for insert/remove and extra memory for the hash head/next entries.

Finally, if your lookups are often redundant -- sometimes a single last-found cache at the front can be very effective.

jindroush · August 3, 2023

Regarding the last note - currently there is something like 'the last used sector is still valid'. Invalidation is done if read goes out of sector, or if there is a seek operation. Because the invalidation is not done 'centrally' it'd be quite an effort to try to fix it (if needed).

I tried it right now, and manually selected part of profiling data:
read_byte is called 941x
get_sector is called 102x - so the 'last-found sector' works, at least to some extent ie. 8 of 9 reads are fast, 1 is dead-slow.
in get_sector there are 0! actual reads, just lookup of the sectors, the 102 calls take 108136 clocks / 28365 instructions, ie. ~270 instructions for the sector-to-page associative array lookup.

When I check it, main loop consumes cca 17% of instructions + 12% is taken by the read operation, but the get_sector look-ups take 33% of all clocks measured.
Checking this on different subsets of profile data, the ratios are mostly the same. (5277 byte reads taking 15%, 524 get_sector calls taking 33% with 3 actual disk reads, main loop 16%)

This speaking 'aloud' is very useful tool (as most programmers know), so I currently found ;##TRACE and I'm going to dive into the invalidation business, which is definitely problematic as seen from this short excerpt.
J.

Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0158
- sector found: 0158 at 24
Sector to find: 0157
- sector found: 0157 at 22
Sector to find: 0157
- sector found: 0157 at 22
Sector to find: 02CE
- sector found: 02CE at 3A
Sector to find: 02CE
- sector found: 02CE at 3A
Sector to find: 02CE
- sector found: 02CE at 3A
Sector to find: 02CE
- sector found: 02CE at 3A
Sector to find: 02CE
- sector found: 02CE at 3A
Sector to find: 02CE
- sector found: 02CE at 3A
Sector to find: 0157
- sector found: 0157 at 22
Sector to find: 0157
- sector found: 0157 at 22
Sector to find: 0157
- sector found: 0157 at 22
Sector to find: 0158
- sector found: 0158 at 24
Sector to find: 0158
- sector found: 0158 at 24
Sector to find: 0158
- sector found: 0158 at 24
Sector to find: 0158
- sector found: 0158 at 24
Sector to find: 0158
- sector found: 0158 at 24
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0153
- sector found: 0153 at 16
Sector to find: 0154
- sector not found: 0154
Sector to find: 0154
- sector found: 0154 at 15
Sector to find: 0123
- sector found: 0123 at 37

Zolaerla · August 3, 2023

It sounds like it's a read many times per write situation, where you want the index lookups to be as fast as possible and list updates may be slower. I would maintain a list of the active sectors in order, so you could just do a binary search instead of a linear one. This isn't too difficult to implement in 6502 since it only needs a single bit shift right as the most complex arithmetic involved. With up to 0x40 (64) entries, each search would only need to go through a maximum of 6 loops, instead of 64. https://en.wikipedia.org/wiki/Binary_search_algorithm

The downside is that inserts/deletes in the list would be slower as up to the whole list would need to be moved when done. You would have to delete the drop off target (in my asset manager, I have a "Least Recently Used" array as well to handle that) if need be, then find where to stick the new entry (which could be saved from the search if no result was found), moving the array as necessary to compensate.

Sign In

Cache table search

Recommended Posts

jindroush

Link to comment

Share on other sites

phaeron

Link to comment

Share on other sites

jindroush

Link to comment

Share on other sites

Zolaerla

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More