LZSA format 1 depacker

42bs · October 5, 2022

Added depacker for LZSA format 1:

https://github.com/42Bastian/new_bjl/tree/main/exp/depacker

Slightly better compression compared to LZ4, but 6% slower de-compression.

42bs · October 9, 2022

Update:

- Sync'ing GPU with 68k was somewhat wrong. Reading the "sync" flag from GPU RAM seems to be no good idea. Now uses GPU->CPU Interrupt

- untp did ~~hand~~ crash spuriously. Added a counter to see if depacking works.

- added a raw byte copy for speed comparison

Edited October 9, 2022 by 42bs

42bs · October 10, 2022

Added speed optimized versions of LZSA1, LZ4 und TP.

Cyprian · October 11, 2022

On 10/9/2022 at 2:45 PM, 42bs said:

- Sync'ing GPU with 68k was somewhat wrong. Reading the "sync" flag from GPU RAM seems to be no good idea.

what was wrong with that sync flag ?

42bs · October 11, 2022

I had a strange behavior that sometimes, the 68k did continue even though the GPU was not yet finished.

Maybe, it was because I used only one flag/semaphore. I need to try, if it works better if one flag is used to kick the GPU and _another_ is used to notify the 68k when the GPU has done.

But actually, I think the GPU->CPU interrupt is the better method, as the 68k can be stopped.

Edit:

I tried to step back, but could not reproduce. But, nevertheless, using the interrupt speeds things up, as the 68k does not hog the bus.
TP-Fast w/ interrupt and `stop #$2000`: 122ms
TP-fast w/o interrupt: 154s

Edited October 11, 2022 by 42bs

42bs · October 11, 2022

I cannot emphasis more: DO NOT USE CLR.L to GPU RAM!!!

Edited October 11, 2022 by 42bs

ggn · October 11, 2022

Oh, that's what the .noclear directive does in rmac! (which actually does nothing, there's no implementation in there, just a message )

42bs · October 11, 2022

6 minutes ago, ggn said:

Oh, that's what the .noclear directive does in rmac! (which actually does nothing, there's no implementation in there, just a message )

"Warning: CLR.L opcode ignored..." 🙂

That'll be tough, if rmac would just remove all "clr.l" in the code 🙂

ggn · October 11, 2022

5 hours ago, 42bs said:

"Warning: CLR.L opcode ignored..." 🙂

That'll be tough, if rmac would just remove all "clr.l" in the code 🙂

Well I guess we could range check the clr.ls if outputting Jaguar code (whenever possible) and have a warning issued. Or, we could simply tell everyone off for using clr.l in memory as it's a load-modify-store instruction and wastes cycles

42bs · October 12, 2022

clr.l is load-modify-store? Why do you think.

Cyprian · October 12, 2022

ok. clr instruction is buggy in 68k, I wonder what would it be if you use e.g. "move.l" instead.

"CLR instruction always reads from an operand before clearing it"

http://www.easy68k.com/paulrsm/doc/trick68k.htm

22 hours ago, 42bs said:

using the interrupt speeds things up, as the 68k does not hog the bus.
TP-Fast w/ interrupt and `stop #$2000`: 122ms
TP-fast w/o interrupt: 154s

nice hint. thanks

42bs · October 12, 2022

1 hour ago, Cyprian said:

ok. clr instruction is buggy in 68k, I wonder what would it be if you use e.g. "move.l" instead.

"CLR instruction always reads from an operand before clearing it"

http://www.easy68k.com/paulrsm/doc/trick68k.htm

Ouch! Thanks! This explains why "clr.l d0" takes 6 cycles.

Edited October 12, 2022 by 42bs

Chilly Willy · October 16, 2022

Yeah, lots of 68K based computers/consoles (like the Amiga and Genesis) warned against using clr on hardware registers. It's one of those things you learned while programming in assembly on those systems. Jaguar is just another with limitations on instructions like clr.

42bs · October 17, 2022

8 hours ago, Chilly Willy said:

Yeah, lots of 68K based computers/consoles (like the Amiga and Genesis) warned against using clr on hardware registers. It's one of those things you learned while programming in assembly on those systems. Jaguar is just another with limitations on instructions like clr.

Yepp, HW registers are often a source of trouble. But I did not know about "clr.l" is reading (for 68k). You never stop learning new things.

DEATH · November 19, 2022

"CRL" is not buggy on the 68000. It's an instruction intended for multiprocessor architecture or subroutine/conditional speed test.
This is the "reverse" instruction of "TAS"

TAS test an set, CLR test and clear

This is the reason why it cannot be used on address registers, it's reserved only for testing data (again, on a multiprocessor, mutithread/subroutine or wathever architecture)

CRL (and TAS) should therefore always be followed by a conditional jump

DEATH · November 19, 2022

should be tested because it was a long time ago....

SCPCD · November 20, 2022

I don't think that the CLR instruction is the "reverse" of TAS as the CLR sets flags according to destination value (which is always 0).

TAS sets flags depending of what was here and what it now is.

I would imagine that to reduce transistor count, CLR instruction probably share same or partially state machine from another instruction like the NEG instruction for exemple (as they both have same size field, effective address mode and timing).

Edited November 20, 2022 by SCPCD

42bs · November 20, 2022

Also TAS is only defined for byte access.

The m680xx manual lists only TAS (for m68000) and CAS/CAS2 (x40) as multi-cpu instructions.

Another hint for @SCPCD being right is the fact, that m68000 and m68008 read the destination.

DEATH · November 20, 2022

Like I said it was a long time ago, I don't remember very well...
At the time I remember that the CLR instruction was already causing problems, as soon as the 68000 came out. The recommendations, the rule, was that we NEVER use (or be very careful) the CLR instruction whatsoever on a multi processor/multi BUS MASTER or even single CPU/BUS MASTER system.
For a single CPU/BUS MASTER system it was because the instruction takes an extra read cycle (whatever the reason, if it's just to erase an operand...), and for a multi processor/BUS MASTER system it was (from memory) because the READ-MODIFY-WRITE cycle of this instruction could be "broken" in the middle and the end result could then be unexpected. So you had to be careful where (and/or when) to use it.
Officially only the TAS instruction uses the RMW cycle of the 68000 which cannot be split. Logically any other instruction that would do a sort of RMW would do so with a simple read cycle, an ALU cycle then a write cycle, the BUS being able to be resumed between the read and write cycle. There may be a misunderstanding somewhere, a bug in the documentation or something else...
I think it may be best never to use an "RMW" type instruction on the Jaguar with data shared between multiple processors (edit : exept for the TAS instruction)

Edited November 20, 2022 by DEATH

DEATH · November 20, 2022

Oh, also for some hardware registers this may have unexpected consequences. Because some registers can be modified (or cause an event) just by reading them.

LZSA format 1 depacker

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members