Jump to content
IGNORED

Cinepak, what's missing?


Saturn

Recommended Posts

@Chilly Willy what resolution and fps would a roq file play when there is a code for gpu player?

ROQ format is pretty simple, and could easily be accelerated with the blitter. It's mostly copying small blocks... the screen is split into 16x16 pixel macroblocks. Each macroblock is always split into four 8x8 blocks. At that point, one of three things occur: the 8x8 block could be copied from the previous screen +/- several pixels motion; the 8x8 block might be drawn using a scaled up 4x4 block from the 4x4 codebook; or the 8x8 block may be split into four 4x4 blocks. Each 4x4 block may have one of three things occur: it may be copied from the previous screen +/- several pixels motion; it may be drawn from a 4x4 block in the 4x4 codebook; or it will be drawn using four 2x2 blocks from the 2x2 codebook.

 

The N64 is a 94MHz MIPS, and can handle all that in plain C code in real time for low res video (320x240-ish) up to 30Hz. Decent GPU assembly + BLITTER acceleration can probably handle the same thing. This is one of the things I'll be playing with this winter on the skunkboard. Get it working on small clips in rom to start, then work on a streaming version.

Link to comment
Share on other sites

ROQ format is pretty simple, and could easily be accelerated with the blitter. It's mostly copying small blocks... the screen is split into 16x16 pixel macroblocks. Each macroblock is always split into four 8x8 blocks. At that point, one of three things occur: the 8x8 block could be copied from the previous screen +/- several pixels motion; the 8x8 block might be drawn using a scaled up 4x4 block from the 4x4 codebook; or the 8x8 block may be split into four 4x4 blocks. Each 4x4 block may have one of three things occur: it may be copied from the previous screen +/- several pixels motion; it may be drawn from a 4x4 block in the 4x4 codebook; or it will be drawn using four 2x2 blocks from the 2x2 codebook.

 

The N64 is a 94MHz MIPS, and can handle all that in plain C code in real time for low res video (320x240-ish) up to 30Hz. Decent GPU assembly + BLITTER acceleration can probably handle the same thing. This is one of the things I'll be playing with this winter on the skunkboard. Get it working on small clips in rom to start, then work on a streaming version.

 

 

What would be the benefit of this over Cinepak or whatever is already available?

 

Or is this just for fun? Or both?

Link to comment
Share on other sites

I think it would be good, because this video format is widely used some more modern then Cinepak.

Also the source would be available like the codec.o from the cinepak decoder is never released and copyrighted.

 

With the knowledge of today from the RiSC chips sets, a better optimized decoder could be written I guess. I can't wait to see what Willy will show us... it's almost winter :D

Link to comment
Share on other sites

Most folks probably remember ROQ from Quake3, although it was also in a couple other games. Quake3 videos are 512x256@30Hz. ROQ was never patented, and there's been an open source encoder for many years, which eventually made it into ffmpeg and libav some years ago. It's easy enough to transcode video into roq... here's some lines I used for transcoding Cowboy Bebop:

 

288x208@24Hz - Q=14

ffmpeg -i Cowboy_Bebop_01.mkv -c:v roqvideo -r 24 -q:v 14 -vf scale=288:208 -map 0:0 -c:a adpcm_ima_wav -ar 22050 -ac 2 -map 0:2 CB01_r24_q14.avi

288x208@12Hz - Q=1

ffmpeg -i Cowboy_Bebop_01.mkv -c:v roqvideo -r 12 -q:v 1 -vf scale=288:208 -map 0:0 -c:a adpcm_ima_wav -ar 22050 -ac 2 -map 0:2 CB01_r12_q1.avi

You can do cinepak by simply changing the video codec from "-c:v roqvideo" to "-c:v cinepak". The quality constant is not the same for both other than 1=best and 31=worst. 12/15 Hz videos can be 1 to 5 and fit in a 1X CD rate, while 24/30 Hz will be 10 to 15. Note that you need a version of ffmpeg that is newer than 14.0 to have the cinepak encoder, which was added in Jan 2014.

 

I'm trying to keep my frame sizes less than 128KB for 16 bit RGB since that's what the 32X has. For better consoles like the Jaguar, I can go to slighter higher, doing 320x224 for 4:3 videos (NTSC). I use 320x176 for 16:9 for everything since that fits in 128KB just fine.

 

For ROQ, you need to keep videos to a multiple of 16 in both dimensions since ROQ uses 16x16 pixel macroblocks. Cinepak uses strips of 4x4 or smaller blocks, so cinepak needs to be kept to multiples of 4 in both dimensions. Cinepak and ROQ both give similar results at about the same bitrate, with some videos maybe favoring one format over the other slightly. Cowboy Bebop looks a little less blocky on ROQ. Perhaps the biggest difference between the two is the player code license - all cinepak player code (I have seen) is LGPL, while ROQ is all variations of BSD. So if you are working on closed source, ROQ is your choice. If it doesn't matter because you are making it LGPL compatible open source, you could use either one.

Edited by Chilly Willy
Link to comment
Share on other sites

Nice explaination can't wait to see your first implementation running :-)

A pure optimized risc decoder would rule can roq also do 24-bit color?

ROQ is YUV, so it can easily do 24-bit color. My N64 player runs in 32-bit output. The code book conversion looks like this:

 

        /* unpack the YUV components from the bytestream */
        for (j = 0; j < 4; j++)
            y[j] = *buf++;
        u  = *buf++;
        v  = *buf++;
        u -= 128;
        v -= 128;
	/* CCIR 601 conversion */
        u1 = (88 * u) >> 8;
        u2 = (453 * u) >> 8;
        v1 = (359 * v) >> 8;
        v2 = (183 * v) >> 8;
        /* convert to RGBA8888 */
        for (j = 0; j < 4; j++)
        {
		/* CCIR 601 conversion */
		r = y[j] + v1;
		g = y[j] - v2 - u1;
		b = y[j] + u2;
                if (r < 0) r = 0;
                else if (r > 255) r = 255;
                if (g < 0) g = 0;
                else if (g > 255) g = 255;
                if (b < 0) b = 0;
                else if (b > 255) b = 255;

At that point, you can make the codebook entry 24 bit RGB, 16 bit RGB, 15 bit RGB, 16 bit CRY... RGB to CRY conversion is pretty fast:

 

uint16_t rgb2cry(int32_t r, int32_t g, int32_t b)
{
	uint16_t	intensity;
	uint16_t	color_index;

	intensity = r;						/* start with red */
	if(green > intensity)
		intensity = g;
	if(blue > intensity)
		intensity = b;					/* get highest RGB value */
	if(intensity != 0)
	{
		r = (uint32_t)r * 255 / intensity;
		g = (uint32_t)g * 255 / intensity;
		b = (uint32_t)b * 255 / intensity;
	}
	else
		r = g = b = 0;					/* R, G, B, were all 0 (black) */

	color_index = (r & 0xF8) << 7;
	color_index += (g & 0xF8) << 2;
	color_index += (b & 0xF8) >> 3;

	return (uint16_t)(((uint16_t)cry[color_index] <<  | (uint8_t)intensity);
}

We've already got R, G, and B as 8-bit ints, so then converting to CRY is simple. I'll probably replace that 255 / intensity calculation with a table lookup... use a small fixed point number so that you get a multiply and a shift instead of the divide.

  • Like 1
Link to comment
Share on other sites

Don't forget you'll have to run this for every output pixel, and the GPU has only 4 KB of local memory where you have to fit code, data, and LUTs. And you can only read/write it with 32 bits accesses; if you need smaller data, either you pad it and waste memory, or have to waste cycles with shifting & masking.

 

I don't want to rain on your parade, but I wouldn't be too optimistic about this until you've actually written and benchmarked the code.

  • Like 3
Link to comment
Share on other sites

Don't forget you'll have to run this for every output pixel, and the GPU has only 4 KB of local memory where you have to fit code, data, and LUTs. And you can only read/write it with 32 bits accesses; if you need smaller data, either you pad it and waste memory, or have to waste cycles with shifting & masking.

 

I don't want to rain on your parade, but I wouldn't be too optimistic about this until you've actually written and benchmarked the code.

Actually, the code I posted is for each entry of the 2x2 codebook, making it a max of 1024 times per frame. Actually unpacking the frame is like this:

 

static int roq_unpack_vq(unsigned char *buf, int size, unsigned int arg, quit_callback quit_cb)
{
    int status = ROQ_SUCCESS;
    int mb_x, mb_y;
    int block;     /* 8x8 blocks */
    int subblock;  /* 4x4 blocks */
    int i;

    /* frame and pixel management */
    unsigned int *this_frame;
    unsigned int *last_frame;

    int line_offset;
    int mb_offset;
    int block_offset;
    int subblock_offset;

    unsigned int *this_ptr;
    unsigned int *last_ptr;
    unsigned int *vector;

    /* bytestream management */
    int index = 0;
    int mode_set = 0;
    int mode, mode_lo, mode_hi;
    int mode_count = 0;

    /* vectors */
    int mx, my;
    int motion_x, motion_y;
    unsigned char data_byte;

	if (dc)
	{
		sync_audio();
		display_show(dc);
	}
    while (!(dc = display_lock()));
    current_frame = (int)dc - 1;
    if (current_frame == 0)
    {
        this_frame = frame[0];
        last_frame = frame[1];
    }
    else
    {
        this_frame = frame[1];
        last_frame = frame[0];
    }

    mx = (arg >>  & 0xFF;
    my =  arg       & 0xFF;

    for (mb_y = 0; mb_y < mb_height && status == ROQ_SUCCESS; mb_y++)
    {
        line_offset = mb_y * 16 * stride;
        for (mb_x = 0; mb_x < mb_width && status == ROQ_SUCCESS; mb_x++)
        {
            mb_offset = line_offset + mb_x * 16;
			/* macro blocks are 16x16 and are subdivided into four 8x8 blocks */
            for (block = 0; block < 4 && status == ROQ_SUCCESS; block++)
            {
                block_offset = mb_offset + (block / 2 * 8 * stride) + (block % 2 * ;
                /* each 8x8 block gets a mode */
                GET_MODE();
                switch (mode)
                {
                case 0:  /* MOT: skip */
                    break;

                case 1:  /* FCC: motion compensation */
                    GET_BYTE(data_byte);
                    motion_x = 8 - (data_byte >>  4) - mx;
                    motion_y = 8 - (data_byte & 0xF) - my;
                    last_ptr = last_frame + block_offset +
                        (motion_y * stride) + motion_x;
                    this_ptr = this_frame + block_offset;
                    for (i = 0; i < 8; i++)
                    {
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;
                        *this_ptr++ = *last_ptr++;

                        last_ptr += stride - 8;
                        this_ptr += stride - 8;
                    }
                    break;

                case 2:  /* SLD: upsample 4x4 vector */
                    GET_BYTE(data_byte);
                    vector = cb4x4[data_byte];
                    for (i = 0; i < 4*4; i++)
                    {
                        this_ptr = this_frame + block_offset +
                            (i / 4 * 2 * stride) + (i % 4 * 2);
                        this_ptr[0] = *vector;
                        this_ptr[1] = *vector;
                        this_ptr[stride+0] = *vector;
                        this_ptr[stride+1] = *vector;

                        vector++;
                    }
                    break;

                case 3:  /* CCC: subdivide into four 4x4 subblocks */
                    for (subblock = 0; subblock < 4; subblock++)
                    {
                        subblock_offset = block_offset + (subblock / 2 * 4 * stride) + (subblock % 2 * 4);

                        GET_MODE();
                        switch (mode)
                        {
                        case 0:  /* MOT: skip */
                             break;

                        case 1:  /* FCC: motion compensation */
                            GET_BYTE(data_byte);
                            motion_x = 8 - (data_byte >>  4) - mx;
                            motion_y = 8 - (data_byte & 0xF) - my;
                            last_ptr = last_frame + subblock_offset +
                                (motion_y * stride) + motion_x;
                            this_ptr = this_frame + subblock_offset;
                            for (i = 0; i < 4; i++)
                            {
                                *this_ptr++ = *last_ptr++;
                                *this_ptr++ = *last_ptr++;
                                *this_ptr++ = *last_ptr++;
                                *this_ptr++ = *last_ptr++;

                                last_ptr += stride - 4;
                                this_ptr += stride - 4;
                            }
                            break;

                        case 2:  /* SLD: use 4x4 vector from codebook */
                            GET_BYTE(data_byte);
                            vector = cb4x4[data_byte];
                            this_ptr = this_frame + subblock_offset;
                            for (i = 0; i < 4; i++)
                            {
                                *this_ptr++ = *vector++;
                                *this_ptr++ = *vector++;
                                *this_ptr++ = *vector++;
                                *this_ptr++ = *vector++;

                                this_ptr += stride - 4;
                            }
                            break;

                        case 3:  /* CCC: subdivide into four 2x2 subblocks */
                            GET_BYTE(data_byte);
                            vector = cb2x2[data_byte];
                            this_ptr = this_frame + subblock_offset;
                            this_ptr[0] = vector[0];
                            this_ptr[1] = vector[1];
                            this_ptr[stride+0] = vector[2];
                            this_ptr[stride+1] = vector[3];
                            GET_BYTE(data_byte);
                            vector = cb2x2[data_byte];
                            this_ptr[2] = vector[0];
                            this_ptr[3] = vector[1];
                            this_ptr[stride+2] = vector[2];
                            this_ptr[stride+3] = vector[3];
                            this_ptr += stride * 2;

                            GET_BYTE(data_byte);
                            vector = cb2x2[data_byte];
                            this_ptr[0] = vector[0];
                            this_ptr[1] = vector[1];
                            this_ptr[stride+0] = vector[2];
                            this_ptr[stride+1] = vector[3];
                            GET_BYTE(data_byte);
                            vector = cb2x2[data_byte];
                            this_ptr[2] = vector[0];
                            this_ptr[3] = vector[1];
                            this_ptr[stride+2] = vector[2];
                            this_ptr[stride+3] = vector[3];
                            break;
                        }
                    }
                    break;
                }
            }
        }
    }

    /* if client program defined a quit callback, check if it's time to quit */
    if (quit_cb && quit_cb())
		return ROQ_USER_INTERRUPT;

    /* sanity check to see if the stream was fully consumed */
    if (status == ROQ_SUCCESS && index < size-2)
    {
        status = ROQ_BAD_VQ_STREAM;
    }

    return status;
}
Notice that it's merely a bunch of moves. Should be pretty easy to convert to assembly and maybe use the blitter... if it's needed. I'll try it without the blitter to start and see what I can get away with.
Link to comment
Share on other sites

Oh i thought Linko tolde the dsp could run 2 seperate programs

 

That might of been Adisak Pochanayon you were thinking of?

 

 

 

fwiw the DSP code I wrote for NBA Jam was amazing. Ran audio in one context and supported full async code in the other. You could literally run two local programs (audio mixing + user code) locally on DSP.
Link to comment
Share on other sites

Remember that decoding the frame IS just moving data. The GPU has a much better bus than the DSP. You could feed the compressed audio to the DSP for decompressing to take a little pressure off the GPU, but you really want the GPU (or GPU+BLITTER) handling the video. The code I posted above for decoding the frame moves 32-bit RGBA entries from the codebooks. You'll halve the bandwidth by using CRY mode instead. That was why I posted about converting RGB to CRY when decoding the codebooks. If that takes too long, maybe going with 16-bit RGB would be better. My point was you only need to convert the codebook entries to CRY, not every single pixel.

Link to comment
Share on other sites

When CRY would halve the bandwith use, would it not be better to have a ROQ encoder that encodes CRY ?

 

I read a similar thing in the Dev manual about Cinepak, there is a line that claims 10% more speed when using CRY.

The apple cinepak encoder for Jaguar does support that CRY mode but I never tested it before...

Link to comment
Share on other sites

Did anyone used my cinepak "library" to produce a game ?

So, don't bother spending weeks on a ROQ decoder, nobody will use it :)

I don't care if no ever uses it, I just want something other than the binary blob cinepak driver for folks. It's also something "fun" to work on.

 

Also, the hoops you have to jump through to get a stream you can use with the binary blob driver is probably WHY no one uses your cinepak tools. It's a pain. I want a player that takes a plain avi with regular cinepak/roq. No conversions needed. Which answers TXG above - I COULD make a tool that goes through the ROQ stream converting YUV to CRY before hand, and it would make playback faster; however, it's one more thing that encourages people not to try it at all. We want support for PLAIN streams with no jumping through hoops.

  • Like 2
Link to comment
Share on other sites

I don't care if no ever uses it, I just want something other than the binary blob cinepak driver for folks. It's also something "fun" to work on.

 

Also, the hoops you have to jump through to get a stream you can use with the binary blob driver is probably WHY no one uses your cinepak tools. It's a pain. I want a player that takes a plain avi with regular cinepak/roq. No conversions needed. Which answers TXG above - I COULD make a tool that goes through the ROQ stream converting YUV to CRY before hand, and it would make playback faster; however, it's one more thing that encourages people not to try it at all. We want support for PLAIN streams with no jumping through hoops.

 

Orion, perhaps even you would find this more convenient to use?

Link to comment
Share on other sites

It's like always in the Jaguar community, too much talking about what we could do, and so little people actually DOING and releasing something. (like, a cinepak player ready to use for coders maybe ? did that ever happen before my package release ?)

So let's see, coding a realtime ROQ decoder in GPU assembly, not talking about the Jag CD data streaming (which is, not that obvious, trust me)

I just wonder how much time this will take to code, for someone who, I guess, almost never coded for the Jaguar/JagCD.

But you can surprise me :)

 

JagChris > I use the cinepak player in my last 3 Jaguar CD Games, so it's already convenient to use for me :) (seems like I'm the only user anyway)

  • Like 5
Link to comment
Share on other sites

It's like always in the Jaguar community, too much talking about what we could do, and so little people actually DOING and releasing something. (like, a cinepak player ready to use for coders maybe ? did that ever happen before my package release ?)

So let's see, coding a realtime ROQ decoder in GPU assembly, not talking about the Jag CD data streaming (which is, not that obvious, trust me)

I just wonder how much time this will take to code, for someone who, I guess, almost never coded for the Jaguar/JagCD.

But you can surprise me :)

 

JagChris > I use the cinepak player in my last 3 Jaguar CD Games, so it's already convenient to use for me :) (seems like I'm the only user anyway)

I'm not COMPLETELY reinventing the wheel, I'm just trying to make it a little nicer which HOPEFULLY entices some folks into using it. We have a very recent thread in this forum from someone having real problems getting the current cinepak playing to work. Part of that was problems in even making a compliant video file.

 

I already showed working C code for ROQ, and it's damn simple. Making a GPU assembly version, which might not even be needed, is straight-forward to anyone who has done any assembly. As for streaming CD code, yes, that is a difficult part of the task, but maybe that part I can reuse directly (provided non-conflicting licenses on the different code libraries). You did an excellent job on making alterations to the existing player code to allow ADPCM while still keeping the original cinepak decoder. I basically just want to make the video part a bit better just as you wanted to make the audio part a bit better. If we keep making little parts a little better, the whole becomes better and easier for folks.

 

And yes, I've not written anything for the Jaguar before. I've only written stuff for the SNES, SMS, Genesis, Sega CD, 32X, Saturn, Dreamcast, N64, GameCube, PSX, PS2, PS3, PSP, Amiga, Mac (68K and PPC), PC (from 286 to modern), Atari 8-bit... yeah, I'll NEVER be able to make something for the Jaguar. ;) :D

 

My main limitation is available time for all the tasks I set myself. If anything, I try to do too much. Some things wind up on the back burner while I work on something else that has caught my fancy. Right, that something else that caught my fancy is ROQ video.

  • Like 1
Link to comment
Share on other sites

Are you porting a video decoder just for fun? Good, that's the spirit!

 

Are you porting a video decoder and hoping someone else than you will actually use it? Don't get your hopes up too much. The whole "tools are too complicated to use" excuse is a red herring. Easy-to-use tools are a great things, but even cranky tools won't stop anyone who actually wants to create something. If that is too much of an obstacle for you*, the Jaguar isn't the console you should be coding for in the first place.

 

(* generic "you", not Chilly Willy)

  • Like 1
Link to comment
Share on other sites

Are you porting a video decoder just for fun? Good, that's the spirit!

 

Are you porting a video decoder and hoping someone else than you will actually use it? Don't get your hopes up too much. The whole "tools are too complicated to use" excuse is a red herring. Easy-to-use tools are a great things, but even cranky tools won't stop anyone who actually wants to create something. If that is too much of an obstacle for you*, the Jaguar isn't the console you should be coding for in the first place.

 

(* generic "you", not Chilly Willy)

I've ported quite a few things that nobody ever wound up using... and some things a lot of people use. You never really know ahead of time, so yes, it's better if you port things for yourself and/or fun, and then if someone else uses it, all the better. The main purpose behind porting ROQ is actually to make a "universal" format for old consoles. 32X, Saturn, Dreamcast, Jaguar, PS1, PS2, PSP, N64, NGC... I encode whatever into a ROQ avi and I can then play it on any of my old consoles. Don't know if I'll use for anything more than a video player, but you never know. I've got a bunch of videos like Hellsing Ultimate Abridged I've been watching on my N64. It's pretty cool to see this stuff on these old consoles. I've encoded some music videos that I can leave running while I do other things... my player handles video files just like audio files - it plays the whole directory (in order or shuffled). I start it playing episode 1 of a series and it just keeps on playing until I stop it or it runs out of files.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...