Jump to content
IGNORED

Another "Bad Apple" demo


Recommended Posts

I've implemented a new version of the "Bad Apple" demo. Features:

  • Animation: 256x192 pixels at 25 fps or 30 fps in bitmap mode.
  • Sound: chiptune at 50 fps.
  • Vocals: linear predictive coding for the speech synthesizer at 40 fps.

You can find the source code and technical details and the cartridge file on Github. The cartridge ROM is 4.5 MB. The demo runs properly at 50 Hz on European consoles. It runs too fast on US consoles, although the filled speech buffer keeps it at the intended speed during the vocal sections.

 

This has been a fun challenge with plenty of interesting rabbit holes that kept me busy for weeks:

  • Lossless video compression by writing minimal changes to the screen image table and the pattern table (and the color table, for colored animations).
  • Essentially streaming bytes from the cartridge to the video processor, the sound processor, and the speech synthesizer.
  • Titles and credits with ImageMagick.
  • Linear predictive coding with Praat instead of the more common BlueWizard and python_wizard. I'll go into it in another thread later.
  • The vocals were hard to analyze due to echo/reverb/overdub. I've then created a tool to tune the vocals based on the music.
  • The project grew to the point that I've created a separate open source project with video tools for the TI-99/4A. I'll go into it in another thread later.

 

Of course, @Tursi already created a version with full PCM sound a long time ago, and @Asmusr very recently created a version with vector quantization, but this was too exciting a challenge to pass up.

 

Version 1.0 had room for improvement and testing:

  • The speech synthesis is still horrendous! Could we start from a cleaner source? Text-to-speech, as @wierd_w suggested?
  • I've only tested in Mame on Linux. The ROM is too large for current hardware, but some gurus are working on it in this thread.

 

Version 2.0 already has better vocals:

Version 2.1 offers a PAL version as well as an NTSC version.

Edited by Eric Lafortune
Updated for version 2.1
  • Like 21
  • Thanks 1
Link to comment
Share on other sites

Looks and sounds great!

 

Classic99 doesn't seem to manage 50hz correctly on my machine, so I can't actually run it properly, but it was really nice to see it go.

 

I was thinking, why not handle the 50/60 hz yourself? It sounds hacky, but if you just count interrupts and skip every sixth one, you'll get 50 interrupts per second on a 60hz machine, and you probably won't be able to tell. (I say probably... the animation is really smooth, but I've done it on Playstation works and didn't see anything.)

I love the pixel fades and the way that the final credits come in. It's very slick.

 

  • Like 2
Link to comment
Share on other sites

Great work! The frame rate is amazing, never thought I'd see video this smooth on the TI. The vocals sound surprisingly 'recognizable' to my non-Japanese speaking ears, but they also sound horribly out of tune (at least on js99er) :). Having said that, I really like the idea of using PSG music with the speech synth to create the soundtrack.

 

Any idea what kind of bitrate this needs? How many bytes you transfer to the vdp per frame?

 

Also, kudos on the excellent github pages.

Edited by TheMole
  • Like 2
Link to comment
Share on other sites

Great stuff! I will try to run this on the real hardware, and will save my excitement until that point - I will attempt not to test with an emulator. My StrangeCart has a 16MB serial flash chip, so it can store the whole cartridge. It will not be able to switch pages in an instant, like in a normal paged ROM. But I have space at least for a 128K cartridge in RAM, so I will try to implement a caching scheme, where I will just load the first blob of something like 64K into RAM for immediate paging, and then use the coprocessor to load upcoming 8K pages based on some heuristics, so that when the TMS9900 asks for a new page it would already be in RAM. The StrangeCart should be able to load data from the serial flash to RAM at a rate of 6MB per second (serial clock is 48MHz), so it is able load new pages much faster than the TMS9900 can read them. It just needs to stay ahead :) 

  • Like 2
Link to comment
Share on other sites

Thanks everyone!

 

@Tursi US version: Skipping every sixth vsync would be a very pragmatic solution for 60 Hz systems indeed; I can imagine it working fine. Streaming the same number of bytes in a shorter frame time and then doing nothing every sixth frame would be naggingly suboptimal though. The main reason was the VGM file available from the BBC version: it's sampled at 50 Hz. I should probably resample it and create a proper 60 Hz video.

 

@TheMole Vocals: The vocals sound out of tune in Mame too, even to my non-musical ears. Based on my understanding of the speech chirp table and experiments, I used the expression

synthesizer_pitch = 8000.0 / frequency_in_Hz - 1.0

The synthesizer pitch is then encoded as one of only 64 possible values, based on the synthesizer's hard-coded non-linear pitch table. Any error on my side and the quantization itself could explain the problem. @pixelpedant has derived a different table for his singing programs, from physical hardware and a tuner, but also noted the problem outside the midrange frequencies. Any thoughts, anyone? Would a vibrato mask the problem?

 

Bandwidth: Averaged over the entire video of 12008 vsyncs:

  • VDP: 336 bytes/vsync
  • Sound: 3.3 bytes/vsync
  • Speech:3.5 bytes/vsync

Averaged over the periods with actual updates in the respective streams:

  • VDP: 339 bytes/vsync
  • Sound: 3.7 bytes/vsync
  • Speech: 6.2 bytes/vsync

If I remember correctly, about 100-200 frames take too long due to outliers in the image complexity. Maxima:

  • VDP: 1852 bytes/vsync
  • Sound: 11 bytes/vsync
  • Speech: 8 bytes/vsync

@speccery StrangeCart: Cool! Extra challenge: the video player code currently runs entirely from scratchpad RAM. ?

 

@TheBF Assembly macro definitions: I'm always hoping they could be useful to someone. ?

  • Like 5
Link to comment
Share on other sites

22 minutes ago, Eric Lafortune said:

Vocals: The vocals sound out of tune in Mame too, even to my non-musical ears. Based on my understanding of the speech chirp table and experiments, I used the expression?

Oh, I did not think they were too far off by tune, but barely recognizable as human voice. On the other hand, I don't understand Japanese, let alone Japanese lyrics.

  • Like 1
Link to comment
Share on other sites

2 hours ago, mizapf said:

Oh, I did not think they were too far off by tune, but barely recognizable as human voice. On the other hand, I don't understand Japanese, let alone Japanese lyrics.

All Japanese are robots, perfected in the obscure city of Prometheum which can only be reached by train.  Skip asking @Tursi about it, as they replaced his body with a mechanical body, too.

  • Haha 1
Link to comment
Share on other sites

9 hours ago, Eric Lafortune said:

@Tursi US version: Skipping every sixth vsync would be a very pragmatic solution for 60 Hz systems indeed; I can imagine it working fine. Streaming the same number of bytes in a shorter frame time and then doing nothing every sixth frame would be naggingly suboptimal though. The main reason was the VGM file available from the BBC version: it's sampled at 50 Hz. I should probably resample it and create a proper 60 Hz video.

Suboptimal, yes, but only you will likely know. ;) Though it depends if there's enough time in a 60hz frame to push the video data.

 

That said, my VGMComp toolchain can resample the VGM for you. Not 100% sure how good a job it'll do, I've only tested going from 60 to 50, but once it's converted you can convert it back into a (unoptimized) VGM file for whatever toolchain you were using.

 

  • Like 1
Link to comment
Share on other sites

6 hours ago, OLD CS1 said:

All Japanese are robots, perfected in the obscure city of Prometheum which can only be reached by train.  Skip asking @Tursi about it, as they replaced his body with a mechanical body, too.

I'm getting to the age where I wish they would ;)

 

  • Haha 1
Link to comment
Share on other sites

15 hours ago, mizapf said:

Oh, I did not think they were too far off by tune, but barely recognizable as human voice. On the other hand, I don't understand Japanese, let alone Japanese lyrics.

Just to be clear, this is of course not against the Japanese language. It's a usual problem when languages have different phonemes, and your ears (better, the auditory cortex in the brain) are not trained for them. When I started to learn some Arabic, the flow of weird sounds slowly became a flow of words. The same is true for German, English, French etc.

 

13 hours ago, OLD CS1 said:

Skip asking @Tursi about it, as they replaced his body with a mechanical body, too.

As long as we don't end up as brains in glass jars ... ;-)

 

  • Like 1
  • Haha 1
Link to comment
Share on other sites

For speech, you might be better served just getting "middle C" versions of the base phonemes, all by themselves, out of vocaloid. Those should be amenable to being pitch modulated (either before being turned into LPC samples, or inside the speech synth, depending on octave reachability) after the fact.

 

eg,

 

"wa.wav"
"shi.wav"

 

etc.

Link to comment
Share on other sites

  • 2 weeks later...

@Eric Lafortune I'm happy to report that I just got the Bad Apple demo running on the real iron! Very cool! As I am writing this, I have run the demo exactly once, so my code is just fresh out of the oven. I am using the StrangeCart to support this cartridge. The Bad Apple demo is unmodified, I am using your binary distribution with the 4.5 megabyte image.

 

For the technically minded, you might wonder how this is done? I added to the StrangeCart what I call the "streaming mode". My firmware only supports cartridge images up to 128K of ROM (plus some GROM on top of that). The reason for this is that cartridges are served from the internal RAM of the microcontroller on the StrangeCart. It has 192K RAM, which actually is pretty big, but of course falls short of the 4.5 megs required for the demo.  So now, when a cartridge is loaded and if the cartridge image is larger than 128K, the firmware loads the first 128K of the cartridge to RAM. When the cartridge is started (from the TI's normal boot menu), it starts to execute from the first 128K loaded into RAM. Of course with these paged cartridges only the first 8K is visible to the TMS9900 at a time.

 

It's worth noting that there is no READY line on the TI-99/4A cartridge bus. Thus a cartridge cannot introduce hardware wait states to the TMS9900. When it wants to fetch something from ROM, you better have that data available right then, as there is only about half a microsecond to return the first byte to the TMS9900. In other words, the data has to be already in RAM at that time.

 

The StrangeCart's CPU has two cores. One of the cores (M0+) spends 100% of its time serving the the bus of the TMS9900. The new feature I added is that whenever the TMS9900 changes the 8K page of the large cartridge image, the M0+ writes the number of the newly loaded page to the interprocessor mailbox in the MCU. This in turn now causes an interrupt for the more powerful M4 CPU core. The interrupt routine updates some variables and returns.

For the M4 core I support what I call "cartridge service routines". When a cartridge is loaded (i.e. cartridge for TI-99/4A, containing TMS9900 code) for some types of cartridges the M4 core starts to execute a corresponding service routine. Now for these big 128k+ ROM images, I created a new service routine which monitors the variables set by the interrupt routine. When it sees a new page number written, it checks if the following page is already in RAM. If not, it is loaded from the serial flash chip with the assumption that it is soon needed. Since there is only 128K of RAM, loading a new 8K page means that it has to be put somewhere, replacing an existing 8K page. I went with this logic: the first 64K of a cartridge ROM is always  kept in RAM (I am assuming that the usage pattern is such that the code for a huge cartridge is kept here). For the remaining 64K of on-chip RAM, I search for the first 8K page frame which is not currently being used, and replace its contents with the new 8K page. In practice this means that page frames 8 (at 64K) and 9 (at 72K) are loaded in an alternating pattern with new data. When Bad Apple demo code is working  on the page frame at 64K, I load the next one into 72K.  When the demo code switches to the next page, it will use data at 72K and the next page is loaded into 64K. This seems to work perfectly, and should work for any cartridge doing video decode type activity with a sequential access pattern.

 

The TMS9900 is so slow (and the compression so good) that it's only consuming a couple of 8K pages per second. Like I think I wrote in the past, the TMS9900 in the TI-99/4A can only read data from ROM at a maximum rate of about 1 megabytes per second. While serving the TMS9900 bus, the StrangeCart can concurrently load data from the flash ROM chip at 6 megabytes per second. Thus this happens much faster than the TI-99/4A can consume it, leaving the only problem that when a page switch occurs the next page has to be preloaded ahead of time.

Edited by speccery
  • Like 7
Link to comment
Share on other sites

The Bad Apple demo is absolutely mind-blowing! As I never heard anything about it before, I checked Youtube and found the original video (black/white like the demo) and the live performance of Nomico.

 

 

To be honest, this is definitely not my kind of music. Were it not for comparing with the demo, I would not have made it to the end.

 

Yes, I'm stuck with 80s and early 90s, and that's good for me. 🙂

  • Like 3
Link to comment
Share on other sites

23 hours ago, mizapf said:

To be honest, this is definitely not my kind of music.

It's peculiar and fascinating... Thanks for the pointer to the video -- I was starting to think she was a vocaloid, a virtual character with a virtual voice. She's lip syncing in that video, but it turns out there's another one where she's singing live:

In any case, still work to do on improving my speech synthesis.

Edited by Eric Lafortune
  • Like 1
Link to comment
Share on other sites

  • 1 year later...

I've now released version 2.0 of my Bad Apple demo. This version has better vocals from the speech synthesizer, thanks to my new audio-to-speech tool ConvertWavToLpc. The vocals are more in tune and have fewer pops and other artifacts.

 

I'll add a Youtube video in the first post of this thread. You can still find the source code and binaries on Github.
 

  • Like 9
Link to comment
Share on other sites

Absolutely awesome.  Thank you @Eric Lafortune!

 

I have it working on my NTSC console, as you say, plays a little fast in NTSC and seems to slow a bit when the speech kicks in - that could be my Pi Pico device (with its 8MB PSRAM chip) starting to halt the CPU when emulating the synthesizer.  Being emulated, the speech, while certainly audible, isn't quite as loud as the You-Tube version.  I tried MAME from GameBase, but it said the image was greater than 2MB and didn't load.  Then I tried Classic99 and it seems to have the same speed as my console, but the speech volume is really very low and I can hardly hear it...  I would dearly like to get the resistors I'm using in my Pico PWM emulation of speech right, so I'm wondering if the You-Tube video is from a real machine and do I need to increase the speech volume a tad?

Link to comment
Share on other sites

Thanks for the experiments! Nice setup.

 

I still have the original hardware, but no fancy RAM cartridges, so I'm running all my code on Mame. The cartridge size requires Mame version 0.243 or higher.

 

The balance between the sound volume and the speech volume is hardwired in the actual hardware and hardcoded in Mame, as far as I know. I'm only assuming they are doing things right... Looking at the speech data, the energy could be slightly higher, but I needed to be careful not to clip the output and the result sounded okay. You could try a more conventional cartridge like Parsec to calibrate between different setups (and perhaps Youtube videos).

 

The speech processor indeed halts the CPU when the speech buffer starts to overflow. This has as a side-effect that the demo runs at the intended 50Hz PAL speed on 60Hz NTCSC systems during the vocal sections, because the sound/animation data streams are slowed down accordingly. I've tried to create an NTSC version (undocumented: build.sh ntsc), but the streaming code can't always keep up with the 60Hz Vsyncs for complex frames, resulting in the speech buffer eventually underflowing and the speech breaking up. The PAL version is the best approximation at this time.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...