Jump to content
IGNORED

large audio clips


ZackAttack

Recommended Posts

While testing a small routine designed to free up the overscan and vblank time for the harmony processor I threw together a quick demo for playing back about 30 seconds of sampled audio. The results are surprisingly good. Since the purpose of this was to test a routine that runs in zeropage memory during overscan and vblank the audio samples are intentionally stored in a very inefficient manner. If someone only cared about playing back audio it would be possible to fit more than a minute into a single rom.

rbairos did the conversion to 4bit audio.

 

btf_audio_part1.bin

btf_audio_part2.bin

 

  • Like 4
Link to comment
Share on other sites

Interesting. This sounds is surprisingly good. If anyone is interested in trying it with our improved audio code (before we finish the back port to Stella): it runs in 6502.ts/Stellerator if you set the cartridge type to "bank switched 3E (Tigervision + RAM)" manually --- it won't autodetect.

 

It would be interesting to see how the same sample sounds with 5 bit (combining both channels) and compensation for the nonlinearities in mixing according to the formula in this thread http://atariage.com/forums/topic/271920-tia-sound-abnormalities/

Edited by DirtyHairy
Link to comment
Share on other sites

Very cool.
How does combining sample sounds work on the 2600? You can only update one of the two voices at a time right?
The original source file was in stereo, but I simply averaged them into one channel for this test.
I could provide two streams, one sampled *slightly* after the other to compensate for the delay in writing them. Still not considered 5 bit though?

Link to comment
Share on other sites

It would be interesting to see how the same sample sounds with 5 bit (combining both channels) and compensation for the nonlinearities in mixing according to the formula in this thread http://atariage.com/forums/topic/271920-tia-sound-abnormalities/

It sounds really good, like in a movie where someone is watching TV. icon_thumbsup.gif

 

I am most interested into someone testing nonlinearity compensation. Though probably music is better suited for such a test.

 

BTW: How did you do the conversion to 4 bit? Did you do any error correcting dithering?

Edited by Thomas Jentzsch
Link to comment
Share on other sites

According to Crispy's audio code (which supposedly models the audio generator circuit correctly and is the base of the new audio implementation in Stella and 6502.ts), changes to AUDV are latched at color clocks 37 and 149. If this is correct, then you can update both channels between those two clocks and combine them to get 5 bit audio. I am not 100% sure about this, though, the only source of truth for this are the TIA schematics :)

Link to comment
Share on other sites

For conversion, I simply took the audio channels, averaged them, resampled to 60*262 HZ, scaled slightly, then remapped 0..15.
Didn't do any dither error correcting (didn't occur to me you could apply that to audio, nice!)

"AUDV are latched at color clocks 37 and 149"
So both latches are latched both times?
If so, thats really useful, to get double the audio resolution, though as Zack Attack points out, may have limited use.
Its not as though both channels are brought out unforutnately.

Some testing may be in order!

Link to comment
Share on other sites

Wouldn't 5bit be too expensive though? That's an extra Tia write each scanline. Also, would doubling the sample rate be a better use of the extra audio bandwidth?

 

Dunno what's better for audio fidelity --- that definitely exceeds my knowledge about signal processing :) 5 bit consumes less space than doubling the sample rate, though, provided that you can unpack the samples in time.

Link to comment
Share on other sites

 

Dunno what's better for audio fidelity --- that definitely exceeds my knowledge about signal processing icon_smile.gif 5 bit consumes less space than doubling the sample rate, though, provided that you can unpack the samples in time.

The problem is that, due to the non-linearity of the TIA, the 5th bit doesn't double the resolution, it only increases by 50%.

Link to comment
Share on other sites

What I can do when I get a chance is attach wav files at four and five bit resolution with using default linear scaling. Generally I've found things still sound pretty good for embedded projects at lower sampling frequencies. We can then move onto artifacts introduced by the tia

Link to comment
Share on other sites

The problem is that, due to the non-linearity of the TIA, the 5th bit doesn't double the resolution, it only increases by 50%.

I don't think it's a big problem. The non-linearity still gives 31 unique fairly-evenly distributed values, which is very nearly 5-bit, and certainly much better than 4-bit.

 

Here's a graph plot of the unique values, per DirtyHairy's formula...

 

post-23476-0-13400600-1512871706_thumb.png

 

The non-linear curve favors louder sounds with better resolution. You'd probably want to run the source sample through an audio-compressor filter, to take advantage of that.

 

As others have said, the resampling "grid" should be matched to the non-linear values, to reduce noise, in addition to adding some dithering to the process.

 

Even though the curve is non-linear, I think I'd just implement dithering with a linear value of 1/2 of the smallest step in the non-linear sequence. This wouldn't help quiet values as much, but wouldn't undo the higher resolution with louder sounds. The source sample should also be low-pass filtered to drop any frequencies over the Nyquist Frequency.

 

On the question of 5-bit sound vs higher sample rate... sorta depends on what you're chasing. The higher sample rate will solely give you an increase in treble quality (presently cut-off at 7860Hz) The 5-bit sound will give an overall improvement in noise/fuzz. But in my mind, 5-bit is pointless without the non-linear resampling.

  • Like 2
Link to comment
Share on other sites

I don't think it's a big problem. The non-linearity still gives 31 unique fairly-evenly distributed values, which is very nearly 5-bit, and certainly much better than 4-bit.

 

Here's a graph plot of the unique values, per DirtyHairy's formula...

Wouldn't the non linearity of each individual channel also effect the results? Has anyone documented the graph if only AUDV0 is used?

 

The routine I made only takes up 49 bytes for the code and 32 bytes for the sample buffer. (64 4bit samples packed 2 per byte). I modified it to handle both AUDV0 and AUDV1 and still had 10 bytes of zeropage memory left over. The only problem is that it requires 64 bytes to be copied to ZP each frame instead of 32. So either it's going to require bus stuffing or you'll lose one or two more lines of vblank time. We'll probably need bus stuffing anyway to have enough time to update both audio registers in the middle of a graphics kernel.

 

I'm also wondering why combining two 4bit registers only produces 5bits of resolution. Doesn't that mean that on average for each output value there are 8 combination of AUD0/1 values that produce that output?

Link to comment
Share on other sites

Wouldn't the non linearity of each individual channel also effect the results? Has anyone documented the graph if only AUDV0 is used?

Looking at DirtyHarry's equation, 1+1=2+0. There isn't non-linearity on the source AUDV0 and AUDV1 AFAIK, but on the combined Vout.

 

The first 16 values in the graph can actually be created by 0+0, 1+0, 2+0, ..., 15+0, so the first 16 values of my previous graph would be a graph of only AUDV0 being used.

 

I'm also wondering why combining two 4bit registers only produces 5bits of resolution. Doesn't that mean that on average for each output value there are 8 combination of AUD0/1 values that produce that output?

All kinds of combinations create the same output level...

0+0

1+0=0+1

2+0=0+2=1+1

3+0=0+3=1+2=2+1

4+0=0+4=3+1=1+3=2+2

...

  • Like 3
Link to comment
Share on other sites

I don't think it's a big problem. The non-linearity still gives 31 unique fairly-evenly distributed values, which is very nearly 5-bit, and certainly much better than 4-bit.

Maybe the volume compression is even something positive here, since louder volumes (which the human ear notices more) have a better resolution.

 

But in my mind, 5-bit is pointless without the non-linear resampling.

Yup, that's definitely necessary to gain advantage here.
Link to comment
Share on other sites

Wouldn't the non linearity of each individual channel also effect the results? Has anyone documented the graph if only AUDV0 is used?

 

Looking at DirtyHarry's equation, 1+1=2+0. There isn't non-linearity on the source AUDV0 and AUDV1 AFAIK, but on the combined Vout.

 

 

As RevEng says :) The two channels combine additively before entering the nonlinear part.

 

Each of the two tone generators outputs a binary square wave, which fed into network of four resistors connected in parallel. Each resistor is toggled by one bit of AUDV0/1, and the ratios of the the resistances are powers of two: bit 0 toggles 30kOhms, bit 1 toggles 15kOhms, bit 2 toggles 7.5kOhms, and bit 3 toggles 3.75kOhms. If you do the math, you find that the total resistance of the network is 30kOhm / AUDV.

 

Both networks are connected in parallel (this gives 30kOhm / (AUDV0 + AUDV1) total) and form one half of a voltage divider (the other half being a single 1kOhm resistor). The parallel network and the ratios of the resistors are the reason for the additivity, the voltage divider is the source of the nonlinearity.

 

This is what is described in Crispy's PDF that is linked in the Stella GitHub issue and, from looking at the schematics, I am very confident that this is correct. What I am unsure about is my previous statement about latching AUDV at two discrete color clocks: I have found nothing in the schematics that indicates that this is the case, so it might just be a trade-off between accuracy and performance in the code he contributed. However, I know even less about digital electronics than I know about analog stuff, and there is a lot in the schematics that I don't understand, so I might well be overlooking something.

 

At any rate, I wouldn't expect any audible effect of the delay between writing both registers as long as it is small compared to the sample length (and the way the contributions are distributed between AUDV0 and AUDV1 could be tailored to minimize it). If you write once per scanline (for a sample rate of ~15kHz), that's a ratio of 3 clocks / 76 clocks ~4% --- I think this is just a negligible smoothing of the signal edge, which will be distorted by the analog parts of the circuit anyway.

 

Dithering the signal sounds like a good idea to me, as does cutting off the spectrum above the Nyquist threshold.

Edited by DirtyHairy
  • Like 2
Link to comment
Share on other sites

At any rate, I wouldn't expect any audible effect of the delay between writing both registers as long as it is small compared to the sample length (and the way the contributions are distributed between AUDV0 and AUDV1 could be tailored to minimize it). If you write once per scanline (for a sample rate of ~15kHz), that's a ratio of 4 clocks / 68 clocks ~5% --- I think this is just a negligible smoothing of the signal edge, which will be distorted by the analog parts of the circuit anyway.

My own experience is different here. The human ear is pretty sensitive to small timing differences. That's why it is so important that digitized sound replay has perfect timing so that the writes to AUDVx always happen at the exact same interval.

 

I am not sure when exactly timing problems (jitter?) become noticeable though, so that's something to test too.

  • Like 1
Link to comment
Share on other sites

Looking forward to your experiments. Not too surprisingly, TIA has very similar non-linear behavior as POKEY by the looks of it. Without non-linear resampling, combining channels still gives a noticable improvement there. If things can be further improved, that's great news for TIA and POKEY coders.

 

Haven't found dithering to be that helpful at these low bit depths. Although subjective, the dithering is too audible - it noticably adds more noise such as hiss at the sampling rates typically used here (7.8kHz, 15.6kHz). Quieter parts suffer most. Although saying that, it may be great for some samples.

Edited by Sheddy
Link to comment
Share on other sites

My own experience is different here. The human ear is pretty sensitive to small timing differences. That's why it is so important that digitized sound replay has perfect timing so that the writes to AUDVx always happen at the exact same interval.

 

I am not sure when exactly timing problems (jitter?) become noticeable though, so that's something to test too.

 

 

I think the key is regularity: the writes have to happen immediately one after the other and at the same clock each line. In this case, I think effect will be contributions at the high end of the spectrum and virtually unnoticeable (the delay itself is a few microseconds and much too short to be resolved). But, I agree: testing is the only way to find out.

Link to comment
Share on other sites

Don't think this requires filtering out at twice nyquist frequency, as 60*292 = 15,720 hz, which is the upper limits of most basic audio playback devices anyways, so most sounds recordings (especially from the 1980s, probably had that in mind).

As far as delay between samples, usually this is very noticeable in stereo setups, as that's how positioning is implemented. Badly designed wireless speakers will suffer from even small delays, especially earphones. The sound will appear at different offsets from the center of your head. Not sure how this relates to the TIA though where its combined onto a single speaker though.

As others have pointed out, by far the main thing for clarity is regularity, anything straying will be interpreted as continuous buzz or random noise.

Lemme prepare some wavs at different bit-depths. Predicting they'll be negligible in difference, even without dithering.

Edited by rbairos
Link to comment
Share on other sites

Okay here's the sound sample from 8 bit depth to 1 bit depth. (all resampled at 262*60)
I take it back, I think dithering might be useful for the lower bit depths. (not implemented)
I recall being super impressed at 1 bit sound samples on my old PC speaker years ago, and never being able to reproduce it.

Gonna write up a dithering algorithm sometime today to compare

marty_1bit.wav

marty_2bit.wav

marty_3bit.wav

marty_4bit.wav

marty_5bit.wav

marty_6bit.wav

marty_7bit.wav

marty_8bit.wav

Edited by rbairos
  • Like 1
Link to comment
Share on other sites

For a basic experiment, I ripped the Boulder Dash music code and created a simple ROM from it. Attached you find the source code and two variations. One updates AUDV0 and AUDV1 at 100% equal intervals, while the other one (7) randomly delays the updates between 0 and 7 cycles.

 

At least for this example, I cannot hear any difference. So probably I was wrong, and the update timing is not that critical.

BD_Music.asm

BD_Music.bin

BD_Music (7).bin

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...