Creating a Basic Synth in XNA 4.0 – Part II

Hello folks, and welcome to part II of my Creating a Basic Synth in XNA 4.0 series!

This time I’ll be covering the DynamicSoundEffectInstance class and how to put some of the concepts we discussed earlier into practice. Therefore (if you haven’t done so already) I recommend that you read part I before proceeding. I’d also like to warn you that this article will be quite long. The good news is that this is the hardest part for a beginner to overcome, and future articles on this series will probably be smaller and in general easier to grasp.

By the way, there’s a video sample at the end of the article in case you’d like to see it in action before you start reading!

Summary

I’ll start with a summary of everything that has been discussed already in part I:

  • Sound is the way our brain interprets the pressure oscillations that reach our eardrums.
  • These pressure oscillations can be mathematically represented as waves.
  • The wave’s basic relation to sound is that amplitude equals volume, frequency equals pitch and shape equals tone.
  • A wave can be implemented (in code) as any regular function that returns an amplitude given a specific time input (e.g. double wave(double time)).
  • Even though a wave is a continuous function you can approximate it by sampling only a “few” points along the way.

Introduction

Using this knowledge to generate our own sounds in XNA 4.0 is possible because of a new class called DynamicSoundEffectInstance which exposes all of the necessary low-level access. In theory, the whole process is pretty straightforward: you create some function that represents the sound wave you’d like to playback, read (or sample) enough values from that function to get a decent representation, and feed all of them to XNA which routes them through the low-level audio engine that takes care of the rest. In practice however, it’s not that easy because of a few important details that you will have to know in order to use the class properly:

  1. (Configuration) The DynamicSoundEffectInstance class is flexible enough to support different sample rates and multiple channel configurations. You’ll need to understand these concepts and set the parameters accordingly.
  2. (Format) The DynamicSoundEffectInstance class expects the data you submit to be presented in a very specific way. As we’ll see, this format is not exactly the easiest to work with when doing our wave calculations.
  3. (Chunks) Even if you have a large amount of sound data, you (most likely) won’t be submitting all of it at once, since that would be slow. Instead, in order to enable real-time behavior you’ll be dividing your data in small chunks and submitting them one at a time.

And I’ll admit that some of these details can be pretty confusing if you’ve never seen anything like this before. It did take me some time before finally getting that “a-ha” moment. Therefore, let’s take it one step at a time. I’ll describe each step you need to implement in order to use the DynamicSoundEffectInstance class, and present my solutions to the problems encountered during the way. Let’s start!

The Steps

In order to create and use the DynamicSoundEffectInstance class, you’ll need to follow these four steps:

  1. Create and configure a DynamicSoundEffectInstance object.
  2. Create a buffer to hold your audio samples.
  3. Fill the buffer with meaningful values.
  4. Submit the buffer to the DynamicSoundEffectInstance object when needed.

Step 1: Create DynamicSoundEffectInstance

If you’ve used XNA’s SoundEffect API before, then you might already be familiar with the SoundEffect and SoundEffectInstance classes. When using the content pipeline to load sound effects for your game, you receive a SoundEffect object that can be used to playback that sound. You can also use the SoundEffect.CreateInstance() method to create single instances of that SoundEffect for individual control. Either way, when loading a sound resource through the content pipeline, all of its data is automatically copied into the object’s audio buffer, and the sound engine only needs to start playing it when requested.

On the other hand, the DynamicSoundEffectInstance class (which inherits from SoundEffectInstance) is created with an empty audio buffer, and instead allows you to fill it with data. But since you’re the one providing it with all of the data, there are some extra decisions you need to take on how the data should be interpreted. Some of these parameters are already fixed for you by the framework while others are selected when creating the DynamicSoundEffectInstance object. Either way it’s important to know exactly what they mean, so I’ll describe them one by one.

Let’s start by taking a look at DynamicSoundEffectInstance‘s constructor:

As you can see, there are two parameters that you need to pass to the constructor: the sample rate and the audio channels count.

Sample Rate

As explained in part I of this article, the sample rate describes how many audio samples (remember, a sample is exactly one value taken from our wave function) will be processed every second (for each audio channel). Having a larger sample rate is desirable because it means that you can better represent fast variations in the sound and higher frequencies. The XNA Framework lets you select a sample rate ranging from 8.000 Hz to 48.000 Hz. Audio CD’s use a sample rate of 44.100 Hz so if you’re creating a music application you’ll probably want to use at least that for best results.

Audio Channels Count

You’ve probably heard the words Stereo and Mono before. The main difference between them is that with Mono (1 audio channel), even if you have a pair of speakers, they will both emit the same sound, whereas with Stereo (2 audio channels) you have the freedom to specify different sounds for each speaker (left and right). It’s natural then that with Stereo you’ll need to hold twice the amount of information. The XNA Framework currently supports two possible configurations: AudioChannels.Mono (1 audio channel) and AudioChannels.Stereo (2 audio channels).

Besides these two parameters, there are also a couple things that you do not have control over (because XNA enforces them for you), but should still understand:

Bit Depth

The bit depth describes how many bits of precision will be used to encode each sample. A larger bit depth implies that you’ll have more precision when dealing with small variations of amplitude in your wave, and get a digital representation that is closer to the original analog wave. When working with waves you’ll normally be generating floating-point samples (usually between -1.0 and 1.0). The problem is that XNA’s audio engine does not work with floating-samples directly. Instead, it works with a fixed bit depth of 16 bits/sample (signed). So basically, you’ll need to convert each of your floating-point samples (in the -1.0 to 1.0 range) to a short (signed 16-bit integer) sample (in the short.MinValue to short.MaxValue) range before feeding them to XNA.

Audio Format

We’ve just seen that XNA forces a signed 16-bit integer bit depth. That’s only one part of the audio format, namely the one which specifies how each audio sample is represented. But we’ve also seen that by choosing a Stereo configuration you can have two audio channels playing simultaneously. The samples for each audio channel will have to be packed into the same buffer, so how are they organized in relation to each other? Well, when you have only one audio channel, each sample is simply stored in the buffer in same the order they appear. But if you have two audio channels, then it’s a different story. Instead, you’ll have to interleave one audio sample from the left channel with one audio sample from the right channel, and repeat until there are no more samples left (i.e. LRLRLRLRLR etc.). Naturally, you’ll also be needing a buffer with twice the size in order to fit all the samples there. I’ll touch this subject again when describing how to create the audio buffers.

Okay, now that you understand most of the necessary concepts, let’s start putting it to practice and create that DynamicSoundEffectInstance object. I’ll be working with a sample rate of 44.100 Hz and Stereo audio for the rest of this article, since it covers most of the edge cases. Here’s the code for this step:

Step 2: Create Audio Buffers

This is possibly the hardest step to understand in the whole process, because it depends on most of the concepts we’ve just seen. Let’s start from the beginning. A buffer is simply an array of values, and you’ll need to create one big enough to hold the audio data that will be passed to audio engine. But how big should it be, and in what format should it be created?

Size

The first step is to decide how many audio samples (per channel) do you want to submit to XNA at a time (i.e. on each submit operation). To avoid confusion with the physical size of the buffers we’ll be creating, I’ll call this amount “SamplesPerBuffer” from now on. This is an arbitrary value, so you can choose whatever you like, but depending on your choice, there are a few implications. Depending on the size you choose, the audio engine could become unstable, and you’ll have drops in the audio signal and glitches. Also depending on the size you choose, the latency of your signal will change, which in pratical terms determines how fast your application will be able to react to user input. The rules of thumb are:

Ideally you’ll want to choose a value that is small enough for latency to be near real-time, but large enough not to hear any sound glitches. I usually start around 3000 and then tweak it to my needs. On my Windows machine I can get it down to about 1000 or less and it still works perfectly.

Format

So, we’ve decided that each buffer should hold, for instance, 3000 audio samples (as I said, experiment with this value). Now what format should our buffer have? That’s where it gets tricky. The thing is, the format that is more convenient for doing wave calculations is not the same that XNA expects to receive.

In order to ease this step, I like to create two different buffers: one to do all of our wave math calculations (which I’ll call that the “Working Buffer“), and a different one to actually comunicate with XNA (which I’ll call the “XNA Buffer“). I’ll provide a method that will convert samples from one format to the other so you won’t have to worry about XNA’s internal format at all. Now on to our buffer formats:

Working Buffer Format

  1. Our wave function outputs floating-point samples.
  2. Each audio channel will be generated and stored separatedly.

So we’ll create a bi-dimensional floating-point array depending on the number of channels, like so:

 XNA Buffer Format

  1. XNA expects each sample to be a signed 16-bit integer (or short).
  2. XNA expects both audio channels to be passed together, in an interleaved fashion (LRLRLRLRLR).
  3. Important: Even though samples are expected to be 16-bit (short), for compatibility reasons XNA accepts its buffer as an array of bytes (8-bit)! This means that you’ll have to encode each of your 16-bit (short) samples as two consecutive 8-bit (byte) samples instead.

Point 3 is very important because it means that you’ll have to multiply the whole size of your buffer by 2 in order to hold all the necessary bytes (which will be 2 bytes for each sample). Since everything will be stored in a row, we use a uni-dimensional array, like so:

I created this image that should help you visualize better how the two buffers are related. The buffer in the middle is never really created, but serves to illustrate how we need to convert our samples from floats/doubles to shorts, before splitting them up into bytes (click to see in full size).

That’s all there is to it. As long as you create these arrays with the right size and format, you should be okay. Here’s the code:

And the method I mentioned before that converts the contents of the working buffer into the xna buffer is (just add it to your class, it’s an utility method):

The method is heavily commented in order to explain what’s going on, but if you have any question about it, feel free to leave it in the comment section below.

Step 3: Fill Buffers

Now that you’ve gotten here, the rest is actually quite easy! Remember that any calculations you do will be stored in the Working Buffer. Then at the end, you simply use the ConvertBuffer method above to copy it to the XNA buffer in the right format. For the sake of keeping it easy, we’ll be using a simple sine wave function that is described like this:

Then we’ll be filling our working buffer, such that the left and right audio channels are playing sine waves of different frequencies (in order to clearly demonstrate the effects of stereo audio). Generating waves of different frequencies is pretty straightforward, since you just need to pass a different “frequency” parameter to the function. But what value do we pass to the “time” parameter? We need a way to know, for each individual sample we’re generating, at what time that sample corresponds to. For that we need to create some sort of timer.

First, create a variable to count how much time has passed so far:

But instead of incrementing it in our Update method as you’re probably used to, we should increment it whenever we advance to the next sample during generation. Calculating how much we should advance our time variable is also easy. We know, for instance, that if we have a SampleRate of 44100 Hz, this means that there will be 44100 samples in a second. Therefore, we can also deduce that each sample has a duration of 1 / 44100 seconds! So basically all we have to do is advance our time variable in (1 / SampleRate) increments, and our variable will always hold a very accurate representation of time (just check the example code below).

So let’s put this all together and create a FillWorkingBuffer() method that fills our buffer with values from the SineWave function above, while automatically counting the time for us. Note that this function is generating a 440 Hz sine wave for the left channel, and a 220 Hz sine wave for the right channel:

Also don’t forget to call ConvertBuffer() at the end in order to convert your working buffer into XNA’s format!

Step 4: Submit Buffer

All that’s left is to call Play() on our DynamicSoundEffectInstance object and start submitting buffers to it. Submitting the buffer itself is pretty trivial, all you need to do is call:

Since filling, converting and submiting a buffer is usually done in sequence, let’s group all three operations in a function for ease of use:

We’re almost there. There’s only one question remaining! How do I know when it’s time to submit a new buffer? There are two different ways to accomplish this:

  1. Hook the DynamicSoundEffectInstance.BufferNeeded event and submit it there.
  2. Poll DynamicSoundEffectInstance.PendingBufferCount in your update loop and decide yourself when to submit.

I prefer way number 2 because it allows me to keep a consistent amount of buffers in reserve, which prevents audio glitches and dropouts. I tried doing that with the BufferNeeded event (method 1), and when monitoring my PendingBufferCount value, it would keep fluctuating around every second, whereas with method 2 it stays stable. Furthermore, it’s really easy! All you need to do is add this to your Update method:

At last, simply call Play() on your DynamicSoundEffectInstance object after creating the buffers and preparing everything, in order to start playback of your sound!

Sample

Here’s a sample video and source code showing everything described in this article put into practice. You’ll probably notice that there’s many different types of waves being generated (sine, square, pulse, triangle, sawtooth, noise). Since this article became big enough as it is, I’ll be covering those in a separate article dedicated to describing the Oscillator class I’m using on the sample. An oscillator is basically a device that creates wave signals for you. Don’t worry, it’s not complicated at all compared to what we already did.

Source Code

Conclusions

That’s it! I know it was long, but by now you hopefully have an annoying buzzing sound playing on your speakers. There’s still many things missing though, most notably:

  • You can’t play more than one sound at once (like a piano, i.e. we lack polyphony).
  • You can’t control the note’s durations (i.e. when they start and stop, etc)

These are the two questions that I’ll address in part III of the series, which will show you how to create a little playable “piano” with dynamically generated audio. It will also describe how to start shaping all of this code into a framework so that we can have more flexibility to add new sounds, effects and other goodies in the future. Meanwhile, be sure to also check my article about oscillators in sound synthesis. Until then!

Continue To Part III
23 Comments.
  1. AdoTheLimey says:

    Believe it or not, this is just what I was looking for. Looks like I need to get XNA4 installed 🙂

    Thankyou – and in case you are curious I found this page via your youtube vid.

    • I’m really glad you found it helpful 🙂 If you have any questions just leave a post!

      And I might as well add that, while I’m writing with XNA4 in mind, everything I described can be easily implemented in other plataforms, as long as they give you access to the audio buffers.

      For instance, with Flash10+ you can submit audio buffers through the SampleDataEvent.SAMPLE_DATA event of the Sound class, and you can even pass it floating-point samples directly!

      You can also do it with the FMOD library using a simple read callback when creating an audio stream. I never used OpenAL but I’m pretty sure you can do it there too.

    • AdoTheLimey says:

      Thanks for the tips – I’m actually using XNA 3.1, just havent made the upgrade yet. I was getting frustrated with the lack of control I had with the SoundEffect and SoundEffectInstance classes.

  2. Rick says:

    Nice write up. I’m looking forward to your next article. Need an example of playing two different sounds, simultaneously (think a beep that represent a beat/tempo and a click that represents a subdivision of the beat.

    I’ve figured out how to insert the white space (time w/o any sound), but I’m not sure how to blend the two sounds when they both should play on beat one for example.

    Any ideas?

    • Thank you for your interest. Playing multiple sounds together (i.e. polyphony) will be the theme of my next article, where I’ll cover that in more detail. I’ll try to have it finished by today, so look again later!

  3. ToguAudioLine says:

    Great tutorial how to make your own raw sounds with c#. Helped me a lot!

  4. lenny says:

    Great, your tutorial is awsome, but I am missing only onething I don’t fing anything to stop the right or left headsetI can see it into the code but it’s not working. What I am missing?

    • Hello! If you’re talking about the sample, you can control the volume of the left and right channels by pressing the Up or Down arrows while holding down Ctrl (for the left channel) and/or Alt (for right channel).

      I.e. to mute the left channel completely, hold Ctrl + Down for a second or two. Did that work?

  5. lenny says:

    Yes I am talking about it. But when I am talking about Channel I am thinking headphones (Right/Left) I am wrong?
    Else any Idea?

    • Yes, if you’re using headphones then each audio channel will correspond to a different ear. But did you use the key combination I said above?

      Because I just tried the sample and it’s working perfectly here. If I had to guess, I think maybe it’s a keyboard problem. Head over to the HandleInput() method and at the top of the change Keys.LeftAlt and Keys.LeftControl to some other keys. Maybe your keyboard only has RightAlt and RightControl and is not detecting the key presses?

  6. lenny says:

    Ok I tryied on an other computer and the result is different, IT’S WORKING Prefectly now i know that my laptop (HP) is not done to develop music. Thank very much for your work!

    PS: My keys are working I already used them on other projects 🙂

    • Oh, glad it works on another computer, but now I’m intrigued. I wonder why it would fail on your laptop when it works on other computers.

      I can’t imagine a soundcard nowadays only supporting mono sound. And if that were the case I think the program would probably throw an exception when trying to initialize the audio engine in stereo mode.

      If someone sees this and knows the answer, drop me a note please. 🙂

    • lenny says:

      Yes for me too please. (And my laptop is brand new) with i7 core (so new isn’t it :))

  7. Lochana says:

    Hi there,

    Thanks for this great tutorial. I got it working and I also implemented a graphical slider which allows me to change the frequency as it is playing. My next problem is the change in pitch does not sound smooth. The changes sound glitchy and piece-wise. I’m trying to achieve a “Theremin” like effect if you know what I mean. Do you have any tips to solving this problem?

    Lochana

    • Hello! Unfortunately I’ve tried that same thing before (in order to create a pitch bend wheel) and had exactly the same problem. I just can’t get the sound to change pitch without sounding glitchy. If you do figure this out, please let me know too! I’ve made a “bridge” between this synth and any midi controller connected to the computer, and I’d like to support my pitch bend events too.

    • Joe McConnell says:

      This is likely due to steep changes in the frequency in the waveform. As you change one of the parameters, the waveform shifts and you end up with a very steep change over a short period of time.

      Your best bet for getting round this is interpolating between the changes. When you change a parameter, smooth gradually between the values and that should help overcome the clicking/popping. I’d suggest storing the previous setting value and then referencing this when the setting changes in order to determine to range of the change for smoothing purposes.

    • Lochana: I forgot to update, but I already have the solution to this problem too! Check this link: http://stackoverflow.com/questions/8566938/how-to-properly-bend-a-note-in-an-audio-synthesis-application

      Joe McConnell: That was my first impulse too! I spent a deal of time trying to interpolate the frequency values smoothly but the results were never correct. Then I forgot about the issue for a while, until I decided to ask on stackoverflow. Bending works correctly by doing the change I describe on that link, although I’m still not completely sure why it does.

    • Joe McConnell says:

      Thanks David, that looks very useful. Have you any plans to cover filters (high-pass, low-pass) and LFOs in coming articles? Those really seem to take the sounds to a new dimension.

    • Yes, I do have plans to eventually learn how to implement filters, proper envelopes, LFOs and a few simple effects such as reverb, delay and chorus. But not at the moment, all of my development time must be focused on my current project.

      By the way, I’ve also managed to add MIDI controller support to this synthesizer in XNA and it worked well. But the problem is that it used WDM drivers and I couldn’t figure out any way to make it use ASIO drivers yet, so there was a bit of latency.

      If only I could figure out how to do MIDI interfacing with ASIO drivers on C# I would expand this sample to recognize the most common midi messages (note on, note off, pitch bend and sustain pedal), and have some fun with it!

    • Joe McConnell says:

      Hi David, I changed my code to reflect the advice and it works great. Many thanks for sharing the info.

      Glad you’re planning to look into proper envelopes and filters in future. I’ve implemented ADSR envelopes which work pretty well. I found a couple of useful articles on CodeProject, which may help you at some point:

      http://www.codeproject.com/Articles/19618/C-Synth-Toolkit-Part-I
      http://www.codeproject.com/Articles/19621/C-Synth-Toolkit-Part-II

      You might even find some stuff by the same guy (Leslie Sanford) to help you with your MIDI issues. It certainly seems relevant.

      I plan to blog my experiences once my synth is working properly. Good luck with on your audio adventures!

    • Thanks! I also found this link just today which you might enjoy:

      http://csharpsynthproject.codeplex.com/

      He seems to be making steady progress on his project. 🙂

  8. tony says:

    excellent article. thank you! wow!

  9. Henry says:

    Thank you! Like the poster below, this is also exactly what I was looking for. The internet is a wonderful place!

  1. By About Oscillators | David Gouveia on April 11, 2011 at 12:25 am

    […] my Creating a Basic Synth in XNA 4.0 – Part II article I exemplified sound generation using a simple sine wave function. However, the sample video […]

  2. […] Continue to part II Read more from Game Development Audio, DynamicSoundEffectInstance, Synth, XNA 4 Comments Post a comment […]

  3. […] part II I’ve shown how to use the DynamicSoundEffectInstance class in order to to continuously play […]

  4. By One-line Algorithmic Music in XNA | David Gouveia on October 21, 2011 at 11:17 pm

    […] Here’s how I solved it. Note: If you’re not familiar with XNA’s dynamic audio API then check one of my earlier articles such as this: Creating a Basic Synth in XNA 4.0 – Part II. […]

  5. By Audio specific by wookster77 - Pearltrees on December 10, 2011 at 12:24 pm

    […] A wave can be implemented (in code) as any regular function that returns an amplitude given a specific time input (e.g. double wave(double time)). Creating a Basic Synth in XNA 4.0 – Part II | David Gouveia […]

Leave a Reply