2. SynthBilder Application

Figure 4. SynthBilder screenshot

SynthBilder is a Matlab program that transforms an audio signal into a sonogram image and vice versa. It allows one to study and examine the conversion of digital audio signals from the time domain into the frequency domain and back again. Additionally, SynthBilder demonstrates the use of windowing in the transformation of an audio signal. But, most importantly, SynthBilder will allow one to compose in a visual manner with sound, and provide a fine-tuned mechanism for sound synthesis.

The first part of the Matlab application takes a sound as input and produces a spectrogram image as output. The magnitudes and phases of each windowed segment are scaled from 8-bit unsigned integers (0-255) where magnitude is mapped to the red channel and phase is put on the green channel. For now, the blue channel is unused, but is reserved for future use in describing the stereo pan of a signal. Magnitude values from an FFT range from 0 to 1, so conversion to integers means just multiplying by 255. Since phase will range from minus PI to PI, it must first be scaled to a positive range and then multiplied by 255 as so: (phase + pi) / (2*pi) * 255

Figure 5. Sound to image conversion

The second part of the application is able to take a spectrogram as input and produce a sound. The spectrogram can be painted in any program (such as the Gimp) and loaded from a file on disk. This process is almost the exact inverse of the sound-to-image conversion, except that instead of doing an Inverse FFT, SynthBilder renders the sound by means of additive synthesis. The advantage of additive synthesis over the IFFT is that it allows more control for playing with time and frequencies. The following figure demonstrates how this is done.

Figure 6. Image to sound conversion: phaser addition & windowing

The Red and Green pixel column values of the image are converted back to magnitude and phase for specified frequencies (spaced evenly from 0 to Fs/2 in the case of SynthBilder) and then added and windowed. Each column segment is rendered individually, and are then concatenated with a specified overlap size to form the entire segment of audio. If the image is a single channel image, then magnitudes are used for the conversion and phase can either be set to zero overall or be randomly generated.