When converting from image to sound, there is no telling if phase has been included in the image (ex. the image is black and white only). And, because SynthBilder should also be able to read images that don't have proper phase saved on the green channel or images that are only black and white, it is necessary to make some special considerations.
The main problem in "creating" phase for a signal is how to know the shape and direction of the tail of a window so that when the following window segment is overlapped and added to it, there is a smooth transition from one segment to the next. When the window is a rectangle, there is a big chance that the end of one segment is completely out of line with the beginning of the other. Here you notice an audible 'click'. When a Hanning or Tukey window is used, there still can be various kinds of constructive or destructive interference where the windows overlap.
The graphic above shows the transition from one signal at a fixed frequency overlapped with another signal at the same frequency. One can see how the window can smooth over the border between the two. However, a smoothing window doesn't guarantee that cancellations won't take place. When multiple signals with varying frequencies are added, the problem is expanded.
In terms of image to sound conversion, this problem can be defined as how to gracefully traverse the border of one column to the next column. The problem can best be seen when dealing with pure tones. To demonstrate this, I provide the following two graphics.
Both signals above were rendered with a Tukey window at alpha 0.2. The black and white image on the left represents the magnitude components of a spectrum. The two signals on the left show a close-up of the rendered audio at a place where windows have been overlapped. Above, one can see that with the same window, a change in phase can make a big difference in the signal.
Here, as the sound is rendered from the image, column by column (audio segment by audio segment), one should hear a single tone that is pitched from low to high (Fs/2). On the right are two plots of the rendered audio - the top one with phase set to zero and the bottom one with random phase. In both cases, the overlap from one windowed segment to the next leaves some residual of the border, however, in most cases the random phase variation sounds the most pleasing.
An idea for the future that might address this problem would be to develop a graphical synthesis technique similar to SynthBilder that uses vector graphics as input instead of raster images. With a vector representation of sound (as an image), there would be no column 'borders' to traverse. That way, each sound transition "object" - a vector based line or shape - could be rendered with scalable phase based on its position in time. Additionally, it would allow for frequencies to be ramped smoothly as if it were an analogue transition from one frequency to the next.