Tickle.py Dev Diary #2
I finally managed to make a precleanse audio processing script in Jupyter notebook, this took longer than initially expected but it works and the results are nice to see. The chunking process itself was quicker than expected using this script. The resulting chunks are 8 seconds of 16 bit 16kHz zero padded goodness and I thought that they would be good enough for the implementation of SampleRNN that I was running locally.
Sadly when I went to train the model with the chunks my first script had generated, the zero padding was not as accurate as I had anticipated and the model I was using was not equipped to handle different batch sizes. So I had to find a different chunking script, and in doing so I found an implementation of SampleRNN that had better documentation and a python chunking script: Prism-SampleRNN (if its good enough for memo its good enough for me).
The Prism chunking script has overlap between audio chunks (which is a feature that I did not implement in my first attempt) and a much more accurate zero padding method. One thing that it did not have was the ability to iterate through directories (could only handle a single file at a time) so I had to add that myself.
This new script works! However, it is significantly slower than my first attempt at creating a chunking script. Hopefully the time I save by having training examples of the same batch size is good enough to offset this loss of time. That said, taking apart the prism chunking command line tool and looking at how they used the parser and then adjusting that to create the directory chunking tool has helped me understand a bit more about creating command line scripts in Python so not all is lost!