Tickle.py Dev Diary #1

‘future kitsch’, by LOBO for the Spring 2003 Jum Nakao collection (2002) - sourced via y2kinstitute (not really relevant but its a nice picture)

future kitsch’, by LOBO for the Spring 2003 Jum Nakao collection (2002) - sourced via y2kinstitute

(not really relevant but its a nice picture)

Hiya, welcome to the development blog for my ASMR generating python script Tickle.py!

Today in development I have scoured the internet (mostly bandcamp) for around 2 hours worth of copyright free ASMR content that I can use to train the model. I have also downloaded jupyter and tensorflow to the computers I will be using to test and train the model, I ran into some issues getting the terminal paths to update on my macbook but after a few hours of googling I managed to resolve that.

My next step after obtaining the audio data would be to pre-cleanse the data for training through back propagation. As I aim to follow and attempt to recreate the conditions shown in the SampleRNN paper. I need to edit the audio to 16 kHz sample rate and 16 bit depth and chopping up the longer audio clips into sections of 8 seconds.

The low sample rate and bitrate is to save on processing power, however I have thoughts on whether or not this will impact the model’s ability to replicate ASMR triggers as there is quite a bit of high end on some of the training examples and this will be lost through the process of downsampling. On the reduction of audio clips to sections of 8 seconds, this is because when dealing with training examples that are very long could have an impact on the usability of the model in terms of training time (A second of audio at 16kHz are 16,000 discrete points of data to process, some of these examples are 10+ minutes!). 8 seconds is more than enough time to capture an ASMR trigger in a training example.

To pre-cleanse the audio files I envision making a python script that goes through each file and cuts them into 16kHz 16 bit 8 second chunks and names them accordingly. For the files that don’t divide nicely into 8 second chunks, the remainder of the time spectrum will be zero padded (filled with silence, samples set to 0).

Still need to find python libraries that can handle downsamping and bit rate shifting as I feel that implementing dithering in a python script on my own will be too much hassle. Libraries that stand out at the moment include SOX and PySound and I have installed both onto my computers instance of Anaconda that I am currently running a python environment in.

In my next blog I aim to update you with my audio splicing script and how successful it is in creating training material for my SampleRNN network! See you next time!

Previous
Previous

Tickle.py Dev Diary #2