Sound and Visual Representation Learning
with Multiple Pretraining Tasks