Deep Learning for MIR
Schedule
Mon Jul 22 2024 at 10:00 am to Fri Aug 02 2024 at 05:00 pm
UTC-07:00Location
The Knoll | Stanford, CA
About this Event
Deep Learning for Music Information Retrieval, Week 1
July 22-26
This workshop is an introduction to audio and music processing with an emphasis on signal processing and machine learning. Participants will learn to build tools to analyze and manipulate digital audio signals with PyTorch, an efficient machine learning framework used both in academia and industry. Both theory and practice of digital audio processing will be discussed with hands-on exercises on algorithm implementation. These concepts will be applied to various topics in music information retrieval, an interdisciplinary research field for processing music-related data. No pre-requisites, but some knowledge of python is assumed.
In-person (CCRMA, Stanford) and online enrollment options available. Students will receive the same teaching materials and have access to the same tutorials in either format. However, students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.
Schedule
Day 1: Introduction to audio signal processing
Morning: the Discrete Fourier Transform
Afternoon: spectral feature extraction
Lab: supervising additive and subtractive audio synthesis with PyTorch
Day 2: Audio effects and filter design
Morning: digital filter theory
Afternoon: filter implementation and analysis
Lab: parameter learning for IIR and FIR filter design with PyTorch
Day 3: Beat, rhythm, and tempo
Morning: beat tracking and rhythm analysis
Afternoon: non-linear resonance and gradient frequency neural networks (GrFNN)
Lab: beat finding with a GrFNN in PyTorch
Day 4: Pitch and chroma analysis
Morning: pitch representations and detection
Afternoon: music transcription and source separation
Lab: key estimation and chord recognition with Hidden Markov Models (HMM)
Day 5: Music information retrieval and machine learning
Morning: regression, clustering and classification
Afternoon: dataset/model preparation
Lab: music genre classification using deep neural representations in PyTorch
About the instructors
Iran R. Roman holds a PhD from CCRMA. He currently is a theoretical neuroscientist and machine listening scientist at New York University’s Music and Audio Research Laboratory. Iran is a passionate instructor, with extensive experience teaching artificial intelligence and deep learning. His industry experience includes deep learning engineering internships at Plantronics in 2017, Apple in 2018 and 2019, Oscilloscape in 2020, and Tesla in 2021. Iran’s research has focused on using deep learning for auditory scene analysis and human action understanding. iranroman.github.io
Chuyang Chen is a student and research assistant at New York University’s Music and Audio Research Laboratory. With a background in music technology, computer science, and electrical engineering, Chuyang is passionate about building machine listening systems using artificial intelligence, signal processing, and mathematical modeling techniques. His past research topics include beat tracking, music similarity, urban acoustics, and audio-visual analysis.
Deep Learning for Music Information Retrieval, Week 2
July 29-Aug. 2
This workshop will cover the industry-standard methods to develop deep neural network architectures for digital audio using PyTorch. Throughout five immersive days of study, we will cover theoretical and practical principles that deep learning researchers use everyday in the real world. Our schedule will be:
Day 1: Cross entropy and feedforward neural networks
Math - Linear algebra and differential calculus review. The mathematics of feedforward neural networks. Activation functions. Batch Norm.
Theory - How synaptic neuroplasticity inspired the backpropagation algorithm.
Practice - Automating differentiation in a neural network with PyTorch.
Day 2: Dimension reduction techniques for audio
Theory - Dimensionality reduction. Principal Component Analysis. Autoencoders.
Practice a) - Finding interpretable features in the Tinysol and EGFxSet datasets with PCA.
Practice b) - Writing an autoencoder to denoise audio in PyTorch.
Day 3: Convolutional neural networks
Theory - convolution, optimizers and momentum, Loss functions.
Practice - writing a CNN for music genre classification
Day 4: Temporal encoding with RNN, GRU, and WaveNet
Theory - Architecture and data flows on a Gated Recurrent Unit (GRU).
Practice a) - Writing an RNN and a GRU in PyTorch and using it for sound event classification.
Practice b) - Reading the seminal WaveNet paper
Day 5: Generative Models
Theory - Kulback-Leibler divergence. Probability review, Variational autoencoders. Self-attention.
Practice - writing a VAE to use its latent space to generate parameters for an audio synthesizer.
Enrollment Options:
In-person (CCRMA, Stanford) and online enrollment options available during registration (see red button above). Students will receive the same teaching materials and have access to the same tutorials in either format. In-person students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.
About the instructors:
Iran R. Roman is a theoretical neuroscientist and machine listening scientist at New York University’s Music and Audio Research Laboratory. Iran is a passionate instructor, with extensive experience teaching artificial intelligence and deep learning. His industry experience includes deep learning engineering internships at Plantronics in 2017, Apple in 2018 and 2019, Oscilloscape in 2020, and Tesla in 2021. Iran’s research has focused on using deep learning for speech recognition and auditory scene analysis. iranroman.github.io
Where is it happening?
The Knoll, 660 Lomita Court, Stanford, United StatesEvent Location & Nearby Stays:
USD 250.00 to USD 925.00