|Topic:||How neural networks learn representation: a random matrix theory perspective|
|Time:||3:00 pm - 4:00 pm|
|Venue:||Lady Shaw Building C3|
|Speaker:||MR. Denny Wu|
Random matrix theory (RMT) provides powerful tools to characterize the performance of random neural networks (at i.i.d. initialization) in high dimensions. However, it is not clear if such tools can be applied to trained neural networks where the parameters are no longer i.i.d. due to gradient-based learning. In this work we use RMT to precisely quantify the benefit of feature (representation) learning in the “early phase” of gradient descent training. We consider a two-layer neural network in the proportional asymptotic limit, and compute the asymptotic prediction risk of kernel ridge regression on the learned neural network representation. Our results demonstrate that feature learning can lead to considerable advantage over the initial random features model (and possibly a wide range of fixed kernels), and highlight the role of learning rate scaling in the initial phase of training.
Joint work with Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Greg Yang.