简化Transformer模型训练

邀
请
朋
友
一
起
学

主讲人：刘力源 | 博士生

开课时间

2021.11.19 11:00
课程时长

76分钟
学习人数

7922人次学习

立即学习

添加欣然获取【课件】并进交流群

立即学习

简化Transformer模型训练

Transformer have led to a series of breakthroughs in various deep learning tasks. However, with great power comes great challenges--training Transformers requires non-trivial efforts regarding model configurations, optimizer configurations, and setting other hyper-parameters. Given the inherent resource limitation of real-world tasks, those additional efforts have hindered various research. In this talk, I will present our study towards effort-light Transformer training. First, I will introduce our analyses on the underlying mechanism of learning rate warmup. I will then discuss our analyses on what complicates Transformer training and introduce our study on how to make Transformer easy to train. In the end, I will demonstrate that our research also opens new possibilities to move beyond current neural architecture designing and advance the state of the art. Finally, I will give a brief outlook on future directions.