Hello, ML enthusiasts! 🚀🤖 We analyzed rotational equilibria in our latest work, ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS
💡 Our Findings: Balanced average rotational updates (effective learning rate) across all network components may play a key role in the effectiveness of AdamW.
🔗 ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS
Looking forward to hearing your thoughts! Let’s discuss more about this fascinating topic together!
The human brain isn’t a blank slate when it comes into existence. There are already structures that are designed to do certain things. These structures come “pre trained” and a lot of the learning humans do is more akin to the fine tuning that we do for foundation models.