BERT began a revolution in NLP which has made BERT-based models the most important new tools in the field. With BERT, you can achieve high accuracy with low effort in design, on a variety of tasks in NLP.
The Inner Workings of BERT will provide you with a detailed, but approachable explanation of BERT’s entire architecture. You’ll learn how it fits within the broader field of NLP, and why it does what it does so well.
Ready to become a BERT expert?GET STARTED
BERT’s strength has everything to do with a technique called Transfer Learning, so we start with a tutorial on this powerful approach to machine learning problems.
Before diving into the internals of BERT’s architecture, I’ve found it helpful to take a “black box” view of the model, and start with understanding:
One of BERT’s greatest strengths is its wide applicability to many common NLP tasks. It can’t do everything, though, so we’ll look at which types of applications it supports, and talk about BERT’s general strengths and weaknesses.
The bulk of this eBook is devoted to explaining the internals of BERT’s architecture, and the key concept for understanding this is a mechanism called Self-Attention. I’ll provide an intuitive explanation, as well as walk you through the actual matrix operations.
Building on this understanding, we’ll look at Multi-headed Attention.