Get Started with BERT

BERT began a revolution in NLP which has made BERT-based models the most important new tools in the field. With BERT, you can achieve high accuracy with low effort in design, on a variety of tasks in NLP.  

The Inner Workings of BERT will provide you with a detailed, but approachable explanation of BERT’s entire architecture. You’ll learn how it fits within the broader field of NLP, and why it does what it does so well.  

Ready to become a BERT expert?


What you'll learn

Intro to Transfer Learning

BERT’s strength has everything to do with a technique called Transfer Learning, so we start with a tutorial on this powerful approach to machine learning problems.

Inputs & Outputs

Before diving into the internals of BERT’s architecture, I’ve found it helpful to take a “black box” view of the model, and start with understanding:

  • What BERT does to prepare your text for processing
  • How it handles unknown words
  • What it produces on its output


One of BERT’s greatest strengths is its wide applicability to many common NLP tasks. It can’t do everything, though, so we’ll look at which types of applications it supports, and talk about BERT’s general strengths and weaknesses.


The bulk of this eBook is devoted to explaining the internals of BERT’s architecture, and the key concept for understanding this is a mechanism called Self-Attention. I’ll provide an intuitive explanation, as well as walk you through the actual matrix operations. 

Building on this understanding, we’ll look at Multi-headed Attention.

Ready to learn a whole new skill?

Hi, I'm Chris McCormick

I help researchers, students, and developers like you to master the most difficult concepts in AI...

with legible code, simple  illustrations, and video walkthroughs.

Sneak Peek


50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.