Longer Text with BigBird & Longformer

What do you do when your input text is longer than BERT's maximum of 512 tokens? Longformer & BigBird are two very similar models which employ a technique called Sparse Attention to address this. 

In my video lecture (divided into 9 bite-size pieces), I provide the context for Sparse Attention and explain all about how it works.

I've also created an eBook covering the same material if you prefer that medium! 

To put things into practice, there is also a Colab Notebook applying BigBird to a dataset with longer text sequences. 


Multi-Part Tutorial

Video Tutorial   +   eBook   +   Example Code

1. Practical Observations

  • Where can you expect BigBird to help the most? 
  • What are the caveats?

2. Long Sequence Problem

Why does BERT have a limitation on sequence length to begin with?

3. Self-Attention Review

  • BigBird & Longformer are built on "Sparse Attention", a modification to Self-Attention. 
  • First we'll review how the original Self-Attention works in BERT.

4. Sparse Attention

  • What it is,
  • Why it works,
  • How it's implemented!


Ready to learn a whole new skill?

NLP Base Camp Members have complete access to this tutorial
and all of my NLP content!

Sneak Peek

Here's what you'll see in your library!


50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.