Multilingual BERT

How do you apply BERT's magic to languages beyond just English? Why isn't it always as simple as re-training BERT on text from your language?

In four Colab Notebooks with a video walkthrough, this tutorial explains, implements, and compares several approaches:

4-Part Tutorial

4x Colab Notebooks + Video Walkthrough
PyTorch + huggingface/transformers

A model trained on 100 different languages must have a pretty strange vocabulary--let's see what's in there!

Code tutorial applying XLM-R on Arabic.
Leverages Cross-Lingual Transfer - We'll fine-tune on English data then test on Arabic data!

Code tutorial with community-created Arabic BERT model.
Train with machine-translated text.
- _{(Note: This Notebook uses existing translated text from the XNLI dataset--it does not include code for translating new text).}

NLP Base Camp Members have complete access to this tutorial
and all of my NLP content!

GET STARTED

These Notebooks can be easily modified to run for any of the 15 languages included in the XNLI benchmark!

9. Russian

10. Swahili

11. Thai

12. Turkish

13. Urdu

14. Vietnamese

15. Chinese

_{(Note: Monolingual Notebook requires finding a BERT model trained on your language)}