Hey, I’m Chris McCormick. I help researchers, students, and developers like you to master the most challenging subjects in NLP, and to jump start your projects with easy-to-read eBooks, video tutorials, and example code.
A "base camp" provides equipment and supplies for people before setting off on their adventures...
In the same way, NLP Base Camp will help you understand the fundamental concepts you need and supply easy-to-follow example code that together can provide a smooth start for your next NLP project.
A monthly base camp membership grants you access to ALL of my existing content, plus new tutorials I continue to publish. This includes:
Video Lectures & Code Walkthroughs with colorful illustrations to help guide you through new concepts.
eBooks offer deep-dives into the details of the architectures of word2vec, BERT, and multiple BERT variants.
Tutorials and example code for a wide variety of common BERT use-cases will help jump start your own project.
The Inner Workings of BERT
This course will introduce you to BERT and teach you all about the internals of its architecture.
Basics of Fine-Tuning BERT
Begin applying BERT to real applications with this video course and Colab Notebook. We'll use Python and the "transformers" library from huggingface.
The Library includes over 15 examples--all are written in Python, built on PyTorch and the hugginface/transformers library, and run on a free GPU in Google Colab!
Learn the basics of classifying longer pieces of text with BERT.
Text classification, but now on a dataset where document length is more crucial, and where GPU memory becomes a limiting factor.
Learn how to customize BERT's classification layer to different tasks--in this case, classifying text where each sample can have multiple labels.
Learn how to find and apply publicly-available variants of BERT tailored to specific domains such as medical text.
Add terms to BERT's vocabulary, and improve BERT's accuracy by continuing to Pre-Train BERT on unlabeled text from your domain.
Question Answering Basics
Learn the details of how BERT is applied to search reference text for the answer to a given question. Try with your own examples!
Fine-Tuning on SQuAD
Training BERT on the SQuAD question answering dataset is tricky, but this Notebook will walk you through it!
Custom Reference Text
"Retrieval Augmented Generation" is a QA model which will write out answers to your questions using whatever text dataset that you supply.
Learn the basics of BERT's input formatting, and how to extract "contextualized" word and sentence embeddings from text.
Fine-Tuning Basics Video Course
Learn the basics of fine-tuning BERT with PyTorch and the huggingface/transformers library.
See how to adapt any of our examples to train on a multi-GPU system.
Mixed Feature Types
Many applications include additional feature types besides just text, such as numbers and categories. Learn how to integrate these additional features with BERT.
Named Entity Recognition
Fine-tune BERT to recognize custom entity classes in a restaurant dataset.
Facebook retrained the original BERT with a larger dataset and some training tweaks, and the result is a drop-in replacement with improved accuracy.
Though BERT excels at interpreting text and making predictions, it cannot generate new text. GPT-2 is a different Transformer-based model which includes this ability.
BART is a more recent text generation model which outperforms GPT-2!
Easily find help for your topic. The index organizes all of my content, and recommends resources for additional topics.
Nick and I host a research group 2-3 times a month to share our progress on our current research topic. Join us on Thursdays! Base Camp members also have access to the session recordings!
I'm an author of eBooks, tutorial videos, and example code on a variety of Machine Learning topics--particularly on challenging subjects in NLP.
I'm best known for word2vec blog posts (recommended reading for Stanford's NLP class), BERT architecture YouTube series, and example code for a variety of BERT applications.
I earned my B.S. from Stanford in 2006 as a software engineer, and have been working in the areas of computer vision, machine learning, and NLP since 2012.
My writing and speaking styles are characterized by levity and positioning myself as a fellow learner rather than an authority.
I love to create the tutorials that I wish I could have read--with an emphasis on thoroughness, while still being easy-to-follow.
I'm no graphic artist, but I do find that simple and colorful illustrations can aid greatly in an explanation, and keep you engaged in the learning process.
You'll often find my illustrations reused around the web.
I enjoy creating them and always end up wishing I could add more!
It's usually not hard to find example code on the web, but it's rare that you'll find code that's well organized and commented, and that makes it clear what it does and doesn't do.
Colab Notebooks are a wonderful teaching tool because they allow me to:
Beyond that, I like my code to be abundantly easy to read, and so it's rare to find a line of my code without a verbal explanation of what it does. (In other words, my code is thoroughly commented!)
The majority of my content is focused on BERT and related models.
You'll find thorough tutorials on the architecture of BERT and how to apply it.
You'll also find tutorials on how to apply BERT to a variety of applications:
After BERT, the next topic with the most coverage is word2vec, with a video course, eBook, and example code.
Code is written in Python and shared in the form of Google Colaboratory Notebooks.
BERT examples are built on PyTorch and the 'transformers' library from huggingface.
With a few exceptions, my tutorials are not available to purchase individually or as a bundle.
However, you are always welcome to become a member for as many months as you need to go through the content that you are interested in. You are free to cancel your membership at any time.