Awesome Transformer Tutorial and blogs




Relationship of dot product to matrix multiplication.

Why scaled dot production?

The number or parameter for each layer?




Data Scientist/MLE/SWE @takemobi

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to participate in the Elastos x Bit.Game Airdrop

Automating Logical Volume Manager with Python

CI/CD : A Two-Sided Quick Outline

How to buy a new license

Shh! Your secret is safe — A simple guide to Steganography in Python

Working With JSON in Golang

How to properly override the ENTRYPOINT using docker run

From zero to self-learning Dragster bot

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jimmy Shen

Jimmy Shen

Data Scientist/MLE/SWE @takemobi

More from Medium

The Transformer: Key Takeaways

Review — RoBERTa: A Robustly Optimized BERT Pretraining Approach

How to Build a Code Search Tool Using PyTorch Transformers and Annoy

NeuralSpace’s Building Blocks for NLP in low-resource languages