Transformer number of parameters estimation

Jimmy (xiaoke) Shen
3 min readNov 8, 2022

From the original paper we know that base has about 65M parameters, while the big has 213M parameters.

From [1]

The question is how to compute those number of parameters?

How to computer number of parameters?

--

--