One hot encoding

Jimmy (xiaoke) Shen
1 min readAug 12, 2020

--

One hot encoding on the fly.

I am using a one-hot encoding on the fly method to generate different ONE HOT ENCODING codebooks as different datasets may have different features available for one hot encoding.

For example,

             feature1, feature2, feature3, feature4
Dataset A: yes, no, yes, no
Dataset B: yes, no, yes, yes
Dataset C: yes, yes, yes, no

For the above example, if we are using the dataset A and B since both A and B don’t have the feature2, we don’t need to consider about feature2. However, for Dataset A and C, we should consider about feature2.

One hot encoding method in our project

For JAK dataset, we get the feature data from TDU format

For ZINC dataset, we have both the TDU and PG format

For the DeepChem dataset, we only have PG format.

  • For one hot encoding of JAK+ZINC, we are using:
JAK TDU + ZINC TDU to get files and do the one hot encoding

for Deepchem + ZINC, we are using

DeepChem PG + ZINC TDU to get files and do the one hot encoding

--

--

No responses yet