One hot encoding
1 min readAug 12, 2020
One hot encoding on the fly.
I am using a one-hot encoding on the fly method to generate different ONE HOT ENCODING codebooks as different datasets may have different features available for one hot encoding.
For example,
feature1, feature2, feature3, feature4
Dataset A: yes, no, yes, no
Dataset B: yes, no, yes, yes
Dataset C: yes, yes, yes, no
For the above example, if we are using the dataset A and B since both A and B don’t have the feature2, we don’t need to consider about feature2. However, for Dataset A and C, we should consider about feature2.
One hot encoding method in our project
For JAK dataset, we get the feature data from TDU format
For ZINC dataset, we have both the TDU and PG format
For the DeepChem dataset, we only have PG format.
- For one hot encoding of JAK+ZINC, we are using:
JAK TDU + ZINC TDU to get files and do the one hot encoding
for Deepchem + ZINC, we are using
DeepChem PG + ZINC TDU to get files and do the one hot encoding