Today while working on the police-shooting data, I learnt about encoding of data.
- Data encoding is the process of converting data from one form to another. We usually perform encoding for purpose of transmission, storage, or analysis.
- By the process of encoding, we can:
- Prepare data for analysis by transforming it into a suitable format that can be processed by models and/or algorithms.
- Create features by extracting relevant information from data and creating new variables to improve the accuracy of analysis.
- Compress data by reducing its size or complexity without reducing its quality.
- Encrypt the data so that we can prevent unauthorized access.
- There are many types of encoding techniques used in data analysis, the few which I learnt are:
- One-hot encoding
- Label Encoding
- Hash Encoding
- Feature Scaling
- One-hot encoding is a technique to convert categorical variables to numerical. In this technique we create new variables that take on values 0 and 1 to represent the original categorical values.
- Lable encoding is also a method to convert categorical variables to numerical type. In this type, the difference is we assign each categorical value an integer value based on alphabetical order.
- Binary Encoding is a technique for encoding categorical variables with a large number of categories, which can pose a challenge for one-hot encoding or label encoding. Binary encoding converts each category into a binary code of 0s and 1s, where the length of the code is equal to the number of bits required to represent the number of categories.
- Hash encoding is a technique for encoding categorical variables with a very high number of categories, which can pose a challenge for binary encoding or other encoding techniques.
- Feature scaling is a technique for encoding numerical variables, which are variables that have continuous or discrete numerical values. For example, age, height, weight, or income are numerical variables.