[Project 2] Day 8: Understanding Encoding of data

Today while working on the police-shooting data, I learnt about encoding of data.

  • Data encoding is the process of converting data from one form to another. We usually perform encoding for purpose of transmission, storage, or analysis.
  • By the process of encoding, we can:
    • Prepare data for analysis by transforming it into a suitable format that can be processed by models and/or algorithms.
    • Create features by extracting relevant information from data and creating new variables to improve the accuracy of analysis.
    • Compress data by reducing its size or complexity without reducing its quality.
    • Encrypt the data so that we can prevent unauthorized access.
  • There are many types of encoding techniques used in data analysis, the few which I learnt are:
    • One-hot encoding
    • Label Encoding
    • Hash Encoding
    • Feature Scaling
  • One-hot encoding is a technique to convert categorical variables to numerical. In this technique we create new variables that take on values 0 and 1 to represent the original categorical values.
  • Lable encoding is also a method to convert categorical variables to numerical type. In this type, the difference is we assign each categorical value an integer value based on alphabetical order.
  • Binary Encoding is a technique for encoding categorical variables with a large number of categories, which can pose a challenge for one-hot encoding or label encoding. Binary encoding converts each category into a binary code of 0s and 1s, where the length of the code is equal to the number of bits required to represent the number of categories.
  • Hash encoding is a technique for encoding categorical variables with a very high number of categories, which can pose a challenge for binary encoding or other encoding techniques.
  • Feature scaling is a technique for encoding numerical variables, which are variables that have continuous or discrete numerical values. For example, age, height, weight, or income are numerical variables.

Leave a Reply

Your email address will not be published. Required fields are marked *