Today I attempted to build a random forest model to predict mental illness based on the fatal police shootings data.
- For both classification and regression applications, Random Forest is a potent ensemble machine learning technique that is frequently utilized. It is a member of the decision tree-based algorithm family, which is renowned for its resilience and adaptability. The unique feature of Random Forest is its capacity to reduce overfitting and excessive variance, two major issues with individual decision trees.
- Random Forest’s technical foundation is the construction of a set of decision trees, thus the word “forest.” The functions used in Random Forest are as follows:
1. Bootstrapping: As I previously learnt, is a technique in which we create several subsets, referred to as bootstrap samples, by randomly sampling the dataset with replacement. Random Forest uses one of these samples is used to train each decision tree, adding diversity.
2. Feature Randomization: Random Forest chooses a random collection of characteristics for every tree in order to increase diversity. This guarantees that no single feature controls the decision-making process and lowers the correlation between the trees.
3. Decision Tree Construction: A customized decision tree method is used to build each tree in the forest. These trees divide the data into nodes that maximize information gain or decrease impurity based on the attributes that have been selected.
4. Voting: Random Forest uses majority voting to aggregate the predictions of individual trees for classification problems and takes the average of the tree predictions for regression tasks.