Today while checking my analysis and what i have done so far, I realized I had not properly noted down the descriptive statistics. So, I decided to note them down in today’ s blog.
- On checking the information for the dataframe, it can be seen that there are a maximum of 8002 non null values, which essentially indicates that there are 8002 records.
- It can also be seen that all features do not have the same number of non-null values. This indicates missing values. So, next i checked the total number of missing values and got the following result:
From this, I observed that the ‘race’ feature had the maximum number of missing values. This if followed by ‘flee’. - Next, I used the ‘describe( )’ function. It displayed the following:
The ‘id’ , ‘longitude’, ‘latitude’ feature description does not help much. The descripts for age show a mean of 37.209 and a standard deviation of 12.979. - To visualize the skewness of the age distribution I created a plot.
The data appears to be left skewed with its the mean being 37.2. The maximum ages lie between 27 years and 45 years. - On creating a bar plot to view the distribution of ‘manner of death’, it can be seen that maximum deaths occur with only shootings with barely 4.2% of the victims being tasered and then shot.
- The barplot of gender distribution shows that majority of the victims are male with less than 1000 female victims.
- The boxplot of race distribution shows that maximum of the victims are white with (41%) followed by 22% of black victims and 15% Hispanic. We have to remember that we have over 1000 missing values for race which this would indicate a strong bias towards White victims
- On checking the statistics ‘weapon type’ which the victim/fugitive possessed, it shows that over 4000 of them had a gun while other typyes of weapons in possession are 1200.