[Project 2] Day 9: Descriptive Statistics of Data

Today while checking my analysis and what i have done so far, I realized I had not properly noted down the descriptive statistics. So, I decided to note them down in today’ s blog.

  • On checking the information for the dataframe, it can be seen that there are a maximum of 8002 non null values, which essentially indicates that there are 8002 records.

  • It can also be seen that all features do not have the same number of non-null values. This indicates missing values. So, next i checked the total number of missing values and got the following result:
    From this, I observed that the ‘race’ feature had the maximum number of missing values. This if followed by ‘flee’.
  • Next, I used the ‘describe( )’ function. It displayed the following:

    The ‘id’ , ‘longitude’, ‘latitude’ feature description does not help much.  The descripts for age show a mean of 37.209 and a standard deviation of 12.979.
  • To visualize the skewness of the age distribution I created a plot.

    The data appears to be left skewed with its the mean being 37.2. The maximum ages lie between 27 years and 45 years.
  • On creating a bar plot to view the distribution of ‘manner of death’,  it can be seen that maximum deaths occur with only shootings with barely 4.2% of the victims being tasered and then shot.

  • The barplot of gender distribution shows that majority of the victims are male with less than 1000 female victims.
  • The boxplot of race distribution shows that maximum of the victims are white with (41%) followed by 22% of black victims and 15% Hispanic. We have to remember that we have over 1000 missing values for race which this would indicate a strong bias towards White victims

  • On checking the statistics ‘weapon type’ which the victim/fugitive possessed, it shows that over 4000 of them had a gun while other typyes of weapons in possession are 1200.

Leave a Reply

Your email address will not be published. Required fields are marked *