[Project 2] Day 2: Initiation of data exploration

Today I started my exploration of the ‘fatal police shootings’ data.

  • The first thing I did was load the 2 csv’s, namely ‘fatal-police-shootings-data’ and ‘fatal-police-shootings-agencies’  to jupyter notebooks.
  •  The ‘fatal-police-shootings-data’ dataframe  has 8770  instances and 19 features while the  ‘fatal-police-shootings-agencies’ dataframe has  3322  instances and 5 features.
  • On reading the column descripts given on github, I realized that the ‘ids’ column in the ‘fatal-police-shootings-agencies’ dataframe is the same as ‘agency_ids’  in the ‘fatal-police-shootings-data’ dataframe.
  • Hence, I changed the column name form ‘ids’ to ‘agency_ids’ in the ‘fatal-police-shootings-agencies’ dataframe.
  • Next, I started to merge both csv’s on the ‘agency_ids’ colmn. However I got an error which stated the I coud not merge on a column with 2 different data types.
  • On checking the data types of the columns by using ‘.info()’  function, I learnt that in one dataframe the column was that of type object while the column in the other sheet was of type int64.
  • To rectify this, I used the ‘pd.to_numeric()’ function and ensured that both columns are of type ‘int64’.
  • Once again I started to merge the data, however I am currently getting an error owing to the fact that in the ‘fatal-police-shootings-data’  dataframe, the ‘agency-ids’ column has multiple id’s present in a single instance (or cell).
  • I am currently trying to split these cells into multiple rows.
  • Once I split the cells, I will go furthur into the data exploration and start the data preprocessing.

Leave a Reply

Your email address will not be published. Required fields are marked *