[Project 1] Day 7: Intro to Resampling Methods

Today I started learning about resampling techniques, In particular Cross Validation.

  • What is Resampling?  From what I understood, since sampling is process of data collection, resampling is the conduction of repetitive tests on on the same sample or the creation of a new samples on the basis of the 1st observed sample.
  • Why do we use Resampling? When we create prediction models on some data, it is always good to test it on new data. but since we may not always had new data, we can use resampling methods to generate new data.
  • The main usage of Cross Validation is for checking our prediction model for test errors due to over fitting.
    * Test Error is the mean (avg.) error that comes from testing new data while Training Error is the error computed when testing our training data.
    *What is Overfitting? When we conduct our regression analysis, if we ‘fit’ the line extremely close to certain data points, it is said to be over fit.  It would result in the model being fit only for this initial data and not help give a good prediction for other data.
  • In cross-validation we divide the data into 2 parts: the training data and the validation data. In simple words, the training data would be used to ‘train’ or fit the model to the data, and this fitted model would then be used  to try and predict the outcomes in the validation data.
  • With relation to the current project on the cdc data, im still considering if I want to use this approach as compared to bootstrap.

Leave a Reply

Your email address will not be published. Required fields are marked *