[Project1] Day 8: Conducting of Cross Validation and Boostrap – Advance Mathematical Statistics (MTH 522) Assignments

Today I went a little indepth into how to do Cross Validation and understanding Bootstrap

To start cross validation, first the data has to be divided 2 parts, i.e the training data and the testing data.
To do this we need to first decide into how many subsets or folds (k) we will split up the total data; as the training data would be k-1 folds and the testing data would be the remainder of the data i.e 1 fold.
Next we need to select a performance metric. Performance metrics are used to measure behaviour and activities which would help to evaluate our model.
We would then repeat this process k-1 times, then take the average of the performance metrics which would be the estimate of the model’s overall performance.
By conducting Cross Validation we would estimate the test error
From my understanding, Cross Validation is random sampling without replacement whereas in Bootstrap there is replacement.
While in cross validation, we conducted the test over a number of training samples but 1 test data which would be useful if we had a large amount of data. But Bootstrap is better when we have less amount of data. By repeatedly resampling from the observed (limited) data, we would estimate the distribution.
Coming back to the project, Now that i have better understood the concepts I believe that Bootstrapping would better serve our model built on the CDC data as it has only 300 odd data point.