Data Science Lesson 2: Pre-processing and cleaning data

In this lesson you will work with a Kaggle notebook to explore the Edexcel large data set. You will meet some examples of how data needs to be pre-processed or cleaned before you can analyse it. You will also see an example of how effective code is for automating many of these processes.

In this activity you will explore the main differences in weather between 1987 and 2015 at different locations using the Edexcel large data set. You will be use a Kaggle notebook and be introduced to some of the ideas around pre-processing and cleaning data so that you can analyse it.

This example demonstrates how effective code is for processing data sets. Run the example in the notebook and then read through the code: you don't need to understand every line but you should be able to get a sense of what it's doing.

