Skip to content

1: Load, Extract, and Clean the Data

Objective

In this lab, we will:

  • Download and extract the Bike Sharing Dataset from the UCI Machine Learning Repository.
  • Clean and prepare the data for analysis.
  • Understand the data through visualization and summary statistics.
  • Store the cleaned data for future use.

Guide

Step 1 - Find and Open the Jupyter Notebook

In directory "workshop_materials/bike_demand_forecasting", look for notebook "01_data_exploration.ipynb" and open it.

Step 2 - Download the dataset into the environment

The Data for bike sharing company can be found under this link.

https://archive.ics.uci.edu/static/public/275/bike+sharing+dataset.zip

You should set this URL at the beginning of the notebook for variable "DATASET_URL" (copy and paste the link).

Please follow the instructions inside the notebook and execute each code cell to explore, clean, and preprocess the dataset. The final cleaned dataset will be saved in the data/processed directory.

When you are finished with the notebook, go to the next exercise Prepare Data for Training.