Tutorial 5: Linear Classifiers#
0. Background Reading: The Iris Dataset#
Before starting this assignment, please read about the Iris dataset to understand its origin and significance in machine learning. For more information, visit:
1. Load the Iris Dataset#
Use the following code snippet to load the Iris dataset. We will filter the dataset to use only two classes (class 0 and class 1) for binary classification.
Note: You will need to install the scikit-learn library to run this code. You can install it using the following command:
pip install scikit-learn
from sklearn.datasets import load_iris
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Filter to use only two classes (class 0 and class 1)
mask = y < 2
X = X[mask]
y = y[mask]
## ADD CODE
2. Generate Training and Test Datasets#
Split the dataset into training and test sets, where 80% of the data/labels are used for training and 20% for testing. Do not use scikit-learn’s train_test_split; instead, perform the split using NumPy.
Hint: You can use np.random.permutation to shuffle indices and then split them.
## ADD CODE
3. Implement the Perceptron Algorithm#
Copy the train_perceptron function from Linear Classification Algorithms.
## ADD CODE
4. Train and Evaluate the Classifier#
Apply the train_perceptron function to the train data and labels. Then output the accuracy of your perceptron-based classifier on the test data.
## ADD CODE
5. Implement and Evaluate a Random Classifier#
Implement a random classifier that randomly predicts the class label. Evaluate the accuracy of the random classifier on the test data. Note that this is not the train_random_search function from Linear Classification Algorithms. Instead, the random classifier should randomly predict a label for each data point. You should implement the random classifier and output the accuracy of the random classifier on the test data.
Hint: You can use np.random.choice to assign random labels.
## ADD CODE
6. Calculate Margins#
Calculate the margin for each test data point (use the calculate_margin function from Linear Classification Algorithms). Compute the margin for each test example for the perceptron classifier and determine and output the minimum margin over the test set.
## ADD CODE
7. Another Dataset: The Breast Cancer Wisconsin (Diagnostic) Dataset#
Before beginning, read about the dataset to understand its context and significance. Here are some resources:
Repeat each of the above exercises on this dataset. You should re-use the functions defined above. Note that this dataset is already binary, so you do not need to filter the classes.
## ADD CODE