Tutorial 5: Linear Classifiers#

0. Background Reading: The Iris Dataset#

Before starting this assignment, please read about the Iris dataset to understand its origin and significance in machine learning. For more information, visit:

1. Load the Iris Dataset#

Use the following code snippet to load the Iris dataset. We will filter the dataset to use only two classes (class 0 and class 1) for binary classification.

Note: You will need to install the scikit-learn library to run this code. You can install it using the following command:

pip install scikit-learn
from sklearn.datasets import load_iris

# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Filter to use only two classes (class 0 and class 1)
mask = y < 2
X = X[mask]
y = y[mask]

## ADD CODE

2. Generate Training and Test Datasets#

Split the dataset into training and test sets, where 80% of the data/labels are used for training and 20% for testing. Do not use scikit-learn’s train_test_split; instead, perform the split using NumPy.

Hint: You can use np.random.permutation to shuffle indices and then split them.

## ADD CODE

3. Implement the Perceptron Algorithm#

Copy the train_perceptron function from Linear Classification Algorithms.

## ADD CODE

4. Train and Evaluate the Classifier#

Apply the train_perceptron function to the train data and labels. Then output the accuracy of your perceptron-based classifier on the test data.

## ADD CODE

5. Implement and Evaluate a Random Classifier#

Implement a random classifier that randomly predicts the class label. Evaluate the accuracy of the random classifier on the test data. Note that this is not the train_random_search function from Linear Classification Algorithms. Instead, the random classifier should randomly predict a label for each data point. You should implement the random classifier and output the accuracy of the random classifier on the test data.

Hint: You can use np.random.choice to assign random labels.

## ADD CODE

6. Calculate Margins#

Calculate the margin for each test data point (use the calculate_margin function from Linear Classification Algorithms). Compute the margin for each test example for the perceptron classifier and determine and output the minimum margin over the test set.

## ADD CODE

7. Another Dataset: The Breast Cancer Wisconsin (Diagnostic) Dataset#

Before beginning, read about the dataset to understand its context and significance. Here are some resources:

Repeat each of the above exercises on this dataset. You should re-use the functions defined above. Note that this dataset is already binary, so you do not need to filter the classes.

## ADD CODE