A series of Jupyter Notebooks containing my code and solutions to the exercises in the third edition of the O'Reilly book Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow (3rd edition):
House prices prediction using California housing dataset.
Implementation: matplotlib for data inspection and visualisation; Stratified sampling based on income category; Combine attributes to make more useful features; Fill in missing values using SimpleImputer; Encode categorical variables using OneHotEncoder; Scale the attributes using StandardScaler; Build a custom transformer to cluster houses based on location using K-means; Create a pipeline to chain together preprocessing and model training; Perform cross-validation to obtain the accuracy results; Utilised hyperparameter tuning using GridSearchCV; Save and load my model using joblib.
Multiclass classification of digits 0-9 using the MNIST dataset, which contains 70000 images. Implemented K-Nearest-Neighbors algorithm with data augmentation. Used Obtained a 97.1% accuracy on the Kaggle knowledge contest.
Topics:
- The Normal Equation used to find the optimal parameter vector that minimizes the MSE cost function
- Batch, Stochastic and Mini-batch Gradient Descent
-
$l_1$ and$l_2$ regularization, such as Ridge, Lasso and ElasticNetRegression. - As an end of chapter exercise, I wrote a softmax regression model utilising batch gradient descent with early stopping and l2 regularization, without using Scikit-Learn, only NumPy. I proceeded to use the model on a multiclass classification task on the iris dataset.