NLU classification and auto ML

Getting started from an excellent intro from Charlie Flanagan and his Machine Learning for Business class here is some experimentation with my own models, Google AutoML and AWS SageMaker.

Problem

Product reviews from a women’s clothing ecommerce store are provided. Based on each review it has been classified as whether the reviewer would recommend that product or not. Along with each review some additional features are also provided. The goal is to create a model that can predict the likelihood of product recommendation given the customer review and some additional features.

My Attempt

Using Colab and following Charlie’s example.

Example raw data
data['Title_Review'] = data.Title.astype(str).str.cat(data['Review_Text'].astype(str), sep=' ')
Logistic Regression classification
Logistic Regression classification
Logistic Regression classification
Naive Bayes classification
Naive Bayes classification
Naive Bayes classification
XGBoost classification
XGBoost classification
XGBoost classification
Test set comparison of AUC
Test set comparison of AUC
Test set comparison of AUC

AWS SageMaker

Sign-in to your AWS console and find the SageMaker service. First step is to launch the SageMaker Studio which is basically Jupyter. I created a test user with suggested role. Takes few minutes to get setup the first time.

SageMaker Studio setup
SageMaker Studio setup
Create an autopilot experiment
Create an autopilot experiment
Auto created notebooks in SageMaker
Auto created Notebooks
Hyperparameter tuning list and F1 score
Hyperparameter tuning list and F1 score
Feature importance in terms of SHAP values
Feature importance in terms of SHAP values
  • Any models you deploy show in deployments and make sure you delete them
  • Look in the SageMaker dashboard and nothing should be running
  • I’m not clear if leaving SageMaker studio is ok so delete that too if you don’t plan to use it again
  • Cleanup your S3 bucket (SageMaker actually adds a lot of files here)
  • Additionally SageMaker exhausts your free-tier KMS and S3 calls quota so expect some charge there as well
Recent SageMaker activity
Recent SageMaker activity

Google AutoML

Sign-in to GCP and create a new project (this is paid only so you need a billing account linked to your project). Under the hamburger menu group for Artificial Intelligence pick “Natural Language.” As a one-time step it asks to “enable API” but subsequently you will always have the dashboard.

GCP Natural Language dashboard
GCP Natural Language dashboard
Create Dataset
Create Dataset
Start training
Start training
Training results
Training results
Training results