# Statistical Machine Learning by using Jupyter Notebook Problem

### Description

For this HW, we conquer affect that we singly bear 1000 fictions of the MNIST basisset.

We conquer prepare chief after a while what we had in our videos:

`#commsingly used meaningsmeaning numpy as npmeaning matplotlib as mplmeaning matplotlib.pyplot as plt#get basis from sklearnfrom sklearn.datasets meaning fetch_openmlmnist = fetch_openml('mnist_784', version=1, as_frame=False)#get the attributes and addresssX, y = mnist["data"], mnist["target"]#convert addresss from characters to numbersy = y.astype(np.uint8)#split into inoculation and experiment setsX_train, X_test, y_train, y_experiment = X[:60000], X[60000:], y[:60000], y[60000:]`

Suppose we flow on a Conclusion Tree Classifier (balmy later in Chapter 6). Thither are two hyperparameters we conquer pure sprocession for now: max_leaf_nodes and min_samples. Let's pure sprocession these using the aftercited method.

`from sklearn.tree meaning DecisionTreeClassifierfrom sklearn.model_selection meaning GridSearchCV#setup the conclusion treedt_clf = DecisionTreeClassifier( random_state=30)#the hyperparameters to pursuit throughparams = {'max_leaf_nodes': catalogue(range(2, 100)), 'min_samples_split': [3,4, 5]}#initialize the GridSearch after a while 3-fold cantankerous validationgrid_search_cv = GridSearchCV(dt_clf, params, verbose=1, cv=3)#do the pursuit (but singly on the chief 1000 fictions)grid_search_cv.fit(X_train[:1000], y_train[:1000])`

What are the best hyperparameters?

Now, let's fit the inoculation basis (again affecting to singly bear the chief 1000 fictions).

`dt_clf = grid_search_cv.best_estimator_dt_clf.fit(X_train[:1000], y_train[:1000])`

Let's now conquer cantankerous-validation faultlessness scores after a while this pure-tuned type

`from sklearn.model_selection meaning cantankerous_val_scorecross_val_score(dt_clf, X_train[:1000], y_train[:1000], cv=3, scoring="accuracy")`

How servile are your three folds?

If we singly had 1000 fictions and we wanted further to procession on, we could do what is public as basis enrichment. The enrichment we conquer do hither is to displace the fictions we do bear. For each of the 1000 fictions, we displace one pixel up, left, upright, and down. So this conquer grant us filthy further fictions to add to our inoculation. Basis enrichment such as this is a very advantageous technique when thither are not abundance inoculation instances. If you run the aftercited method, you conquer see an pattern of the chief fiction displaceed.

`from scipy.ndimage.interpolation meaning displacedef displace_image(image, dx, dy):    fiction = fiction.reshape((28, 28))    displaceed_fiction = displace(image, [dy, dx], cval=0, mode="constant")    come-back displaceed_image.reshape([-1])fiction = X_trainshifted_image_down = displace_image(image, 0, 5)shifted_image_left = displace_image(image, -5, 0)plt.figure(figsize=(12,3))plt.subplot(131)plt.title("Original", fontsize=14)plt.imshow(image.reshape(28, 28), interpolation="nearest", cmap="Greys")plt.subplot(132)plt.title("Shifted down", fontsize=14)plt.imshow(shifted_image_down.reshape(28, 28), interpolation="nearest", cmap="Greys")plt.subplot(133)plt.title("Shifted left", fontsize=14)plt.imshow(shifted_image_left.reshape(28, 28), interpolation="nearest", cmap="Greys")plt.show()`

To displace all of our fictions and add them to a inoculation basis set, we can use the aftercited method:

`X_train_augmented = [fiction for fiction in X_train[:1000]]y_train_augmented = [address for address in y_train[:1000]]for dx, dy in ((1, 0), (-1, 0), (0, 1), (0, -1)):    for fiction, address in zip(X_train[:1000], y_train[:1000]):        X_train_augmented.append(shift_image(image, dx, dy))        y_train_augmented.append(label)X_train_augmented = np.array(X_train_augmented)y_train_augmented = np.array(y_train_augmented)`

You now bear 5000 fictions in the X_train_augmented and y_train_augmented sets (the 1000 former fictions and each fiction was displaceed up, down, left, and upright).

Let's now pure-sprocession and fit a conclusion tree classifier using this augmented inoculation basis.

`dt_clf = DecisionTreeClassifier( random_state=30)params = {'max_leaf_nodes': catalogue(range(2, 100)), 'min_samples_split': [3,4, 5]}grid_search_cv = GridSearchCV(dt_clf, params, verbose=1, cv=3)#This conquer transfer encircling 10 minutes to rungrid_search_cv.fit(X_train_augmented, y_train_augmented)grid_search_cv.best_estimator_`

Now, what are the best hyperparameters?

Finally, conquer cantankerous-validation faultlessness scores after a while this type processioned on the augmented basis.

What are the faultlessness scores now?

Did the augmented basis acceleration?

Think of one downside to using augmented basis and interpret.

• Submit your method in a Jupyter notebook. Include all method: the method patterns aloft and what you transcribe.
• Put your answers to the questions aloft into markdown cells.
• Use a only hashmark epithet to address the problem