First Machine Learning Project: Part Two!

Hello everyone! This post will be a continuation of the last post, as I will now be analyzing the code that I wrote while following the YouTube video (which is linked below again). If you would like to see the entire code without it broken up into segments, it will be at the end of the post. 

If you haven’t watched the video, a quick summary is that we are trying to predict the genre that a person will listen to when given their age and gender. 

Analyzing the Code

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

In these two lines, we import the Pandas library and name it “pd” so that we can easily refer back to it later. Pandas is a Python library that is used for data manipulation and analysis. By importing it, we can use its functions throughout our code. The second line is from the scikit-learn library, which is used for building and training machine learning models. Decision trees are used for classification and regression, and decision tree classifiers in particular can classify multiple classes. Here, we import a decision tree part of the scikit-learn library so that we can use it later.

music_data = pd.read_csv(‘music.csv’)

X = music_data.drop(columns=[‘genre’])

y = music_data[‘genre’]

Before explaining these lines of code, it’s important to know that before writing these lines I had downloaded a file with data into the same folder as where my Jupyter notebook was (which is what I was coding with). Hence, in these lines, we use the Pandas library to read the file and then load it into a DataFrame, which has each row representing a data point and each column representing a feature.

The next line creates another DataFrame named X by dropping (or removing) the genre column from the DataFrame ‘music_data’ (which we just created in the previous lines). We drop the genre column to only have the age and gender columns because X will be our inputs.

The next line creates another DataFrame called Y with only the genre column so that Y can represent the outputs (which is the genre).

model = DecisionTreeClassifier()

Here, we assign the Decision Tree Classifier to the variable ‘model’, so that we can refer to it throughout the code. When we call the decision tree with no arguments, we use default parameter values, which can later be modified as needed.

model.fit(X.values, y)

Here, we train the data. ‘Fit’ adjusts the parameters to best fit the patterns analyzed from the data. The inputs and outputs are passed in here so that they can be analyzed and searched for patterns.

predictions = model.predict([[21, 1]])

Here, we finally make predictions. Since the goal was to predict genres when given ages and genders, in the arguments passed into ‘.predict’, we predict what genre a 21-year-old male (represented by 1) would listen to. 

Conclusion

I hope that this cleared up any confusion you had from the last post! I personally believe that by analyzing the code line by line, I was better able to understand the roles of models and libraries. If you followed along with me and wrote the code, then congrats: you just wrote your first project! 

Resources

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

#import data

music_data = pd.read_csv(‘music.csv’)

X = music_data.drop(columns=[‘genre’])

y = music_data[‘genre’]

#create a model

model = DecisionTreeClassifier()

#train model

model.fit(X.values, y)

#making predictions

predictions = model.predict([[21, 1]])

predictions