Machine Learning From Scratch: Part 1

This is part one of Machine Learning from Scratch

In this lesson, you'll learn how to:

  • Import a module from a bigger library

  • Start working with Matplotlib and Pyplot

  • Declare lists of data

  • Generate a line chart (X and Y axis) from the lists

  • Generate a bar chart

Discover the power of data by implementing machine learning algorithms in Python. Here, I'll show you the logic behind each technique, and you are going to be able to apply machine learning in different situations.

No more talking, let's get straight to it.

Assuming that you have Anaconda and Jupyter Notebooks installed, create a new notebook.

Let's import the pyplot module from the library matplotlib. Pyplot is useful for generating simple charts from data. It's not recommended for heavy-duty data visualizations - you wouldn't use it live in a web dashboard.

#For making simple plots

from matplotlib import pyplot as plt

Now, let's declare two lists - each one containing 7 elements. You'll notice that their elements are corresponding. years[0] is related to gdp[0] - that's for all lists' elements.

years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]

gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

Now, using pyplot, let's plot a line chart.

X-axis: years

Y-axis: gdp

Take a close look at plt.plot syntax. The attribute on the X-axis goes first, the Y-axis goes second. Then, you select the attributes you want:

  • color

  • marker ('o' means a circle as indicator in the chart)

  • linestyle

#create a line chart. Years on x-axis, gdp on y-axis

plt.plot(years, gdp, color = 'green', marker = 'o', linestyle = 'solid')

#add a title

plt.title("Nominal GDP")

Now, let's add a title to our chart and print it right into Jupyter notebook:

#add a label to the y-axis

plt.ylabel("Billions of $")

Pyplot is a simple and fast solution to generate visualizations from data.

In business, you need to be agile. Pyplot charts may not be that good looking or interactive, but they will certainly do their job.

You don't need to memorize each parameter for a function. For example, put your mouse cursor next to plt.plot() and press shift + tab. The docstring of the function will pop into your screen:

Now, let's learn how to plot a bar chart.

Bar charts are useful when when you want to show how some quantity varies among some discrete set of items.

Discrete items are not continuous values - which means that they are not a progression of numbers.

We want to visualize the names and heights in meters of the tallest buildings in the world. After a quick Google search, you will come up with two lists of corresponding items: building_names and heights

building_names = ["Burj Khalifa", "Shanghai Tower", "Makkah Tower", "Ping An Financial Center"]

heights = [828, 632, 601, 555]

As you've declared Pyplot previously, it's already instantiated into your Jupyter Notebook, so there's no need to declare it again. If you've close this notebook, you will have to execute the import statement again.

If you type in and press shift+tab, the docstring of the function will pop into your screen:

Again, you don't need to memorize the parameters each function receives.

To make the bar chart look good, we might want to set up that the length of each bar has the same length of the name of the building. Also, we'll set the bars' heights. As we are talking about a range of values, we might simply call range:, heights)

Let's add titles to our bar chart and y-axis:

plt.title("Tallest buildings in the world") #add a title

plt.ylabel("#height in meters") # label the y-axis

To add labels to our X-axis, we'll call xticks:

plt.xticks(range(len(building_names)), building_names) will literally show our bar chart which must look like this:

That's good for now. I believe that short tutorials are more productive than larger ones.

On the next tutorial of Machine Learning from Scratch we'll keep playing around with Pyplot, collections, histograms and line charts.