# A machine learning odyssey

As the begining of the book, the chapter introduce some basic elements in machine learning and TensorFlow. As I always do, I will just extract some interesting parts and enclose my opinions. Now, let’s we begin!

## Distance metrics

Let’s we have two feature vectors, $x = (x_1, x_2, x_3, \cdots, x_n)$ and $y = (y_1, y_2, y_3, \cdots, y_n)$. The Euclidian distance $||x - y||$ is calculated by :

$$\sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 + \cdots + (x_n - y_n)^2}$$

Scholars call this the L2 norm. But that’s is just one possible distance functions. The L0, L1, and L-infinity norms also exist. All these norms are valid ways to measure distance. Here they are in detail:

1. The L0 norm counts the total number nonzero elements of a vector. For example, the distance between the origin (0, 0) and vector (0, 5) is 1, because there is only one nonzero element. The L0 distance between (1, 1) and (2, 2) is 2, because neither dimension matches up. Imagine that the first and second dimensions represent username and password, repectively. If the L0 distance between a login attempt and the true credentials is 0, the login is sucessful. If the distance is 1, then either the username or password is incorrect, but not both. Lastly, if the distance is 2, both username and password aren’t found in the database.
2. The L1 norm, shown the in following figure 1.7, is defined as $\sum{|x_n|}$. The distance between two vectors under L1 norm is also referred to as the Manhattan distanse. Imagine living in a downtown area like Manhattan, New York, where the streets form a grid. The shortest distance from one intersection to another is along the blocks. Similarly, The L1 distance between two vectors is along the orthogonal directions. The distance between (0, 1) and (1, 0) under L1 norm is 2. Computing the L1 distance between two vector is the sum of absolute differences at each dimension, which is a useful measure of similarity.
3. The L2 norm shown in figure 1.8, is the Euclidian length of a verctor.
4. The LN norm generalizes this pattern, resulting in $(\sum{|x_n|^N})^{1/N}$
5. The L-infinity norm is $(\sum{|x_n|^\infty})^{1/\infty}$. More naturally, it’s the lagerest magnitude among each element. If the vector is (-1, -2, -3), the L-infinity norm is 3. If a feature vector represents costs of various items, minimizing the L-infinity norm of the vector is an attempt to reduce the cost of the most expensive item.

## Types of learning

### Supervised learning

By definition, a supervisor is someone higher up in the chain of command. When we’re in doubt, out supervisor dictates what to do. Likewise, supervised learning is all about learning from examples laid out by a supervisor (such as a teacher).

### Unsupervised learning

Unsupervised learning is about modeling data that comes without corresponding labels or reponses. The fact that we can make any conclusions at all on raw data feels like magic. With enough data, it may be possible to find patterns and structure. Two of the most powerful tools that maching-learning practitioners use to learn from data alone are clustering and dimensionality reduction.

Clustering is the process of splitting the data into individual buckets of similar items. One of most popular clustering algorithms is k-means, which is a specific instance of a more powerful technique called the E-M algorithm.

Dimensionality reduction is about manipulating the data to view it under a much simpler perspective. It’s the ML equivalent of the phrase, “Keep it simple. stupid.”. One of the earliest algorithms is principle component analysis (PCA), and a newer one is autoencoders.

### Reinforcement learning

Unlike supervised learning, where training data is conveniently labeled by a “teacher”, reinforcement learning trains on information gathered by observing how the environment reacts to actions. Reinforcement learning is a type of machine learning that interacts with the environment to learn which combination of actions yields the most favorable results. Because we’ re already anthropomorphizing algorithms by using the words environment and action, scholars typically refer to the system as an autonomous agent. Therefore, this type of ML naturely manifests itself in the domain of robotics.

To the reason about agents in the environment, we introduce two new concepts: states and actions. The status of the world frozen at a particular time is called a state. An agent may perform one of many actions to change the current state. To drive an agent to perform actions, each state yields a corresponding reward, An agent eventually discovers the expected total reward of each state, called the value of s state.

The only information an agent knows for certain is the cost of a series of actions that it has already taken, which is incomplete. The agent’s goal is to find a sequence of actions that maximizes rewards.

## TensorFlow

The structure of this book: