A simple linear classifier has the following equation:

\begin{align} f(x_{random}, W, b) &= W x_{random} + b \\ W &\in \mathbb{R}^{K \times D} \\ x_{random} &\in \mathbb{R}^D \\ b &\in \mathbb{R}^K \\ \end{align}

In image classification, a single image is represented by $$x_{random}$$ in computer memory. $$x_{random}$$ is an array of $$D$$ numbers, each of which represents a pixel. You can think of $$W$$ as $$K$$ classifiers.

$\overbrace{x_{random}}^{\text{an image = an array of pixel values}}$ \begin{align} W = \left. \begin{bmatrix} \begin{array}{ccccc} - & - & W_{dog} & - & -\\ - & - & W_{cat} & - & - \\ - & - & W_{turtle} & - & - \\ & & \vdots & & \\ - & - & W_{tiger} & - & - \end{array} \end{bmatrix} \quad \right \} K \text{ classifiers or "templates"} \end{align}

Notice each of the $$K$$ classifiers are an array of $$D$$ numbers as well. You can think of each classifier as the “ideal” or “template” image for the class of image it represents. For example, one of the classifiers might represent a dog. If you use it to classify a random image, $$x_{random} \in \mathbb{R}^D$$, it’ll produce a “dog score”. The “dog score” could be the probability the random image, $$x_{random}$$, is an image of a dog. Or it could just be a numerical value that you use to compare against scores when multiplying $$x_{random}$$ by other templates.

\begin{align} W_{dog} \cdot x_{random} = \text{score of how similar the random image is to a dog's image} \end{align}

However you interpret the score, when you multiply $$W$$ and $$x_{random}$$, essentially you’re producing $$K$$ scores, one for each classifier in $$W$$. You’re getting scores for dog, cat, truck, lion, and whatever other classes are in $$W$$. On a more fine-grained level, you’re taking the dot product between each classifier of $$W$$ and $$x_{random}$$. If you recall from linear algebra, taking the dot product between two vectors, $$v_1$$ and $$v_2$$, can be thought of as taking the projection of $$v_1$$ on $$v_2$$ or vice versa. And you can think of computing projection as computing the similarity between the two vectors. In other words, given a random image $$x_{random}$$ and a template image for a dog, $$W_{dog}$$, how similar is $$x_{random}$$ to $$W_{dog}$$?

\begin{align} W x_{random} + b &= \left. \begin{bmatrix} \begin{array}{c} \text{dog score} \\ \text{cat score} \\ \text{turtle score} \\ \vdots \\ \text{tiger score} \\ \end{array} \end{bmatrix} \quad \right \} K \text{ scores} \end{align}

Now compute similarity scores between $$x_{random}$$ and every other template image in $$W$$. Finally, after computing $$K$$ similarity scores, you’ll have $$K$$ scores. Depending on what your score represents, choose the score that tells you which template your random image, $$x_{random}$$, most represents.

For example, max(dog score, cat score, turtle score, …, tiger score) = dog score. Therefore, the random image, $$x_{random}$$, is most likely to be an image of a dog.