What Is a Deep Neural Network?

This article explains by means of an example what a deep neural network is and how they work. Artificial Intelligence (AI) is a broad set of computer science techniques that allow a computer to imitate human intelligence.

An artificial neural network is a machine learning technique that allows a computer to do tasks, by training it, that would be very difficult to do using conventional programming techniques. This is best understood by looking at an example

Imagine you had hundreds of thousands of images, some of which had dogs in them and you decided you wanted to write a computer program to recognize dogs in pictures.

You have two choices. You can write a program to explicitly identify dogs. Or you can write a program that “learns” how to identify dogs.

You unwisely decide to try to do the former.

You create an software program using “if” and “then” statements where the likelihood that you are looking at a dog is programmed to increase every time you identify a doglike attribute such as fur, floppy ears and a tail. The problem is this is hard on many levels.

For example if a clump of pixels resembles a tail increase the likelihood that you are looking at a dog.

Your code needs to identify groups of pixels that correspond to the doglike attributes. Even if you manage to do that many photographed objects share some of the dog like attributes, especially photographs of similar animals. Sometimes the attributes are there but obscured. Sometimes attributes are only of importance when other attributes are present.

Your classification fails. You realize you cannot manually identify the complete set of attributes let alone devise all the rules needed to deal with all these special cases.

You wisely give up and decide to try the latter approach. To use a neural network.

The neural network is so named because there is a similarity between this programming approach and the way the brain works.

Just like the brain, the neural net algorithms use a network of neurons or nodes. And like the brain, these neurons are discrete functions (or little machines if you like) that take in inputs and generate outputs. These nodes are arranged in layers whereby the outputs of neurons in one layer become the inputs to neurons in the next layer until the neurons on the outer layer of the network generate the final result.

There are therefore layers of neurons with each individual neuron receiving very limited inputs and generating very limited outputs just like in the brain. The first layer (or input layer) of neurons takes in the inputs and the last layer of neurons (or output layer) in the network output the result.

The human brain is far more complex and powerful than a neural network of course. Naming the algorithm a “neural network” was a branding coup but it may create unrealistic expectations about what is achievable with these techniques. That said there are people trying to re-engineer the brain, using a very complex neural network, in the hope that by doing this they will be able to replicate general, human like intelligence in bot development.

So how do the neural net and machine learning techniques help us with our dog recognition problem?

Well, instead of manually defining dog-like attributes, the algorithm can identify the important attributes and deal with all the special cases without programming.

It does this as follows:

Each neuron on the input layer receives a bit of information from the image as an input and then randomly weights (between zero and one) whether that information suggests a dog or not. A low weight (less than 0.5) means it’s less likely that the information is associated with a dog and a high weight means it's more likely the information is associated with a dog. This multilayer neural network approach is called deep learning. Neural networks and deep learning are very powerful techniques for achieving computer comprehension.

So to continue, the weights of these neurons are then fed as inputs into the other layers of neurons which also randomly assign weights and pass them on as inputs to yet more neurons in the network. This continues until the output layer of neurons gives a binary verdict. If the average of the weights passed to them is greater than 0.5, it’s a dog otherwise it’s not. These connections between and activation of neurons across a multiple layers of nodes are what give neural network applications their power.

Relevant questions at this point are: has it guessed correctly or not and what happens if it has or has not guessed correctly? And how does it know if it’s guessed correctly or not?

One way it would know is if you undertook the extremely time consuming classification task of labelling all the photographs “dog” or “not dog” depending as to whether there is a dog in the photo or not. The neural net will simply look at the label to see if it correctly identified the dog or not.

And of course we are not interested in whether it got the “Dog or not” question right on just one photo. We are interested in whether it got the question right for every photo or at least what percentage of the time was it accurate in determining there was a dog or not in the photo in question.

For a given set of weights across all neurons in the network, the neural net will make guesses for all photos and then determine how accurate it was. What percentage of the time did it get the right result i.e. say the dog was in the photo when it was in the photo, and how many times did it get the wrong result, say the dog was in the photo when it wasn’t or say the dog wasn’t in the photo when it was. This indication as to how accurate the AI algorithm is essential feedback for the neural network model.

Once it has run through all the photos once, it can randomly (or otherwise) adjust some of the weights and then do the whole exercise of guessing what is in the photo again. If the result from the second run is better, it will keep instead of reverting back to the previous set of weights. If the result from the second run was worse it may revert to the previous set of weights and then try different modifications to those weights.

This process will carry on in this way until the neural net becomes good at identifying dogs in photos (hopefully).

When the algorithm can accurately identify dogs it is said to have converged. It has been successfully “trained” to identify dogs.

One way to imagine what the algorithm is doing is to imagine each neuron as a kind of certainty test. Instead of coding all those if then statements to identify dogs, each neuron is calibrated to add or take away from the final judgement that the object in the photo is a dog. It’s as though the judgement (such as dog or not) is split into a large number of connected judgements that contribute in aggregate to a final judgement.

Of course the primary objective is to achieve convergence if that is possible. It’s also an important objective to do this in a reasonable amount of times, preferably in a short time. What is interesting is that the logic that allows the neural network to identify the dogs in the picture is not human understandable. It is hidden logic, essentially a black box. That said, there have been some attempts to try to visually represent the logic behind neural networks for image recognition tasks. For other cases it in not possible to see what the algorithm is doing behind the scenes.

Neural networks and machine learning are popular now but many of these algorithms were known around 50 years ago.

One of the primary reasons that neural nets are much more popular now than at the time they were first invented, is that processing power is faster and cheaper than it was. Computing power has made all the difference in being able to achieve fast convergence. The other reason is that data is now ubiquitous which increases the value of algorithms that can make use of the data like chatbots for business.

Deep learning neural networks are data and processor hungry techniques that can achieve results that would be impossible for programmers using programming techniques to achieve. They are ideally suited to certain problems where ubiquitous data is available and it is easy to categorize or rank preferable outcomes.

Without having hundreds of thousands or preferably millions of photos of dogs it would be impossible to train the algorithm. These techniques only work when a lot of data is available. This is fairly obvious as all the special cases are unlikely to be represented in a set of just 1000 photos.

One problem in the above example is that a lot of manual work is involved in labelling all the photos. It is easier for the algorithms to use data that is labelled in a structured way. Neural network machine learning that uses structured data is called supervised learning.

The question is it possible to avoid all of that tagging work? That would be good because not only could you avoid a lot of manual work but also most of the data available on the internet is unstructured i.e. is not carefully labelled or structured.

Artificial neural networks and machine learning that works with unstructured data is called unsupervised learning. Of course this is the holy grail of machine learning and more analogous to the way humans learn. However, even unsupervised learning by machines requires much more data to “learn” than humans do and machines cannot easily extrapolate to examples that are outside what they have been trained on.

Some people believe that these types of algorithms can be developed, perhaps by re-engineering the brain, to the point that the algorithms start to approach a human level “understanding”. They believe that it will be possible to use sophisticated scanning technology of the brain to allow us to have insights into how the neural networks of the brain actually work. By copying these designs and patterns we will be able to replicate human level intelligence.

While the techniques are no doubt ingenious and very useful, especially where a large data set is available, it is hard to imagine that such simple algorithms could be the basis of a highly creative human-like intelligence.