Why programmers should learn deep learning

Fei Lianghong: Why should programmers learn deep learning?

Deep learning itself is a very large knowledge system. This article would like to start from the perspective of programmers, let everyone observe what deep learning means to programmers, and how we can use such a rapidly developing discipline to help programmers improve their software development capabilities.

This article is compiled based on Fei Lianghong’s speech at the 2016 QCon Global Software Development Conference (Shanghai).

Foreword

In 1973, a popular science fiction movie "WestWorld" was released in the United States, and three years later there was a sequel called "FutureWorld". This movie was introduced to China in the early 1980s as "Future World". That movie was simply shocking to me. There are many robots in the film, with integrated circuit boards underneath their expressive faces. This made me feel at that time that the future world was so far away and so mysterious.

The time has arrived in 2016, and many of my friends may be following the series "WestWorld" on the same theme that HBO spent heavily on. If the first two movies were still limited to topics such as robots and artificial intelligence, the new drama in 2016 has made great breakthroughs in plot and thinking about artificial intelligence. It is no longer about whether robots will threaten humans, but about more philosophical questions such as "DreamsaremainlyMemories".

The topic of "How does memory affect intelligence" is very worthy of our thinking, and it also gives us a good revelation - what kind of development and progress has been made in the field of artificial intelligence today.

The topic we discuss today is not just simple artificial intelligence. If you are interested in deep learning, I believe you will have searched for similar related keywords on search engines. I used deeplearning as a keyword on Google and got 26.3 million search results. This number is more than 3 million results higher than a week ago. This number is enough to show the speed of development of deep learning-related content, and people are paying more and more attention to deep learning.

From another perspective, I want everyone to see how popular deep learning is in the market. From 2011 to now, more than 140 startups focusing on artificial intelligence and deep learning have been acquired. In 2016 alone, more than 40 such mergers and acquisitions occurred.

The craziest thing among them is Google, which has acquired 11 artificial intelligence startups, the most famous of which is DeepMind, which defeated Lee Sedol 9th ??Dan. Following the rankings are Apple, Intel and Twitter. Taking Intel as an example, it has acquired three startups this year alone, Itseez, Nervana and Movidius. This series of large-scale mergers and acquisitions are aimed at laying out the fields of artificial intelligence and deep learning.

When we search for deep learning topics, we often see such obscure terms: Gradient descent (gradient descent algorithm), Backpropagation (backpropagation algorithm), Convolutional Neural Network ( Convolutional neural network), restricted Boltzmann machine (Restricted Boltzmann Machine), etc.

If you open any technical article, you will see various mathematical formulas throughout. As you can see, the picture on the left below is not actually a high-level academic paper, but just an introduction to Boltzmann machines from Wikipedia. Wikipedia is content at the popular science level, and the complexity of the content exceeds the capabilities of most mathematical knowledge.

In this context, my topic today can be summarized into three points: first, why should we learn deep learning; second, the core key concept of deep learning is neural network, then What exactly is a neural network? Third, as programmers, when we want to become deep learning developers, what kind of toolbox do we need and where to start development.

Why should we learn deep learning

First, let’s talk about why we should learn deep learning. In this market, the most indispensable thing is the vocabulary of various concepts and various fashionable new technologies. What is different about deep learning? I really like a metaphor once used by Andrew Ng.

He compared deep learning to a rocket. The most important part of this rocket is its engine. Currently, in this field, the core of the engine is the neural network. As we all know, rockets need fuel in addition to engines, so big data actually constitutes another important component of the entire rocket - fuel. In the past, when we talked about big data, we emphasized more on the ability to store and manage data, but these methods and tools are more about statistics and summary of past historical data.

For unknown things in the future, these traditional methods cannot help us draw predictive conclusions from big data. If we consider the combination of neural networks and big data, we can see the true value and significance of big data clearly. Andrew Ng once said, "We believe that (deep learning represented by neural networks) is the shortcut that allows us to obtain the closest approach to artificial intelligence." This is one of the most important reasons why we should learn deep learning.

Secondly, as our data processing and computing capabilities continue to improve, the artificial intelligence technology represented by deep learning has made rapid progress in performance compared with artificial intelligence technology in the traditional sense. This is mainly due to the continuous development of computer and related industries in the past few decades. In the field of artificial intelligence, performance is another important reason why we choose deep learning.

This is a video released by Nvidia this year about the application of deep learning in the field of driverless driving. We can see what level can be achieved by applying deep learning to autonomous driving after only 3,000 miles of training. In experiments conducted at the beginning of this year, this system did not yet have truly intelligent capabilities, and often experienced various frightening situations, and even required manual intervention in some cases.

But after 3,000 miles of training, we have seen that autonomous driving has performed very amazingly on various complex road conditions such as mountain roads, highways, and mud. Please note that this deep learning model has only been trained for a few months and 3,000 miles.

If we continue to improve this model, how powerful this processing power will become. The most important technology in this scenario is undoubtedly deep learning. We can draw a conclusion: deep learning can provide us with powerful capabilities. If programmers have this technology, it will make every programmer even more powerful.

Quick Start with Neural Networks

If we have no doubts about learning deep learning, then we will definitely be concerned about what kind of knowledge I need to master to allow me to enter this field. The most important key technology here is "neural network". Speaking of "neural network", it is easy to confuse these two completely different concepts.

One is the biological neural network, and the second is the artificial intelligence neural network we are going to talk about today. Maybe all of you here have friends who are working in artificial intelligence. When you ask him about neural networks, he will throw out many unfamiliar concepts and terms that make you sound confused, and you can only stay away.

For the concept of artificial intelligence neural network, most programmers will feel that there is a great distance from them. Because it is difficult for anyone to take the time to share with you what the essence of neural networks is. And the theories and concepts you read in books can also help you find a clear and simple conclusion.

Today let’s take a look at what a neural network is from a programmer’s perspective. The first time I learned about the concept of neural networks was through a movie-"Terminator 2" released in 1991. The actor Schwarzenegger has a line:

"MyCPUisaneural-netprocessor;alearningcomputer." (My processor is a neural processing unit, it is a computer that can learn). From a historical perspective, human beings' exploration of their own intelligence far preceded the research on neural networks.

In 1852, due to an accidental mistake, an Italian scholar dropped a human head into a nitrate solution, thus gaining the first opportunity to focus on neural networks with the naked eye. This accident accelerated the exploration of the mysteries of human intelligence and opened up the development of concepts such as artificial intelligence and neurons.

Does the development of the concept of biological neural network have anything to do with the neural network we are talking about today? The neural network we are talking about today has nothing to do with biological neural network except for borrowing some terms from biological neural network. It is completely a concept in the field of mathematics and computer, which is also the mature development of artificial intelligence. logo. Everyone should distinguish this point and not confuse biological neural networks with the artificial intelligence we are talking about today.

In the mid-1990s, Vapnik and others proposed the Support Vector Machines algorithm (Support Vector Machines). Soon this algorithm showed great advantages over neural networks in many aspects, such as: no need to adjust parameters, high efficiency, global optimal solution, etc. For these reasons, the SVM algorithm quickly defeated the neural network algorithm and became the mainstream of that period. The research on neural networks has once again fallen into an ice age.

In the ten years since it was abandoned, several scholars still persisted in research. One of the most important people is Professor Geoffery Hinton from the University of Toronto in Canada. In 2006, he published a paper in the famous "Science" magazine and proposed the concept of "deep belief network" for the first time.

Different from traditional training methods, "deep belief network" has a "pre-training" process, which can easily allow the weights in the neural network to find a near-optimal solution. value, and then use "fine-tuning" technology to optimize the training of the entire network. The use of these two technologies greatly reduces the time for training multi-layer neural networks. In his paper, he gave the learning method related to multi-layer neural networks a new term - "deep learning".

Soon, deep learning emerged in the field of speech recognition. Then in 2012, deep learning technology made great strides in the field of image recognition. Hinton and his students successfully trained a multi-layer convolutional neural network on one million images containing one thousand categories in the ImageNet competition, and achieved a classification error rate of 15%, which was higher than Second place was nearly 11 percentage points higher.

This result fully proves the superiority of the multi-layer neural network recognition effect. Since then, deep learning has ushered in a new golden period. We see today’s booming development of deep learning and neural networks, which started at that time.

Use a neural network to build a classifier. What is the structure of this neural network?

In fact, this structure is very simple. We see that this picture is a schematic diagram of a simple neural network. Neural network is essentially a "directed graph". Each node on the graph has a new term - "neuron" - borrowing biological terminology. The directional lines (directed arcs) connecting neurons are regarded as "nerves". The neurons are not the most important thing in this picture, the most important thing is the nerves connecting the neurons. Each neural part has directivity, and each neuron will point to the node of the next layer.

Nodes are hierarchical, and each node points to the node of the previous layer. Nodes on the same level are not connected and cannot cross nodes on the previous level. Each arc has a value, which we usually call a "weight". Through the weights, a formula can be used to calculate the value of the node they refer to. What is this weight value? We get results through training. Their initial assignment often starts with random numbers, and then the result obtained by training that is closest to the true value is used as a model and can be used repeatedly. This result is what we call a trained classifier.

The nodes are divided into input nodes and output nodes, and the middle is called the hidden layer. To put it simply, we have data input items, and the different layers of neural network layers in the middle are what we call hidden layers. It is called this because these levels are invisible to us. The output result is also called the output node. The output node is a limited number, and the input node is also a limited number. The hidden layer is the part of the model that we can design. This is the simplest neural network concept.

If I want to make a simple analogy, I would like to use a four-layer neural network to explain it. On the left is the input node. We see that there are several input items, which may represent the RGB values, flavors, or other input data items of different apples. The middle hidden layer is the neural network we designed. This network now has different levels, and the weights between levels are the result of our continuous training.

The final output result is stored in the output node. Each time, like a flow direction, the nerve has a direction and performs different calculations through different layers. In the hidden layer, the input result of each node is calculated and used as the input of the next layer. The final result will be saved on the output node. The output value is closest to our classification. If a certain value is obtained, it will be divided into a certain category. That's a brief overview of using neural networks.

In addition to the structural diagram expressed from left to right, there is also a common expression form that represents a neural network from bottom to top. At this time, the input layer is at the bottom of the figure and the output layer is at the top of the figure. The left-to-right expression form is mostly used in the literature of Andrew Ng and LeCun. In the Caffe framework, bottom-up expression is used.

To put it simply, neural networks are not mysterious. They have images and use the processing power of graphs to help us extract and learn features. In Hinton's famous paper in 2006, deep learning was summarized into the three most important elements: calculation, data, and model. With these three points, a deep learning system can be implemented.

The toolbox that programmers need

For programmers, mastering theoretical knowledge is for better programming practice. So let's take a look at what kind of tools programmers need to prepare to start the practice of deep learning.

Hardware

In terms of hardware, the first thing that comes to mind for the computing power we may need is the CPU. In addition to the usual CPU architecture, CPUs with additional multipliers have also appeared to increase computing power. In addition, there are DSP application scenarios in different fields, such as handwriting recognition, speech recognition, and other dedicated signal processors. Another category is GPU, which is currently a popular field for deep learning applications. The last category is FPGA (Programmable Logic Gate Array).

Each of these four methods has its own advantages and disadvantages, and each product will be very different. In comparison, although the CPU has weaker computing power, it is good at management and scheduling, such as reading data, managing files, human-computer interaction, etc., and has rich tools. In comparison, DSP has weaker management capabilities, but strengthens specific computing capabilities.

Both of these rely on high clock speeds to solve the computational problem, and are suitable for algorithms with a large number of recursive operations and inconvenient splitting. The GPU's management capabilities are weaker, but its computing power is stronger. However, due to the large number of computing units, algorithms are more suitable for stream processing of entire data.

FPGA is strong in management and operation processing, but the development cycle is long and complex algorithm development is difficult. In terms of real-time performance, FPGA is the highest. Judging from the current development alone, for ordinary programmers, the commonly used computing resources in reality are still the CPU and GPU models, with GPU being the most popular field.

This is a p2 instance on AWS that I prepared for this sharing the day before yesterday. The instance update, driver installation and environment setting are completed with just a few commands. The total resource creation and setting time is about 10 minutes. Before, it took me two days to install and debug the computer mentioned above.

In addition, a comparison can be made in terms of cost. A p2.8xLarge instance costs $7.20 per hour. The total cost of my own computer was ?16,904 yuan. This cost is enough for me to use p2.8xLarge for over 350 hours. Using the AWS Deep Learning Station in one year can offset all my efforts. With the continuous upgrading of technology, I can continuously upgrade my instances, so that I can obtain larger and more processing resources at a limited cost. This is actually the value of cloud computing.

What is the relationship between cloud computing and deep learning? On August 8 this year, an article was published on the IDG website talking about this topic.

The article made such a prediction: If the parallel capabilities of deep learning continue to improve, and the processing power provided by cloud computing also continues to develop, the combination of the two may produce a new generation of deep learning, which will bring greater influence and impact. This is a direction that everyone needs to consider and pay attention to!

Software

Deep learning is in addition to the basic environment of hardware. Programmers will be more concerned about software resources related to development. Here I list some software frameworks and tools that I have used.

Scikit-learn is the most popular Python machine learning library. It has the following attractive features: simple, efficient and extremely rich data mining/data analysis algorithm implementation; based on NumPy, SciPy and matplotlib, the entire process from data exploratory analysis, data visualization to algorithm implementation is integrated; open source, There are very rich learning documents.

Caffe focuses on volume and neural networks and image processing. However, Caffe has not been updated for a long time. One of the main developers of this framework, Jia Yangqing, also switched jobs to Google this year. Perhaps the once dominant position will give way to others.

Theano is a very flexible Python machine learning library. It is very popular in the research field and is very convenient to use and easy to define complex models. Tensorflow's API is very similar to Theano. I also shared topics about Theano at this year's QCon conference in Beijing.

Jupyter notebook is a very powerful python code editor based on ipython. It is deployed on the web page and can be used for interactive processing very conveniently. It is very suitable for algorithm research and data processing.

Torch is an excellent machine learning library. It is implemented by a relatively niche Lua language. But because of the use of LuaJIT, the efficiency of the program is excellent. Facebook focuses on Torch in the field of artificial intelligence, and has even launched its own upgraded version of the framework Torchnet.

There are so many frameworks for deep learning. Isn’t it a bit overwhelming? What I will focus on introducing to programmers today will be TensorFlow. This is an open source machine learning-oriented development framework launched by Google in 2015. This is also Google’s second-generation deep learning framework. Many companies have used TensorFlow to develop many interesting applications with very good results.

What can you do with TensorFlow? The answer is that it can be applied to regression models, neural networks and deep learning. In terms of deep learning, it integrates distributed representation, convolutional neural network (CNN), recurrent neural network (RNN) and long-short-term memory artificial neural network (Long-Short Term Memory, LSTM).

The first concept to understand about Tensorflow is Tensor. The dictionary definition of this word is a tensor, a multilinear function that can be used to represent linear relationships between some vectors, scalars, and other tensors. In fact, this expression is difficult to understand. In my own language, Tensor is just an "N-dimensional array".

To use TensorFlow, as a programmer you must understand several basic concepts of TensorFlow: it uses graphs to represent computing tasks; it executes graphs in a context called a session. ; Use Tensor to represent data; maintain state through variables (Variable); use feed and fetch to assign values ??to or obtain data from arbitrary operations.

In one sentence, TensorFlow is a data flow graph computing environment with a state diagram. Each node is performing data operations, and then provides dependencies and directivity to provide a complete data flow.

TensorFlow installation is very simple, but the CUDA version supported by the installation package provided for download from the official website is 7.5.

Considering the exciting new features of CUDA 8 and the current situation that it will be officially released soon. Maybe you want to consider experiencing CUDA 8 immediately, which can only be obtained by compiling Tensorflow source code. Currently TensorFlow already supports Python2.7, 3.3+.

In addition, programmers who use the Python language also need to install some required libraries, such as: numpy, protobuf, etc. For convolution processing, cuDNN is recognized as the best-performing development library, so be sure to install it. Regular Tensorsorflow installation is very simple, one command is enough:

$ pip3 install —upgrade /anishathalye/neural-style. Belarusian modern impressionist artist Leonid Afremov is good at using thick ink and heavy colors to express urban and landscape themes, especially his rain scene series. He is accustomed to using large color blocks to create light and shadow effects, and his grasp of reflective objects and ambient colors is very precise.

So I found a photograph of the Oriental Pearl TV Tower in Shanghai. I hope to learn Leonid Afremov’s painting style through Tensorflow and process this photo of the Oriental Pearl into the light and shadow colors. Rich work style. Using Tensorflow and the code of the project mentioned above, a thousand iterations were performed on an AWS p2 type instance, and the processing results were obtained as shown below.

The code for this processing is only 350 lines, and the model uses VGG, a star that became famous in the 2014 ImageNet competition. This model is very good and its feature is "go depper".

TensorFlow creates such works not only as entertainment for everyone to laugh, but also can do more interesting things. By extending the processing capabilities just now to videos, you can see the effect as shown below, using the style of Van Gogh's famous work "Starry Night" to create such a new video style.

You can imagine what kind of magical results it will produce if this kind of processing power is applied in more fields? The prospects are bright and allow us to have unlimited daydreams. In fact, application development in many fields we are currently engaged in can be changed by using neural networks and deep learning. For deep learning, mastering it is not difficult. Every programmer can easily master this technology, and using the resources available, we can quickly become deep learning program developers.

Conclusion

We have no way to predict what the future will look like. A writer, Ray Kurzweil, wrote the book "The Singularity is Near" in 2005. In this book, he clearly tells us that that era will come soon. As a group of people before the dawn of that era, do we have the ability to accelerate this process and use our ability to learn to realize this dream?

The development of artificial intelligence in China

The era of artificial intelligence has undoubtedly arrived. What this era needs is of course engineers who have mastered artificial intelligence and solved specific problems. Frankly speaking, there are still very few engineers of this type on the market. The salary package in the workplace can show how popular engineers like this are. The subject of artificial intelligence has developed to this day, and as far as academic itself is concerned, it has the ability to be industrialized on a large scale.

So, the top priority for engineers is to master the application technology of artificial intelligence as soon as possible. It can be said that there are already a lot of learning materials about artificial intelligence on the Internet. Those engineers who have the ability to learn quickly will definitely stand out in the tide of artificial intelligence.

China already has the environment to develop the artificial intelligence industry. Regardless of the entrepreneurial environment, the quality of personnel and even market opportunities, it has all the conditions for industrial change. Compared with the United States, the performance of Chinese teams in many areas of artificial intelligence can be said to be modest. As far as the technical level of artificial intelligence is concerned, Chinese engineers are on the same starting line as the best technical teams in the world.

Time waits for no one, and Chinese engineers will have the opportunity to show their talents in this field.

However, it is worth noting that two things should be avoided: First, aiming too high and blindly comparing with foreign countries. After all, accumulation has its own length, and skills have their own specializations. We must base ourselves on the existing accumulation and seek gradual breakthroughs. The second is to rush in and blindly pursue market trends. The engineering of artificial intelligence requires a lot of basic accumulation, and success cannot be achieved by simply copying it overnight.

The achievements of China’s scientific research and technical personnel in the field of artificial intelligence are obvious to all. In an article by Wang Yonggang, he counted the "deep learning" papers included in SCI from 2013 to 2015. In 2014 and 2015, China surpassed the United States as the leader.

Another thing that surprised me was that Jeff Dean of Google published a paper titled "TensorFlow: Asystem for large-scale machine learning" in 2016. Among the 22 authors of the article, those with obviously Chinese names account for 1/5. If you want to list the Chinese/Chinese big names in the field of artificial intelligence, Ng Enda, Sun Jian, Yang Qiang, Huang Guangbin, Ma Yi, Zhang Dapeng... you can easily name a long list.

For China, the current top priority is the industrialization of artificial intelligence technology. Only in this way can we transform the advantages in the scientific research/intelligence field into overall and comprehensive advantages. At this point, China is the world's largest consumer market and manufacturing power. We have every opportunity to take advantage of the market to become a leader in this field.

Innovative Enterprises in Silicon Valley

Although I have been to Silicon Valley many times, I have never been able to work there for a long time. In the market in the field of artificial intelligence, what we hear more about is the actions of some large technology companies such as Google, Apple, Intel, and Amazon. However, there are still a large number of small start-up companies in the US market that have performed amazingly in the field of artificial intelligence. Just take companies in the Silicon Valley area as an example:

Captricity, which provides information extraction from handwritten data;

VIVLab, which develops virtual assistant services for speech recognition;

TERADEEP uses FPGA to provide efficient convolutional neural network solutions;

There is also NetraDyne, which provides driverless solutions.

The list can be long, and there are many teams that are using artificial intelligence technology to try to create history and are building their dreams. These teams and the areas they are focusing on are worth learning and experiencing.