For example, engaging in text editing requires a lot of manuscripts, but the efficiency is very low. One of the biggest reasons is that you spend a lot of time collecting information. If you continue the previous manual browsing, you can either stay up late to work overtime or ask others for help, but obviously neither is convenient. In this case, web crawler is very important.
With the advent of the era of big data, the position of web crawler in the Internet will become more and more important. The data on the Internet is huge. How to automatically and efficiently obtain the information we are interested in and use it for us is an important problem, and the crawler technology was born to solve these problems.
The information we are interested in is divided into different types: if we are just a search engine, then the information we are interested in is as many high-quality pages as possible on the Internet; If we want to obtain data in a vertical field, or have clear retrieval requirements, then the information we are interested in is the information located according to our retrieval and requirements. At this time, it is necessary to filter out some useless information. The former is called universal web crawler, and the latter is called focused web crawler.