United States ODP patent

The Development History of Search Engine

In the early days of the development of the Internet, the classified directory query of websites represented by Yahoo was very popular. The classified catalogue of the website is organized and maintained manually. Excellent websites on the Internet are selected and briefly described, and classified in different directories. When users query, they can find the website they want by clicking on it layer by layer. Some people call this directory-based retrieval service website a search engine, but strictly speaking, it is not a search engine.

1990 Archie was developed by teachers and students of Computer College of McGill University in Canada. At that time, before the World Wide Web appeared, people used FTP to enjoy communication resources. Archie can regularly collect and analyze file name information on FTP servers, and provide search for files in each FTP host. The user must enter the exact file name to search, and Archie tells the user which FTP server can download the file. Although the information resources collected by Archie are not web pages (HTML files), their working principle is the same as that of search engines: automatically collecting information resources, establishing indexes and providing retrieval services. So Archie is recognized as the originator of modern search engines.

The Origin of Search Engine

The ancestor of all search engines was Archie FAQ invented by three students (Allen Ntaghi, Peter deutsch and Bill Whelan) at McGill University in Montreal in 1990. Alan Emtage and others thought of developing a system that can find files by file name, so Archie came into being. Archie is the first program to automatically index anonymous FTP website files on the Internet, but it is not a real search engine. Archie is a searchable FTP file name list. The user must enter the exact file name search, and then Archie will tell the user which FTP address can download the file. Inspired by Archie's popularity, Nevada University of System Computing Services developed a Gopher(Gopher FAQ) search Veronica(Veronica FAQ, at 1993. Jughead is another gopher search tool later.

The development of search engines

Development (1)

The history of Excite can be traced back to February 1993. The idea of six Stanford university students is to analyze the relationship between words in order to search a large amount of information on the Internet more effectively. By the middle of 1993, the project had been fully invested, and they also released a version of search software for webmasters to use on their own websites, which was later called Excite for Web Servers.

Note: Excite later became famous for concept search. In May 2002, Excite, which was acquired by Infospace, stopped its own search engine and switched to the meta-search engine Dogpile.

Development (2)

1In April, 994, two doctoral students from Stanford University, Chinese-American Jerry Yang and david filo, co-founded Yahoo! . With the increase of the number of visits and links, Yahoo Directory began to support simple database search. Because of Yahoo! The data of is manually input, so it can't really be classified as a search engine. In fact, it's just a searchable directory. Yahoo! Because all the websites included in the website are attached with brief information, the search efficiency is obviously improved.

Note: After Yahoo, Altavista, Inktomi and Google will provide search engine services one after another.

Yahoo! -almost became synonymous with the Internet in the 1990s.

Development (3)

1995, a new form of search engine-meta search engine appeared. Users only need to submit a search request once, and the meta-search engine is responsible for the conversion and submission to multiple pre-selected independent search engines, and all the query results returned by independent search engines are returned to users after collection and processing.

The first meta-search engine was Eric Selberg, a graduate student of Washington University, and Metacrawler of Oren Ezioni. Meta-search engine looks good in concept, but the search effect is always unsatisfactory, so no meta-search engine has ever had a strong position.

Development (4)

The emergence of intelligent retrieval: using word segmentation dictionary, synonym dictionary and homophone dictionary to improve the retrieval effect can further assist the query at the knowledge level or the concept level. Through the retrieval processing of subject dictionary, superior dictionary and related dictionaries at the same level, a knowledge system or concept network is formed, which gives users intelligent knowledge tips and ultimately helps users obtain the best retrieval effect.

Example:

(1) Query "Computer", and you can also retrieve information related to "Computer";

(2) The query scope can be further narrowed to "microcomputer" and "server" or expanded to "information technology" or related "electronic technology", "software" and "computer application";

(3) It also includes ambiguous information and retrieval processing, such as whether "Apple" refers to a fruit or a computer brand. The distinction between "China people" and "China people * * * and China" will be processed by combining technologies such as ambiguous knowledge description database, full-text indexing, user retrieval context analysis and user-related feedback, so as to efficiently and accurately feed back the most needed information to users.

Development (5)

Personalization trend is one of the important characteristics and inevitable trends of the future development of search engines. One way is to organize personal information through the community products of search engines (that is, to provide services to registered users), and then introduce personal factors into the basic information base retrieval of search engines for analysis, so as to obtain different search results for individuals. From June 5438+1October 2004, Yahoo launched myweb beta, and from June 5438+065438+1October a9, 2005, Googlesearchhistory basically followed the same path, analyzed the limited range of specific users' search needs, and then extended it to other similar websites on the Internet, giving the most according to the range of users' needs. The other is aimed at the popular Google personalized search engine, or yahooMindSet, or vivisimo, which we all know is foreground clustering. However, no matter which implementation method, that is, Google actively chooses the search scope or yahoo and vivisimo recombine the information they need in the results, it is an experiment or an idea, and it will not become the mainstream search engine application product in a short time.

Development (6)

Big global grid technology: Because there is no unified information organization standard to deal with network information resources, disordered network information resources are difficult to search, hand over, enjoy and even develop deeply, forming information islands. Grid technology is to eliminate information islands and realize the comprehensive connection of all resources on the Internet.

China's Global Information Grid.

The word robot has a special meaning for programmers. Computer robot refers to an automated program that can repeatedly perform a task at a speed that human beings can't reach. Because the robot program specially used to retrieve information crawls on the network like a spider, the robot program of the search engine is called a spider program.

Matthew Gray developed the World Wide Web Rover in 1993, which is the first "robot" program to detect the scale of the World Wide Web by using the links between HTML pages. At first, it was only used to count the number of servers on the Internet. Later, it was also able to capture web addresses (URLs).

1In April, 994, two doctoral students from Stanford University, Yang Zhiyuan and david filo, co-founded Yahoo. With the increase of the number of visits and links, Yahoo Directory began to support simple database search. Because of Yahoo! The data of is manually input, so it can't really be classified as a search engine. In fact, it's just a searchable directory. Yahoo acquired inktomi on February 23rd, 2002, Overture including Fast and Altavista on July 23rd, 2003, and 372 1 Company on June 23rd, 2003.

1994 In early 1994, Brian Pinkerton, a student at the University of Washington, started his small project, WebCrawler. 1On April 20th, 994, WebCrawler only contained content from 6000 servers. WebCrawler is the first full-text search engine on the Internet that supports searching all words in documents. Before it, users can only search through URL and abstract, which usually come from manual comments or programs that automatically extract the first 100 words of the text.

1In July, 994, Michael Mauldin of Carnegie Mellon University connected the spider program of John Leavitt to its indexing program and created Lycos. In addition to relevance ranking, Lycos also provides prefix matching and character similarity restrictions. Lycos is the first to use automatic summarization of web pages in search results, and its biggest advantage is that it far exceeds the data volume of other search engines.

At the end of 1994, Infoseek officially appeared. Its friendly interface and a large number of additional functions make it an important representative of search engines such as Lycos.

1995, a new form of search engine-a summary of meta-search engine appeared. The user only needs to submit a search request once, and the meta search engine is responsible for the conversion processing, and submits it to a plurality of pre-selected independent search engines, and all the query results returned by each independent search engine are collected and processed before returning to the user. The first meta-search engine was Eric Selberg, a graduate student of Washington University, and Metacrawler of Oren Ezioni.

199565438+In February, DEC officially released AltaVista. AltaVista is the first search engine that suppORts natural language search, AND it is also the first search engine that implements advanced search syntax (such as and, or, NOT, etc.). Users can use AltaVista to search newsgroups and get articles from the Internet. They can also search for words in picture names, titles, Java applets and ActiveX objects. AltaVista also claims to be the first search engine that supports users to submit or delete URLs to the web index database, and it can be started within 24 hours. One of the most interesting new features of AltaVista is to search all websites with URL links. AltaVista has also made many innovations in the user-oriented interface. It puts "tips" in the search box area to help users better express their search style. These tips are updated frequently, so that users will see many interesting functions that they may never know after searching for a few times. This series of functions are gradually widely adopted by other search engines. 1997, AltaVista released a graphic demonstration system, LiveTopics, to help users find what they want from thousands of search results.

1On September 26th, 995, Eric Brewer, an assistant professor at the University of California, Berkeley, and Paul Gauthier, a doctoral student, founded Inktomi. 1on may 20th, 996, Inktomi company was established, and a powerful HotBot appeared in front of the world. It claims that it can crawl more than 65438+ 1 100 million pages every day, so it has new content far beyond other search engines. HotBot also uses cookie to store users' personal search preferences.

1August, 997, the Northern Lights search engine officially appeared. It used to be one of the largest search engines in the database. It has no stop word. It has excellent current news, a special collection of more than 7 100 publications, and a good advanced search grammar. It is the first to support simple automatic classification of search results.

Before 1998 10, Google was just a small project of Stanford University, BackRub. 1995, doctoral student Larry Page began to study search engine design, and registered the domain name on September 15, 1997. At the end of 1997, with the participation of sergey brin, Scott Hassan and Allen Strumberg, BachRub began to provide Demo. 1February, 999, Google completed the transformation from Alpha version to Beta version. Google regards1September 27th, 998 as its birthday. Google judges the importance of web pages on the basis of Pagerank, which greatly enhances the relevance of search results. Google's geek culture and not doing evil have won Google a high reputation and brand reputation. In April 2006, Google announced its Chinese name "Google", which was the first name given by Google in a non-English-speaking country.

Fast(Alltheweb) Company was founded in 1997, which is a by-product of academic research of Norwegian University of Science and Technology (NTNU). 1May, 999, released its own search engine AllTheWeb. The goal of Fast is to be the largest and fastest search engine in the world, and it has been close in recent years. Fast(Alltheweb) can automatically classify web pages according to ODP, support Flash and pdf search, support multilingual search, and also provide news search, picture search, video, MP3 and FTP search, which has extremely powerful advanced search function. (On February 25th, 2003, the Internet search department of Fast was acquired by Overture).

1In August, 1996, Sohu Company was established to classify Chinese websites. At one time, it had the reputation of "going out to find maps and surfing the Internet to find Sohu". With the rapid increase of Internet websites, this manually edited classified catalogue is no longer applicable. In August 2004, Sohu became an independent domain name search website "sogou", calling itself "the third generation search engine".

Openfind was founded in 1998 65438+ 10, and its technology originated from GAIS laboratory led by Professor Wu Sheng of Chung Cheng University in Taiwan. At first, Openfind was just a Chinese search engine. At its peak, it provided Chinese search engines for three famous portals: Sina, Qimo and Yahoo. But after 2000, the market was gradually divided by Baidu and Google. In June, 2002, Openfind re-released the beta version of Openfind search engine based on GAIS30 project, and launched PolyRankTM, and announced that it had accumulated 3.5 billion web pages and started to enter the English search field.

June 5438 +2000 10, Li Yanhong, two alumni of Peking University, the patent inventor of hyperlink analysis and former senior engineer of Infoseek, and his friend Xu Yong (postdoctoral fellow in Berkeley, California) founded Baidu Company in Zhongguancun, Beijing. Baidu search engine beta was released in August, 20001year (Baidu only provides search engines for Sohu, Sina, Tom and other portals), and Baidu search engine was officially released on October 22nd, 20001year, focusing on Chinese search.

Other functions of Baidu search engine include: Baidu snapshot, webpage preview/all webpage preview, related search words, typo correction tips, mp3 search and Flash search. After the Blitzen project was launched in March 2002, the technical upgrade was obviously accelerated. Later, a series of products such as Post Bar, Know, Map, Sinology, Encyclopedia, Document, Video and Blog were introduced, which were well received by netizens. On August 5, 2005, it was listed on NASDAQ, with an issue price of $27.00 and a code of BIDU. The opening price was US$ 66.00, closing at US$ 122.54, with an increase of 353.85%, setting a record for the highest increase of new shares listed in US stocks in the past five years.

On February 23rd, 2003, at 65438, the original HC Search officially operated independently, and China Search was established. In February 2004, China released the desktop search engine Internet Pig 1.0. In March 2006, Zhongsou changed its name to Internet Pig ig (Internet Gateway).

In June 2005, Sina officially launched its self-developed search engine "Aiwen". Since 2007, Sina loves to use Google search engine.

On July 2007 1, Netease independently developed the Youdao search technology, which merged the original comprehensive search and web search. There are web search, picture search and blog search to provide services for Netease search. Among them, web search uses its self-developed natural language processing, distributed storage and computing technology; Image search is the first advanced search function based on camera brand, model and even season; Compared with similar products, blog search has the advantages of comprehensive capture and timely update, and provides innovative functions such as "article preview" and "blog file".