Introduction to retrieval

Information: Information is the way, form or motion state of matter, and it is the universal attribute of things. Generally refers to the meaning contained in data and messages, which can reduce the uncertainty of events described in messages.

Knowledge: the sum of knowledge and experience gained by people in the practice of transforming the world is a systematic information collection that people's brains recombine through thinking.

Intelligence: special knowledge or information that is activated and activated to solve a specific problem.

(Basic attributes of intelligence: knowledge, transmission and utility)

Literature: All carriers of knowledge are recorded. (General rules described in document GB3792. 1-83)

The four basic elements of literature:

(1) Record the specific content of knowledge;

2 means of recording knowledge, such as words, images, symbols, audio and video. ;

③ Material carriers for recording knowledge, such as paper, CD-ROM, video tape, etc.

(4) Record the forms of knowledge, such as books, periodicals, patent specifications, etc.

I. File type (by carrier type):

Printing type: traditional books, periodicals, etc.

Audio-visual category: records, audio tapes, video tapes, etc.

Miniaturization: microfilm, microfilm, etc.

Digital (or electronic): e-books, electronic journals, databases, etc.

* Zero document: the original information recorded directly on the carrier without information processing, such as experimental data, experimental records, investigation materials, design sketches, personal notes, oral communication information, etc.

Primary literature: primary literature refers to the originality written by the author according to his own research results (such as the results of experiments, observations, investigations, etc.). ).

For example: monographs, periodical papers, research reports, conference documents, patent documents, dissertations, conference papers, translations, electronic journals, e-books and so on.

Secondary literature: a large number of disordered and scattered primary literature are collected, sorted, processed and recorded in a certain order, forming a new form of literature, which can be used to retrieve clues of primary literature. Because of its retrieval function, it is called retrieval tool or retrieval system.

For example: title, table of contents, index, abstract, etc.

Title: Taking periodical papers as an example, the underlined part is the literature source.

Title/author//journal name. Year, volume (period). –Page number

Three documents: documents formed on the basis of absorbing the contents of the first document by searching the second document around a certain theme. Such as Review, Review and Progress, and reference books such as encyclopedias, yearbooks, guides and handbooks.

Survey (Review) refers to a kind of literature that comprehensively analyzes and describes the development and present situation of scientific research in a certain period of time and predicts the future.

The concept of retrieval:

Retrieval: the whole process of using retrieval tools to query the answers to questions.

Document retrieval: refers to the process of searching for relevant documents with the help of various retrieval tools for the purpose of obtaining documents.

Information retrieval: refers to the activities, processes and methods of finding out the needed information from any information set, which are divided into broad sense and narrow sense. Information retrieval in a broad sense also includes information storage, collectively referred to as information storage and retrieval. Information retrieval in a narrow sense only refers to the information search process.

Retrieval language is an artificial language that describes information characteristics and expresses user information problems in the process of information storage and retrieval.

According to the appearance or content characteristics of documents, it processes and compiles some languages or symbols with retrieval significance into specific languages to serve the document information retrieval.

The main function of retrieval language is to describe the appearance characteristics and content characteristics of documents at multiple levels, and provide various retrieval methods to facilitate users to retrieve from different angles.

Classification language:

China Library Classification (China Library Classification)

International decimal classification

library of congress classification

Basic categories of China Library Classification;

I Marxism-Leninism and Mao Zedong Thought.

philosophy

Introduction to social science

Politics, law

E military

F jingji

G. Culture, science, education and sports

H language and literature

I. Literature

J art

History, geography

Introduction to natural science

Mathematical science and chemistry

P astronomy, earth science

Q bioscience

Medicine and health

American agricultural science

technology

U transportation

V. Aviation and aerospace

X environmental science, labor protection science (safety science)

Z comprehensive books

Key words: point out the key technical terms that can express the substantive content of the document or can be used as the retrieval entrance in the title, abstract or full text of the document. It is a natural language without standardization, also known as free words.

Keywords: it is an artificial language that reflects the theme of literary content and has been strictly standardized. That is, after unifying the writing forms of various synonyms reflecting the theme of literature, the retrieval language is determined.

Database and its structure:

A database is a collection of data that can satisfy a specific purpose or a specific data processing system. It can consist of one or more documents.

A file is a collection of records in a database.

A record is the basic document unit in the database, and a record often records the relevant information of a document.

A field is the basic information unit that constitutes a record. Each field describes a certain feature of the document, including appearance features and content features, such as title, author, publication name, publication year, subject words, etc. The combination of fields that describe part of the characteristics of a document becomes a record.

Boolean logic retrieval is the most widely used retrieval technology in retrieval system and the simplest and most basic matching method. Its theoretical basis is set theory and Boolean logic.

Boolean logic retrieval uses Boolean logic expressions to express users' retrieval needs. Boolean logic expression refers to an expression of retrieval requirements, which consists of Boolean logic operators connecting search terms and brackets indicating operation priority.

Examples: (lung cancer or lung tumor) and surgery

Location retrieval (also called proximity retrieval): Location operator (also called proximity operator) is used to specify the adjacent location relationship between search words in the original document.

Both positional operators imply the meaning of logical operator AND, that is, the two search words (expressions) they link must appear, but positional operators further restrict the positional relationship between the two linked search words.

Location retrieval usually includes three levels of retrieval:

Field-level retrieval: limit the search term to the same field, such as Medline CD database;

Sub-domain or natural sentence level retrieval: limit the search words to the same sub-domain or natural sentence, such as near(Medline CD database);

Word location retrieval: the mutual location of search words is limited to meet certain conditions, such as how many words (or words) are separated from each other, whether they appear in a certain order, whether the appearance of one word does not allow another word (or words) to appear at the same time, and so on, such as nW and pre(ScienceDirect).

Interception retrieval is to attach a truncation symbol to the search term during retrieval, indicating whether the search term matches completely or partially when compared with the words in the literature database.

Word interceptors are divided into infinite truncation and finite truncation.

Infinite truncation: one hyphen represents multiple characters, and the hyphen "*" is commonly used. There are front truncation, middle truncation and back truncation.

Finite truncation: the hyphen can only represent one character, and the hyphen "?" It is commonly used. (also known as wildcards).

Designated domain retrieval refers to a retrieval method that limits the retrieval scope to a certain domain or certain domains in order to achieve a certain retrieval purpose.

Knowledge is an orderly collection of information formed by human brain through thinking processing, an information product and a part of information; Literature is the whole carrier of human knowledge and a part of it. Information is the knowledge that people activate to solve specific problems, and it is a part of knowledge. Documents also contain information, but not all documents are information, so information, knowledge, documents and information are related.