How to search for tags?

1, Tag is generally a classification system, and some people call it Folksonomy. However, tag is different from the general directory structure classification method. First of all, tag can refine classification with less cost. Imagine that an article covering a wide range, such as an article about the achievements of physics since the 20th century, may involve scientists such as relativity, quantum mechanics, black hole theory, Big Bang theory, Einstein, Planck, and even the Nobel Prize. If the classification method of directory structure is used, it is impossible to classify according to all aspects involved in this paper, because refining the classification will make the whole directory structure extremely huge, which is not conducive to the organization and search of data. On the other hand, labels are different. Regardless of the directory structure, they are free to classify articles. The relationship between tags is parallel, but according to correlation analysis, tags that often appear together can be correlated to produce a correlation classification.

2. Tags can also be said to be keyword tags, which are beneficial to search and search. But tag is also different from ordinary keywords. When searching with keywords, you can only find the keywords mentioned in the article, while tag can mark the keywords that are not in the article at all. For example, I can label the above article as "data" or "history", of course, it is more often labeled as "physics". However, if I label it with "data", I can relate all the informative articles.

Reference: Really, there is no trace on the net.

So, what is a label? Very simple, look down;

Classical information structure mode;

In the traditional composition of web pages, we usually use classification to summarize, organize and store our information. The library is a good example. All information starts from a point and forms a tree-like classification, thus forming a complete and interrelated logical system.

This system was formed by manual classification from the beginning, and when we need to search, we hardly need much effort;

[Example] For example, our blog first has a main title, and then it is divided into several categories, and the actual articles are classified and stored in these different categories; In general, we don't allow an article to exist in multiple categories at the same time, which is convenient for our management and uniqueness of retrieval;

On the Internet, dmoz and wiki are typical and well-known examples.

Information synthesis mode of decentralized level;

Most of the seemingly disordered information is described by "language", which shows the directionality of this information, so we can obtain relevant information by extracting the same parts of these languages (words and texts); This information is completely loose and unrelated at ordinary times, and only when we extract it does it present a relatively compact organizational structure. Even so, compared with the classical classification structure, this structure is still scattered enough.

[Example] You may have thought of it, Google. At present, most search engines rely on this, so the research on word segmentation has always been the focus and pain point of these search engines. Aside from others, only logical positivism and general language philosophy, two contemporary schools, are enough to keep them tossing into the next century.

For example, when I say "fuck", how can a search engine that only searches for keywords and doesn't care about their actual use in daily language know whether I am cursing or stating a attributed fact? What's more, we often face thousands of search results on Google, which are completely different from the original semantic requirements of keywords.

[Introduction]

Logical positivism: it holds that human daily language is full of fallacies and needs to be completely leveled and a perfect logical language system like mathematics should be reconstructed;

General language philosophy: He believes that human daily language is very reasonable and realistic, and "perfect" logical language does not exist and does not conform to reality; The only problem is that there are some methodological problems when people use daily language, which need our attention and research.

(The latter is the conclusion I tend to agree with. )

Information composition mode in line with future development;

Now, when we comprehensively examine the above two increasingly important information composition modes in daily life, we will find that they have their own advantages and disadvantages.

For the former, the thought of language expression and connotation is extensive, and simple classification logic cannot interpret and identify all the key points of an article design, while complex classification will fall into infinite micro-paradox logic;

For the latter, in addition to the trouble of word segmentation, Google may also hope to shoulder the heavy responsibility of teaching everyone to rebuild daily language credits and ask everyone to reach the height of Wittgenstein.

Ludwig Wittgenstein? By the way, this finally comes back to our point of view.

Wittgenstein himself is the founder of logical positivism and general language philosophy. In his later daily language thoughts, he put forward a familiar viewpoint: family similarity.

Here is a quote as a general explanation:

Wittgenstein opposed the definition of words from the standpoint of "anti-essentialism". Essentialists believe that similar things become such things because they have the same essence (* * * phase), and the definition is to stipulate this essence of things. Wittgenstein, on the other hand, thinks that things have no * * * nature at all, only "family similarity". The so-called "family similarity" is not * * * similarity, but incomplete similarity in one aspect or another. For example, some members of a family have similar eyes, some have similar expressions and some have similar faces. Therefore, Wittgenstein insists on a nominalistic position, believing that people use general noun concepts in their daily lives only for convenience, and metaphysical things such as essence and * * * do not exist. If you mistake these things for existence, you will get a "philosophical disease".

Okay, see? Those similarities are tag)；; Essence, anti-essence and family similarity can be seen everywhere in the above quotations, and we can read and understand them everywhere as categories, fragments and labels.

Tags shows the position of traditional taxonomy, just like the view of Hegel's system in ordinary language philosophy, which requires that the goals pursued by classical philosophy, such as universality, unity and uniqueness, be disintegrated and replaced by fragmented structures. The connection between these fragments only exists when people need it.

Tags, with the semantic color of fragments and the fighting power of philosophy, are formed actively rather than passively, actively aggregated rather than waiting for passive retrieval, and the formation process has been manually screened, which is relatively more in line with the normal use of daily language; For example, the information about "SMTH" all over the world, although some articles don't write a character about SMTH, the facts it describes are really related to it.

If divorced from philosophy, Tag actually has the same characteristics as traditional classification and search keywords, and at the same time eliminates a considerable part of their shortcomings and weaknesses.

It should be pointed out that from the existing application and theoretical analysis, classification, labeling and unordered keywords have different application categories; For microscopic and small amounts of information, classification is enough for us; For massive and boundless disorderly information, keywords may be the most widely used and generally accepted organizational model at this stage; Between the two, Tag may be the best choice for a large enough information group, with high systematic requirements and accuracy requirements.

Use tags in blogs?

If someone maliciously uses the tag, it will make the tag meaningless, similar to the boring move of forcibly adding countless keywords completely unrelated to the web page itself in mata tag, and the small P children who later spread redundant links everywhere; Of course, there is no technical problem in using tags in an open and popular forum, but it may not be a good idea, especially in China.

Relatively speaking, the owner of a blog is more responsible for his blog, and can carefully screen the information published on his blog, or comment and spread it, so it is of positive significance to construct tags on the blog to sort the information;

But for personal blogs, Tag is of little significance-because the information capacity as a personal publishing source is too small; If a blog needs a tag, it must be aimed at a wide range of users. There are two suitable situations: one is a multi-user blog site with tens of thousands of users, and the other is an aggregation site based on XML;

In any case, Tag is aimed at the sorting problem of large information capacity, which helps users to accurately locate nodes that meet semantic information in these huge inventories, rather than at the sorting problem of personal blog information;

Labeling, disordered information, labeling and classification, the above analysis can be regarded as a typical example of philosophy ahead of technology, at least in terms of domestic applications, and there are no related development projects;

BXNA's blog aggregation still depends on classification, which is said to be an attempt to enter the field of word segmentation, but it does have doubts about the information of BXNA aggregation; Other label service providers do not directly support blogs;

Who can eat such a big cake? As a pioneer technology leader? A provider of blogging programs? Or the owner of the capital? Personally, I think that an aggregation service provider or blog provider with a broad user base will first release an active aggregation platform based on Tag or a passive aggregation platform based on TrackBack, and perhaps in the future, it will naturally become the owner of capital.

Finally, the specific development and management technology of Tag is beyond the scope of this paper. Please refer to other related articles.