搜索結(jié)果排序是搜索引擎更核心的構(gòu)成部分,很大程度上決定了搜索引擎的質(zhì)量好壞及用戶接受與否。盡管搜索引擎在實(shí)際結(jié)果排序時(shí)融合了上百種排序因子,但更重要的兩個(gè)因素還是用戶查詢和網(wǎng)頁的內(nèi)容相關(guān)性及網(wǎng)頁鏈接情況。
Search result ranking is the core component of search engine, which largely determines the quality of search engine and whether users accept it or not. Although search engines integrate hundreds of ranking factors in the actual result ranking, the two most important factors are the content relevance between user queries and web pages and web page links.
關(guān)于網(wǎng)頁鏈接分析算法在有詳述,本章主要介紹的是:給定用戶搜索詞,如何從內(nèi)容相關(guān)性的角度對網(wǎng)頁進(jìn)行排序。
The web page link analysis algorithm is described in detail. This chapter mainly introduces how to sort web pages from the perspective of content relevance given user search terms.
判斷網(wǎng)頁內(nèi)容是否與用戶查詢相關(guān),這依賴于搜索引擎所采用的檢索模型。關(guān)于檢索模型的研究,從信息檢索學(xué)科建立之初就直是研 究,到目前為止,已經(jīng)提出了多種各異的模型,本章將介紹其中更重要的幾種檢索模型:布爾模型、向量空間模型、概率模型、語言模型及更近幾年興起的機(jī)器學(xué)習(xí)排序算法。
Judging whether the web content is related to user query depends on the retrieval model adopted by the search engine. The research on retrieval models has been the focus of research since the establishment of information retrieval discipline. So far, many different models have been proposed. This chapter will introduce the most important retrieval models: Boolean model, vector space model, probability model, language model and machine learning ranking algorithm in recent years.
盡管檢索模型多種多樣,但其在搜索引擎中所處的位置和功能是相同的,給出了一個(gè)搜索引擎計(jì)算內(nèi)容相似性的框架。當(dāng)用戶產(chǎn)生了信息需求后,構(gòu)造查詢詞,以此作為信息需求的具體體現(xiàn),搜索引擎在內(nèi)部會對用戶的查詢詞構(gòu)造內(nèi)部的查詢表示方法。
Although there are many retrieval models, their position and function in search engine are the same. A framework for search engine to calculate content similarity is given. When users have information needs, query words are constructed as the specific embodiment of information needs. Search engines will internally construct internal query representation methods for users' query words.
于海量的網(wǎng)頁或者文檔集合,對每個(gè)文檔,在搜索系統(tǒng)內(nèi)部也有相應(yīng)的文檔表示方法。搜索引擎的核心是判斷哪些文檔是和用戶需求相關(guān)的,并按照相關(guān)程度排序輸出,所以相關(guān)度計(jì)算是將用戶查詢和文檔內(nèi)容進(jìn)行匹配的過程,而檢索模型就是用來計(jì)算內(nèi)容相關(guān)度的理論基礎(chǔ)及核心部件。
For a large number of web pages or document collections, there are corresponding document representation methods in the search system for each document. The core of search engine is to judge which documents are related to user needs, and sort the output according to the degree of correlation. Therefore, correlation calculation is the process of matching user query with document content, and retrieval model is the theoretical basis and core component used to calculate content correlation.
什么樣的檢索模型是個(gè)好模型呢?用戶發(fā)出查詢詞Q后,我們可以把要搜索的文檔集合按照“是否相關(guān)”及“是否包含查詢詞"兩個(gè)維度,將其劃分為4個(gè)象限,其中,象限的文檔出現(xiàn)了用戶查詢詞同時(shí)被用戶判定為相關(guān)的;第二象限的文檔不包含用戶查詢詞但是被用戶判斷為相關(guān)的;第三象限的文檔出現(xiàn)了用戶查詢詞但被用戶判定為不相關(guān)的;而第四象限的文檔則是不包含用戶查詢詞且被用戶判斷為不相關(guān)的
What kind of retrieval model is a good model? After the user sends out the query word Q, we can set the documents to be searched according to "relevant" and "whether to include query words" " The two dimensions are divided into four quadrants. The documents in the first quadrant contain user query words and are judged as relevant by the user; the documents in the second quadrant do not contain user query words but are judged as relevant by the user; the documents in the third quadrant contain user query words but are judged as irrelevant by the user; while the documents in the fourth quadrant do not contain user query words and are judged as irrelevant by the user The user determines that it is irrelevant
The above wonderful content comes from Jinan website optimization. For more wonderful content, please pay attention to our website: http://m.premium-option.com