SEO新手必读:2022年搜索引擎工作原理大揭秘
搜索引擎是如何工作的?
How does search engine work?
SEO小也在开头提到,搜索引擎要为用户提供网站信息,需要完成三个主要任务:爬取网站(Crawing)、创建索引(Indexing)、建立排名(Ranking)。当然,其中的技术实现非常复杂,作为一名SEO新人,可能暂时不需要深入研究这些问题。
As mentioned earlier, search engines need to complete three main tasks to provide website information to users: Crawling, Indexing, and Ranking. The technical implementation is quite complex, and as an SEO beginner, you may not need to delve too deep into these issues for now.
1. 搜索引擎抓取(Crawing)
1. Search Engine Crawling
搜索引擎要将相关内容展示给用户,第一步是派出它的小弟——搜索引擎爬虫(蜘蛛),它们会不断抓取互联网上的新内容或更新数据库中的旧内容。内容形式多种多样,可能是网页、PDF文件、MP3音频文件等,但它们都是通过URL来找到这些内容的。
To display relevant content to users, search engines first send out their "little helpers" - search engine crawlers (spiders). These crawlers continuously fetch new content from the internet or update old content in databases. The content can be in various formats like web pages, PDF files, MP3 audio files, etc., all found through URLs.
2. 搜索引擎索引(Indexing)
2. Search Engine Indexing
建立索引是一个复杂的过程,涉及算法、地理环境、社会学研究等多个因素。搜索引擎会根据许多参数来控制这些内容的分类,但最重要的是内容的相关性,相关性越高,被分到同一分类的可能性就越大。
Indexing is a complex process involving algorithms, geographical factors, sociological research, and more. Search engines use numerous parameters to classify content, with relevance being the most important factor - the higher the relevance, the more likely content will be grouped together.
3. 搜索引擎排名(Ranking)
3. Search Engine Ranking
用户在搜索引擎输入框输入关键词后,搜索引擎会在其庞大的索引数据库中快速找到相关内容,并根据内容的相关性和其他参数对内容进行排序。排名靠前的内容通常是搜索引擎认为与用户问题相关性最高的回答。
When users enter keywords in the search box, the search engine quickly finds relevant content from its massive index database and sorts the content based on relevance and other parameters. Top-ranked content is usually what the search engine considers the most relevant answer to the user's query.
搜索引擎能找到你吗?
Can search engines find you?
如前所述,如果要让自己的网站出现在SERPs中,前提是网站要被搜索引擎蜘蛛爬取和索引。如果你已经有了网站,可以使用site命令查看网站的收录情况。
As mentioned, for your website to appear in SERPs, it must first be crawled and indexed by search engine spiders. If you already have a website, you can use the "site:" command to check its indexing status.
常见未被收录的原因:
Common reasons for not being indexed:
- 网站是新站,搜索引擎还没有收录
- 网站没有外部导入链接
- 网站目录结构太深太复杂
- 网站包含阻止搜索引擎爬虫的代码
- 网站可能被搜索引擎处罚
- New website not yet indexed
- No external backlinks
- Overly complex website structure
- Code blocking search engine crawlers
- Possible search engine penalty
重要提示:我们有时过于关注如何让搜索引擎抓取内容,而忽略了如何不让搜索引擎抓取一些内容,例如重复页面、搜索参数以及公司的联系方式、留言等。这些内容被收录的意义不大。
Important Note: Sometimes we focus too much on getting content crawled while neglecting to block unnecessary content like duplicate pages, search parameters, contact information, etc., which have little value in being indexed.
