First Google crawls the web
Notionally, Google work starts with crawling.
As there’s no central registry of all web resources in the world, Google somehow needs to explore the whole web itself on a regular basis. To do this, Google uses automated software known as a web crawler, or simply Googlebot.
Googlebot regularly wanders through the Internet and searches for new or recently updated webpages. This process is called crawling. As a rule, it is done in a few ways.
First, Googlebot visits the pages it’s already discovered during previous crawls. Here it follows either all links found there or the XML sitemap if there is one submitted. All the newly found pages are then added to the list of pages to crawl later.
Second, Googlebot crawls the pages that are submitted by site owners via Google Search Console. There, the crawler gets another portion of webpages to add to its crawling queue.
Normally, Googlebot will crawl all the new pages it finds. However, a page will not be crawled if:
- it’s disallowed from crawling in the site’s robots.txt file
- it can’t be accessed by an anonymous user, e.g., login pages.
If a page is a duplicate of another page, Googlebot will visit it less frequently to make crawling more efficient.
If you feel the need to study the topic in-depth, there is our cool guide on how Google crawler works to read.
Then Google adds the pages to the index
After Googlebot finds a new page, it then tries to understand what this page is about. The process is known as indexing. It includes a thorough analysis of all page elements such as text content, meta tags and attributes, images, and videos, etc.
As a rule, all newly discovered and crawled pages are then getting indexed. The only exception is if the page has the noindex directive in a tag or header. In this case, Googlebot won’t index the page.
When indexing is done, the crawler catalogs the page in the Google index – the database of Google Search. For now, the Google index counts hundreds of billions of webpages.
Once this new page is indexed, it is ready to be served to searchers.
When Google gets a query, it serves the search results
Every time a user enters a query into the Search box, Google turns to its index to find and serve the most relevant results. The process is called “serving” and includes eight steps.
1. Defining context and narrowing down the index
By the moment you submit your search request, Google will already factor in a few things that will help it narrow down the index, and filter out irrelevant results.
Here’s what Google checks even before you hit Enter:
- Google checks your location to deliver content relevant to your area. Thus, when you search for a vegan cafe nearby, you will see a Local pack (a map with three local businesses listed) even if you don’t specify the location.
- Google analyzes the language of the query. If you search in German, there will be search results in German regardless of your location and the preferred language specified in Search settings.
- Google looks at your device type. If you’re using a phone, Google will display mobile-friendly pages first. Moreover, this also determines what SERP features you’ll see. E.g., featured snippets and ads are more often returned on desktop, while some other features are unique for mobile search.
- Google sticks to your search settings. If you turn on the SafeSearch filtering, Google won’t show you explicit search results. Likewise, if you set Show personal results, you will get personal answers and recommendations based on the information in your Google account.
2. Identifying the meaning and intent of the query
After you’ve submitted your search request to Google, it then has to understand the actual meaning behind your query. It’s not always that users know how to spell something correctly or phrase the query the way webmasters do.
The first thing Google does for that matter is recognize new words and correct spelling mistakes. Google uses natural language understanding models to decipher unknown words, slip-of-finger and conceptual mistakes. This is mainly achieved by looking at the entire query instead of focusing on one word.
Then Google identifies the meaning and intent of the query. Earlier, Google was matching words in queries to words on pages without understanding their meaning. Everything changed with the Hummingbird algorithm introduction in 2013. That’s when Google stepped into a new era of semantic search and developed its capabilities of understanding the meaning of the query rather than individual keywords. This update is the forerunner of the Artificial Intelligence systems that became the biggest breakthrough in natural language processing.
I’ll be honest with you. SEOs around the globe are trying to figure out AI algorithms Google uses, but the topic is anything but clear. Maybe, it’s because Google doesn’t want to share its trade secrets. Or maybe it’s Google spokesmen that aren’t in the know enough. Anyway, the most authoritative and clearly-worded reading on the topic is this Barry Schwartz’s post.
There are 3 semantic processing systems Barry makes emphasis on: RankBrain, Neural Matching, and BERT. They were launched gradually, and their aims overlap. So, I divided their spheres of influence to simplify this for you:
|RankBrain, 2015||Neural Matching, 2018||BERT, 2019|
|Matching queries to specific real-world concepts||Matching queries to their synonyms||Matching words in the query to specific syntactic roles|
|Example: If you search for “what’s the title of the consumer at the highest level of a food chain,” Google’s systems know the concept of a food chain may have to do with animals, and not human consumers. By understanding and matching these words to their related concepts, RankBrain helps Google understand that you’re looking for what’s commonly referred to as an “apex predator.”||Example: If you search for “insights how to manage a green,” Google applies its synonym system to identify broader meanings behind the words (like management, leadership, personality, and more) and decipher that you are looking for management tips based on a popular, color-based personality guide.||Example: If you search for “can you get medicine for someone pharmacy,” BERT helps Google understand that you’re trying to figure out if you can pick up medicine for someone else. Before BERT, Google took that short preposition for granted, mostly surfacing results about how to fill a prescription.|
By applying these three AI algorithms and enhancing the process with some dark art, Google understands the meaning of the query and moves on to the next stage.
3. Checking if the query requires new content
Once Google grasps the meaning and intent of your search query, it then checks if you’re looking for something that requires the most recent and up-to-date information (news, politics, events, etc.).
To detect if you are looking for current information, Google applies the Query Deserves Freshness (QDF) mathematical model to your query. First, the model identifies that the topic is hot if news sites or blog posts are actively posting about it. Or simply if the volume of searches on a topic increases. When Google concludes it is the topic you want to get the freshest information on, it then rewards up-to-date content with higher rankings.
For example, when you search for “prince harry and meghan”, you probably expect to see some news about them. So, Google shows Top Stories with the latest news about the couple at the top of the SERP.
4. Checking if the query is Your Money or Your Life
Along with QDF check, Google examines your query to see if it is the one for which Google considers unacceptable to return unreliable content. Such queries and pages are called Your Money or Your Life (YMYL). As a rule, these are health, safety, financial, etc., topics.
It’s become possible to distinguish Your Money or Your Life queries and match them to the right content with the Medic update. If Google decides that the query requires YMYL content, it evaluates the expertise, authoritativeness, and trustworthiness (E-A-T) of the relevant pages, their creators, and the websites in general. Pages with a higher E-A-T score will be eventually ranked higher.
For example, if you search for “stock exchange”, the first SERP will mainly consist of highly trusted pages like Nasdaq, London Stock Exchange, New York Stock Exchange, etc.
5. Defining what the SERP will look like
Depending on the type of query you enter, SERP may look different. For example, along with ten blue links, it may show a bunch of ads, Knowledge Graph results, a map, and so on.
So, before Google returns its final SERP, it decides what type of search results will be the most suitable. As practice shows, the SERP structure highly depends on search intent:
There is also a noticeable difference between how Google chooses what SERP features to show for mobile and desktop search.
For example, mobile SERP possesses the following unique features: Broaden this search and Refine this search (Predictive features), Knowledge Panel with the View in 3D feature, Short Videos, and Web Stories.
Meanwhile, there are some features that are shown more often on desktops, e.g., ads and featured snippets. Here is an example of how different the first SERP for the same query may look like:
The logic behind such a difference lies in the way we use these two types of devices. While at the desktop, we have more time to study text content. When we use our phones, on the contrary, we expect to find the information as quickly as possible. So, Google “equips” the SERP with more predictive and visual features.
6. Picking the most relevant pages for each type of search results
After Google grasps the concepts in the query and pages, it looks at how well the information on a website corresponds to the search query. To assess the content relevance, Google analyzes text, images and videos, as well as all the meta elements like title, meta description and alt tags.
Those pages that are more relevant, i.e. meet user requirements best, will be ranked higher. That said, you should remember that content relevance, though vital, is not the only ranking factor. It’s the combination of many factors that can guarantee high positions on the SERP.
7. Balancing pages’ relevance and importance
Google ranks pages prioritizing the most reliable and quality content. In fact, it tries to achieve the right balance of information relevance and authoritativeness at this stage.
The first thing Google does for this purpose is assess the quality of the page’s content. So, it identifies the signals that demonstrate expertise, authoritativeness, and trustworthiness on a given topic. This process includes the following:
- Estimating PageRank. Google checks if other prominent websites link to or refer to the given page’s content. The count also matters. The more backlinks from quality sites the page gets, the higher its chance of ranking at the top is.
- Detecting any spam or other deceptive or manipulative behavior thanks to the anti-spam algorithm. Naturally, everything that violates Google Guidelines won’t be ranked high.
- Checking if a site is safe. Google considers HTTPS a gold standard as it provides encryption, data integrity, and authentication. If the page provides a safe user experience, it’s rewarded.
And as Google places user experience above all, it also checks if the page is easy to navigate and use – the page’s usability. The process is also rather complicated and includes the following:
- Checking the page for intrusive interstitials. If there are popups that prevent users from consuming the main content, the page is not going to rank high.
- Checking if the site is designed for all device types. Web content should be equally easy to consume, be it mobile, tablet, or desktop.
- Taking into account the site’s Core Web Vitals. Loading, interactivity, and visual stability determine how engaged your visitors will be and how gracious Google will be to your content.
Obviously, pages that provide both quality and usability tend to rank higher in search results.
8. Returning the result to users
When your query is analyzed from all angles, and the AI algorithms have done their job, Google finally returns the most relevant search results. Just look at the image below – this whole process takes a fraction of a second.
Fun fact #1: The amount of time you’ve spent on reading this guide up to this point would suffice for Google to process 38 million queries.
Fun fact #2: You may think that you’ve just figured the Google algorithm out. But it’s too early to break out the champagne – the algorithm can change tomorrow.
Google never stops improving its algorithm
Google can’t change specific search results manually to make search better. Instead, it constantly changes and adapts its algorithms. For example, in 2020, Google introduced around 4 500 improvements to Search. On average, it’s around 12 changes per day – we can say Google is a real hard worker.
I tried to break down Google’s efforts in this regard below.
1. Fighting webspam
For Google, fighting spam is a pain in the neck. In 2020 alone, Google claimed they were finding around 40 billion spammy pages daily.
From Google’s perspective, anything that deceives users and goes against Google Quality Guidelines is considered spam. These are:
- automatically generated content
- sneaky redirects
- link schemes
- thin content
- paid links
- hidden text and links
- doorway pages
- scraped content
- pure affiliate sites
- irrelevant keywords
- pages with malicious behavior
- automated queries
- user-generated spam.
In fact, spam fighting is a multistep process, which involves both Google AI algorithms and manual review by the spam removal team.
A huge portion of spam webpages is filtered out between the crawling and indexing stage. The remainder that slips through is caught by the filters later during the ranking and serving stage.
Despite the perfection of current anti-spam algorithms, some webpages still make it to SERPs. This is when Google’s spam removal team comes into play. They review spam reports submitted by searchers and take manual actions against the sites that violate Google. As a result, spammy websites get downranked or even excluded from search results.
In the unlikely event you receive a manual action from Google, don’t panic. First, you’ll see a corresponding notification in your Search Console. Then, it’s crucial to eliminate all the issues that might have led to this. Once everything is fixed, your site is likely to get rankings back.
2. Testing algorithm
Naturally, it’s impossible to perfect search without tests and experiments. Each new idea that comes to Google minds is tested rigorously before it’s launched.
Thus, to improve search quality, Google works with Search Quality Raters – a group of independent reviewers from all over the world. The raters assess how efficient the search is and if the provided search results are satisfying the search intent of a user. Additionally, they evaluate the quality of search results based on the Expertise, Authoritativeness and Trustworthiness of the content. What’s important, they do all that strictly following Quality Rating Guidelines.
Besides search quality tests, Google also runs side-by-side experiments, again with the help of Quality Raters. Google shows Raters two different sets of search results: one with the proposed change and one without. Then they ask Raters which results they prefer and why.
The ratings provided by Quality Raters don’t directly impact the rankings of a page. Instead, this information is taken in aggregate to help Google measure how well their search algorithms perform.
More to that, Google runs live traffic experiments to see how real people interact with a feature under test. It enables the feature for a small group of users and then compares the results with a control group. If the result isn’t satisfying enough, the feature isn’t approved for further integration.
To complete the picture, let’s dive into the latest Google updates.
3. Latest developments
Google updates can be basically divided into two groups.
The first group is the minor updates. As a rule, they go unnoticed by searchers, and result in mild ranking fluctuations for SEOs. Google typically doesn’t provide any details on such changes.
The second group includes Google’s major (core) algorithm updates, which are of particular interest because sometimes they significantly change the game for both users and SEOs. Below, I’ve put together some of the most prominent updates for the last 7 years.
Quality content serving:
- Medic update (August 2018). This algorithm was rolled out to improve the identification of expertise, authoritativeness and trustworthiness of web content. This is done to promote the YMYL pages with the highest E-A-T scores to the top of search results.
- Passage ranking update (February 2021). With it, Google can assess the relevance of a specific passage, rather than the whole page, and rank it individually. Now you can find even needle-in-a-haystack information among lots of lines.
- Search spam updates (2021). The updates targeted content that goes against the Google webmaster guidelines and were designed to fight spam in the web and image results more effectively.
- Link spam update (July 2021). Thanks to it, Google can identify and nullify link spam more broadly, across multiple languages. This way, the effectiveness of deceptive link building techniques was significantly reduced.
- Product reviews updates (2021, 2022). With it, Google can identify and effectively reward high quality product reviews with better rankings. Now, Google provides users with even more helpful and valuable information.
Natural language and search intent understanding:
- RankBrain (October 2015). This is the first machine-learning algorithm that can process even never-before-seen search queries and smarter match them to relevant pages.
- BERT (October 2019). The introduction of this NLP algorithm changed the way Google understands words in queries. Thanks to it, Google can grasp even the slightest nuances in context and therefore effectively match the queries to the proper results.
- MUM – Multitask Unified Model (June 2021). This new algorithm is many times more powerful than BERT. MUM can understand complicated questions and information of all types (photo, video) across multiple languages. Thanks to MUM, Google will learn to answer users’ questions the way real experts would. The update is relatively new, so it will take time for us to see its full potential.
Providing excellent user experience:
- Mobile–friendly updates (2015, 2016). They were meant to boost the ranking of mobile-friendly pages in the mobile search results. Now users can easily find relevant results that are readable without zooming and horizontal scrolling. Also, the updates made mobile-friendliness a ranking signal for mobile search.
- Accelerated Mobile Pages (AMP) framework (2016). This open-source project was developed to help mobile pages load much faster, but was expanded to desktop sites, emails, ads, etc. With it, the page content is loaded even before one visits it.
- Mobile-first indexing (2019). This is the next stage of mobile-friendly updates – Google not only is rewarding mobile-friendly pages with high rankings, now it also primarily uses the mobile version of the site for crawling, indexing, and ranking.
- Page experience updates (2021, 2022). Google added Core Web Vitals (Largest Content Input, First Input Delay, and Cumulative Layout Shift) as signals for page experience both for mobile and desktop searches. Thus, to rank pages Google now checks if they load quickly, are mobile-friendly, run on HTTPS, if there are no intrusive ads, and if content doesn’t move as the pages load.
Google Search algorithm will always be surrounded by mystery, no matter how hard the global SEO community tries to hack it. The reason is that Google wants to prevent any manipulation of search results from third parties and therefore discloses just a fraction of how it really works.
I hope that my article lifted the shroud of secrecy and helped you understand some basics of how Google and its algorithm work. If you have any questions, welcome to the comments.