You may not be familiar with the term “crawl budget” so here is a simple explanation; a crawl budget is the number of pages Google and other search engines will visit in a day on your website. This number could be anywhere from four to four thousand, its not always the same number of pages but typically it keeps within a similar amount each time. It is important to know how many pages of your website are being crawled (especially if you have a larger website and it’s likely it isn’t every page) so you know which need to be prioritised for the crawl and what happens to new content and pages you’ve added.
What Effects A Crawl Budget?
There are a few factors when considering what impacts a website’s crawl budget:
- The age of the domain – The older the domain, the better.
- The number of links to the website – The more (good quality) websites that link to your website, the better.
- Content, both quality and regular – Having good quality content on your website and regularly updating existing and adding new content to the website.
If you have a lot of products or pages on your website, or if it takes a long time to load a page for the search engine, the website may not be fully crawled in a single day aka your crawl budget runs out. Therefore, it is important to prioritise what content the crawler sees first. You can use Google Search Console to see how many of your pages are being viewed by Google in a day. Because Google has different ranking factors for; desktop, mobile and maps, each one has different crawlers and be crawled at different rates for each one.
How Do You Prioritise Content?
You can use robots.txt to block a crawler from accessing certain pages and areas of your website, this can be done directly onto the robots.txt file or you can access it through Google Search Console. Be careful using this however because the robots.txt file can be used to block your website entirely from Google and if you don’t implement the tool correctly. Robots.txt is used to tell search crawlers what to do and more importantly what not to do. They don’t technically have to be followed but it’s been used for a very long time for a guideline of how to behave on a website.
Alternatively to the robots.txt file, you can use meta tags to suggest to the crawlers what to do on each page they are visiting. These robots meta tags can govern what the crawler can do on each page:
- Noindex – Do not add this page to the search results.
- Nofollow – Do not follow the links on this page.
- Notranslate – Do not translate the content on this page.
- Nosnippet – Do not allow this page to be added as the snippet in the search results.
Noindex is the most commonly used robots meta tag, suggesting to the crawlers that they don’t need to crawl and index this page. This will mean that the crawlers will move onto the next page and not waste any time on the Noindex page, allowing you to control what pages get index first.
If you want us to review your crawl budget or advise on what content you need to prioritise, get in touch with our team today.