What can you do with robots.txt.liquid on Shopify?

From June 2021, robots.txt became editable in Shopify. But how do you edit it? What what are the use-cases? Should you even change it? I dive in.

An often cited SEO gripe with Shopify as a platform, is the lack of the ability to edit robots.txt. Something so straightforward with nearly every other CMS and ecom platform, has been so heavily locked-down in Shopify, until now.

Recently (see changelog) Shopify updated their theme system to allow store owners to edit the robots.txt file directly.

User-agent: everyone
Allow: /
Starting today, you have complete control over how search engine bots see your store. #shopifyseo https://t.co/Hz9Ijj5h1y
— Tobi Lutke (@tobi) June 18, 2021

Feature announcement Tweet from Shopify CEO in June 2021

If you are new to robots.txt, I’d highly recommend you heed the warnings on this page and Shopify documentation. In all likelihood – you don’t need to edit your robots.txt at all.

What is a robots.txt, and what does it do?

robots.txt is used by websites (or website owners) to set a series of crawl allowed and not allowed rules for bots to follow. What I mean by bots.

Nearly every website has robots.txt configured, for example: https://www.nytimes.com/robots.txt

Here’s an very basic robots.txt file example:

User-agent: Googlebot
Disallow: /checkout/

User-agent: *
Allow: /

Sitemap: http://www.example.com/sitemap.xml

^ for this example, these rules can be written out to mean something like this:

To Googlebot: Please don’t crawl the checkout pages
To All Bots: Please go ahead and crawl everything else
The sitemap: (list of all pages) is over here

Note: You can request and view a /robots.txt URL easily in a web browser, it will load like a normal page full of plain text. While we may think of robots.txt as a file, technically it is delivered as an HTTP response (not a file), same as any other web-page. It just has a Content-Type: text/plain header instead of text/html

nike.com has a very on brand robots.txt file with a commented out first line (#)

It’s important to be upfront and make the clarification that robots.txt is not intended or designed as a way to remove a page from search results. Even though you may have set Disallow, it is still possible for that URL to be indexed, remain indexed, and show up in search results. If you want to prevent a page from getting into the search index or remove it from the index, then you’re better to use a noindex tag, not a robots.txt Disallow. Crawling and indexing are two different things.

A bit about “bots”

You can think of a “bot”, “spider” or “crawler” as roughly the same thing in this context. Anytime a computer program or script programmatically accesses your website – that’s a bot. There are good bots and bad bots: good bots follow your rules, bad bots ignore your rules.

Most of the well-known bots like Googlebot, BingBot, Baiduspider, YandexBot, facebookexternalhit are “good bots”. Good meaning that typically they will read the contents of /robots.txt before making any page access requests to crawl your website, or a specific page on your site. Then they will (usually) follow your rules and not access parts of your site that you have “asked” them not to.

Bad or malicious bots are an entirely different topic, and preventing them from accessing your site, stealing your content or throwing off your web-analytics data goes beyond what you can achieve with a robots.txt file. This is because real “bad bots” will just ignore that the robots.txt file – it doesn’t block them, it’s just a request.

For more reading, check out Wikipedia on internet bots

How do you edit robots.txt in Shopify?

You can do this easily in Shopify Admin > Online Store > Themes > Edit Theme. Open robots.txt.liquid and edit it. If the file doesn’t exist, you can create one by using “default” Shopify robots.txt.liquid content from here.

Shopify will process the file through the standard theme templating engine (so Liquid tags and logic can run) and output under the standard /robots.txt URL route. This means if you’re feeling super advanced, you can even use custom liquid logic in your robots.txt.liquid file!

WARNING: Don’t break your SEO!

If you make a typo, or are following bad advice, robots.txt could easily stop search engines from crawling your site which over time, will be catastrophic to organic visibility and sales from organic search. If you are not sure, about exactly what you are changing, why you’re changing it and know about other options, then don’t change anything.

Even seasoned professionals can introduce bad robots.txt logic which can have wide-reaching effects. Tools like ScreamingFrog can let you test crawl an entire site after changing, to explore and evaluate the effect the effect of robots.txt rules, but require a bit of hands-on SEO experience to use effectively.

If you’re still hell-bent on changing the robots.txt file, I’d highly recommend that you try to understand what you are changing, why you are changing it and to what end before making non-standard customisations. Then after changing it, test and validate the “crawlability” of pages. Then even if you’re sure it’s right – monitor crawl data, coverage and other edge-case issues in GSC and crawl monitoring tools like ContentKing to watch for unexpected changes or side-effects.

If you just want to prevent a page from showing up in Google, Bing or another search engine, then a noindex tag, is usually a better way of doing this. In Shopify, my recommended approach is usually to use the seo.hidden metafield to control noindex tag insertion.

Also the official position of Shopify appears to be that they don’t support edits to this file. I take that to imply that if things do go badly, all they will help you with is restoring it to the default. I wouldn’t rely on Shopify tech support for spoon-fed tech support for robots.txt customisations. So if you go down this path, you will be on your own or you may well need to find a dev or an SEO person to help if things don’t quite go right.

Official Shopify robots.txt info:

Highly recommend that you refer to the official Shopify documentation for this, and read their warnings.

Developer Docs – Customize robots.txt (more detailed)
Shopify Help Center – Editing robots.txt.liquid

Here’s the default / standard Shopify robots.txt.liquid content:

{% for group in robots.default_groups %}
  {{- group.user_agent }}

  {%- for rule in group.rules -%}
    {{ rule }}
  {%- endfor -%}

  {%- if group.user_agent.value == '*' -%}
    {{ 'Disallow: /*?q=*' }}
  {%- endif -%}

  {%- if group.sitemap != blank -%}
      {{ group.sitemap }}
  {%- endif -%}
{% endfor %}

Note: Even before this June 2021 update, there have workarounds to edit the robots.txt file on Shopify. Previously if you really needed to change it, then it could have be achieved with EdgeSEO techniques like a CloudFlare O2O proxy + Cloudflare Worker to replace the /robots.txt response on the fly at the network edge with your own robots.txt content. This hacky workaround is no longer needed 🙂

Making changes to the standard robots.txt.liquid file

It’s really as simple as opening the robots.txt.liquid file in the theme and editing it. To be more specific, you could add static customizations above or below the existing content. Or if you understand Liquid, you could modify this. How best to edit the file in a specific situation, really depends on context and is case-by-case.

When should you change robots.txt in Shopify?

Quite honestly if you have to ask the question, the answer is most likely going to be that you don’t need to change anything. However there are certain edge-case situations where changing the robots.txt.liquid file is exactly what you need.

A few use-cases for changing robots

1. Allow internal search result pages

So I’m referring to internal site search URL’s like this:

outdoorshop.com/search?q=red+umbrellas

The default Shopify robots.txt blocks internal site search result pages with this rule here:

User-agent: *
Disallow: /search

Removing that rule is likely to have no SEO benefit in and of itself, and potentially even carries some risks that it will do more harm than good. Potentially introducing crawl and render budget issues.

But in saying that…

Here’s an example of internal search-result pages showing up in Google for “red umbrellas”

Admittedly, I had to skip past rich results like: PLAs, map-pack, PAA and images to get to the the blue-links, organic is so far down these days!

Here’s how that Amazon organic position 1 looks:

Their search result pages have been heavily customized to act as if they are PLP (Product Listing Pages) in and of their own right. You have likely even landed on pages like this in the past. Amazon makes incredible use of internal-search result pages and is a good example of how these pages can be leveraged.

To make this work in Shopify

First of all making these pages has any kind of SEO benefit, is far more work than a robots.txt change. In this hypothetical, a robots.txt change would really be the last step in a list of other strategic, theme, content and technical adjustments.

This entire approach typically only works on high SKU count stores (we’re talking thousands to tens of thousands of products) where the right breath and depth of category, searched for product names naturally translates into useful (to users & Google) internal search results landing pages.
Ideally, you would not just allow crawls, but also have a way of scaleably linking to the high-value pages via internal pages and sitemaps. Might also be good to create a sitemap_useful_internal_searches.xml of all the valuable pages, and add that to the robots.txt as well. Some of this could potentially be automated through GSC data with some kind of custom service running periodic checks like: “IF /search page gets organic clicks, THEN add it to the priority list if not already”
There are a number of scale problems here, with things like how to prioritize which internal-serp pages are important. As well as having a way to detect, and noindex out -ve sentiment and business sensitive queries.
You’ll likely need to adjust your search.liquid template get the basics right. So I’m talking title tags, meta descriptions and the overall semantic structure of H1s, all need adjustment – at minimum.
Ideally this would be used to create a much larger search surface and to discover new opportunity. However, even if a given internal search result starts performing well in organic, you may want to have a way of reacting to this and porting it over to it’s own PLP collections page. This would allow you to introduce more control, make it more useful to visitors, and create more unique content, like a description, FAQ’s, and below the grid sections.
Link aquisition – often people will share links to search result pages, you should probably find ways to encourage this, monitor new monitor backlinks, and make sure those pages get prioritised and indexed in Google.

So this approach is not going to be suitable for 99.9% of Shopify sites out there. It requires quite a bit of upfront planning, strategy, SEO capability and technical implementation. If you simply allow /search pages to be crawlable without any strategy, you are not likely to see any benefit.

2. Allow multi tag pages

You could make multi tag page URL’s. That contain a ‘+’ symbol, indexable. These pages are disallowed in robots for a reason – because they are typically thin duplicate content. You may have a reason and a way to make the content not thin by creatively editing the appropriate collections template/s, and then want to have the pages crawled and indexed.

3. Disallow duplicate collections page URLs being crawled

I saw this mentioned elsewhere, so I’m covering it, but only to try and talk you out of it. I don’t see robots.txt as the correct fix to deal with this problem.

What I’m referring to of course is how Shopify can have multiple URLs of the exact same product. These path variations inevitably get created when a product appears in multiple collections.

For example, we have a product with handle apple

/products/apple
/collections/fruit/products/apple
/collections/food/products/apple

Now, you could add a line into your robots.txt file to prevent ALL those path variation URLs matching /collections/*/products/* from being crawled, but you probably shouldn’t.

The reason I say this is, because those pages will still persist in the Google index if internal links pointing at them. My preferred “fix” here is to instead remove the internal links to the dupe collection pages by modifying the collections.liquid file in the right way.

In summary

Shopify has a pretty well thought out robots.txt config out of the box. If you have a good reason to change it, go ahead. But some of the changes I see discussed with robots.txt changes on Shopify raise alarm bells. If it’s not broken, don’t fix it.

If you have any corrections, suggestions, or other use-cases feel free to leave a comment or fire me an email, hi [at] this domain.

2 Comments

Sacha
February 12, 2024 at 8:46 pm

Hi Kieran,

I’m trying to educate myself here, but I’m seeing a bit too many dead image links on this page to continue my journey. It’d be good to have these images repaired I reckon – chances are your “average time spent on page” may shoot up?

Cheers,

Sacha

Kieran Reid (Post author)
June 20, 2024 at 12:21 pm

Yeah I know. Sucks getting hacked. Thanks for the heads up though. I’ll get around to restoring post images one day! You know what they say – builders house, cobblers shoes all that 🙂

Recent Posts

My Referrals