How To Manage Crawler Robots As Hugo Theme

Like all SEO enabled sites, there should be a default robots management across each page and via the singular "robots.txt" for crawlers to crawl in. Starting from version v1.12.0, Bissetii supports both page-level meta robot tag and the single "robots.txt".

Customizing robots.txt

Depending on Bissetii version, Bissetii renders robots.txt differently.

Version v1.13.0 and Above

Having Hugo fixed its robots.txt placement in https://github.com/gohugoio/hugo/issues/5160 (tested with Hugo version v0.78.2), Bissetii can now safely reverting back to using Hugo renderer to create robots.txt as documented in Hugo.

To customize robots.txt, you can create your “User Agent” data file inside your data/bissetii/robots data directory. The pathing, depending on your configurations, is as follow:

filepath pattern:             data/bissetii/robots/<User-Agent>.toml
repo-docs:              docs/.data/bissetii/robots/<User-Agent>.toml

The filename serves as the value for User-agent field. The only exception is all where it will be renamed as * (User-agent: *).

You can create as many <User-Agent>.toml as you want (e.g. all.toml, GoogleBot.toml, etc). They will all be processed into a single robots.txt file.

By default, Bissetii supplies all.toml to define sitemap location as required by most search engines optimization.

Defining Rules for An User-Agent

You can define the rules for that particular crawler robot using TOML format. For relative URL construction, Bissetii supplies {{ .BaseURL }} as a placeholder to be replaced with the actual .Site.BaseURL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Relative to .Site.BaseURL (placeholder {{ .BaseURL }}) for "Sitemap" field.
Sitemap = "{{ .BaseURL }}/sitemap.xml"


# Multi-values "Allow" field. The value will be placed in as it is.
# Placeholder can be used.
Allow = [
	"/",
	"/en/",
]


# Multi-values "Disallow" field. The value will be placed in as it is.
# Placeholder can be used.
Disallow = [
	"/en-us/",
	"/zh-cn/",
]


# "Crawl-delay" field.
Crawl-delay = 5

As an example, say the filename is GoogleBot.toml, the above will be converted into:

1
2
3
4
5
6
7
User-agent: GoogleBot
Allow: /
Allow: /en/
Crawl-delay: 5
Disallow: /en-us/
Disallow: /zh-cn/
Sitemap: https://bissetii.zoralab.com/en/sitemap.xml

Version v1.12.0 to v1.12.5

Bissetii facilitates robots.txt in static/robots.txt. The location is at static root directory. By default, Bissetii has the following content:

User-agent: *

To override Bissetii’s default file, you can create the same robots.txt in the same path.

The enableRobotsTXT will be disabled due to Hugo’s multi-language bug in config/_default/config.toml. Hence, the guide for robots.txt if Hugo’s main documentations no applied into these Bissetii versions.

Meta Robot Tag

Bissetii also supports page specific meta tag for robot management. To add a robot rules tag, you need to add each robot’s rules into the [robots] front-matter TOML map-table. Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
+++
...
[robots]
[robots.googleBot]
name = "googleBot"
content = "noindex, nofollow"

[robots.twitterBot]
name = "twitterBot"
content = "noindex"

...
+++

This will render the HTML as:

<meta name="googleBot" content="noindex,nofollow" />
<meta name="twitterBot" content="noindex" />

You can create multiple robot tags for this specific page.

Epilogue

That’s all for how to manage robot crawlers for your website as Bissetii Hugo Theme user. If you have any questions, please feel free to place your query in our Issues Section.