Manage Crawler Robots with Bissetii in Hugo

Bissetii strives to be SEO compatible and friendly to crawler robots by default. This is where Bissetii shines from other Hugo themes where Bissetii supplies a full interface to handle crawlers easily.

Customizing `robots.txt`

Bissetii supplied a method to customize Hugo’s robots.txt file. Depending on the Bissetii version you use, the customization methods are different.

The output of the robots.txt is always at the root of the website. Example, for this site, it is located at: https://bissetii.zoralab.com/robots.txt

Version `v1.13.0` and Above

To customize robots.txt, you can create your robot TOML data file and place it inside the following directory:

file path:             data/bissetii/robots/
repo-docs:       docs/.data/bissetii/robots/

The filename (e.g. google from google.toml) shall be used as User-Agent field. The only exception is all where it will be renamed as * resulting User-agent: *).

Data file Content

The robot TOML data file content is shown as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


Sitemap = "{{ .BaseURL }}/sitemap.xml"

Allow [
	"/",
	"/en/",
]

Disallow = [
	"/en/internal/",
	"/zh-hans/internal/",
]

Crawl-delay = 5

Sitemap - COMPULSORY
- The URL location of your root sitemap.
- The use of {{ .BaseURL }} is available for multi-facing website where Bissetii will replace it with your base URL.
Allow - COMPULSORY
- The allowed URL array list
Disallow - OPTIONAL
- The disallowed URL array list
- If the list empty, avoid declaring Disallow to keep the rendering simple.
Crawl-delay - OPTIONAL
- Specifies the delay timing for the specific crawler.

Say the above data file is named as GoogleBot.toml, it will be rendered as:

1
2
3
4
5
6
7


User-agent: GoogleBot
Allow: /
Allow: /en/
Crawl-delay: 5
Disallow: /en/internal/
Disallow: /zh-hans/internal/
Sitemap: https://www.example.com/en/sitemap.xml

Version `v1.12.5` and Below

Bissetii does not have any processing solution due to Hugo Bug #5160. The only way is to supply your raw robots.txt via your static/ directory.

To override Bissetii’s default file, you can create the same robots.txt in the same path.

The enableRobotsTXT will be disabled due to Hugo’s multi-language bug in config/_default/config.toml. Hence, the guide for robots.txt if Hugo’s main documentations no applied into these Bissetii versions.

Page-Specific Crawler Instructions

Bissetii also supports page specific meta tag for ad-hoc robot management. To add a robot rules tag, you need to add each robot’s rule into the [robot] table inside the page’s Hugo front matter. Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


+++
...
[robots]
[robots.googleBot]
name = "googleBot"
content = "noindex, nofollow"

[robots.twitterBot]
name = "twitterBot"
content = "noindex"

...
+++

[robots] - COMPULSORY
- Denotes the following fields belong to robots table.
[robots.NAME] - COMPULSORY
- Denotes the following fields belong to robots.NAME table.
- Provide a TOML compatible NAME. Otherwise, keep it to the robot name itself.
name - COMPULSORY
- Name of the robot. Will be used as name= attribute inside the <meta> tag.
content - COMPULSORY
- instructions for the robot. Will be used as content= attribute inside the <meta> tag.

The above will be rendered as:

<meta name="googleBot" content="noindex,nofollow" />
<meta name="twitterBot" content="noindex" />

Wrapping Up

That is all for managing crawler robot with Bissetii in Hugo. If you have any question to ask us directly, please feel free to raise it at our GitLab Issues Section. We will be happy to assist you.