Creating Proper robots.txt and sitemap.xml for WordPress

1. Creating Proper robots.txt

The robots.txt file plays a crucial role in search engine optimization by instructing search engine robots on which pages to index and which to ignore. Here’s how to create a proper robots.txt for WordPress:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /*/trackback/
Disallow: /*/feed/
Disallow: /*?*
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.txt$
Disallow: /*.xml$
Disallow: /*.html$
Disallow: /*.pdf$

User-agent: Yandex
Disallow: /cgi-bin
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /*/trackback/
Disallow: /*/feed/
Disallow: /*?*
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*.gz$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.txt$
Disallow: /*.xml$
Disallow: /*.html$
Disallow: /*.pdf$

Host: www.example.com
Sitemap: https://www.example.com/sitemap.xml

Explanation:

  • User-agent: * – General rules for all search engines.
  • User-agent: Yandex – Specific rules for the Yandex search engine.
  • Disallow – Prevents indexing of specified directories and files.
  • Host – Specifies your website’s domain name.
  • Sitemap – Specifies the location of the sitemap.xml file.

2. Creating a Sitemap (sitemap.xml)

The sitemap.xml file helps search engines discover and index important pages on your site. In WordPress, you can use a plugin to automatically generate the sitemap.xml.

Example sitemap.xml generated by the Google XML Sitemaps plugin:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://www.example.com/</loc>
        <lastmod>2024-05-01</lastmod>
        <changefreq>weekly</changefreq>
        <priority>1.0</priority>
    </url>
    <url>
        <loc>https://www.example.com/about/</loc>
        <lastmod>2024-05-01</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
    <url>
        <loc>https://www.example.com/contact/</loc>
        <lastmod>2024-05-01</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

Explanation:

  • <urlset> – The root element containing the list of URL addresses.
  • <url> – Each URL of your site should be nested under <url>.
  • <loc> – The URL address of the page.
  • <lastmod> – The date of the last modification of the page.
  • <changefreq> – The expected frequency of change to the page’s content (always, hourly, daily, weekly, monthly, yearly, never).
  • <priority> – The priority of the page relative to other pages on your site (from 0.0 to 1.0).

Recommendations

  • Regularly Update sitemap.xml: After setting up with the Google XML Sitemaps plugin, it will automatically update when new pages or posts are added.
  • Verify robots.txt: Check the functionality of your robots.txt file using tools like Google Search Console or Yandex.Webmaster.

These steps will help optimize your site for search engines, improving its visibility and ranking in search results.