The Turnstyle Web Reader will dynamically generate sitemaps based on a publisher’s content. Two sitemaps are generated:
- Publisher Sitemap Index – Located at the root domain, the sitemap index points to the sitemap of each publication (also known as a collection) owned by a publisher.
- Publication Sitemap – located at the collection level, this sitemap will generate an entry for each issue published in that collection.
For demonstration purposes, let’s use Acme Publishing, Inc. In this example, they publish two titles: Coyote Living and Anvil Magazine.
Root Domain: https://www.acmepublishing-digital.com
Collection 1: Coyote Living (https://www.acmepublishing-digital.com/coyote-living)
Collection 2: Anvil Magazine (https://www.acmepublishing-digital.com/anvil-magazine)
Publisher sitemap index location: https://www.acmepublishing-digital.com/sitemap.xml
Publisher Sitemap Index Example:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex version="2.0" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc> https://www.acmepublishing-digital.com/coyoteliving/sitemap.xml </loc> <lastmod> 2019-12-10T18: 17: 26.000Z </lastmod> </sitemap> <sitemap> <loc> https://www.acmepublishing-digital.com/anvilmagazine/sitemap.xml </loc> <lastmod> 2019-12-10T18: 17: 26.000Z </lastmod> </sitemap> </sitemapindex>
For each title, a sitemap is dynamically generated with a link to every issue in that collection. For example, Coyote Living has 5 issues published:
Publication Sitemap Example
<?xml version="1.0" encoding="UTF-8"?> <urlset version="2.0" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc> https://www.acmepublishing-digital.com/coyoteliving/dec_jan_2020 </loc> <changefreq> monthly </changefreq> </url> <url> <loc> https://www.acmepublishing-digital.com/coyoteliving/july_2019 </loc> <changefreq> monthly </changefreq> </url> <url> <loc> https://www.acmepublishing-digital.com/coyoteliving/june_2019 </loc> <changefreq> monthly </changefreq> </url> <url> <loc> https://www.acmepublishing-digital.com/coyoteliving/may_2019 </loc> <changefreq> monthly </changefreq> </url> <url> <loc> https://www.acmepublishing-digital.com/coyoteliving/april_2019 </loc> <changefreq> monthly </changefreq> </url> </urlset>
This will help the bots crawl each issue and follow any internal links. We recommend following Google’s instructions for best practice.
Regarding duplicate content
We understand that omnichannel publishing is the new normal and that publishers will often want to distribute an article from their digital edition to other platforms, including their own website.
Google’s definition of duplicate content is as follows:
“Duplicate content generally refers to substantive blocks of content within or across domains that either completely matches other content or are appreciably similar. Mostly, this is not deceptive in origin.”
That last part is important. If you scrape, copy and spin existing content — Google calls this copied content — with the intention of deceiving the search engine to get a higher ranking you will be on dangerous ground.
Google says this type of malicious intent might trigger an action:
“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results”
In our experience, Google crawlers have historically viewed articles in the web reader in the context of the issue they belong to. The article is unique enough so as not to be seen as malicious, even if it is duplicated on other websites.
While not technically a penalty, duplicate content can still sometimes impact search engine rankings, as search engine results will only display one version of the article. This can potentially dilute the visibility of both versions and have an adverse effect on inbound link equity since other sites will link back to only one of the articles.
In order to avoid duplicate content, best practices suggest using canonical URLs. This points crawlers to the original article so they know which one should get SEO credit. For our publishers, the article usually originates in the digital edition, so they use the web reader article as the canonical URL in the reproduced article on their website.
In some cases, publishers want the article reproduction to be considered the original. However, our web reader has been engineered in such a way that prevents any article-specific metadata to be added manually, at least for the time being. Alternative methods, such as 301 redirects and nofollow entries in robots.txt are also not available at this time.