Maneuver crawler to your will

control

I was reviewing one of our client websites and found few issues which are considered to be SEO pitfalls.  Would like to share this, I am assuming that most of us already know this, but, in this myriad of things we need to keep a tap on, it is almost necessary to remind ourselves just how important SEO aspect for the website you will build actually is.   Sitemap is almost an inherent choice and much needed one to submit to the search engine giants in some way or choice.

But, it is essential to remember what not to push to sitemap as crawler is busy doing it’s thing and indexing every bit of your fresh content as much as it can.   Few things to always remember –

  1. Ensure to not add any urls that either do not have any content or could lead to 404.  This is very essential to make sure we are not asking crawler to even for a milli-second think about pages that are not important and this will help ensure you don’t set up your own pitfall towards having a red mark on Google for instance.  You can do this by providing a way to either exclude content or templates when sitemap is being generated, easy enough.  But, is often missed. 🙂
  2. Now, if you are using any folders or intermediate content for just solely organizational purpose, ensure you have proper redirects in play to make sure if some one is either being smart or received an improper url, your set up does a terrific job of placing the end user where you wish him/her to be looking at when they pull up your site.  Magic, yeah. lol.  For the other side of users which is your content authors, ensure you have proper insert options and templates filled in with beautiful presentation that would then take care of all in the background keeping content authoring simple and easy peasy.

You do not have such fancy sitemap generation on your sitecore instance? Look the following options which are my favorite.  Setting these up should be real simple, but, as always tricky part is maintaining the solution and training some one who uses it to know these things that they can do to keep crawlers in check.

References / Suggestions 

https://marketplace.sitecore.net/Modules/S/SitemapXml.aspx

https://marketplace.sitecore.net/en/Modules/XML_Sitemap_Generator.aspx

https://github.com/JimmieOverby/SitecoreSitemapXML    — My Personal Favorite with more customization

Tons of other modules on market place, explore more:

https://marketplace.sitecore.net/SearchResults#query=sitemap

Sitecore URL Duplication

Sitecore is amazing when it comes to how it can resolve a URL seamlessly to an item and how we can tune that based on requirements and project demands.
In most of the blogs out there – Good reads
https://jammykam.wordpress.com/2015/07/13/seo-friendly-urls-in-sitecore-prevention-is-better-than-cure/  – My Fav
http://reinoudvandalen.nl/blog/using-replacement-characters-in-sitecore-the-right-way/
https://www.cmsbestpractices.com/add-seo-value-by-replacing-spaces-with-dashes-in-sitecore/

you will see how you can tackle LinkManager using  to ensure you always have a good SEO friendly URL using <encodeNameReplacements> and how to avoid the side effects of this by ensuring we do not allow hyphen in item names using “InvalidItemNameChars”

Now, there are other problems we will have to deal with.  Google does not like duplicate content, which means when two URL’s yield same result, in our case render same item content.  It means it is duplication.

But, though spaces are replaced by hyphens internally, it does not do anything when user or say some refferal link some where actually has spaces
For example – www.domain.com/test%20page and www.domain.com/test%20page would yield same page/content

Not Good…Google would not like it!

So, good SEO options are either do a 404 on space version(%20) above or do a 301 re-direct.  Canononical links might help a little if you cant do both of the above.

You could also you IIS re-write rules to replace spaces with ‘-‘ instead of %20

Wait for more…