When creating an XML sitemap, you may want to modify it slightly before you upload it to your server. There are a number of reasons why you should manually edit your XML sitemap file which include, proper indexation, URL index limitations (although they have gotten better at indexing larger xml files), and to increase the % of pages that get indexed.
There are tons of XML sitemap generation tools like AuditMyPC, XML-Sitemaps, or the Google Sitemap Generator that will help you to create your sitemap, however many of these tools will not take the following into consideration.
- URL Duplicates: Make sure that you don’t include multiple versions of a URL. For example, if you have domain.com/services and domain.com/services/ make sure you remove one of the URLs so you don’t have any canonical issues.
- Robots.txt: Many of the sitemap generators will look at your robots.txt file and not include the directories or urls that you have already omitted. Just make sure that those directories are not being included in your sitemap.
- Error Pages: If you notice you have some 404 Error pages in your sitemap, you should either fix those pages so you can include them or remove them from the sitemap.
- Images: There is no need to include a list of all of your images in your XML sitemap. Google will come by and index the images when they index the page, so I wouldn’t try and have Google focus on your images.
- Invalid Links: Some sitemap generators will have this option, but ignoring invalid relative links will help you with submitting only valid links to the search engines.
Quick Tip: Make sure you don’t block your CSS files in your robots.txt file. Google will want to be able to read your CSS file to ensure that you are not doing anything blackhat with how you display your content.