Top
fr
We Design & Host Your Place on the Web

Troubleshooting Search Engine Problems

To try to combat the perennial problems webmasters of all ilk face with search engines, especially Google, here's a 14-point website tune-up plan, that may help :

More FAQ ...

  1. Get server response headers for homepage - it should be a server response code 200; for a missing page it should be a 404.
    Resolve any canonical domain issues before they become a headache (www vs non www - make a decision) by using a 301 redirection of one to the other one to be retained. Here's a tutorial for apache servers.
  2. Run a spider simulator for the homepage and others - if not enough text is found, then there's no content to index.
  3. Run Xenu Link Sleuth starting at homepage and check broken links and number of true links found - unless it never completes the crawl which will mean a problem. Compare list of good links found with your sitemap.
  4. Run a web page spam detector - more than 100 outbound links on a page signal a link farm - linking to too many diverse sites can be a liability.
  5. Validate your homepage and others - proper doctype and charset help - broken code especially at the block level will prevent bots from crawling.
  6. Pages not reachable through a normal crawl ( orphan or reachable by javascript, ajax or flash navigation) will not be crawled/indexed properly (if at all) even if present in the sitemap - make sure sitemap is correctly laid out.
  7. Affiliate links present on pages - alarm bells will go off. You need to ensure you have a lot of original and unique content to supplement the products available through the affiliate links and you should use rel="nofollow" for all such links. Forget cloaking them with any sort of redirecitons, this is sneaky. Remember affiliate links are on par with paid links.
  8. Page titles, headers on the text, proper use and distribution of keywords and keyphrases (don't keyword stuff, don't spam), anchor text, alt text for images, increased use of css - all part of efforts of internal optimization of pages. Optimze images and other media files so pages load fast.
  9. Watch your page layout and page size carefully. Putting ads (whether Adsense or any others) in the prime viewing area of a page (e.g. above the fold, above the top menu, in the left navigation area or smack bang where one expects actual content) signals a low quality site and a poor user experience. Having ads disguised as regular content and website links is a bad signal. Penalty material.
  10. Get or better attract relevant, quality incoming links - but not from link farms. Don't buy rank-passing links. Forget about blog and forum signatures, unless your post is truly appropriate and relevant to that forum or blog. Don't spam. Careful with SEO specialists you may hire, make sure you know exactly waht they do and that they don't break any guidelines. The statistics are grim.
  11. Find out the indexing situation by checking in Google.com site:example.com and site:www.example.com - investigate omitted (previously called supplemental) pages. When you see an indication of similar pages that is usually because they have the same title or descriptions tag as others already listed and that should be fixed. Or are very thin content and don't deserve to be ranked. Or are among those blocked in robots.txt. Some may benefit from having a robots "noindex" meta tag instead of being blocked in robots.txt. It depends.
  12. Have dead urls currently indexed first removed from the Google index by requesting it unless you have appropriate equivalent new urls to which you can 301 redirect them.
  13. The file robots.txt is your friend. Tweak it well, streamline it and make sure you know why you are blocking what you are blocking.
  14. Make use of Google Webmaster Tools.

Finally remember: CRAWLABILITY as the number 1 technical requirement for a site to even start to be indexed. View Matt Cutt's video where he explains a lot of concepts involved in SEO. He mentions CRAWLABILTIY around 1min and again at around 3min into the video.

301 Redirection on an Apache Server

NB: There are more comprehensive tutorials at faqhowto.info.

I will provide here the 3 kinds of 301 redirect most often needed, plus a combo. Others have done it and maybe better. But this I have pieced together and tested on my server, so I know it works.

The following directives are to be added to an .htaccess file and uploaded to the root folder (as text or ASCII file transfer).

NB: If your site is meant to work with https, then replace http:// with https:// in all urls in the examples given below.

  1. Redirecting from an old domain to a new domain called www.newdomain.com, preserving the same website internal structure, same page names, so only the domain changes. This is to go into the .htaccess file on the OLD domain:

    RewriteEngine on
    RewriteRule (.*) http://www.newdomain.com/$1 [R=301,L]

  2. Redirecting non-www url's to www urls on the same domain www.example.com:

    RewriteEngine On
    RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
    RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

    Or, to consolidate all parked domains and IP-based address or hosting userid based-address into one main domain:

    RewriteEngine On
    RewriteCond %{HTTP_HOST} !^www.example\.com$ [NC]
    RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

  3. Redirecting oldpg.html to newpg.html on the same domain www.example.com (keep all on one line):

    RedirectMatch 301 ^/oldpg.html$ http://www.example.com/newpg.html

  4. Several directives involving url rewriting can be combined. For instance this useful combo will redirect non-www to www urls and /index.html to / (website root or folder root):

    RewriteEngine on
    RewriteBase /

    ### re-direct index.html to root / ###
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
    RewriteRule ^(.*)index\.html$ /$1 [R=301,L]

    ### re-direct non-www to www
    rewritecond %{http_host} !^www.example.com [nc]
    rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]
    ####

But a word of caution - the above may interfere with certain addressing schemes. In particular if you are using Frontpage, you will need to make adjustments. Refer to this article for more information.

Ads