|
Only Validation + Navigation = Crawlability |
| Browse Archives at groups.google.com RSS Feed |
Troubleshooting Search Engine Problems
To try to combat the perennial problems webmasters of all ilk face with search engines, especially Google, here's a 13-point website tune-up plan, that may help :
- Get server response headers for homepage - it should be a server response code 200; for a missing page it should be a 404.
Resolve any canonical domain issues before they become a headache (www vs non www - make a decision) by using a 301 redirection of one to the other one to be retained.
- Run a spider simulator for the homepage and others - if not enough text is found, then there's no content to index.
- Run Xenu Link Sleuth starting at homepage and check broken links and number of true links found - unless it never completes the crawl which will mean a problem. Compare list of good links found with your sitemap.
- Run a web page spam detector - more than 100 outbound links on a page signal a link farm - linking to too many diverse sites can be a liability.
- Validate your homepage and others - proper doctype and charset help - broken code especially at the block level will prevent bots from crawling.
- Pages not reachable through a normal crawl (no javascript, no flash) will not be crawled/indexed even if present in the sitemap - make sure sitemap is correctly laid out.
- Affiliate links present on pages - alarm bells will go off.
- Page titles, headers on the text, proper use and distribution of keywords and keyphrases, anchor text, alt text for images, increased use of css - all part of efforts of internal optimization of pages.
- Get relevant, quality incoming links - but not from link farms.
- Find out the situation by checking in Google.com site:example.com and site:www.example.com - investigate supplemental pages. When you see an indication of similar pages that is usually because they have the same title or descriptions tag as others already listed and that should be fixed.
- Find out the full extent of indexing in the major datacenters - you can spotcheck individual datacenters from the list provided.
- Have dead urls currently indexed first removed from the Google index by requesting it and then, if needed (e.g. if you get reports of error 404 for them) have 301 redirections for them to equivalent new pages on the site.
- The file robots.txt is your friend.
Finally remember: CRAWLABILITY as the number 1 technical requirement for a site to even start to be indexed. View Matt Cutt's video where he explains a lot of concepts involved in SEO. He mentions CRAWLABILTIY around 1min and again at around 3min into the video.
301 Redirection on an Apache Server
I will provide here the 3 kinds of 301 redirect most often needed, plus a combo. Others have done it and maybe better. But this I have pieced together and tested on my server, so I know it works. These are the 3 kinds of 301 redirections needed most often.
The following directives are to be added to an .htaccess file and uploaded to the root folder (as text or ASCII file transfer).
-
Redirecting from an old domain to a new domain called www.yournewdomain.com, preserving the same website internal structure, same page names, so only the domain changes. This is to go into the .htaccess file on the OLD domain:
Options +FollowSymLinks
RewriteEngine on
RewriteRule (.*) http://www.yournewdomain.com/$1 [R=301,L]
-
Redirecting non-www url's to www urls on the same domain www.example.com:
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
-
Redirecting old-page.html to new-page.html on the same domain www.example.com (keep all on one line):
RedirectMatch 301 ^/old-page.html$ http://www.example.com/new-page.html
-
Several directives involving url rewriting can be combined. For instance this useful combo will redirect non-www to www urls and /index.html to / (website root or folder root):
<Ifmodule mod_rewrite.c>
Options +Indexes +FollowSymlinks
RewriteEngine on
RewriteBase /
### re-direct index.html to root / ###
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ /$1 [R=301,L]
### re-direct non-www to www
rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]
####
</Ifmodule>
But a word of caution - the above may interfere with certain addressing schemes. Personally I could not use it as is on this site for instance. I discovered it interfered badly with my Cpanel access to some functions and it took me ages to figure out what was wrong. In the end it's a choice to make.

