About 2 weeks ago, Technorati stopped indexing my blog 'zine. I have an RSS subscription to a couple of Technorati tags, and noticed my entries stopped appearing. I thought it odd, but didn't bother to inquire further, until a few days ago. In response to my email, Technorati tech support says:
Your blog has over 490 XHTML validation errors that may prevent search engines such as Technorati from fully indexing your blog's content. Please review your site's markup using the W3C Validator to ensure maximum search engine visibility.http://validator.w3.org/check?uri=http%3A%2F%2Fb12partners.net%2Fmt%2F
That sounds like a lot of errors, but after looking at what are classified as errors, the excuse seems less plausible. I suppose this is why I never became a programmer, arbitrary rules (of any sort) really irk me. For instance, one of the validation errors flagged is: 'img' tags without 'alt' attribute
Line 219, column 75: required attribute “alt” not specified
...m/images/P/630601232X.01._SCMZZZZZZZ_.jpg“
The attribute given above is required for an element that you've used, but you have omitted it. For instance, in most HTML and XHTML document types the ”type“ attribute is required on the ”script“ element and the ”alt“ attribute is required for the ”img“ element.
This is an Amazon image (for the DVD of American Pimp), inserted by my blog web 'zine creation tool, ecto. I suppose I could tweak my template to make a fake 'alt' tag for every amazon image inserted (since this information is not automatically inserted), but why should that invalidate my Technorati indexing?
Or, 'p' tag inside some enclosing element:
Line 141, column 73: document type does not allow element ”p“ here; missing one of ”object“, ”applet“, ”map“, ”iframe“, ”button“, ”ins“, ”del“ start-tag
...le=”text-align:right;font-size:10px;“ ”Technorati Tags: “a href=”http://techno
The mentioned element is not allowed to appear in the context in which you've placed it; the other mentioned elements are the only ones that are both allowed there and can contain the element mentioned. This might mean that you need a containing element, or possibly that you've forgotten to close a previous element.
One possible cause for this message is that you have attempted to put a block-level element (such as “p” or “table”) inside an inline element (such as “ a ”, “ span ”, or “ font ”).
What's the difference? Seems like a cop-out to say, as Technorati seems to be saying, “if you put a paragraph tag inside a blockquote, we will no longer index your page. So there!” I think movabletype is actually adding the 'p' tag, I know I didn't type it myself. I realize that standards are useful for setting rules to ensure that the browsers can actually render the page correctly, however, as far as I can tell, every browser currently being used can display this page 99% as intended. Technorati should be as robust as IE 5, for gods sakes. Search engines like Google, Yahoo, even Mark Cuban's Ice Rocket are able to figure out what the heck is on my web zine, even with missing 'alt' tags. A quarter of my hits are from 'Google Images', missing tags don't seem to be a big problem for Google.
Technorati is a free service, so I don't have a vested interest in getting my page properly indexed, I just used to think Technorati was a cool idea. That's a little harsh, I still like the idea of Technorati, just am disappointed in its performance.
---
update: perhaps there was just something else wrong on Technorati's side, per the comments, everything seems fine now. D would say it was a mercury-retrograde related glitch. Who knows. Mini-crisis averted.
update again 8/1/05: still doesn't seem to index my site with any regularity. Whatever. Wake me when technorati is out of beta....
Tags: Techno-babble
You make a good point. I'm going to ask our spider engineers to find out what is going on here, we really should be able to index your blog without any problems.
Dave
Thanks Dave. I appreciate that. I do realize that my movabletype template is cobbled together, and probably does contain some errors. If I could find relatively quick solutions, I'd implement changes. I just used various browsers on various platforms to check if the page rendered mostly correctly. Once it did, I stopped tweaking it.
We are indexing you fine, and your recent posts show up:
http://technorati.com/search/hipster-weepy
I'm not sure if there was a specific markup error before, but we are making sense of your blog now.
Sorry for any missed indexing earlier.
[cool. Thanks for looking into it, who knows what happened. -Seth]
Hmmm, seems not to work at all again. Even this link no longer finds anything (Hipster-weepy http://technorati.com/search/hipster-weepy). What's up with that?