September 30th, 2007 Curt Monash
I just picked out a few of the many unreviewed sites in my DMOZ categories to evaluate, and listed most of those I reviewed.
How did I choose them to get screened? Mainly, I picked out ones with focused descriptions, titles, and so on, that just seemed likely to be listable based on that info (which is the essence of what I see on the page where all the various submitted sites are linked). I correctly guessed that I’d be able to quickly understand what I was seeing and judge whether to list the site or not, quickly write the official site description, and so on. Read the rest of this entry »
Posted in Directories, Directories and filtering, ODP and DMOZ, Search engine optimization (SEO) | 2 Comments »
August 31st, 2007 Curt Monash
Give or take a corrected typo, here’s a challenge to DMOZ bashers I just wrote in the flame war thread.
If you want to do something that is:
A. Correct
B. Credible
C. Potentially useful
just go find a specific category with terrible listings, and publicize the fact with overwhelmingly clear proof of your assessment.
If that’s not EASY for you to do … then maybe DMOZ isn’t so bad after all, eh?
In particular, I’d encourage you to post a version of the category that is clearly better than what is currently there.
Technorati Tags: DMOZ, ODP
Posted in Directories, Directories and filtering, ODP and DMOZ, Social software and media | 1 Comment »
August 31st, 2007 Curt Monash
My latest thoughts about DMOZ and the ODP may be found in this blog comment thread.
The gist is:
- DMOZ has many problems, such as categories that are at least five years out of date.
- Newly, corruptly listed sites are NOT high on the list of problems.
- In fact, the attention paid to avoiding such corruption is a terrible drain on ODP resources.
- There are a lot of liars and/or idiots bashing DMOZ in the website owner community.
- robjones is a sarcastic jerk, but he’s our sarcastic jerk.
Or something like that. As I said, it’s a flame war …
Anyhow, I’m flying off on a two-week snorkeling trip Saturday, and should be much mellower soon.
Posted in Directories, Directories and filtering, ODP and DMOZ, Search engine optimization (SEO), Social software and media | 8 Comments »
July 22nd, 2007 Curt Monash
It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.
*Factiva is the most significant exception. Hint, hint.
If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.
*I.e., part of the “T” in “ETL” (Extract/Transform/Load).
Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet.
Read the rest of this entry »
Posted in ClearForest and Reuters, Factiva and Dow Jones, Mark Logic, SAS, Search and text storage, Spam and antispam, Text Analytics Summit, Text mining, Voice of the Customer, nStein | 1 Comment »
June 6th, 2007 Curt Monash
Akismet recently upgraded so that you can see all the spam it’s holding, not just the last 150 messages. This made me a lot happier — but ironically I quickly gave up, and decided to trust Akismet without checking. Why? Well, Akismet sequesters 15 days of spam, and I currently have the following numbers of messages stashed away in it:
That’s over 800 spam per day across four blogs. And when I did check, I almost never found a false positive, except occasionally a trackback of my own.
More problematic is my e-mail. Eudora flags pretty much everything that isn’t from an established sender as spam. So along with my 300+ true spam, I get a number of false positives per day, some of which have turned into paying customer relationships. So THAT spam directory I do check carefully …
Want to continue getting great research about search, anti-spam, and other hot text technology topics? Then get a FREE subscription, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Technorati Tags: spam, antispam, Akismet, Wordpress, Eudora
Posted in Blogosphere, Spam and antispam | No Comments »
April 17th, 2007 Curt Monash
In a recent post on the Monash Report, I drew a distinction between two aspects of the Internet: Jeffersonet and Edisonet. Jeffersonet deals in thoughts and ideas and research and scholarship and news and politics, and in commerce too. It’s what makes people so passionate about the Internet’s democracy-enhancing nature. It’s what needs to be protected by extreme network neutrality. And it’s modest enough in its bandwidth requirements that net neutrality is completely workable. (Edisonet, by way of contrast, comprises advanced applications in entertainment, teleconferencing, etc. that probably do require new capital investment and tiered pricing schemes.)
And if there’s one application that’s at the core of Jeffersonet, it’s search. No matter how much scary posturing telecom CEOs do – and no matter how profitable or monopolistic Google becomes – telecom carriers must never be allowed to show any preference among search engines! At least, that’s the case for text-centric search engines such as Google, Yahoo, and Microsoft run today. The reason is simple: The democratic part of the Internet only works so long as things can be found. And search will long be a huge part of how to find them. So search engine vendors must never be able to succeed based on a combination of good-enough results plus superior marketing and business development. They always have to be kept afraid of competition from engines that provide better actual search engine results.
Read the rest of this entry »
Posted in Censorship, Google, Search and text storage, Social software and media | No Comments »
March 26th, 2007 Curt Monash
Andrew Orlowski is an over-the-top jerk, and a pretty sloppy reporter and analyst to boot. But he occasionally makes a good point even so. In the most recent instance, he confronted Tim Berners-Lee. As the article makes clear, Berners-Lee reacted badly to Orlowski, reflecting an attitude that is probably shared by 99% of the people who encounter the guy, and in the future will probably be adopted by sentient computers as well. Even so, Orlowski’s underlying point is valid: If the Semantic Web is going to be any more spam-free than the current Web, nobody has adequately explained why.
Want to continue getting great research about search engines, directories, and other hot internet topics? Then subscribe to our feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Posted in Ontologies and context identification, Spam and antispam | 2 Comments »
February 7th, 2007 Curt Monash
Joost de Valk makes an interesting suggestion, namely that Wikipedia should drop all external links other than to DMOZ, and rely on DMOZ as the outside link directory. As division of labor, it makes perfect sense. However, it’s a total non-starter until at least two problems are solved. Read the rest of this entry »
Posted in Directories, Directories and filtering, ODP and DMOZ, Ontologies and context identification, Spam and antispam | 4 Comments »
February 6th, 2007 Curt Monash
- DMOZ is dead. Fiction!
- New site submissions are being processed. Partial fact.
- Pending site submissions were lost in the outage. Partial fact.
- Other non-public ODP data was lost in the outage too. Partial fact.
- New editor applications aren’t being processed yet. Fact.
- ODP editors are corrupt. Fiction!
- The ODP is secretive and deceptive. Largely fiction.
- If a DMOZ category doesn’t have a listed editor, it’s unlikely to get much attention. Part fact, part fiction.
- ODP editors hate search engine optimization. Partial fact.
- ODP editors hate SEOs. Partial fact.
I shall explain. Read the rest of this entry »
Posted in Directories, Directories and filtering, ODP and DMOZ, Search engine optimization (SEO) | 5 Comments »
February 6th, 2007 Curt Monash
Before saying anything about the Open Directory Project or the DMOZ directory it produces, I should offer several disclaimers.
- No editor speaks for the ODP, let alone for Time Warner/AOL/Netscape.
- No single editor’s opinions or choices control any edits in DMOZ, even if s/he is the sole listed editor of a category. Any of us can be overruled on any editing decision at any time.
- I’m effectively as new as they come, or at least was at the time DMOZ editing came back online (late December). There have been no new editors since the well-publicized outage, and I had next to no involvement with the project prior to the outage.
- Notwithstanding point #2, I’m quite opinionated, which I’m sure surprises approximately nobody. And my opinions quite often are different from those of the ODP mainstream.
Read the rest of this entry »
Posted in Directories and filtering, ODP and DMOZ | 1 Comment »