March 5th, 2008 Curt Monash
Google has begun to introduce a feature whereby, if your search obviously leads you to a single site (e.g., you searched on a company name), you get a second search box to search only within that site. More details at Google and Search Engine Land. Basically, this is Google Site Search made a lot easier to use.
I think this could be a really big deal. Read the rest of this entry »
Posted in Enterprise search, Google, Search and text storage, Specialized search engines | 4 Comments »
February 28th, 2008 Curt Monash
Questions come up here from time to time about code search engines, a subject I have not researched. Well, here’s a quick link listing some leading code search engines, both Web (guess who?) and internal. Most interesting may be that the list is so short.
Posted in Search and text storage, Specialized search engines | No Comments »
December 2nd, 2007 Curt Monash
Danny Sullivan thinks blended vertical search — which he’s calling Search 3.0 — is a game changer. (In this context, “vertical” search denotes alternate result types such as video, image, map coordinates, or product listings.) In saying that, he’s focused on search marketers, who now have a lot more ways to try to get their messages onto Google searchers’ top result pages. But I presume what he’s really saying is that there will be a feedback effect — if Google tells all web searchers about videos and product listings, then internet marketers will be more motivated to post videos and product listings, and hence there will be more interesting choices of videos and product listings — which Google will naturally wind up featuring more prominently in its search results. And so on.
Given the Youtube explosion, I find it hard to argue with his claim.
Stay informed! No hassle, no spam — all it takes is an email address or an RSS subscription! Get all our research, or just the text analytics part, or even just a very few notifications of our most important news.
Posted in Google, Search and text storage, Search engine optimization (SEO), Specialized search engines, Structured search | No Comments »
April 30th, 2007 Curt Monash
Baynote sells a recommendation engine whose motto appears to be “popularity implies accuracy.” While that leads to some interesting technological ideas (below), Baynote carries that principle to an unfortunate extreme in its marketing, which is jam-packed with inaccurate buzzspeak. While most of that is focused on a few trendy meme-oriented books, the low point of my briefing today was the probably the insistence against pushback that “95%” of Google’s results depend on “PageRank.” (I think what Baynote really meant is “all off-page factors combined,” but anyhow I sure didn’t get the sense that accuracy was an important metric for them in setting their briefing strategy. And by the way, one reason I repeat the company’s name rather than referring to Baynote by a pronoun is that on-page factors DO matter in search engine rankings.)
That said, here’s the essence of Baynote’s story, as best I could figure it out.
Read the rest of this entry »
Posted in Baynote, Google, Ontologies and context identification, Search and text storage, Search engine optimization (SEO), Social software and media, Specialized search engines | 3 Comments »
February 7th, 2007 Curt Monash
I just did some Technorati searches, and my blog posts come up near the top of the search results for a bunch of small companies’ names and similar words — Attensity, ClearForest, Netezza, DATAllegro, Crossbeam, DMOZ, ODP, and surely many others.
But judging by my referrer logs, nobody cares. I get lots of visitors via classic search engines — largely Google, but also the others — but bubkus from Technorati.
Technorati Tags: Technorati
Posted in Search and text storage, Specialized search engines | 4 Comments »
January 31st, 2007 Curt Monash
According to Steven Arnold, FirstGov – which has been renamed USASearch.gov — is by far the most effective US government-specific search engine. But there’s something odd about it; whatever the query, it’s determined to give no more than a little over 100 results. Queries for which I’ve noted results in this quantity range include Bush (and this covers all family members), Cheney (ditto), Kennedy (ditto), Condaleeza, Scalia, Coolidge, Red Sox, big dig, Burlingame, Redmond, Pluto, ethanol, spotted owl, and topology. The only ones I’ve found so far coming out above that results range – perhaps inevitably
— are death (137) and taxes (177).
Read the rest of this entry »
Posted in Convera, Search and text storage, Specialized search engines | No Comments »
January 23rd, 2007 Curt Monash
Popular on Digg, for obvious reasons, is a post showing that Google is better for searching Digg than Digg’s own search engine. No shock there. If I want to search Wikipedia for information on astrowidgets, I’ll just google on the phrase wikipedia astrowidgets. That works much better than Wikipedia’s own search.
Speaking of which — if you want to search for my writing, I’m using Google web search technology too. It works like a charm.
Posted in Google, Search and text storage, Specialized search engines | 2 Comments »
October 22nd, 2006 Curt Monash
OK. I have a vision of one way search could evolve, which I think deserves consideration on at least a “concept-car” basis. This is all speculative; I haven’t discussed it at length with the vendors who’d need to make it happen, nor checked the technical assumptions carefully myself. So I could well be wrong. Indeed, I’ve at least half-changed my mind multiple times this weekend, just in the drafting of this post. Oh yeah, I’m also mixing several subjects together here too. All-in-all, this is not my crispest post …
Anyhow, the core idea is that large enterprises spider and index a subset of the Web, and use that for most of their employees’ web search needs. Key benefits would include:
- Filtering out spam hits. This is obviously important for search, and in some cases could help with public-web text mining as well. It should be OK to be more aggressive on spam-site filtering in an enterprise-specific index than it is in general web search.
- Filtering out malicious/undesirable downloads of various sorts. I’m thinking mainly of malware/spyware here, but of course it can also be used for netnannying porn-prevention and the like as well. Again, this is more easily done for the enterprise market than for the search world at large. (I anyway think that Google could blow Websense out of the water any time they wanted to – except, of course, for the not-so-small matter of not being seen as participating in the censorship business — but that’s a separate discussion.)
- Capturing employees’ search strings. This could be useful for various purposes, including discerning their interests, and building the corporate ontology for internal web search.
- Freshness control. If there’s a site you really care about, you can make sure it’s re-indexed frequently.
Read the rest of this entry »
Posted in Convera, Directories and filtering, Enterprise search, FAST, Google, IBM and UIMA, Search and text storage, Spam and antispam, Specialized search engines, Text mining, Web site filtering | 1 Comment »
October 3rd, 2006 Curt Monash
Last July I wrote about Google’s text-based project management system. Dave Kellogg of Mark Logic offers links to discussion of a related Google project, and adds news of his own — Mark Logic built a text-based bug tracking system in its own MarkLogic technology.
Posted in Enterprise search, Google, Mark Logic, Search and text storage, Specialized search engines | No Comments »
August 26th, 2006 Curt Monash
I talked again with Mark Logic, makers of MarkLogic Server, and they continue to have an interesting story. Basically, their technology is better search/retrieval through XML. The retrieval part is where their major differentiation lies. Accordingly, their initial market focus (they’re up to 46 customers now, including lots of big names) is on custom publishing. And by the way, they’re a good partner for fact-extraction companies, at least in the case of ClearForest.
Here, as best I understand, is the story of the custom publishing business.
Read the rest of this entry »
Posted in ClearForest and Reuters, Mark Logic, Search and text storage, Specialized search engines | 2 Comments »