Specialized search

Analysis of efforts in “vertical” search and single-site search, and other forms of specialized search engine. Related subjects include:

September 20, 2009

Data marts in the world of text

CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.

Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts, in the tabular and document worlds alike. For example:

Price/performance. Your main/central data manager might be too expensive to support additional large specialized databases. Or different databases and applications might have sufficiently different profiles so as to get great price/performance from different kinds of data managers. This is particularly prevalent in the relational world, where each of column stores, sequentially-oriented row stores, and random I/O-oriented row stores have compelling use cases.
Different SLAs (Service-Level Agreements). Similarly, different applications may have very different requirements for uptime, response time, and the like. (In the relational world, think of operational data stores.)
Different security requirements. Different subsets of the data may need different levels of security. This is particularly prevalent in the document world, where security problems are not as well-solved as in the tabular arena, and where it’s common for a search engine to index across different corpuses with radically different levels of sensitivity.
Integrated application and user interfaces. In the relational world, there’s a pretty clean separation between data management and interface logic; most serious business intelligence tools can talk to most DBMS. The document world is quite different. Some search engines bundle, for example, various kinds of faceted or parameterized search interfaces. What’s more, in public-facing search, a major differentiator is the facilities that the product offers for skewing search results.
Different text applications require different thesauruses or taxonomy management systems. Ideally, those should all be integrated — but the requisite technology still doesn’t exist.

Bottom line: Text data marts, much like relational data marts, are almost surely here to stay.

Related link

The future of data marts

Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search

2 Comments

March 31, 2009

Twitter shows some directions for growth

TechCrunch pointed out a Twitter jobs page. The specific job TechCrunch mentioned* isn’t up there any more, but at the moment I write this, 18 others are (copied below). That’s considerable growth, given that the same page says Twitter has fewer than 30 current employees. Note the emphasis on search and the mention of Japan.

*Care and feeding of celebrity tweeters. Celebrity tweeting is actually a subject I’ve written and even been interviewed about several times.

As of this writing, the full list is: Read more

Categories: Microblogging, Search engines, Social software and online media, Specialized search, Twitter

1 Comment

December 29, 2008

Where “semantic” technology is or isn’t important

At Lynda Moulton’s behest, I spoke a couple of times recently on the subject of where “semantic” technology is or isn’t likely to be important. One was at the Gilbane conference in early December. The slides were based on my previously posted deck for a June talk I gave on a text analytics market overview. The actual Gilbane slides may be found here.

My opinions about the applicability of semantic technology include:

The big bucks in web search are for “transactional” web search, and semantics isn’t the issue there. (Slides 3-4)
When UIs finally go beyond the simple search box — e.g. to clusters/facets or to voice — semantics should have a role to play. (Slide 5)
Public-facing site search depends — more than any other area of text analytics — on hand-tagging. (Slide 7)
“Enterprise” search that searches specialized external databases could benefit from semantic technologies. (Slide 8)
True enterprise search could benefit from semantic technologies in multiple ways, but has other problems as well. (Slides 10-11)
Semantics — specifically extraction — is central to custom publishing. (Slide 12 — upon review I regret using the word “sophisticated”)
Semantics is central to text mining. (Slide 18)
Semantics could play a big role in all sorts of exciting future developments. (Slide 19)

So what would your list be like?

Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search

5 Comments

November 11, 2008

Lukewarm review of Yahoo mobile search

Stephen Shankland reviewed Yahoo’s mobile voice search, which works by taking voice input and returning results onscreen (in his case on his Blackberry Pearl). He found:

There are plenty of times when voice is a more convenient form of input than typing.
Voice recognition was good but far from perfect.
Editing search strings was annoyingly difficult.
Search results themselves aren’t 100% perfect.

No big surprises there. 😀

Categories: Language recognition, Search engines, Specialized search, Speech recognition, Yahoo

Leave a Comment

October 28, 2008

Google and the Author’s Guild establish an ASCAP for books

Most of the coverage of the Google/Authors Guild settlement today seems to focus on Google’s side of things. But I think the authors’ side is much more important. This deal paves the way for traditional publishers to become quaint and useless — and not a moment too soon.

Below are some quotes — fair use!! 🙂 — from the Authors Guild official statement on the deal (emphasis mine): Read more

Categories: Google, Search engines, Social software and online media, Specialized search

Leave a Comment

July 9, 2008

Google Health spoof

FutureFeedForward is on a roll:

MOUNTAIN VIEW–Information search giant Google, Inc. announced Thursday the release of Google Body, a search service aiming to index the internal and external anatomy of every living creature on the planet. …

Early testers have remarked upon a fuzzy-logic “match my organ” feature, which helps users get in touch with the nearest, most suitable donor for multiple organ systems. …

Responding to criticism from privacy groups, Google’s Hind pointed to the program’s opt-out policy. “We are very concerned about user privacy, and that’s why we will not make publicly available any information about anybody who let’s us know they do not want to participate by wearing an Opt-Out headband when in public. Google archives information about those individuals, but does not make it searchable.” The yellow and black vinyl headbands can be requested free of charge by writing to the company at its Mountain View headquarters.

Categories: Fun stuff, Google, Humor, Search engines, Specialized search

Leave a Comment

March 5, 2008

Google could dominate single-site search

Google has begun to introduce a feature whereby, if your search obviously leads you to a single site (e.g., you searched on a company name), you get a second search box to search only within that site. More details at Google and Search Engine Land. Basically, this is Google Site Search made a lot easier to use.

I think this could be a really big deal. Read more

Categories: Enterprise search, Google, Search engines, Specialized search

4 Comments

February 28, 2008

Code search options

Questions come up here from time to time about code search engines, a subject I have not researched. Well, here’s a quick link listing some leading code search engines, both Web (guess who?) and internal. Most interesting may be that the list is so short.

Categories: Search engines, Specialized search

Leave a Comment

December 2, 2007

Danny Sullivan thinks blended vertical search is a game-changer

Danny Sullivan thinks blended vertical search — which he’s calling Search 3.0 — is a game changer. (In this context, “vertical” search denotes alternate result types such as video, image, map coordinates, or product listings.) In saying that, he’s focused on search marketers, who now have a lot more ways to try to get their messages onto Google searchers’ top result pages. But I presume what he’s really saying is that there will be a feedback effect — if Google tells all web searchers about videos and product listings, then internet marketers will be more motivated to post videos and product listings, and hence there will be more interesting choices of videos and product listings — which Google will naturally wind up featuring more prominently in its search results. And so on.

Given the Youtube explosion, I find it hard to argue with his claim.

Categories: Google, Search engine optimization (SEO), Search engines, Specialized search, Structured search

Leave a Comment

April 30, 2007

Wise Crowds of Long-Tailed Ants, or something like that

Baynote sells a recommendation engine whose motto appears to be “popularity implies accuracy.” While that leads to some interesting technological ideas (below), Baynote carries that principle to an unfortunate extreme in its marketing, which is jam-packed with inaccurate buzzspeak. While most of that is focused on a few trendy meme-oriented books, the low point of my briefing today was the probably the insistence against pushback that “95%” of Google’s results depend on “PageRank.” (I think what Baynote really meant is “all off-page factors combined,” but anyhow I sure didn’t get the sense that accuracy was an important metric for them in setting their briefing strategy. And by the way, one reason I repeat the company’s name rather than referring to Baynote by a pronoun is that on-page factors DO matter in search engine rankings.)

That said, here’s the essence of Baynote’s story, as best I could figure it out. Read more

Categories: Baynote, Google, Ontologies, Search engine optimization (SEO), Search engines, Social software and online media, Software as a Service (SaaS), Specialized search

4 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Specialized search

Data marts in the world of text

Twitter shows some directions for growth

Where “semantic” technology is or isn’t important

Lukewarm review of Yahoo mobile search

Google and the Author’s Guild establish an ASCAP for books

Google Health spoof

Google could dominate single-site search

Code search options

Danny Sullivan thinks blended vertical search is a game-changer

Wise Crowds of Long-Tailed Ants, or something like that

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin