Analysis of efforts in “vertical” search and single-site search, and other forms of specialized search engine. Related subjects include:
CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.
Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts, in the tabular and document worlds alike. For example:
- Price/performance. Your main/central data manager might be too expensive to support additional large specialized databases. Or different databases and applications might have sufficiently different profiles so as to get great price/performance from different kinds of data managers. This is particularly prevalent in the relational world, where each of column stores, sequentially-oriented row stores, and random I/O-oriented row stores have compelling use cases.
- Different SLAs (Service-Level Agreements). Similarly, different applications may have very different requirements for uptime, response time, and the like. (In the relational world, think of operational data stores.)
- Different security requirements. Different subsets of the data may need different levels of security. This is particularly prevalent in the document world, where security problems are not as well-solved as in the tabular arena, and where it’s common for a search engine to index across different corpuses with radically different levels of sensitivity.
- Integrated application and user interfaces. In the relational world, there’s a pretty clean separation between data management and interface logic; most serious business intelligence tools can talk to most DBMS. The document world is quite different. Some search engines bundle, for example, various kinds of faceted or parameterized search interfaces. What’s more, in public-facing search, a major differentiator is the facilities that the product offers for skewing search results.
- Different text applications require different thesauruses or taxonomy management systems. Ideally, those should all be integrated — but the requisite technology still doesn’t exist.
Bottom line: Text data marts, much like relational data marts, are almost surely here to stay.
|Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search||2 Comments|
TechCrunch pointed out a Twitter jobs page. The specific job TechCrunch mentioned* isn’t up there any more, but at the moment I write this, 18 others are (copied below). That’s considerable growth, given that the same page says Twitter has fewer than 30 current employees. Note the emphasis on search and the mention of Japan.
As of this writing, the full list is: Read more
|Categories: Microblogging, Search engines, Social software and online media, Specialized search, Twitter||1 Comment|
At Lynda Moulton’s behest, I spoke a couple of times recently on the subject of where “semantic” technology is or isn’t likely to be important. One was at the Gilbane conference in early December. The slides were based on my previously posted deck for a June talk I gave on a text analytics market overview. The actual Gilbane slides may be found here.
My opinions about the applicability of semantic technology include:
- The big bucks in web search are for “transactional” web search, and semantics isn’t the issue there. (Slides 3-4)
- When UIs finally go beyond the simple search box — e.g. to clusters/facets or to voice — semantics should have a role to play. (Slide 5)
- Public-facing site search depends — more than any other area of text analytics — on hand-tagging. (Slide 7)
- “Enterprise” search that searches specialized external databases could benefit from semantic technologies. (Slide 8)
- True enterprise search could benefit from semantic technologies in multiple ways, but has other problems as well. (Slides 10-11)
- Semantics — specifically extraction — is central to custom publishing. (Slide 12 — upon review I regret using the word “sophisticated”)
- Semantics is central to text mining. (Slide 18)
- Semantics could play a big role in all sorts of exciting future developments. (Slide 19)
So what would your list be like?
|Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search||5 Comments|
Stephen Shankland reviewed Yahoo’s mobile voice search, which works by taking voice input and returning results onscreen (in his case on his Blackberry Pearl). He found:
- There are plenty of times when voice is a more convenient form of input than typing.
- Voice recognition was good but far from perfect.
- Editing search strings was annoyingly difficult.
- Search results themselves aren’t 100% perfect.
No big surprises there.
|Categories: Language recognition, Search engines, Specialized search, Speech recognition, Yahoo||Leave a Comment|
Most of the coverage of the Google/Authors Guild settlement today seems to focus on Google’s side of things. But I think the authors’ side is much more important. This deal paves the way for traditional publishers to become quaint and useless — and not a moment too soon.
Below are some quotes — fair use!! — from the Authors Guild official statement on the deal (emphasis mine): Read more
|Categories: Google, Search engines, Social software and online media, Specialized search||Leave a Comment|
MOUNTAIN VIEW–Information search giant Google, Inc. announced Thursday the release of Google Body, a search service aiming to index the internal and external anatomy of every living creature on the planet. …
Early testers have remarked upon a fuzzy-logic “match my organ” feature, which helps users get in touch with the nearest, most suitable donor for multiple organ systems. …
Responding to criticism from privacy groups, Google’s Hind pointed to the program’s opt-out policy. “We are very concerned about user privacy, and that’s why we will not make publicly available any information about anybody who let’s us know they do not want to participate by wearing an Opt-Out headband when in public. Google archives information about those individuals, but does not make it searchable.” The yellow and black vinyl headbands can be requested free of charge by writing to the company at its Mountain View headquarters.
Google has begun to introduce a feature whereby, if your search obviously leads you to a single site (e.g., you searched on a company name), you get a second search box to search only within that site. More details at Google and Search Engine Land. Basically, this is Google Site Search made a lot easier to use.
I think this could be a really big deal. Read more
Questions come up here from time to time about code search engines, a subject I have not researched. Well, here’s a quick link listing some leading code search engines, both Web (guess who?) and internal. Most interesting may be that the list is so short.
Danny Sullivan thinks blended vertical search — which he’s calling Search 3.0 — is a game changer. (In this context, “vertical” search denotes alternate result types such as video, image, map coordinates, or product listings.) In saying that, he’s focused on search marketers, who now have a lot more ways to try to get their messages onto Google searchers’ top result pages. But I presume what he’s really saying is that there will be a feedback effect — if Google tells all web searchers about videos and product listings, then internet marketers will be more motivated to post videos and product listings, and hence there will be more interesting choices of videos and product listings — which Google will naturally wind up featuring more prominently in its search results. And so on.
Given the Youtube explosion, I find it hard to argue with his claim.
|Categories: Google, Search engine optimization (SEO), Search engines, Specialized search, Structured search||Leave a Comment|
Baynote sells a recommendation engine whose motto appears to be “popularity implies accuracy.” While that leads to some interesting technological ideas (below), Baynote carries that principle to an unfortunate extreme in its marketing, which is jam-packed with inaccurate buzzspeak. While most of that is focused on a few trendy meme-oriented books, the low point of my briefing today was the probably the insistence against pushback that “95%” of Google’s results depend on “PageRank.” (I think what Baynote really meant is “all off-page factors combined,” but anyhow I sure didn’t get the sense that accuracy was an important metric for them in setting their briefing strategy. And by the way, one reason I repeat the company’s name rather than referring to Baynote by a pronoun is that on-page factors DO matter in search engine rankings.)
That said, here’s the essence of Baynote’s story, as best I could figure it out. Read more
|Categories: Baynote, Google, Ontologies, Search engine optimization (SEO), Search engines, Social software and online media, Software as a Service (SaaS), Specialized search||4 Comments|