Comments on: 19 bullet points about the difference between enterprise and web search

By: Nothing Like a Move by Microsoft to Stir up Analysis and Expectations

Nothing Like a Move by Microsoft to Stir up Analysis and Expectations — Thu, 21 Nov 2019 22:10:56 +0000

[…] include links to the pieces that influenced the following comments. But one by Curt Monash in his piece on January 14 summarized the state of this industry and its already long history. It is noteworthy […]

By: Wally

Wally — Fri, 26 Sep 2008 17:19:57 +0000

You guys are geeks.

By: 19 bullet points about the difference between enterprise and web search | Text Technologies :: Kelvin Tan - Lucene Nutch Consulting

Tue, 17 Jun 2008 14:47:09 +0000

[…] http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/ […]

By: How text search has evolved over the past 15 years | Text Technologies

How text search has evolved over the past 15 years | Text Technologies — Sun, 15 Jun 2008 07:26:54 +0000

[…] “Looking at this list, you can see that the conceptual changes (breakthroughs?), with the exception of better phrase handling, are primarily focused around Web searches. When dealing with one-of-a-kind document collections behind the corporate firewall, many of these developments turn out not to add much to older approaches. So, at least for enterprise search, I too remain partial to some of the older products you mention, though I am disappointed that most of the old-time vendors have not updated their approaches beyond adding taxonomy support.” [CAM] Yep, web search and enterprise search are very different things. […]

By: DBMS2 — DataBase Management System Services » Blog Archive » The 4 main approaches to datatype extensibility

Fri, 25 Apr 2008 04:10:38 +0000

[…] Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at […]

By: Spectate Swamp

Spectate Swamp — Fri, 07 Mar 2008 23:27:34 +0000

I use my Desktop Search to search source code at a Telephone billing software company. .

It is a non indexing search. The first step is to “Merge/Append” all the source code into
1 file. Then search that file. When merging the files have a start and stop header is put
in the merged file. When a match is found the originating file name is displayed in the
form title bar. It searches text at 20,000,000 cps. Any system worth it’s salt can export
data to text. I have all my emails since 1996 in large text files. I can even use the
search to extract lists of email addresses.

The search has evolved to randomly play mpg video and mp3 audio as well as pictures.

I have been arguing search with everybody on the net, for years now.

http://channel9.msdn.com/showuserthreads.aspx?userid=31672

http://forums.thedailywtf.com/forums/t/7593.aspx

By: David Eddy

David Eddy — Sat, 19 Jan 2008 03:08:33 +0000

Curt –

That’s precisely why I’m interested in enterprise search.

I was a Y2K inventory/impact analysis tool vendor, getting into the market in late 1994. We had a tool that explicitly handled “odd” languages (beyond the biggies of COBOL & PL/1)… things like EasyTrieve, EasyTrieve Plus (they’re not related), Natural and others long forgotten. It was a hoot chasing down folks who had these bizarre languages I’d never heard of before. Ever heard of Extracto?

I knew we were onto something interesting/challenging in when Capers Jones sent me his Function Points languages list in 1995. There were 400+ software languages on the list. By 2005 that list had expanded to 650 before being “pruned” back to a more manageable 500.

To the best of my knowledge none of the Y2K inventory/impact analysis tools have survived. I know ours didn’t (I know of a single surviving site).

We got to 1/1/00. The world didn’t end. The tools & systems inventory knowledge went into the bit-bucket. End of story.

It’s my belief that most “civilians” see Y2K as a giant techie hoax. I’m sure a lot of IT departments did not cover themselves in glory in the eyes of business executives for heavily porking up IT budgets under the dodge of “we need it for Y2K.”

The business value of actively maintaining a complete, accurate & edge-to-edge inventory of an organization’s applications portfolio (with the additional benefit of being able to trace how the pieces are interrelated) is a very hard sell.

The high-school dropout running the local 7-11 knows how many candy bars & jugs of milk he has on hand (inventory). Why doesn’t IT keep an inventory? There was a news item last year about EDS doing an outsourcing contract for the US Navy. They went in believing there were 5,000 systems. EDS ultimately found 100,000+.

What is different now is that we have the delight of Google… which means people now want to have the same ease-of-use access to knowledge/answers/information behind the firewall. The fact that serious analysts clearly emphasize that Google & enterprise search are not even remotely comparable problems just falls on the floor as useless noise.

Thanks for being interested.

– David

By: Curt Monash

Curt Monash — Sat, 19 Jan 2008 00:54:16 +0000

David,

Why don’t you take a look at the tools that purported to automate the finding (if not fixing) of Y2K 2-character data fields? That was, er, 8+ years ago, so they’ve had a lot of time to evolve since then.

CAM

By: David Eddy

David Eddy — Fri, 18 Jan 2008 04:51:53 +0000

Curt –

>
> a specialized tool is needed. It’s not realistic to ask one search product to find EVERYTHING.
>

Again, agreed.

The ultimate enterprise search tool is obviously going to need a passel of highly specialized tools under the covers. Making it look as easy & slick as Google is going to be interesting. A major challenge, of course, is that most enterprises have only the foggiest idea of their applications inventory.

First, source code has to be considered a valuable & important BUSINESS search resource before we start thinking about what sort of exotic tools are needed.

It is my argument (clearly a voice of one) that the corporate knowledge buried in source code needs to be recognized as worthwhile to mine… rather than to leave it walking around in the heads of soon-to-retire experts. Currently, changing systems is slow, expensive, manual work, far too often highly dependent upon domain experts. It is my belief that through enterprise search could be a significant help in whittling away at the “80% of my IT budget goes to legacy systems” problem.

Obviously (after a lot of non-obvious rat holes) you have to approach enterprise search with a “white list” approach… first pass you identify what it is you’re trying to read (e.g. PowerPoint, MSWord, COBOL, dBase, etc.), second pass you process it with the appropriate reader. If you can’t identify what it is, then don’t try to read more than a few lines. Probably best not to rely on extensions (.exe, .doc) as gospel as to what the document truly is.

I’m not aware of any application development tools that bring semantic understanding to the table. But then maybe I’m quibbling over the definition of “semantics.”

Development tools (Xcode/ObjectiveC being my most current knowledge) that I’m familiar with are equally happy with:

a = b * c or

weeklyPay = hoursWorked * payRate.

If an Eclipse plug-in has brought something more robust to the table, please point me in the right direction.

– David

By: Curt Monash

Curt Monash — Thu, 17 Jan 2008 20:23:59 +0000

What I’m saying, David, is that a specialized tool is needed. It’s not realistic to ask one search product to find EVERYTHING.

General search products don’t even work well across the full range I think they should cover. And the specific problem you’re referring to falls outside that range.

Configuration management and app dev tools have ever more understanding of software’s syntax and semantics. I’d start from them as a base, rather than from traditional inverted-file text-string indexing.

CAM