<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: 19 bullet points about the difference between enterprise and web search</title>
	<atom:link href="http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/</link>
	<description>Understanding technology ... in both senses of the phrase</description>
	<pubDate>Fri, 05 Sep 2008 13:46:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: 19 bullet points about the difference between enterprise and web search &#124; Text Technologies :: Kelvin Tan - Lucene Nutch Consulting</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-42889</link>
		<dc:creator>19 bullet points about the difference between enterprise and web search &#124; Text Technologies :: Kelvin Tan - Lucene Nutch Consulting</dc:creator>
		<pubDate>Tue, 17 Jun 2008 14:47:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-42889</guid>
		<description>[...] http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/ [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/"  rel="nofollow">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: How text search has evolved over the past 15 years &#124; Text Technologies</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-42729</link>
		<dc:creator>How text search has evolved over the past 15 years &#124; Text Technologies</dc:creator>
		<pubDate>Sun, 15 Jun 2008 07:26:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-42729</guid>
		<description>[...] “Looking at this list, you can see that the conceptual changes (breakthroughs?), with the exception of better phrase handling, are primarily focused around Web searches. When dealing with one-of-a-kind document collections behind the corporate firewall, many of these developments turn out not to add much to older approaches. So, at least for enterprise search, I too remain partial to some of the older products you mention, though I am disappointed that most of the old-time vendors have not updated their approaches beyond adding taxonomy support.” [CAM] Yep, web search and enterprise search are very different things. [...]</description>
		<content:encoded><![CDATA[<p>[...] “Looking at this list, you can see that the conceptual changes (breakthroughs?), with the exception of better phrase handling, are primarily focused around Web searches. When dealing with one-of-a-kind document collections behind the corporate firewall, many of these developments turn out not to add much to older approaches. So, at least for enterprise search, I too remain partial to some of the older products you mention, though I am disappointed that most of the old-time vendors have not updated their approaches beyond adding taxonomy support.” [CAM] Yep, web search and enterprise search are very different things. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DBMS2 &#8212; DataBase Management System Services &#187; Blog Archive &#187; The 4 main approaches to datatype extensibility</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-38421</link>
		<dc:creator>DBMS2 &#8212; DataBase Management System Services &#187; Blog Archive &#187; The 4 main approaches to datatype extensibility</dc:creator>
		<pubDate>Fri, 25 Apr 2008 04:10:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-38421</guid>
		<description>[...] Text search is a huge business on the web, and a separate big business in enterprises. And text doesn&#8217;t fit well into the relational paradigm at [...]</description>
		<content:encoded><![CDATA[<p>[...] Text search is a huge business on the web, and a separate big business in enterprises. And text doesn&#8217;t fit well into the relational paradigm at [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Spectate Swamp</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-35731</link>
		<dc:creator>Spectate Swamp</dc:creator>
		<pubDate>Fri, 07 Mar 2008 23:27:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-35731</guid>
		<description>I use my Desktop Search to search source code at a Telephone billing software company. .

It is a non indexing search. The first step is to "Merge/Append" all the source code into
1 file. Then search that file. When merging the files have a start and stop header is put 
in the merged file. When a match is found the originating file name is displayed in the 
form title bar. It searches text at 20,000,000 cps. Any system worth it's salt can export
data to text. I have all my emails since 1996 in large text files. I can even use the
search to extract lists of email addresses. 

The search has evolved to randomly play mpg video and mp3 audio as well as pictures.

I have been arguing search with everybody on the net, for years now.

http://channel9.msdn.com/showuserthreads.aspx?userid=31672

http://forums.thedailywtf.com/forums/t/7593.aspx</description>
		<content:encoded><![CDATA[<p>I use my Desktop Search to search source code at a Telephone billing software company. .</p>
<p>It is a non indexing search. The first step is to &#8220;Merge/Append&#8221; all the source code into<br />
1 file. Then search that file. When merging the files have a start and stop header is put<br />
in the merged file. When a match is found the originating file name is displayed in the<br />
form title bar. It searches text at 20,000,000 cps. Any system worth it&#8217;s salt can export<br />
data to text. I have all my emails since 1996 in large text files. I can even use the<br />
search to extract lists of email addresses. </p>
<p>The search has evolved to randomly play mpg video and mp3 audio as well as pictures.</p>
<p>I have been arguing search with everybody on the net, for years now.</p>
<p><a href="http://channel9.msdn.com/showuserthreads.aspx?userid=31672" onclick="javascript:pageTracker._trackPageview('/outbound/comment/channel9.msdn.com');" rel="nofollow">http://channel9.msdn.com/showuserthreads.aspx?userid=31672</a></p>
<p><a href="http://forums.thedailywtf.com/forums/t/7593.aspx" onclick="javascript:pageTracker._trackPageview('/outbound/comment/forums.thedailywtf.com');" rel="nofollow">http://forums.thedailywtf.com/forums/t/7593.aspx</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Eddy</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31313</link>
		<dc:creator>David Eddy</dc:creator>
		<pubDate>Sat, 19 Jan 2008 03:08:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31313</guid>
		<description>Curt -

That's precisely why I'm interested in enterprise search.   

I was a Y2K inventory/impact analysis tool vendor, getting into the market in late 1994.  We had a tool that explicitly handled "odd" languages (beyond the biggies of COBOL &#38; PL/1)... things like EasyTrieve, EasyTrieve Plus (they're not related), Natural and others long forgotten.  It was a hoot chasing down folks who had these bizarre languages I'd never heard of before.  Ever heard of Extracto?  

I knew we were onto something interesting/challenging in when Capers Jones sent me his Function Points languages list in 1995.  There were 400+ software languages on the list.  By 2005 that list had expanded to 650 before being "pruned" back to a more manageable 500.

To the best of my knowledge none of the Y2K inventory/impact analysis tools have survived.  I know ours didn't (I know of a single surviving site).  

We got to 1/1/00.  The world didn't end.  The tools &#38; systems inventory knowledge went into the bit-bucket.  End of story.  

It's my belief that most "civilians" see Y2K as a giant techie hoax.  I'm sure a lot of IT departments did not cover themselves in glory in the eyes of business executives for heavily porking up IT budgets under the dodge of "we need it for Y2K."

The business value of actively maintaining a complete, accurate &#38; edge-to-edge inventory of an organization's applications portfolio (with the additional benefit of being able to trace how the pieces are interrelated) is a very hard sell.  

The high-school dropout running the local 7-11 knows how many candy bars &#38; jugs of milk he has on hand (inventory).  Why doesn't IT keep an inventory?  There was a news item last year about EDS doing an outsourcing contract for the US Navy.  They went in believing there were 5,000 systems.  EDS ultimately found 100,000+.



What is different now is that we have the delight of Google... which means people now want to have the same ease-of-use access to knowledge/answers/information behind the firewall.  The fact that serious analysts clearly emphasize that Google &#38; enterprise search are not even remotely comparable problems just falls on the floor as useless noise.

Thanks for being interested.

- David</description>
		<content:encoded><![CDATA[<p>Curt -</p>
<p>That&#8217;s precisely why I&#8217;m interested in enterprise search.   </p>
<p>I was a Y2K inventory/impact analysis tool vendor, getting into the market in late 1994.  We had a tool that explicitly handled &#8220;odd&#8221; languages (beyond the biggies of COBOL &amp; PL/1)&#8230; things like EasyTrieve, EasyTrieve Plus (they&#8217;re not related), Natural and others long forgotten.  It was a hoot chasing down folks who had these bizarre languages I&#8217;d never heard of before.  Ever heard of Extracto?  </p>
<p>I knew we were onto something interesting/challenging in when Capers Jones sent me his Function Points languages list in 1995.  There were 400+ software languages on the list.  By 2005 that list had expanded to 650 before being &#8220;pruned&#8221; back to a more manageable 500.</p>
<p>To the best of my knowledge none of the Y2K inventory/impact analysis tools have survived.  I know ours didn&#8217;t (I know of a single surviving site).  </p>
<p>We got to 1/1/00.  The world didn&#8217;t end.  The tools &amp; systems inventory knowledge went into the bit-bucket.  End of story.  </p>
<p>It&#8217;s my belief that most &#8220;civilians&#8221; see Y2K as a giant techie hoax.  I&#8217;m sure a lot of IT departments did not cover themselves in glory in the eyes of business executives for heavily porking up IT budgets under the dodge of &#8220;we need it for Y2K.&#8221;</p>
<p>The business value of actively maintaining a complete, accurate &amp; edge-to-edge inventory of an organization&#8217;s applications portfolio (with the additional benefit of being able to trace how the pieces are interrelated) is a very hard sell.  </p>
<p>The high-school dropout running the local 7-11 knows how many candy bars &amp; jugs of milk he has on hand (inventory).  Why doesn&#8217;t IT keep an inventory?  There was a news item last year about EDS doing an outsourcing contract for the US Navy.  They went in believing there were 5,000 systems.  EDS ultimately found 100,000+.</p>
<p>What is different now is that we have the delight of Google&#8230; which means people now want to have the same ease-of-use access to knowledge/answers/information behind the firewall.  The fact that serious analysts clearly emphasize that Google &amp; enterprise search are not even remotely comparable problems just falls on the floor as useless noise.</p>
<p>Thanks for being interested.</p>
<p>- David</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31303</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Sat, 19 Jan 2008 00:54:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31303</guid>
		<description>David,

Why don't you take a look at the tools that purported to automate the finding (if not fixing) of Y2K 2-character data fields?  That was, er, 8+ years ago, so they've had a lot of time to evolve since then.

CAM</description>
		<content:encoded><![CDATA[<p>David,</p>
<p>Why don&#8217;t you take a look at the tools that purported to automate the finding (if not fixing) of Y2K 2-character data fields?  That was, er, 8+ years ago, so they&#8217;ve had a lot of time to evolve since then.</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Eddy</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31250</link>
		<dc:creator>David Eddy</dc:creator>
		<pubDate>Fri, 18 Jan 2008 04:51:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31250</guid>
		<description>Curt -

&#62;
&#62; a specialized tool is needed. It’s not realistic to ask one search product to find EVERYTHING.
&#62;

Again, agreed.  

The ultimate enterprise search tool is obviously going to need a passel of highly specialized tools under the covers.  Making it look as easy &#38; slick as Google is going to be interesting.  A major challenge, of course, is that most enterprises have only the foggiest idea of their applications inventory.

First, source code has to be considered a valuable &#38; important BUSINESS search resource before we start thinking about what sort of exotic tools are needed.

It is my argument (clearly a voice of one) that the corporate knowledge buried in source code needs to be recognized as worthwhile to mine... rather than to leave it walking around in the heads of soon-to-retire experts.  Currently, changing systems is slow, expensive, manual work, far too often highly dependent upon domain experts.  It is my belief that through enterprise search could be a significant help in whittling away at the "80% of my IT budget goes to legacy systems" problem.


Obviously (after a lot of non-obvious rat holes) you have to approach enterprise search with a "white list" approach... first pass you identify what it is you're trying to read (e.g. PowerPoint, MSWord, COBOL, dBase, etc.), second pass you process it with the appropriate reader.  If you can't identify what it is, then don't try to read more than a few lines.  Probably best not to rely on extensions (.exe, .doc) as gospel as to what the document truly is. 



I'm not aware of any application development tools that bring semantic understanding to the table.  But then maybe I'm quibbling over the definition of "semantics."  

Development tools (Xcode/ObjectiveC being my most current knowledge) that I'm familiar with are equally happy with:

a = b * c        or

weeklyPay = hoursWorked * payRate.


If an Eclipse plug-in has brought something more robust to the table, please point me in the right direction.

- David</description>
		<content:encoded><![CDATA[<p>Curt -</p>
<p>&gt;<br />
&gt; a specialized tool is needed. It’s not realistic to ask one search product to find EVERYTHING.<br />
&gt;</p>
<p>Again, agreed.  </p>
<p>The ultimate enterprise search tool is obviously going to need a passel of highly specialized tools under the covers.  Making it look as easy &amp; slick as Google is going to be interesting.  A major challenge, of course, is that most enterprises have only the foggiest idea of their applications inventory.</p>
<p>First, source code has to be considered a valuable &amp; important BUSINESS search resource before we start thinking about what sort of exotic tools are needed.</p>
<p>It is my argument (clearly a voice of one) that the corporate knowledge buried in source code needs to be recognized as worthwhile to mine&#8230; rather than to leave it walking around in the heads of soon-to-retire experts.  Currently, changing systems is slow, expensive, manual work, far too often highly dependent upon domain experts.  It is my belief that through enterprise search could be a significant help in whittling away at the &#8220;80% of my IT budget goes to legacy systems&#8221; problem.</p>
<p>Obviously (after a lot of non-obvious rat holes) you have to approach enterprise search with a &#8220;white list&#8221; approach&#8230; first pass you identify what it is you&#8217;re trying to read (e.g. PowerPoint, MSWord, COBOL, dBase, etc.), second pass you process it with the appropriate reader.  If you can&#8217;t identify what it is, then don&#8217;t try to read more than a few lines.  Probably best not to rely on extensions (.exe, .doc) as gospel as to what the document truly is. </p>
<p>I&#8217;m not aware of any application development tools that bring semantic understanding to the table.  But then maybe I&#8217;m quibbling over the definition of &#8220;semantics.&#8221;  </p>
<p>Development tools (Xcode/ObjectiveC being my most current knowledge) that I&#8217;m familiar with are equally happy with:</p>
<p>a = b * c        or</p>
<p>weeklyPay = hoursWorked * payRate.</p>
<p>If an Eclipse plug-in has brought something more robust to the table, please point me in the right direction.</p>
<p>- David</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Curt Monash</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31222</link>
		<dc:creator>Curt Monash</dc:creator>
		<pubDate>Thu, 17 Jan 2008 20:23:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31222</guid>
		<description>What I'm saying, David, is that a specialized tool is needed.   It's not realistic to ask one search product to find EVERYTHING.

General search products don't even work well across the full range I think they should cover.  And the specific problem you're referring to falls outside that range.

Configuration management and app dev tools have ever more understanding of software's syntax and semantics.  I'd start from them as a base, rather than from traditional inverted-file text-string indexing.

CAM</description>
		<content:encoded><![CDATA[<p>What I&#8217;m saying, David, is that a specialized tool is needed.   It&#8217;s not realistic to ask one search product to find EVERYTHING.</p>
<p>General search products don&#8217;t even work well across the full range I think they should cover.  And the specific problem you&#8217;re referring to falls outside that range.</p>
<p>Configuration management and app dev tools have ever more understanding of software&#8217;s syntax and semantics.  I&#8217;d start from them as a base, rather than from traditional inverted-file text-string indexing.</p>
<p>CAM</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Eddy</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31218</link>
		<dc:creator>David Eddy</dc:creator>
		<pubDate>Thu, 17 Jan 2008 19:59:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31218</guid>
		<description>Curt -

&#62;
&#62; There’s very little overlap between how that’s done in human languages and how it’s done in computer languages.
&#62;

Acknowledged.

Obviously there's been a huge amount of work done with human languages—proximity, stemming, probabilities, etc.—that simply will not work when applied to software languages.

My point is that the rules of the business are deeply buried and highly fragmented in extremely difficult to comprehend software, not in scores of well written document formats (MSWord, email, PowerPoint, etc.) intended for human consumption.

Enterprise search is going about the problem by looking for the lost keys under the street light ("Because that's where the light is... but I lost the keys over by the car which is in the dark.") simply because it's easy.

My primary beef here is that software (source code) is simply not considered to be a document... and therefore is not worthy of being brought to the search table.

How are people going to know what's really happening "behind the firewall" if source code is not included in the searching process?  

When the CEO issues the command "I want social security number either encrypted or removed from use where not necessary." you're going to rely on what your easily findable word processing format documents tell you?  Surely you're not telling me that?

- David</description>
		<content:encoded><![CDATA[<p>Curt -</p>
<p>&gt;<br />
&gt; There’s very little overlap between how that’s done in human languages and how it’s done in computer languages.<br />
&gt;</p>
<p>Acknowledged.</p>
<p>Obviously there&#8217;s been a huge amount of work done with human languages—proximity, stemming, probabilities, etc.—that simply will not work when applied to software languages.</p>
<p>My point is that the rules of the business are deeply buried and highly fragmented in extremely difficult to comprehend software, not in scores of well written document formats (MSWord, email, PowerPoint, etc.) intended for human consumption.</p>
<p>Enterprise search is going about the problem by looking for the lost keys under the street light (&#8221;Because that&#8217;s where the light is&#8230; but I lost the keys over by the car which is in the dark.&#8221;) simply because it&#8217;s easy.</p>
<p>My primary beef here is that software (source code) is simply not considered to be a document&#8230; and therefore is not worthy of being brought to the search table.</p>
<p>How are people going to know what&#8217;s really happening &#8220;behind the firewall&#8221; if source code is not included in the searching process?  </p>
<p>When the CEO issues the command &#8220;I want social security number either encrypted or removed from use where not necessary.&#8221; you&#8217;re going to rely on what your easily findable word processing format documents tell you?  Surely you&#8217;re not telling me that?</p>
<p>- David</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Text Technologies&#187;Blog Archive &#187; Lynda Moulton on enterprise search</title>
		<link>http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31214</link>
		<dc:creator>Text Technologies&#187;Blog Archive &#187; Lynda Moulton on enterprise search</dc:creator>
		<pubDate>Thu, 17 Jan 2008 18:11:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2008/01/14/enterprise-search-versus-web-search/#comment-31214</guid>
		<description>[...] search quite similarly, as I discovered when she called me yesterday to praise my post on the many differences between enterprise and web search, and followed up with this one of her own. One of Lynda&#8217;s big themes is that large [...]</description>
		<content:encoded><![CDATA[<p>[...] search quite similarly, as I discovered when she called me yesterday to praise my post on the many differences between enterprise and web search, and followed up with this one of her own. One of Lynda&#8217;s big themes is that large [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.178 seconds -->
