<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Petabyte-scale search scalability</title>
	<atom:link href="http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/</link>
	<description>Understanding technology ... in both senses of the phrase</description>
	<pubDate>Mon, 13 Oct 2008 19:37:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: DBMS2 &#8212; DataBase Management System Services &#187; Blog Archive &#187; Teradata apparently has crossed the petabyte barrier</title>
		<link>http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-38417</link>
		<dc:creator>DBMS2 &#8212; DataBase Management System Services &#187; Blog Archive &#187; Teradata apparently has crossed the petabyte barrier</dc:creator>
		<pubDate>Fri, 25 Apr 2008 04:08:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-38417</guid>
		<description>[...] of user data in a single instance. He wouldn&#8217;t disclose any names, but I&#8217;d guess one is eBay, who he did confim is a customer. The intelligence area is another one where I&#8217;d speculate [...]</description>
		<content:encoded><![CDATA[<p>[...] of user data in a single instance. He wouldn&#8217;t disclose any names, but I&#8217;d guess one is eBay, who he did confim is a customer. The intelligence area is another one where I&#8217;d speculate [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xlrdbms</title>
		<link>http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-30494</link>
		<dc:creator>xlrdbms</dc:creator>
		<pubDate>Sat, 05 Jan 2008 05:27:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-30494</guid>
		<description>With Web 2.0 companies capturing 10-50 TB of data a day (growing exponentially), Petabytes aren't all the much after all. Its all about value density coupled with mixed workload. At a recent XLDB workshop at Stanford is has become very clear that the designs currently in the works are for 10-100PB in a SINGLE rdbms to be deployed over the next 2-3 years. The 1-2PB systems are there right now, growing by 1-2 orders of magnitude will require new concepts and potentially the marriage of multiple technologies. Its not going to be Vendor A vs Vendor B, but most likely Vendor X AND Vendor Y. Which brings up the point of consolidation in this industry. Now that the BI folks are consolidating, it will only be natural to see rdbms vendors to do the same.
There is a need for 100PB+ solutions out there, the current mindset of vendors is either too immature or too greedy to make that a reality.</description>
		<content:encoded><![CDATA[<p>With Web 2.0 companies capturing 10-50 TB of data a day (growing exponentially), Petabytes aren&#8217;t all the much after all. Its all about value density coupled with mixed workload. At a recent XLDB workshop at Stanford is has become very clear that the designs currently in the works are for 10-100PB in a SINGLE rdbms to be deployed over the next 2-3 years. The 1-2PB systems are there right now, growing by 1-2 orders of magnitude will require new concepts and potentially the marriage of multiple technologies. Its not going to be Vendor A vs Vendor B, but most likely Vendor X AND Vendor Y. Which brings up the point of consolidation in this industry. Now that the BI folks are consolidating, it will only be natural to see rdbms vendors to do the same.<br />
There is a need for 100PB+ solutions out there, the current mindset of vendors is either too immature or too greedy to make that a reality.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DBMS2 &#8212; DataBase Management System Services&#187;Blog Archive &#187; Really big databases</title>
		<link>http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-4783</link>
		<dc:creator>DBMS2 &#8212; DataBase Management System Services&#187;Blog Archive &#187; Really big databases</dc:creator>
		<pubDate>Fri, 23 Feb 2007 05:04:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-4783</guid>
		<description>[...] Business Intelligence Lowdown has a well-dugg post listing what it claims are the 10 largest databases in the world. The accuracy leaves much to be desired, as is illustrated by the fact that #10 on the list is only 20 terabytes, while entirely unmentioned is eBay&#8217;s 2-petabyte database (mentioned here, and also here). Only one phone company was listed, and no credit-raters. And for some databases listed, the size given seemed too low. E.g., for Google I&#8217;d guess that the average page size it indexes is 10K+ (vs. teh 100ish max), even with all the junk stuff in there, so it&#8217;s in the 100s of terabytes at a minimum, for raw data alone before considering indexes and so on. [...]</description>
		<content:encoded><![CDATA[<p>[...] Business Intelligence Lowdown has a well-dugg post listing what it claims are the 10 largest databases in the world. The accuracy leaves much to be desired, as is illustrated by the fact that #10 on the list is only 20 terabytes, while entirely unmentioned is eBay&#8217;s 2-petabyte database (mentioned here, and also here). Only one phone company was listed, and no credit-raters. And for some databases listed, the size given seemed too low. E.g., for Google I&#8217;d guess that the average page size it indexes is 10K+ (vs. teh 100ish max), even with all the junk stuff in there, so it&#8217;s in the 100s of terabytes at a minimum, for raw data alone before considering indexes and so on. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Text Technologies&#187;Blog Archive &#187; Introduction to FAST</title>
		<link>http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-1684</link>
		<dc:creator>Text Technologies&#187;Blog Archive &#187; Introduction to FAST</dc:creator>
		<pubDate>Thu, 03 Aug 2006 02:16:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.texttechnologies.com/2006/08/02/petabyte-scale-search-scalability/#comment-1684</guid>
		<description>[...] FAST, aka Fast Search &#38; Transfer (www.fastsearch.com) is a pretty interesting and important company. They have 3500 enterprise customers, a rapidly growing $100 million revenue run rate, and a quarter billion dollars in the bank. Their core business is of course enterprise search, where they boast great scalability, based on a Google-like grid architecture, which they fondly think is actually more efficient than Google’s. Beyond that, they’ve verticalized search, exploiting the modularity of their product line to better serve a variety of niche markets. And they’re active in elementary fact/entity extraction as well. Oh yes – they also have forms of guided navigation, taxonomy-awareness, and probably everything else one might think of as a checkmark item for a search or search-like product. [...]</description>
		<content:encoded><![CDATA[<p>[...] FAST, aka Fast Search &#38; Transfer (www.fastsearch.com) is a pretty interesting and important company. They have 3500 enterprise customers, a rapidly growing $100 million revenue run rate, and a quarter billion dollars in the bank. Their core business is of course enterprise search, where they boast great scalability, based on a Google-like grid architecture, which they fondly think is actually more efficient than Google’s. Beyond that, they’ve verticalized search, exploiting the modularity of their product line to better serve a variety of niche markets. And they’re active in elementary fact/entity extraction as well. Oh yes – they also have forms of guided navigation, taxonomy-awareness, and probably everything else one might think of as a checkmark item for a search or search-like product. [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.145 seconds -->
