WordType Designs
Driven To Distractions©
The Sound of One Hand Clapping©


A rchive Date
[ 17-02-2005 ]
Category
[ Information Technologies ]
sub-Categoy
[ Computers ]

      [http://www.eweek.com/article2/0,1759,1764741,00.asp

      Database Legend: How Real-Time Data Analysis Will Transform Society
      By Lisa Vaas
      February 15, 2005

      Mike Stonebraker is a database superstar. Not only is the former UC/Berkeley computer science professor the father of the popular relational databases Ingres and Postgres, he was also the founder of Illustra Information Technologies Inc., acquired by Informix, which in turn was acquired by IBM.

      The next project for this database pioneer takes shape in the form of StreamBase Systems Inc., a company that's churning out software designed to process, analyze and act on real-time data "within milliseconds of its arrival." Stonebraker is StreamBase's founder and chief technology officer.A

      StreamBase announced its Stream Processing Engine at the DEMOConference on Monday in Scottsdale, Ariz. eWEEK.com Database Editor Lisa Vaas recently got a chance to talk with Stonebraker about the issue of real-time data analysis, about how it leaves relational databases in its dust and, most importantly, how this cutting-edge technology is poised to transform our society. Financial services comes to mind, of course, but what really fires up Stonebraker are prospects like revolutionizing the care of emergency-room patients, the care of soldiers on the front lines or simply the ability to find your child when she's lost at Disney World.

      You've said that streaming data on the fly is something that ordinary relational databases can't handle. Why?
      Here's a quick, simple little problem. This was a pilot we were asked to do early on. [It was] a large, mutual funds company. They subscribe to every feed on the planet, [including feeds such as Reuters]. They have a current application that watches each feed to determine if the data is late, so they can say, "Don't trust Reuters now, the feed is screwed up."

      They defined "late" as [when the] inter-arrival time of ticks between the same stocks is greater than a certain number. You see an IBM tick, and if you don't see another IBM tick in x seconds, it's an indication of late data.

      They wanted to issue an alarm if you saw a late tick. Then they wanted to say, "If you see 100 late ticks that are coming from the feed vendor, then ring the red telephone."

      The current application is written on top of bare metal in C++. They were unhappy with the performance of the current application, and it was hard to maintain. And expensive.

      On this application, they said, "How fast can you go?" We processed about 150,000 messages per second on this, on a $1,500 PC, a commodity piece of hardware. Their current production application does about 3,000 messages per second. The best we could get out of one of the very popular relational databases was 900 messages per second.

      Elephants store data
      In round numbers, we're two orders of magnitude faster than the elephants. And the two orders of magnitude are on identical hardware. If you normalize for clock speed of our production application vs. theirs, we're one order of magnitude faster.

      What accounts for this speed gain?
      There are three big reasons: One, the elephants store the data. There's no need to store the data. One of the characteristics of real-time, streaming data, it's like IT sushi. It has high value right now, and the value decays very quickly. There's no need to keep the data around for the long term in some sort of repository. That just takes up time, latency and resources to do that.
      Reason No. 2 is when you're looking for the inter-arrival time between ticks, that's a time-series notion. When you're doing real-time stream processing, we have time-oriented primitives in the bottom of the screen. … We have extended SQL to something we call StreamSQL, which has extra stuff in it. … We've had to add another notion to SQL, the notion of time windows. You can do SQL-like calculations over time windows. Do them in real time as data is flying by. …

      [Finally,] if you want to count to 100, which is what this [application] had to do in order to decide to ring the red phone, the most efficient way to do that is with four lines of C++. In this application, it makes sense to mix small amounts of code in a general-purpose environment with database-oriented processing steps. We can do that in our architecture: freely intermix C++ with our StreamSQL primitives. The relational guys all run client/server, and C++ code has to run in the client in a separate place from the server. So the client/server architecture slows you down on this style of application.

      What types of enterprises need this type of fast analysis?
      Financial services, industrial process control, monitoring oil refineries, the government: Military and homeland security is full of this style of application. We've been talking to one of the three-letter agencies. The guys who won't give you their business cards. They're monitoring Arabic chatter. When the czar of homeland security says, "The chatter has changed," there's a real-time system processing incoming feeds, computing statistics on incoming Arabic language streams, to actually determine that. They started yakking with us on piloting that application.

      Another example: network monitoring, for DOS [denial of service] attacks. Fraud detection.

      Financial firms seek to thwart identity theft.
      Another very large financial services company is exploring piloting another application with us. They're terrified the really bad guys, who do credit card fraud and identity theft, will target financial services. This company wants to monitor their worldwide network and watch application-level events. For example, they want to watch every log-in to their systems and watch for suspicious events such as the same user logged in more than once from two IP addresses more than a mile apart.

      RFID [radio frequency ID] must pose big opportunities for this type of real-time data analysis, right?
      What's coming is a microsensor revolution. The cost of microsensors is being driven down at a vast rate. … One of my favorite applications: I have kids, I've taken them to Disneyland and Disney World. It's a stressful situation. It's a crowded place, and you don't want to lose your kids, and it's awfully easy to lose them. The paper wristband you wear will turn into an electronic tag, and that will allow parents to dock at a kiosk so you can say, "Exactly where are my kids, so I can go get them?"

      Another example: Mass General Hospital in Boston is very interested in getting hospital personnel to wear electronic tags. If there's a code blue, now, they issue a global alarm, and everybody lines up at the door of the person who has the emergency. If they knew where everybody was, they can dispatch the right person more efficiently.

      The military is very interested in tagging all soldiers and all vehicles [so they can] monitor medical vital signs in real time.
      There will be incredible social good from medical monitoring that will be possible from wireless technology downstream of cheap microprocessing technology.

      The current database vendors are all selling one-size-fits-all, with a single engine being good for everything. I think at least in streaming data it isn't true, since there's just a huge performance problem with the one-size-fits-all model. … The one-size-fits-all paradigm is getting stretched. It will be interesting to see how in unfolds in the next few years.

      Copyright © 1996-2005 Ziff Davis Publishing Holdings Inc. All Rights Reserved. eWEEK and Spencer F. Katt are trademarks of Ziff Davis Publishing Holdings, Inc. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis Media Inc. is prohibited.]


Some pages may require Adobe Acrobat Reader



Copyright and Fair Use Information: The contents of this web site is protected by international copyright laws and may not be reproduced in any form or manner whatsoever, if for the purpose of resale or solicitation of a donation. The essays included here, may be reproduced only if: 1)They are not altered in any way; 2) reproductions must be accompanied by this copyright page ; and 3) it is given freely and without charge.
Fair use: The fair use of copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified in above sections, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is fair use the factors to be considered include : (1) the purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and; (4) the effect of the use upon the potential market value of the copyrighted work.

Home | About Narrative? |Contact
Copyright © 2025. All Rights Reserved
HAG122125 (1998 -2026)