Categorized | Lead Stories

Hadoop

One of the very interesting things (to me) about cloud computing are the new engineering challenges that it presents. For instance, let’s take my favorite thought experiment: persistent communications. Persistent communications refers to the recording, storing and indexing of all verbal communications for a company. It’s easier to see how valuable it is when you consider the not-so-hidden costs of managing and training call center support teams. If you were able to capture everything that was said by the customer, and the agent, you could then analyze the types of support calls, recall solutions that weren’t properly documented, whatever.  (I realize that there’s going to be a big political pushback from this sort of activity… but it’s a thought experiment.)

Among the first obvious questions would be “Sounds like a lot of data to store” and I would say oh yeah. If you’re considering a company that would do this for others in the cloud, then I would say “oh oh yeah”.  Tons of data.

That’s when I start thinking about Hadoop. Hadoop is an open source effort, supported by Apache, that allows other applications to store, receive and process vast amounts of data.   It’s scalable, economical, efficient and reliable.  Recently, Dave Rosenberg wrote about Hadoop breaking data-sorting world records:

Hadoop is the only open-source software to ever win the GraySort
competition, adding another notch to last year’s win at the Terasort
competition, where Hadoop sorted 1 terabyte of data in 209 seconds.
That beat the previous record of 297 seconds in the terabyte sort benchmark.

Within the rules for the 2009 Gray sort, our 500 GB sort set a new
record for the minute sort and the 100 TB sort set a new record of
0.578 TB/minute. The 1 PB sort ran after the 2009 deadline, but
improves the speed to 1.03 TB/minute. The 62 second terabyte sort would
have set a new record, but the terabyte benchmark that we won last year
has been retired.

Is this a solution that could have come from closed source, proprietary vendors? Maybe, but now that Hadoop is both open and the fastest in the world, it seems like the proprietary providers have something to worry about.

One Response to “Hadoop”

Trackbacks/Pingbacks

  1. Kramer auto Pingback[...] FriendFeed Jonathan Denison Hadoop – http://thethomashowecompany.com/487... 4 minutes ago from Google [...]


Leave a Reply

Twitter

    Got an idea?

    If you have an idea on how to improve how businesses are run using communications, have a nifty product or service, Tell us about it!