Friday, October 14, 2011

Big data has arrived.   At BIDMC, I oversee 1.5 petabytes of clinical and administrative data.   At HMS, I oversee nearly 3 petabytes of research data.

As Blackberry's recent outage illustrates depending on single monolithic infrastructure has its risks and impact of failure can be enormous.

How can we leverage commodity hardware infrastructure, reduce risk, and meet user demands for mining big data?   Apache Hadoop is a cool technology worth knowing about.

Hadoop is an open source framework that allows for the distributed processing of large data sets across clusters of computers, designed to scale from a single server to thousands of machines.  Rather than rely on hardware to deliver high-availability, Hadoop detects failures and automatically finds redundant copies of data.  The Hadoop library includes

*The Hadoop Distributed File System (HDFS), which splits user data across servers in a cluster.

*MapReduce, a parallel distributed processing system that takes advantage of the distribution and replication of data in HDFS to spread execution of any job across many nodes in a cluster.

Microsoft has just introduced support for Hadoop into SQL Server 12  as part of their  end-to-end Big Data roadmap.

A fault tolerant distributed file system using commodity hardware for big data that is even integrated into mainstream data mining tools like SQL Server.  That's cool!

Related Posts:

  • The United States Health Information Knowledgebase I have long suggested that we have a single place to access standards, implementation guides, test scripts, guidelines, and code sets.The National Library of Medicine is building a national resource for vocabularies and code … Read More
  • Building Unity Farm - Preparing for Winter This week we've had our first hard freeze in Massachusetts - 22 degree temperatures last night.   How have we prepared the farm for winter?1.  All outside water supplies are off and drained.   A yard hydrant pr… Read More
  • The Election and Healthcare IT Tomorrow the Presidential election process comes to an end and the advertising will finally stop.   We'll all be relieved.   I especially look forward to a quiet dinner at home without robotic election-related calls… Read More
  • The AMIA Healthcare Information Exchange Debate Today I'm in Chicago at the American Medical Informatics Association annual meeting, joining my colleagues Mark Frisse, Bill Yasnoff and Latanya Sweeney to debate the question:"Resolved - Health information exchange organizat… Read More
  • Cool Technology of the WeekWhile in China last week, I participated in a ribbon cutting ceremony for a new private (rather than public) funded hospital. Each patient room included several high tech amenities including showers that transition from clear… Read More

0 comments:

Post a Comment

Powered by Blogger.

Popular Posts

Blog Archive