Big Data – Doing the Research

Nope, I’m not an expert here but I keep seeing this phrase “big data” thrown around (very much like the term “cloud”) and these terms are often left open to interpretation. My advice is to spend a few minutes and research these (especially if you’re a decision maker listening to market hype).
The most succinct definition of “big data” IMHO is simply this: large datasets that require substantial resources to conduct ETL/simulation/analysis in a timely fashion. That’s it.

What I think people (including myself) can get hung up on are the ancillary associations with big data such as business intelligents (BI), storage architecture/availability, and parallelism among others. All of these will factor into purchase decisions. However, ancillary associations should not be grouped together or mixed in with the basic definition of “big data”.

For example, the decision to purchase a BI solution depends on how much pre-processed information (structure) you end up with at the end of your ETL process or simulation. Lower end BI solutions may be perfectly fine in this case. Disk performance versus cost is not part of the “big data” definition as such but remains a substantial factor in evolving a workable solution (ex: comparing IO costs between solid state disks versus magnetic platters). This next link is an interesting presentation related to the pharmaceutical industry on the choice of database software. Highly recommended even if you may not agree with everything said.

 

link: www.youtube.com/watch?v=QQdbTpvjITM

Another way I see “big data” is simply how much are you willing to spend to reach your goal. It’s old school thinking but fits nicely here. The difference now is that allied technologies that handle “big data” have only recently evolved into more cost friendly solutions.

Also see http://www.greenplum.com/products/greenplum-database

Still confused, this is a fun (and basic) video that talks about what big data is about:

Big data purchase decisions may include solutions for processing massive amounts of “social” data, examples

Greenplum: http://gigaom.com/cloud/emc-greenplum-puts-a-social-spin-on-big-data/

Storm: http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html

Background Information: http://bigdataintegration.blogspot.com/2012/01/how-would-your-enterprises-social-graph.html

That’s it…Cheers!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s