Related Topics: Cloud Computing, Business Intelligence, Big Data on Ulitzer


Escalating Hype About Big Data By @Nimbix | @CloudExpo [#Cloud #BigData]

Imagine the insights we'll be able to gain about society, humanity & the universe thanks to the increasing volume of cloud data

Three Shades of Data in the Public Cloud

There is continued and escalating hype about "Big Data" and the cloud services that empower big data analytics.  But to better understand and demystify this hype, whether exploring Big Data or traditional, structured or unstructured, we need to categorize cloud data into three primary forms.

Aggregate Cloud Data

Aggregate form of cloud data - whether public, hybrid or private - is especially popular with multi-site organizations.  Data collected in different locations needs to be processed at some point and using the cloud to aggregate data in a common location is commonsensical for future processing or analytics.

Aggregate cloud data has actually been around even before we thought of cloud-computing as we know it today.  A very popular use case of this is in organizations with branch offices, such as retail and financial firms, where branch-level processes upload transactions or other information on a regular basis.  With widespread availability of cloud data services, what used to happen periodically is now more real-time. Once the data is aggregated, higher-level reports and analytics, such as comparisons between branches or regions, become possible.

Basically cloud enables a more seamless migration of data.  In the past, client/server models were popular.  Today, cloud data storage is API-driven and in many cases widely distributed across geographies.  This makes it possible for remote branches to store information with minimal latency, knowing that the cloud data services will seamlessly present a unified view of the storage from any other location.

Anonymized Cloud Data

In many cases, information may be too sensitive to store in the cloud.  On the other hand, Big Compute in the public cloud is rapidly emerging as a way to process massive amounts of information without initial, costly capital and labor investments in infrastructure.  An ideal hybrid cloud use case is to filter out sensitive information from data sets shortly after capture, then leverage the public cloud to perform the complex data analytics. For example, when healthcare organizations analyze terabytes worth of medical data to identify healthcare patterns and predict susceptibility to disease, the actual patients' personal information (names, addresses, and social security numbers, etc.) is not relevant to the analytics and can be "scraped" before pushing the anonymized set to secure cloud data storage.  From there, High Performance Data Analysis in the public cloud can expedite the analytics.

Filtering as an early step in Big Data/Analytics workflows can also greatly reduce the amount of data to process.  This speeds both data transfer and computation, reducing costs along the way.

Data Originating in the Cloud

A recent study from Nasuni estimated that there's over 1 Exabyte (more than 1 billion gigabytes) of cloud data already.  While this 2013 measurement included private sources as well, Gartner predicts that by 2016, more than half of all Enterprise data will live in thepublic cloud.  This isn't a surprise, considering the popularity of public cloud in the CRM space alone.  But it's not just customer information and sales metrics. Think of social media and how it benefits predictive analytics - online behavior patterns can be great indicators of purchasing habits, for example!  Not to mention the massive amounts of reference data that grows by the minute.

The real beauty of data originating in the public cloud is its accessibility.  All cloud data services have APIs, and in many cases, either full or filtered information is freely available (again, very common with social media).  In the not too distant future, it really won't matter where the data comes from, as long as it's accessible in the public cloud.  Analytics platforms can already process unstructured data from multiple sources simultaneously.

Just imagine the insights we'll be able to gain about society, humanity, and even the universe thanks to the ever-increasing volume of cloud data!

More Stories By Leo Reiter

Leo Reiter is CTO of Nimbix, providers of cloud-based High Performance Computing and Big Data platforms and applications to help organizations solve their most complex problems faster and easier.