Head in the Cloud

Yesterday at the Pervasive Software Metamorphosis event I spoke on a panel hosted by the esteemed Ray Wang and on this panel, which was for an audience of journalists and analysts, we talked about the cloud and big data.

Here’s my take on a range of topics we discussed:

1) Amazon Out(r)age

It happened and we should all learn from it. If you have revenue flowing apps – transactional – running over Amazon and didn’t have redundancy built in you were in serious trouble and still are. For companies that can afford it this means a backup cloud with replicated data but for others it means paying more money to Amazon in order to spread your risk across multiple regional datacenters and many availability zones.

If you don’t have a transactional application you still have to have a strategy for how to deal with an infrastructure failure but it’s potentially less fatal than those apps which flow revenue. Basically you have a little more breathing room.

This isn’t about cloud vs. no cloud and the reason why you have not read about a backlash fueled by the old guard is that everyone has skin in the cloud game now. Why would any company market against their own strategic initiatives even if it means scoring points against Amazon?

2) Database Architecture

There was a brief conversation about big data as it relates to database architecture, as in relational vs. non-relational (e.g. Hadoop). This is an important debate but it’s still the wrong one… whatever the database architecture is that you commit to, you still need database architects to design and then manage the database interactions.

It also seems like we are missing the bigger picture, which is what is post disk drive. In my mind the more interesting trend is the move to entirely in memory architectures for data management. This is important not just because of the significant performance benefits they deliver, not to mention power management and thermal load, but reliability.

On performance alone this approach more than justifies the shift because not only does it make what you already do faster but the performance creates new business application opportunities as a result of what becomes possible, such as pattern recognition on very very large datasets. I predict that within this decade we will see a wholesale movement toward in memory architectures for network storage and disk drives will predominately serve archival and backup needs.

3) Big Data

This is, in my mind, the most exciting trend of the last decade. We generate so much more data than was ever imagined and I am quite sure that in years forward we will continue to say this even in light of what we are already seeing. The web is shifting from pages to streams and these streams are incredibly expansive as each iteration adds more content and even more metadata.

Think about all the stuff that is attached to something as compact as a tweet… time/data, location, profile, bits for favorited, promoted, retweeted, searched, and much more. I don’t profess to be an expert on Twitter’s data model but it’s a great example to demonstrate how a very small piece of unstructured text can balloon up into a much bigger data object.

We are also becoming adept at dealing with media types in our online interactions and because we suffer no penalty for data storage usage we save everything. We have cameras that make sharing of 8 megapixel images and very large video files effortless. These images and videos are then propagated through the many social networks that we participate in.

Business applications are not immune to this either, for the same reasons. Storage is cheap, relatively speaking, and processing performance means an app can access and manipulate large amounts of data. Gone are the days when compactness and efficiency were required not just for neatness but the economics of applications on expensive hardware devices. Today we capture everything even if we don’t know what we will do with that data.

The good news is that cloud integration capabilities, Webhooks, and sophisticated API data pumping stations mean that all this data can be exploited for purposes other than what it was originally captured for. New applications will develop as a result of the ability to easily hook into these large public data sources.

Cloud integration of data is a necessary topic to discuss because just because you can get data doesn’t mean that you have the right to use it. Data ownership and regulatory driven privacy issues are a major concern, the lack of a unifying standard is admittedly an obstacle. Datamarts will evolve as a result of these issues, serving as a broker or syndicator of data streams.