3 Big Data Lessons from a “Big 3” Company – Ford


In a recent interview from Gigaom Structure Show podcast, Ford Data Science Leader, Michael Cavaretta discussed lessons learned in harnessing the power of big data, (large complex data sets requiring advanced processing), and how to best put it to use. “Big data” at Ford can mean anything from data coming off the car and truck production lines to customer feedback at dealer locations and customer service centers. In this interview, he talked … [Read more...]

Only one valid big data prediction for 2014: It’s a big opportunity for CIOs


This is the time of the year when forecasters tap their crystal balls to tell us which technologies will shine and which will fade. Given that big data became so prominent in our lives in 2013, it’s no wonder it is playing a big role in these predictions. But did anyone predict last year that big data (and metadata, and Hadoop and so on) will be so frequently discussed in mainstream, mass-audience publications in 2013? And that this sudden … [Read more...]

Top Performance Problems discussed at the Hadoop and Cassandra Summits


In the last couple of weeks my colleagues and I attended the Hadoop and Cassandra Summits in the San Francisco Bay Area. It was rewarding to talk to so many experienced Big Data technologists in such a short time frame – thanks to DataStax and Hortonworks for hosting these great events! It was also great to see that performance is becoming an important topic in the community at large. We got a lot of feedback on typical Big Data performance … [Read more...]

Speeding up a Pig+HBase MapReduce job by a factor of 15


The other day I ran a Pig script. Nothing fancy; I loaded some data into HBase and then ran a second Pig job to do some aggregations. I knew the data loading would take some time as it was multiple GB of data, but I expected the second aggregation job to run much faster. It ran for over 15 hours and was not done at that time. This was too long in my mind and I terminated it. I was using Amazon Elastic Map Reduce as my Hadoop environment, so I … [Read more...]

Lessons learned from real world BigData implementations

In the last weeks I visited several Cloud and Big Data conferences. Especially the Big Data Innovation in Boston gained me a lot of insight. Some people only consider the technology side of BigData technologies like Hadoop or Cassandra. The real driver however is a different one. Business analysts discover Big Data technologies as the means to leverage tons of existing data and ask questions about customer behavior and all sorts relationships to … [Read more...]