Five Big Mistakes You Can Make with Big Data
- Ignore the source of your data. Don’t pay attention to the origins of your data; it will just slow you down. You might find that the data was obtained with the understanding that it would only be used for specific purposes, outside of what you are doing.
- Don’t relate your data. Just load your data out there and let the software figure it out. It can! Relationships and keys will sort themselves out. That’s what Big Data is all about—you don’t need to understand how it all relates.
- Keep it old. Data is data and you have a lot of it! Just do extracts when you can and analyze. Avoid using methods to ensure current data.
- The more you have the better. Ignore those that caution you to trim your data. The whole point of Big Data is to have a lot: get all the data you can and hoard it. You never know what you might need.
- Seize upon any pattern. This is actually made easier by practicing mistakes 1-4. All sorts of patterns will emerge and you can run with them!
I think people often think of Big Data as gold, when in fact it’s really ore that needs to be refined. You can’t (or at least you shouldn’t) ignore the source of your data, its relationships and its freshness. Also, “Big Data” can be a cover for avoiding doing the work to really understand and refine the data. But no one wants to do that; they just want something to give them the answers. Well, if you follow my suggestions above, you will get answers, most likely the wrong ones!
Big Data has a lot of promise, but for now I suggest we focus on what I would call “Little Data.” Think of Little Data as the real gold you get after you take the time to do your refining. If you avoid the mistakes I note above, you’ll strike gold. In other words, you’ll not only have data you can understand, but it will be legal, ethical and relevant data. And, most importantly, it will provide information you can actually use.
Latest posts by Mark Schettenhelm (see all)
- Historical Re-enacting with the Mainframe Green Screen - May 29, 2018
- Why the Waterfall Development Methodology Has a Very Apt Name - April 17, 2018
- Mainframe and Distributed: Uniting an IT House Divided - February 13, 2018