Making Data work – a dive into Strata Conference and Big Data

This post gives an overview of the Strata Conference 2013 in London and hopefully a short  introduction to Big Data in general. In short: You need data to succeed and you need to make your data work for you.

What is Big Data?

You can tell big data is BIG right now filling a conference like Strataconf with interesting keynotes and sessions (and speakers and participants). But what’s with all the buzz, is Big Data like teenage sex?

In a way, Doug Cuttings opening keynote at Strataconf gives part of the answer when he states that  “Hadoop dominates Big Data”. Doug may be a bit biased being the founder of Hadoop but everybody seems to be using it. Since a couple of years ago you don’t need a super computer for analysis of huge amounts of data. Apache Hadoop plus a bunch of servers gives you the poor mans version of HPC and storage. Plus an advantage, this setup scales well so in order to attack even larger sets of data, just add hardware nodes (or extend your cloud setup). Hadoop is an open source framework for storage and processing of large scale data sets. Apache Hadoop is actually a number of modules, all open source. The setup is also available as productified distributions, e.g. Cloudera and MAPR, with own versions of one or several Hadoop modules.

This leads us to one part of the Big Data buzz; the ability to handle large amounts of information. Also, the information is often unstructured, at least to start with. The information may also come from different sources; applications, log files, devices or even sensors. The information tends to stay unstructured for longer periods of time, since we don’t know all the use cases. Finally, the field of Big Data is characterized with an opportunistic or exploratory approach.

The conference

One great thing about Strataconf was the blend of different problem areas and domains. The sessions and keynotes  spanned across areas like journalism, healthcare, Internet of things, social media and the more general open data concept. There was also a balance between practical use cases in these areas and more technical sessions around tools and frameworks.

There are a number of related subjects and roles to or within Big Data. All of them mentioned during Strataconf.

Open Data

The movement of sharing one organizations data with the rest of the world. Most commonly via an open API. In this fashion others outside the given organization can explore new possibilities with the information, maybe in combination with other organizations data and open up new opportunities. During Strataconf NHA (UK National Health Service) drew their ambitions were opening up their data was crucial in a time when budget gets smaller. According to NHA this would be the key to improve the overall customer satisfaction.

Data Scientist

The craftsmanship of data analysis and more. “Sexiest job of the 21st century”, according to Harvard Business Review in 2012. A paper handed out by O’reilly identifies four different types of Data Scientists:

Data Businesspeople: Focus on the organization and how data projects generates value and profit

Data Developer: Deals with the technical problem of managing data. How to get and store data and how to learn from it. This is the group focusing on developing with the Hadoop framework.

Data Researchers: Statistics experts

Data Creatives: The mixture of everything above, from data extraction to visualization.

Read the full report here

Data Journalism

Find the story in the data (where data could be big). A lot of similarities with big data in general because of the exploratory approach. Often though accomplished with more simple tools, e.g. spreadsheets. The simple tools approach was covered by Claire Miller from WalesOnline.

(Off topic: One of the more amusing keynotes actually dealt with spreadsheets in general. Have a look at the presentation)

Lean Analytics

If you’re a fan of Lean startup this is for You. The idea behind Lean Analytics is to give you means to figure out what you should be working on in your startup/product development using data. As with Lean startup, build -> measure -> learn is an important concept where Lean Analytics focuses on the measure part giving guidelines on what to measure. This is described in the book written by Alistair Croll and Benjamin Yoskovitz. At the conference, Alistair gave some good advices on choosing the right metrics to follow. The presentation is available on this page.

Summary

There are lots of thing going on in the Big Data field everywhere, of course also at Aftonbladet and Schibsted. The Strata conference gave us inspiring examples of how we can take advantage of concepts in the field of Big Data and how we can improve our use of tools and techniques. We will use these findings in product development, operational decision support and more.


A Holistic View on Developer Productivity

What does developer productivity mean, really? Is it churning out more code or less code? Is it to have less bugs in production or shipping code more often? Is it doing a lot of things or just one thing? Let’s think about this for a moment. I believe developer productivity is about getting more things […]


Improving the usability of Aftonbladet Video-clip pages

We have recently started the process of improving the usability of video-clip pages. In order to get an idea of where Aftonbladet stands compared to other world-class online video/news providers, we conducted an online test answered by 110 visitors of Aftonbladet TV. In this test we compared their perception of an Aftonbladet TV video-clip page […]


Schibsted’s 1st iOS Deployment Meet-up

Schibsted’s 1st iOS Deployment Meet-up Thursday, 28th of April 2016: getting to know each other, guests arrive Friday, 29th of April 2016: the meet-up date We here at Aftonbladet had been planning on having a meet-up with iOS developers across various Schibsted companies for many months. We had a range of topics in mind for […]


Hackday: The Future of Storytelling is social, engaging and rewarding

We gathered students, journalists, developers and designers to get together and conceptualize something new for the news industry. This was our first organized hack event – The Future of Storytelling Hack. The hack was a team-based, news-media-focused prototyping and experimentation event within storytelling over two days at Kungsbrohuset, Schibsted and Aftonbladets headquarter in Stockholm. A good story used to […]