Although the word Hadoop may not ring a bell to you, it is probably one of the main reasons why you have heard – and continuously hear – buzzwords such as “Big Data” and “Internet of Things”.
What is Hadoop?
Hadoop is an open-source platform for distributed storage and processing of very large amounts of data.
You might have never heard of it, but Hadoop is probably one of the main reasons why you have heard – and continuously hear – buzzwords such as “Big Data” and “Internet of Things”.
In fact, Hadoop has become the main standard for big data processing,
Data volumes are increasing. Ninety percent of the world’s data was created over the last two years, says a research from IBM. Essentially, Hadoop accomplishes two tasks: massive data storage and faster processing.
On the one hand it enables to store bigger files than what can be stored on one particular node or server. On the other hand, it has the ability to process huge amounts of data, or at least (provide) a framework for processing that data.
Even though Hadoop is nearly 10 years old, it has only recently started becoming popular in industry and it’s still quite far from being mainstream. However, the interest in Hadoop-related technologies is continuously increasing, Hadoop-related talent is trending, while highly skilful people in this area are highly valued and hard to find.
The Hadoop Summit is one of the biggest industrial conferences focusing on Apache Hadoop and similar big data technologies, organized by Hortonworks. Hadoop community members, users, programmers, industrial partners and researchers participate in this 2-day event to share their experiences and knowledge, seek Hadoop talent to hire, promote their products and do a lot of networking.
The summit takes place both in North America and Europe. In fact, Hortonworks reported having about 15-20 percent of its business and employees in Europe.
The event kicked off on Monday April 13 with several pre-conference events, such as trainings and affiliated meetups, followed by the main 2-day conference on April 15-16.
Each main conference day started with quite lengthy keynotes, followed by talks in 6 parallel sessions, covering the following topics:
– Committer track: technical presentations made by Apache Hadoop and related projects committers
– Data science and Hadoop
– Hadoop Governance, Security & Operations
– Hadoop Access Engines
– Applications of Hadoop and the Data Driven Business
– The Future of Apache Hadoop
We analysed the talk titles (abstracts) to find the most popular topics and not surprisingly, these mostly included the words Hadoop, Data, Apache and Analytics:
According to the organizers, there were 351 submissions from 163 organizations. Yet the diversity and variety in the agenda was rather limited. Almost 1 out 5 speakers came from Hortonworks (the organizers), while there were only 5 women among the speakers. That’s an extremely disappointing percentage of around 5%. The attendee count was also impressive, 1300, as reported by the organizers, but equivalently impressive was the lack of women among them.
What got people talking, however, was the immense amounts of data reported by some of the participating companies: among others, Yahoo reported 600PetaBytes of data, a 43000 servers-cluster and 1 million Hadoop jobs per day, Pinterest talked about 40PetaBytes of data on Amazon S3 and a 2000-nodes Hadoop cluster, while Spotify reported having 13PetaBytes of data stored in Hadoop.
As anticipated, the keynotes were very much focused on proving the business value of Hadoop and aiming at promoting products and services. Apart from some awkward role playing and the sales pitches, there was little technical value in most them, with the Yahoo keynote being an important exception.
Streaming and real-time processing were certainly two of the hottest topics in the summit, covered by several talks each day. Real-time processing, i.e. processing data the moment it reaches your system and being able to immediately make decisions based on occurring events, is, without question, the next –if not current– big thing. I hope, though, that speakers get a bit more creative with their use-cases: out of the 5 streaming talks I attended, 4 used the “anomaly detection” use-case as their walk through example and motivation.
The party place and theme perfectly fit the male-dominated audience. Held in an automotive museum, the famous Brussels Autoworld, a certainly spectacular place if you like cars… and motorbikes…and even more cars. Cars and motorbikes aside, we really enjoyed the food, even though vegetarian and vegan attendees felt probably neglected.
Hadoop Summit was a successful event, judging by its numbers, technical content and overall organization. We look forward to seeing how the organizers will try to build on this success by improving, during the next events, speaker and attendee diversity and inclusivity.