Big data is creating big demand for Apache Hadoop, the open source framework that's known for its ability to store, process and analyze huge amounts of data using commodity servers. With Hadoop, enterprises are able to collect more data, retain it longer and perform analyses that weren't practical in the past because of cost, complexity and a lack of tools.
It's an appealing platform, but it requires expertise. As enterprises weigh the potential for Hadoop in their data architecture plans, IT pros are scrambling to sharpen their skills. Hadoop training courses and certification programs are available from companies including Cloudera, Hortonworks, IBM and MapR. But if you're not ready to commit to formal, paid training courses, there are also free resources that can get a newcomer started on Hadoop or broaden a veteran's skills.
Hortonworks, a Yahoo spinoff that offers a Hadoop distribution and commercial support services, hosts a weekly introductory webinar, Introduction to Hortonworks Data Platform, that covers topics including how to install and provision Hadoop across clusters of machines; the relationship with related Apache Hadoop projects such as Pig, Hive, Oozie and HBase; tools for monitoring clusters; and data sharing between Hadoop and other enterprise data systems.
Hadoop Essentials, a six-part recorded webinar series from Cloudera (which offers a Hadoop distribution, support and services), explores traditional large-scale computing systems, alternative approaches, and how Apache Hadoop addresses particular issues.
Sarah Sproehnle, senior director of educational services at Cloudera, also recommends another video for users interested in learning Hadoop. Introduction to Apache MapReduce and HDFS explains how the components work together to create a scalable, powerful system.
For users who prefer to start their learning with a document rather than a video, Cloudera's Hadoop Tutuorial describes the user-facing facets of the Apache Hadoop MapReduce framework.
MapR, which offers a free M3 distribution for Apache Hadoop, offers a number of training videos through its MapR Academy group. Some of the most popular, according to the company are: Writing MapReduce Applications, which covers the concepts and components of MapReduce; Why Hadoop?, which introduces Hadoop and discusses the problems MapReduce solves; Intro to Cluster Administration, which addresses how to manage Hadoop cluster users and groups; NFS Concepts, a survey of methods and strategies for setting up NFS for MapReduce; and Enterprise Hadoop, which tackles the challenges of evolving Hadoop for the enterprise.
Another player in the Hadoop education arena is online educational site Big Data University, which currently offers all of its courses for free.
Geared for beginners, Hadoop Fundamentals I from Big Data University is focused on the basics of Hadoop, including the Hadoop architecture, HDFS, MapReduce, Pig, Hive, JAQL, Flume and other related Hadoop technologies. The course lets users practice with hands-on labs on a Hadoop cluster via the cloud, with a supplied VMware image, or installed locally.
The next step is Hadoop Fundamentals II, which picks up where the first part leaves off, adding details about Pig, Hive, JAQL and Flume, and exploring analytic technologies.
Big Data University's Hadoop Reporting and Analysis aims to teach participants to build their own Hadoop reports using technologies such as HBase and Hive and to learn how and when to select different reporting techniques, including direct batch reports, live exploration and indirect batch analysis.
Ann Bednarz covers IT careers, outsourcing and Internet culture for Network World. Follow Ann on Twitter at @annbednarz and check out her blog, Occupational Hazards. Her email address is [email protected]
Read more about data center in Network World's Data Center section.