Acerca de este Curso
3.6
45 ratings
12 reviews
Learn to analyze big data using Apache Spark's distributed computing framework. In a series of focused, practical tasks, you will start by launching a spark cluster on Amazon's EC2 cloud computing platform. As you progress to working with real data, you will gain exposure to a variety of useful tools, including RDFlib and SPARQL. The practical tasks on this course make use of the Gutenberg Project data - the world's largest open collection of ebooks. This offers no end of opportunity for highly engaging and novel analyses. As the taught material and example code is given in Python, it is strongly recommended that all students have previous Python programming experience. Furthermore, launching and interacting with a cluster on EC2 requires basic knowledge of Unix command line, and some experience with a command-line editor such as vim or nano would also be advantageous. With these minimal prerequisites, this course is designed to get you up and running in Spark as quickly and painlessly as possible, so that by the end, you will be comfortable and competent enough to start engineering your own big data solutions....
Globe

Cursos 100 % en línea

Comienza de inmediato y aprende a tu propio ritmo.
Calendar

Fechas límite flexibles

Restablece las fechas límite en función de tus horarios.
Intermediate Level

Nivel intermedio

Clock

Sugerido: 4 weeks of study, 3-6 hours/week

Aprox. 23 horas para completar
Comment Dots

English

Subtítulos: English
Globe

Cursos 100 % en línea

Comienza de inmediato y aprende a tu propio ritmo.
Calendar

Fechas límite flexibles

Restablece las fechas límite en función de tus horarios.
Intermediate Level

Nivel intermedio

Clock

Sugerido: 4 weeks of study, 3-6 hours/week

Aprox. 23 horas para completar
Comment Dots

English

Subtítulos: English

Programa - Qué aprenderás en este curso

1

Sección
Clock
9 horas para completar

Getting Started in Spark on EC2

This week, you'll gain essential background knowledge along with the practical skills needed to run applications in Apache Spark. You'll also take the steps necessary to launch a Spark cluster on the Amazon EC2 cloud computing platform....
Reading
8 videos (Total: 47 min), 13 readings, 11 quizzes
Video8 videos
Introduction2m
Create a normal AWS account6m
Launch a Spark cluster on EC212m
What is Spark?6m
Fundamentals10m
Setting up your development environment4m
Summarym
Reading13 lecturas
About this course10m
Week 1 Resource zip10m
Tips for following this lesson10m
Create a normal AWS account with billing alarm10m
Tips for following this video10m
Launch a Spark cluster on EC210m
Additional guidance for starter accounts10m
Accessing the pyspark interactive shell10m
Tips for following this lesson10m
How to install Spark locally10m
Tips for following this lesson10m
Setting up your development environment10m
Submitting applications to a cluster10m
Quiz8 ejercicios de práctica
Prerequisite Skills Quiz8m
Week 1 Introduction Quiz6m
Lesson 1.1 Practice Quiz8m
Lesson 1.2 Practice Quiz8m
Lesson 1.3 Practice Quiz4m
Lesson 1.4 Practice Quiz6m
Lesson 1.5 Practice Quiz4m
Week 1 Summary Quiz22m

2

Sección
Clock
4 horas para completar

Reading and Writing Data

This week you'll learn how to read and write data in Spark. The techniques you'll be shown can be used with data stored locally, or in partnership with the Amazon S3 cloud storage facility. To help get you started, we'll also show you how to upload a subset of the Gutenberg Project dataset onto Amazon S3....
Reading
4 videos (Total: 20 min), 8 readings, 6 quizzes
Video4 videos
Reading and writing RDDs7m
Reading data from Amazon S3 with boto35m
2.4 Writing objects to Amazon S3 (Spark methods)5m
Reading8 lecturas
Week 2 Resources zip10m
Get the Gutenberg project dataset10m
Tips for following this lesson10m
Using Spark methods to read and write data on S310m
Tips for following this lesson10m
Using boto3 to read data from Amazon S310m
Tips for following this lesson10m
Configuring Spark for accessing S310m
Quiz5 ejercicios de práctica
Lesson 2.1 Practice Quiz4m
Lesson 2.2 Practice Quiz6m
Lesson 2.3 Practice Quiz6m
Lesson 2.4 Practice Quiz6m
Week 2 Summary Quiz16m

3

Sección
Clock
3 horas para completar

Tools for Working with Data

This week you'll be getting to grips with some useful tools in preparation for working with the Gutenberg Project data set. In this week's assessment, you will exercise your data wrangling skills to produce a catalogue index file from the Gutenberg Project meta data, a resource that should prove useful in your final assessment....
Reading
4 videos (Total: 23 min), 3 readings, 5 quizzes
Video4 videos
What is RDF?5m
Using RDFLib8m
Summarym
Reading3 lecturas
Week 3 Resources zip10m
Tips for following this lesson10m
Tips for following this lesson10m
Quiz4 ejercicios de práctica
Lesson 3.1 Practice Quiz4m
Lesson 3.2 Practice Quiz4m
Lesson 3.3 Practice Quiz4m
Week 3 Summary Quiz20m

4

Sección
Clock
4 horas para completar

Programming in Spark

This week you'll learn Spark programming in some detail, in preparation for working with the Gutenberg collection of ebooks. The areas that will be covered should lead you to write much more efficient and successful Spark applications....
Reading
8 videos (Total: 42 min), 5 readings, 7 quizzes
Video8 videos
4.1 Working with data frames9m
Pipelines and cacheing7m
Spark performance6m
Spark configuration5m
Spark examples8m
Summarym
Summarym
Reading5 lecturas
Week 4 Resources zip10m
Tips for following this lesson10m
Tips for following this lesson10m
Tips for following this lesson10m
Tips for following this lesson10m
Quiz6 ejercicios de práctica
Lesson 4.1 Practice Quiz6m
Lesson 4.2 Practice Quiz4m
Lesson 4.3 Practice Quiz4m
Lesson 4.4 Practice Quiz4m
Lesson 4.5 Practice Quiz4m
Week 4 Summary Quiz20m
3.6

Principales revisiones

por CCMay 30th 2018

Good Practice session on AWS platform and thorough explanation from the mentors.\n\nThanks a lot.

Instructores

Dr Sorrel Harriet

Lecturer
Computing

Christophe Rhodes

Senior Lecturer
Department of Computing, Goldsmiths

Acerca de University of London

The University of London is a federal University which includes 18 world leading Colleges. Our distance learning programmes were founded in 1858 and have enriched the lives of thousands of students, delivering high quality University of London degrees wherever our students are across the globe. Our alumni include 7 Nobel Prize winners. Today, we are a global leader in distance and flexible study, offering degree programmes to over 50,000 students in over 180 countries. To find out more about studying for one of our degrees where you are, visit www.london.ac.uk...

Preguntas Frecuentes

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

¿Tienes más preguntas? Visita el Centro de Ayuda al Alumno.