Till KTH:s startsida Till KTH:s startsida

Large Scale ETL Design, Optimization and Implementation Based On Spark and AWS Platform

Tid: Tisdag 15 augusti 2017 kl 13:00 - 15:00 2017-08-15T13:00:00 2017-08-15T15:00:00

Kungliga Tekniska högskolan
KTH Kista Degree projects, Master-level (Examensarbete, Master)

Plats: Ada room

Info:

Student:  Di Zhu
Date and Time:  13:00pm, Tuesday, 15th August, 2017
Place: Ada room
Examiner:  Šarūnas Girdzijauskas
Supervisor:  Vladimir Vlassov
Title: Large Scale ETL Design, Optimization and Implementation Based On Spark and AWS Platform
Opponent: Xiaoxu Gao


Abstract

Nowadays, the amount of data generated by users within an Internet product is increasing exponentially. All these data may be yielded more than billions every day, which is not surprisingly essential that insights could be extracted or built. For instance, monitoring system, fraud detection, user behavior analysis and feature verification, etc. Nevertheless, technical issues emerge accordingly. Heterogeneity, massiveness and miscellaneous requirements for taking use of the data from different dimensions make it much harder when it comes to the design of data pipelines, transforming and persistence in data warehouse. Undeniably, there are traditional ways to build ETLs - from mainframe, RDBMS, to MapReduce and Hive. Yet with the emergence and popularization of Spark framework and Amazon Web Services (AWS), this procedure could be evolved to a more robust, efficient, less costly and easy-to-implement architecture for collecting, building dimensional models and proceed analytics on massive data. With the advantage of being in a car transportation company, billions of user behavior events come in every day, this paper contributes to an exploratory way of building and optimizing ETL pipelines based on AWS and Spark, and make the comparison with current main Data pipelines from different aspects like efficiency, robustness, ease of maintenance, etc.

Hela världen får läsa.

Senast ändrad 2017-08-08 19:35

Taggar: Saknas än så länge.