Course Discription |
:
This course is an introduction to the concepts of "Big Data" and "data analysis". It introduces one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible. At the end of this course, students are expected to first, describe the Big Data landscape including examples of real-world big data problems including the three key sources of Big Data: people, organizations, and sensors. Second, explain the V’s of Big Data (volume, velocity, variety, veracity, and value) and why each impacts data collection, monitoring, storage, analysis, and reporting. Third, to have some practical experience with some commonly used tools and techniques for (big) data processing. Forth, know the basics of distributed file systems, databases, and computing. Fifth, to have gained practical data processing skills with the MapReduce framework / Apache Hadoop, Apache Spark, H2O Framework, and TensorFlow. |