|
|
Nov 25, 2024
|
|
STSCI 5065 - Big Data Management and Analysis Spring. 3 credits. Student option grading.
Prerequisite: knowledge of a general purpose computer programming language, such as JAVA, Python, Ruby, or C++, or at least taking STSCI 4060 in parallel with this course; STSCI 5060 or basic SQL knowledge; STSCI 5010 or basic knowledge of SAS programming; STSCI 4520 or STSCI 4030 or basic knowledge of R programming. Permission of instructor required. Enrollment preference given to: students in the MPS program in Applied Statistics.
X. Yang.
Concepts, challenges, and industry trends of big data, with a focus on the Hadoop system. Topics include: basics of the Apache Hadoop platform and Hadoop ecosystem; the Hadoop distributed file system (HDFS); MapReduce or its alternative, a parallel programming model for distributed processing of large data sets; common big data tools, such as Pig (a procedural data processing language for Hadoop parallel computation), Hive (a declarative SQL-like language to handle Hadoop jobs), HBase (the most popular NoSQL database), and YARN; case studies; and integration of Hadoop with statistical software packages, e.g., SAS and R.
Add to Favorites (opens a new window)
|
|
|