Courses of Study 2020-2021 
    
    Nov 25, 2024  
Courses of Study 2020-2021 [ARCHIVED CATALOG]

Add to Favorites (opens a new window)

STSCI 5065 - Big Data Management and Analysis


     
Spring. 3 credits. Student option grading.

Prerequisite: knowledge of a general purpose computer programming language, such as JAVA, Python, Ruby, or C++, or at least taking STSCI 4060  in parallel with this course;  STSCI 5060  or basic SQL knowledge; STSCI 5010  or basic knowledge of SAS programming; STSCI 4520  or STSCI 4030  or basic knowledge of R programming. Permission of instructor required. Enrollment preference given to: students in the MPS program in Applied Statistics.

X. Yang.

Concepts, challenges, and industry trends of big data, with a focus on the Hadoop system. Topics include: basics of the Apache Hadoop platform and Hadoop ecosystem; the Hadoop distributed file system (HDFS); MapReduce or its alternative, a parallel programming model for distributed processing of large data sets; common big data tools, such as Pig (a procedural data processing language for Hadoop parallel computation), Hive (a declarative SQL-like language to handle Hadoop jobs), HBase (the most popular NoSQL database), and YARN; case studies; and  integration of Hadoop with statistical software packages, e.g., SAS and R.



Add to Favorites (opens a new window)