Courses of Study 2024-2025 
    
    Oct 16, 2024  
Courses of Study 2024-2025
Add to Favorites (opens a new window)

CS 5740 - Natural Language Processing


     


Fall, Spring. 3-4 credits, variable. Letter grades only (no audit).

Fall (Ithaca) (4 credits):

  • Strong programming skills are important. Three semesters of programming classes are strongly recommended (e.g., completion of CS 3110 ). CS 2110  may suffice if you individually could have successfully and easily completed the assignments by yourself.
  • Python experience.
  • Pytorch experience (as through CS 4780) not required but some students report it being very helpful.
  • Comfort with elementary probability.
  • Clear understanding of matrix and vector operations.
  • Familiarity with differentiation.

Spring (New York City) (3 credits):

  • CS 4780/CS 5780  or CS 5785  or CS 5781  or equivalent machine learning course experience.
  • Strong experience with Python.
  • Familiarity with a numerical library (e.g., numpy)
  • Experience with a neural network framework (e.g., PyTorch, TensorFlow).
  • Strong understanding of foundational CS concepts such as memory requirements and computational complexity.  
  • Students need to be comfortable with calculus and probability, primarily differentiation and basic discrete distributions.
Fall: Ithaca; Spring: New York City. Co-meets with COGST 4740 /CS 4740 /LING 4474  (Fall only).

Fall: L Lee; Spring: Y. Artzi.

This course constitutes an introduction to natural language processing (NLP), the goal of which is to enable computers to use human languages as input, output, or both. NLP is at the heart of many of today’s most exciting technological achievements, including machine translation, automatic conversational assistants and Internet search. The course will introduce core problems and methodologies in NLP, including machine learning, problem design, and evaluation methods. 

Ithaca only: Expect each of the roughly four connected programming assignments to take tens of hours, although this time is distributed over multiple weeks; to require writing code to massage raw-ish data into different formats and other accessory functions as well as to implement core algorithms; and to necessitate much independent examination of documentation.



Add to Favorites (opens a new window)