Python: Introduction to Machine Learning for Text Classification

Machine Learning (ML) has transformative applications across government, healthcare, education, marketing, business, and life sciences. By employing machine learning techniques, computers can learn from past data and make predictions about new data. 
 
This course guides participants through building their own text-based Machine Learning models. As applications, we focus on Text Clustering and Classification—fundamental techniques for organizing and categorizing text data. These core skills form the foundation for advanced applications including content analysis, sentiment analysis and document organization. 
  
Through hands-on exercises using IPython notebooks, participants will learn to apply ML techniques to real-world text data, understand when different approaches work best, and avoid common pitfalls in ML projects.

This course is offered in collaboration with the Linguistic Research Infrastructure (LiRI) at UZH

General information

Duration 12
  • Guided coding workshops: Implement text preprocessing and ML pipelines in IPython notebooks 
  • Hands-on data exploration and visualization: Apply clustering techniques to discover patterns in unlabeled text datasets 
  • Model building exercises: Train and compare Logistic Regression and Random Forest classifiers on real-world data 
  • Evaluation practice: Calculate and interpret confusion matrices, precision-recall metrics across different scenarios 
  • Hyperparameter experimentation: Optimize model performance 
  • Error analysis: Understanding common pitfalls in applied ML (e.g. overfitting and underfitting, class imbalance) 
  • Mini-project: Conceptualize and present a complete text classification pipeline using provided or personal datasets 
Participants are expected to have a sound knowledge of Python (knowledge of Python syntax, data structures, control structures, working with libraries and files, creating functions). 
It is recommended that participants take Python: Introduction to Natural Language Processing before taking this course.
Students and employees of the University of Zurich.
By the end of the course, participants will be able to: 
  • Transform raw text into numerical representations suitable for machine learning 
  • Apply unsupervised clustering techniques to explore and discover patterns in text data 
  • Select and justify appropriate supervised learning approaches for text classification tasks 
  • Train, evaluate, and interpret machine learning models using standard metrics 
  • Identify and address common issues in model performance 
  • Implement their own text classification pipeline from data to results 
  • The course materials are going to be delivered throughout the course.  
  • The code snippets of each section will be delivered during the lessons. 
Participants will perform hands-on exercises in IPython notebooks related to the content of each section. At the end of each section, participants will also have the opportunity to complete one or two tasks to consolidate the content of the section.
This course is offered in collaboration with the Linguistic Research Infrastructure (LiRI) at UZH

Dates

Code Instructor Dates Available seats Venue
HS26-APTC-01 Schneider Gerold
Ellendorff Tilia
Kew Tannon
Wed 02 September 2026 (09:00am - 12:00pm)
Fri 04 September 2026 (09:00am - 12:00pm)
Wed 09 September 2026 (09:00am - 12:00pm)
Fri 11 September 2026 (09:00am - 12:00pm)
Universität Zürich Irchel Course registration begins on 1 February for the spring semester and on 1 September for the autumn semester.