Python: Introduction to Machine Learning for Text Classification

Machine Learning (ML) has transformative applications across government, healthcare, education, marketing, business, and life sciences. By employing machine learning techniques, computers can learn from past data and make predictions about new data.

This course guides participants through building their own text-based Machine Learning models. As applications, we focus on Text Clustering and Classification—fundamental techniques for organizing and categorizing text data. These core skills form the foundation for advanced applications including content analysis, sentiment analysis and document organization.

Through hands-on exercises using IPython notebooks, participants will learn to apply ML techniques to real-world text data, understand when different approaches work best, and avoid common pitfalls in ML projects.

This course is offered in collaboration with the Linguistic Research Infrastructure (LiRI) at UZH.

General information

Duration	12

Content

Guided coding workshops: Implement text preprocessing and ML pipelines in IPython notebooks
Hands-on data exploration and visualization: Apply clustering techniques to discover patterns in unlabeled text datasets
Model building exercises: Train and compare Logistic Regression and Random Forest classifiers on real-world data
Evaluation practice: Calculate and interpret confusion matrices, precision-recall metrics across different scenarios
Hyperparameter experimentation: Optimize model performance
Error analysis: Understanding common pitfalls in applied ML (e.g. overfitting and underfitting, class imbalance)
Mini-project: Conceptualize and present a complete text classification pipeline using provided or personal datasets

Requirements

Participants are expected to have a sound knowledge of Python (knowledge of Python syntax, data structures, control structures, working with libraries and files, creating functions).
It is recommended that participants take Python: Introduction to Natural Language Processing before taking this course.

Target audience

Students and employees of the University of Zurich.

Goals

By the end of the course, participants will be able to:

Transform raw text into numerical representations suitable for machine learning
Apply unsupervised clustering techniques to explore and discover patterns in text data
Select and justify appropriate supervised learning approaches for text classification tasks
Train, evaluate, and interpret machine learning models using standard metrics
Identify and address common issues in model performance
Implement their own text classification pipeline from data to results

Preparation

The course materials are going to be delivered throughout the course.
The code snippets of each section will be delivered during the lessons.

Teaching methods

Participants will perform hands-on exercises in IPython notebooks related to the content of each section. At the end of each section, participants will also have the opportunity to complete one or two tasks to consolidate the content of the section.

Notices

This course is offered in collaboration with the Linguistic Research Infrastructure (LiRI) at UZH.

Dates

Code	Instructor	Dates	Available seats	Venue
HS26-APTC-01	Schneider Gerold Ellendorff Tilia Kew Tannon	Wed 02 September 2026 (09:00am - 12:00pm) Fri 04 September 2026 (09:00am - 12:00pm) Wed 09 September 2026 (09:00am - 12:00pm) Fri 11 September 2026 (09:00am - 12:00pm)	10	Universität Zürich Irchel	Register