Syntax and semantics for programming languages that are particularly suited for data science, e.g., Python. Routines to import, combine, convert and make selection of data. Algorithms for handling of missing values, discretisation and dimensionality reduction. Algorithms for supervised machine learning, e g naive Bayes, decision trees, and random forests. Algorithms for unsupervised machine learning e g k-means clustering. Libraries for data analysis. Evaluation methods and performance metrics. Visualisation and analysis of results of data analysis.
Course structure
Ten lectures (non-mandatory)
One mandatory seminar
Four assignments, of which one is to be presented at the seminar
Course literature
I. Witten, E. Frank, M. Hall and C. Pal, Data Mining: Practical Machine Learning Tools and Techniques (4th ed.), Morgan Kaufmann, 2016 ISBN: 9780128042915. J. VanderPlas, Python Data Science Handbook: Essential tools for working with data (1st ed.), O'Reilly Media Inc., 2016 ISBN: 9781491912058.
Required equipment
Own computer