Hoppa till huvudinnehållet
Till KTH:s startsida

IL2230 Hardware Architectures for Deep Learning 7,5 hp

Course memo Autumn 2023-50923

Version 1 – 11/16/2023, 3:18:41 PM

Course offering

TEBSM (Start date 30 Oct 2023, English)

Language Of Instruction

English

Offered By

EECS/Electrical Engineering

Course memo Autumn 2023

Headings denoted with an asterisk ( * ) is retrieved from the course syllabus version Autumn 2021

Content and learning outcomes

Course contents

The course consists of two modules. Module I introduces basic knowledge in machine learning and algorithms for deep learning Module II focuses on specialised hardware implementation architectures for deep learning algorithms and new brain-like computer system architectures. Apart from presenting relevant informative knowledge, the course contains laboratory and project assignments to create understanding of the related algorithms applied to deal with real problems and to contrast and evaluate alternative implementation architectures, in term of performance, cost, and reliability.

Module I: Algorithms for deep learning

Module I introduces basic machine learning algorithms, basic neural network algorithms and algorithms for deep learning. Among a number of machine learning algorithms, this module introduces the algorithms for linear regression, polynomial regression, logistic regression that are fundamental and most relevant for neural networks. For neural networks we consider perceptrons, multi-layer-perceptrons and in particular the back-propagation algorithm. After presenting traditional statistical learning machine learning and neural networks this module further examplifies deep learning algorithms, specifically Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).

Module II: Architecture specialization for deep learning

Module II examines specialised hardware based implementation architectures for deep learning algorithms. From a broad spectrum of potential hardware architectures the design alternatives, such as GPGPU:s, domain specific processors, FPGA/ASIC-based accelerators are presented, together with their advantages and disadvantages. In particular limitations and design alternatives for using deep learning algorithms in embedded resource constrained systems will be discussed. Furthermore this module will discuss new architectures in deep learning for computer system design such as brain-like computer system architectures. A case study with analysis, evaluation and application of a deep learing architectures will be carried out.

Intended learning outcomes

After passing the course, the student should be able to

  • describe and explain basic neural networks and deep learning algorithms and their relations
  • explain and justify the hardware design space for deep learning algorithms
  • choose and apply an appropriate deep learning algorithm, to solve real problems with artificial intelligence in embedded systems
  • analyse and evaluate hardware implementation alternatives for deep learning algorithms
  • suggest and justify an implementation architecture for applications with deep learning in embedded resource constrained systems
  • discuss and comment new hardware implementation architectures for deep learning and new brain-like computer system architectures that utilise new devices and new concepts

in order to

  • understand the necessity, importance, and potential of accelerating deep learning algorithms with low power consumption through specialized hardware architecture
  • discuss, suggest and evaluate specialised hardware architectures to implement deep learning algorithms and utilise deep learning concepts in resource constrained reliable systems.

Learning activities

The course consists of 11 lectures (2 hours each), 3 labs (4 hours each), 2 seminars (4 hours each), and 1 exercise (2 hours). In total, it has 44 hours of scheduled activities.  The labs and seminars are group work. The exercise is individual work.

Detailed plan

Learning activities Content Preparations
Lecture 1. Course introduction and motivation

This lecture introduces the course objectives, course content and structure, and course assessment. We also motivate hardware acceleration for deep learning.

A hardware accelerator demo for real-time object detection will be shown in the first lecture.

Visit and review all content in the Canvas course room.

Read slides for Lecture 1 in the Canvas course room.

Lecture 2. Linear regression and logistic regression This lecture introduces two basic statistical learning models starting from linear regression to logistic regression.   Pre-review slides for Lecture 2 in the Canvas course room.
Lecture 3. Perceptron and Multi-Layer Perceptron (MLP) This lecture discusses the general concepts of artificial neural networks (ANNs) from perceptron to multi-layer perceptron (MLP), in particular, about network training and inference.  Pre-review slides for Lecture 3 in the Canvas course room.
Lecture 4. Lecture 4 CNN (Convolutional Neural Network) This lecture presents Convolutional Neural Network as one very successful example of Deep Neural Networks (DNNs). Pre-review slides for Lecture 4 in the Canvas course room.
Lab 1. Handwritten Digits Recognition from MLP to CNN

This lab prepares you with the necessary skills and knowledge for performing basic deep-learning tasks in PyTorch. In particular, you are going to realize handwritten digits recognition using MLP and CNN and compare their performance.

Try to finish the lab tasks as much as you can.
Lecture 5. RNN (Recurrent Neural Network) This lecture presents another important category of DNN with feedback in its structure, namely Recurrent Neural Network (RNN) which considers neuron interactions over time with memory effect. This includes both conventional RNN and LSTM. Pre-review slides for Lecture 5 in the Canvas course room.
Lecture 6. Hardware acceleration for deep learning: Challenges and Overview; Model minimization I This lecture discusses the efficiency challenges (performance, power/energy, resource) of executing deep learning algorithms on hardware, and opens the problem space for hardware acceleration of deep learning algorithms. We discuss the model minimization issues such as network reduction, data quantization, compression, fixed-point operations, etc. for efficient hardware implementations of neural network algorithms.   Pre-review slides for Lecture 6 in the Canvas course room.
Lecture 7. Model minimization II This lecture continues discussing the latest model minimization techniques: Network pruning, Data quantization and approximation, and Network sparsity.  Pre-review slides for Lecture 7 in the Canvas course room.
Lecture 8. Hardware Specialization I This lecture discusses hardware specializations for neural network algorithms, focusing on digital hardware design organization and computing architecture design principles. It also investigates network sparsity and sparsity acceleration. Pre-review slides for Lecture 8 in the Canvas course room.
Seminar I. Deep Learning and Minimization of Neural Network Models A workshop in a conference setting. Each student group is both a presenter (presenting its assigned paper) and an opponent (asking questions to another group).    Read the assigned paper, prepare presentation slides as a group, and prepare questions for another group.
Lab 2. Hardware Design, Implementation and Evaluation of Artificial Neuron In this lab, the tasks are to design three RTL models (three alternative ways) for implementing an N-input artificial neuron.  After you verify their correct functionality, you bring the designs for logic synthesis. Try to finish the lab tasks as much as you can.
Lecture 9. Hardware specialization II This lecture continues to discuss latest techniques used for hardware acceleration: the tile-based architecture, data flow schemes, and ASIP (application-specific instruction-set processor). Pre-review slides for Lecture 9 in the Canvas course room.
Lecture 10. Model-to-Architecture Mapping and EDA; Technology-Driven DL Acceleration and Brain-like Computing This lecture discusses model-to-architecture mapping, its optimization, and Electronic Design Automation (EDA). It also gives an outlook of efficient hardware acceleration of neural networks, with a focus on impacts from technology such as embedded DRAM, 3D stacking, memristor, etc. We also touch upon neuromorphic computing with spiking neural network. Pre-review slides for Lecture 10 in the Canvas course room.
Exercise This is an exercise Q & A session. The exercise questions are collected in an exercise compendium. The questions cover all lectures. Finish the exercise questions individually before the exercise session.
Lab 3 For this lab, you can choose one of the two tasks: (A) Hardware design, implementation and evaluation of MLP; (B)  Transfer Learning, Network Pruning and Quantization. Try to finish the lab tasks as much as you can.
Seminar II. Case studies of deep learning hardware accelerators  A workshop in a conference setting. Each student group is both a presenter (presenting its assigned paper) and an opponent (asking questions to another group). Read the assigned paper, prepare presentation slides as a group, and prepare questions for another group.

 

Schema HT-2021-TEBSM
Schema HT-2020-504

Preparations before course start

Specific preparations

Literature

No information inserted

Software

  • Hardware (ASIC/FPGA) synthesis tool with license.
  • Programming language: Python.
  • Deep Learning framework: Pytorch.

Examination and completion

Grading scale

A, B, C, D, E, FX, F

Examination

  • LAB1 - Laboratory work, 3.0 credits, Grading scale: P, F
  • TEN1 - Written exam, 4.5 credits, Grading scale: A, B, C, D, E, FX, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

The section below is not retrieved from the course syllabus:

Laboratory work ( LAB1 )

Written exam ( TEN1 )

  • You need to complete both LAB1 and TEN1 in order to complete the course.
  • Upon the completion of both LAB1 and TEN1, the grade of the written examination will be the course grade.

Ethical approach

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Changes of the course before this course offering

Fine adjustment on the detaied plan is possible, but the detailed plan will be settled before the course offering. 

Round Facts

Start date

30 Oct 2023

Course offering

  • TEBSM Autumn 2023-50923

Language Of Instruction

English

Offered By

EECS/Electrical Engineering

Contacts

Course Coordinator

Teachers

Teacher Assistants

Examiner