Nyhetsflöde
Logga in till din kurswebb
Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.
I Nyhetsflödet hittar du uppdateringar på sidor, schema och inlägg från lärare (när de även behöver nå tidigare registrerade studenter).
Maj 2014
Here is an interesting course I can recommend:
Introduction to High-Performance Computing
PDC Summer School
KTH Royal Institute of Technology, Stockholm, Sweden
August 18-29, 2014
http://www.pdc.kth.se/education/summer-school
Visa fler liknande händelser (2)
Maj 2012
Given that:
*A GPU contains multiple SIMD processors
*Each SIMD processor contains multiple lanes.
*Each SIMD processor is assigned a single thread block (by the thread block scheduler)
The question is which one of these two alternatives is correct:
-Alt1 (parallel execution of threads): Each lane runs a single thread among all threads in the thread block -> to completely become executed, each thread takes as many clock cycles as there is elements in the vector that it writes to/reads from
-Alt2 ("sequential-alternating" execution of threads): Each thread occupies all lanes in a single SIMD processor -> each thread takes round_up(<nr_of_elements_in_the_vector>/<nr_of_lanes_per_SIMD_processor>) clock cycles to finish execution (not necessary consecutive) -> the thread scheduler (in each SIMD processor) schedules/alternates between different threads even if a single thread didn't finish all its cycles. So threads doesn't execute in parallel
(PS. Alt1 is what I understood from the GPU class/slides; Alt2 is what I understood from the book)
Alt1 is the correct alternative. The book was a bit uncleare about that I think, or maybe I have missed something on it; but the slides are anyhow more cleare with more figures.
Thank you Artur for answering the question and for the slides.