Parallel | Distributed Computing/SYCL
[SYCL] Data Parallel C++'s Table of Contents
jhson989
2022. 2. 16. 00:42
[Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL] by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
- Chapter 1 : Introduction
- Advices for building parallel program
- Chapter 2 : Where Code Executes
- SYCL: parallel programming framework for heterogeneous processors (CPU, GPU, and FPGA)
- Chapter 3 : Data Management
- Buffer
- Unified shared memory
- Chapter 4 : Expressing Parallelism
- Chapter 5 : Error Handling
- Chapter 6 : Unified Shared Memory
- Chapter 7 : Buffers
- Chapter 8 : Scheduling Kernels and Data Movement
- Chapter 9 : Communication and Synchronization
- Communication in work-group : Barrier, local memory
- Sub-groups : Warp(nvidia), Wavefront(amd)
- Sub-groups collective functions : Broadcast, Votes, Shuffles, Load and Stores
- Chapter 10 : Defining Kernels
- Lambda expression
- Function object (functor)
- Interoperability with Other APIs : OpenCL
- Chapter 11 : Vectors
- Chapter 12 : Device Information
- Device query API
- Chapter 13 : Practical Tips
- Chapter 14 : Common Parallel Patterns
- Map : No data dependences and high scalability
- Stencil : Data dependences and high data reuse
- Reduction : Data dependences
- Scan / Pack / Unpack : Limited scalability]
- DPC++ built-in Libraries
- Chapter 15 : Programming for GPUs
- GPU 하드웨어: Simple core, Easy switching
- GPU 커널 실행 모델: SIMD, SPMD, Distributed memory
- Memory bound problem 최적화 기법: Global memory coalesced access, Local memory bank conflict, etc.
- Chapter 16 : Programminig for CPUs
- CPU 하드웨어: cc-NUMA system, SIMD execution
- CPU Parallelism level: Instruction-level (Out-of-order, SIMD), Thread-level (multi-processing)
- CPU 커널 최적화 기법: Thread affinity, First touch, Vectorization
- Chapter 17 : Programming for FPGAs
- FPGA 병렬화 기본 개념: Pipelining (task parallelism) > Data parallelism
- Chapter 18 : Libraries
- Chapter 19 : Memory Model and Atomics
- Race condition & memory consistency model
- Barrier & memory fence
- Atomic operation