Computing

[SYCL] Data Parallel C++'s Table of Contents 본문

Parallel | Distributed Computing/SYCL

[SYCL] Data Parallel C++'s Table of Contents

jhson989 2022. 2. 16. 00:42

[Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL] by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).

 

Data Parallel C++

 

  • Chapter 1 : Introduction
    • Advices for building parallel program
  • Chapter 2 : Where Code Executes
    • SYCL: parallel programming framework for heterogeneous processors (CPU, GPU, and FPGA)
  • Chapter 3 : Data Management
    • Buffer
    • Unified shared memory
  • Chapter 4 : Expressing Parallelism
  • Chapter 5 : Error Handling
  • Chapter 6 : Unified Shared Memory
  • Chapter 7 : Buffers
  • Chapter 8 : Scheduling Kernels and Data Movement
  • Chapter 9 : Communication and Synchronization
    • Communication in work-group : Barrier, local memory
    • Sub-groups : Warp(nvidia), Wavefront(amd)
    • Sub-groups collective functions : Broadcast, Votes, Shuffles, Load and Stores
  • Chapter 10 : Defining Kernels
    • Lambda expression
    • Function object (functor)
    • Interoperability with Other APIs : OpenCL
  • Chapter 11 : Vectors
  • Chapter 12 : Device Information
    • Device query API
  • Chapter 13 : Practical Tips
  • Chapter 14 : Common Parallel Patterns
    • Map : No data dependences and high scalability 
    • Stencil : Data dependences and high data reuse
    • Reduction : Data dependences
    • Scan / Pack / Unpack : Limited scalability]
    • DPC++ built-in Libraries
  • Chapter 15 : Programming for GPUs
    • GPU 하드웨어: Simple core, Easy switching
    • GPU 커널 실행 모델: SIMD, SPMD, Distributed memory
    • Memory bound problem 최적화 기법: Global memory coalesced access, Local memory bank conflict, etc.
  • Chapter 16 : Programminig for CPUs
    • CPU 하드웨어: cc-NUMA system, SIMD execution
    • CPU Parallelism level: Instruction-level (Out-of-order, SIMD), Thread-level (multi-processing)
    • CPU 커널 최적화 기법: Thread affinity, First touch, Vectorization
  • Chapter 17 : Programming for FPGAs
    • FPGA 병렬화 기본 개념: Pipelining (task parallelism) > Data parallelism
  • Chapter 18 : Libraries
  • Chapter 19 : Memory Model and Atomics
    • Race condition & memory consistency model
    • Barrier & memory fence
    • Atomic operation