일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- jhDNN
- convolution
- C++
- 반도체
- 반도체기초
- dnn
- sycl
- Semiconductor
- 클라우드
- deep_learning
- quantum_computing
- stl
- CUDA
- CuDNN
- kubernetes
- FPGA
- HA
- 딥러닝
- POD
- cloud
- 양자역학의공준
- Qubit
- 쿠버네티스
- nvidia
- jhVM
- GPU
- DRAM
- Compression
- SpMM
- flash_memory
Archives
- Today
- Total
Computing
[SYCL] Data Parallel C++'s Table of Contents 본문
Parallel | Distributed Computing/SYCL
[SYCL] Data Parallel C++'s Table of Contents
jhson989 2022. 2. 16. 00:42[Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL] by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
- Chapter 1 : Introduction
- Advices for building parallel program
- Chapter 2 : Where Code Executes
- SYCL: parallel programming framework for heterogeneous processors (CPU, GPU, and FPGA)
- Chapter 3 : Data Management
- Buffer
- Unified shared memory
- Chapter 4 : Expressing Parallelism
- Chapter 5 : Error Handling
- Chapter 6 : Unified Shared Memory
- Chapter 7 : Buffers
- Chapter 8 : Scheduling Kernels and Data Movement
- Chapter 9 : Communication and Synchronization
- Communication in work-group : Barrier, local memory
- Sub-groups : Warp(nvidia), Wavefront(amd)
- Sub-groups collective functions : Broadcast, Votes, Shuffles, Load and Stores
- Chapter 10 : Defining Kernels
- Lambda expression
- Function object (functor)
- Interoperability with Other APIs : OpenCL
- Chapter 11 : Vectors
- Chapter 12 : Device Information
- Device query API
- Chapter 13 : Practical Tips
- Chapter 14 : Common Parallel Patterns
- Map : No data dependences and high scalability
- Stencil : Data dependences and high data reuse
- Reduction : Data dependences
- Scan / Pack / Unpack : Limited scalability]
- DPC++ built-in Libraries
- Chapter 15 : Programming for GPUs
- GPU 하드웨어: Simple core, Easy switching
- GPU 커널 실행 모델: SIMD, SPMD, Distributed memory
- Memory bound problem 최적화 기법: Global memory coalesced access, Local memory bank conflict, etc.
- Chapter 16 : Programminig for CPUs
- CPU 하드웨어: cc-NUMA system, SIMD execution
- CPU Parallelism level: Instruction-level (Out-of-order, SIMD), Thread-level (multi-processing)
- CPU 커널 최적화 기법: Thread affinity, First touch, Vectorization
- Chapter 17 : Programming for FPGAs
- FPGA 병렬화 기본 개념: Pipelining (task parallelism) > Data parallelism
- Chapter 18 : Libraries
- Chapter 19 : Memory Model and Atomics
- Race condition & memory consistency model
- Barrier & memory fence
- Atomic operation
'Parallel | Distributed Computing > SYCL' 카테고리의 다른 글
Intel DevCloud 실행 방법 (0) | 2022.03.16 |
---|---|
[SYCL] SYCL 설치 Ubuntu 18.04 (Nvidia GPU, Intel CPU) (0) | 2022.02.23 |