大规模并行处理器程序设计(英文版·原书第3版) mobi 下载 网盘 caj lrf pdf txt 阿里云

大规模并行处理器程序设计(英文版·原书第3版)电子书下载地址
- 文件名
- [epub 下载] 大规模并行处理器程序设计(英文版·原书第3版) epub格式电子书
- [azw3 下载] 大规模并行处理器程序设计(英文版·原书第3版) azw3格式电子书
- [pdf 下载] 大规模并行处理器程序设计(英文版·原书第3版) pdf格式电子书
- [txt 下载] 大规模并行处理器程序设计(英文版·原书第3版) txt格式电子书
- [mobi 下载] 大规模并行处理器程序设计(英文版·原书第3版) mobi格式电子书
- [word 下载] 大规模并行处理器程序设计(英文版·原书第3版) word格式电子书
- [kindle 下载] 大规模并行处理器程序设计(英文版·原书第3版) kindle格式电子书
内容简介:
本书介绍并行编程和GPU架构的基本概念,详细探索了构建并行程序的各种技术,涵盖性能、浮点格式、并行模式和动态并行等主题,适合专业人士及学生阅读。书中通过案例研究展示了开发过程,从计算思维的细节着手,最终给出了高效的并行程序示例。新版更新了关于CUDA的讨论,包含CuDNN等新的库,同时将不再重要的内容移到附录中。新版还增加了关于并行模式的两个新章节,并更新了案例研究,以反映当前的行业实践。
书籍目录:
Preface Acknowledgements
CHAPTER.1 Introduction........................................1
1.1 Heterogeneous Parallel Computing........................................2
1.2 Architecture of a Modern GPU........................................6
1.3 Why More Speed or Parallelism........................................8
1.4 Speeding Up Real Applicati***........................................10
1.5 Challenges in Parallel Programming ........................................12
1.6 Parallel Programming Languages and Models.............................12
1.7 Overarching Goals........................................14
1.8 Organization of the Book........................................15
References ........................................18
CHAPTER.2 Data Parallel Computing........................................19
2.1 Data Parallelism........................................20
2.2 CUDA C Program Structure........................................22
2.3 A Vector Addition Kernel ........................................25
2.4 Device Global Memory and Data Transfer...................................27
2.5 Kernel Functi*** and Threading........................................32
2.6 Kernel Launch........................................37
2.7 Summary........................................38
Function Declarati***........................................38
Kernel Launch........................................38
Built-in (Predefined) Variables ........................................39
Run-time API........................................39
2.8 Exercises........................................39
References ........................................41
CHAPTER.3 Scalable Parallel Execution........................................43
3.1 CUDA Thread Organization........................................43
3.2 Mapping Threads to Multidimensional Data................................47
3.3 Image Blur: A More Complex Kernel ........................................54
3.4 Synchronization and Transparent Scalability ...............................58
3.5 Resource Assignment........................................60
3.6 Querying Device Properties........................................61
3.7 Thread Scheduling and Latency Tolerance...................................64
3.8 Summary........................................67
3.9 Exercises........................................67
CHAPTER.4 Memory and Data Locality ........................................71
4.1 Importance of Memory Access Efficiency....................................72
4.2 Matrix Multiplication........................................73
4.3 CUDA Memory Types........................................77
4.4 Tiling for Reduced Memory Traffic........................................84
4.5 A Tiled Matrix Multiplication Kernel........................................90
4.6 Boundary Checks........................................94
4.7 Memory as a Limiting Factor to Parallelism................................97
4.8 Summary........................................99
4.9 Exercises........................................100
CHAPTER.5 Performance C***iderati***........................................103
5.1 Global Memory Bandwidth........................................104
5.2 More on Memory Parallelism........................................112
5.3 Warps and SIMD Hardware........................................117
5.4 Dynamic Partitioning of Resources........................................125
5.5 Thread Granularity........................................127
5.6 Summary........................................128
5.7 Exercises........................................128
References ........................................130
CHAPTER.6 Numerical C***iderati*** ........................................131
6.1 Floating-Point Data Representation........................................132
Normalized Representation of M........................................132
Excess Encoding of E ........................................133
6.2 Representable Numbers........................................134
6.3 Special Bit Patterns and Precision in IEEE Format....................138
*** Arithmetic Accuracy and Rounding ........................................139
6.5 Algorithm C***iderati***........................................140
6.6 Linear Solvers and Numerical Stability......................................142
6.7 Summary........................................146
6.8 Exercises........................................147
References ........................................147
CHAPTER.7 Parallel Patterns: Convolution ........................................149
7.1 Background........................................150
7.2 1D Parallel Convolution—A Basic Algorithm ...........................153
7.3 C***tant Memory and Caching........................................156
7.4 Tiled 1D Convolution with Halo Cells.......................................160
7.5 A Simpler Tiled 1D Convolution—General Caching.................165
7.6 Tiled 2D Convolution with Halo Cells.......................................166
7.7 Summary........................................172
7.8 Exercises........................................173
CHAPTER.8 Parallel Patterns: Prefix Sum........................................175
8.1 Background........................................176
8.2 A Simple Parallel Scan........................................177
8.3 Speed and Work Efficiency........................................181
8.4 A More Work-Efficient Parallel Scan........................................183
8.5 An Even More Work-Efficient Parallel Scan..............................187
8.6 Hierarchical Parallel Scan for Arbitrary-Length Inputs..............189
8.7 Single-Pass Scan for Memory Access Efficiency.......................192
8.8 Summary........................................195
8.9 Exercises........................................195
References ........................................196
CHAPTER.9 Parallel Patterns Parallel Histogram Computation .. 199
9.1 Background........................................200
9.2 Use of Atomic Operati*** ........................................202
9.3 Block versus Interleaved Partitioning........................................206
9.4 Latency versus Throughput of Atomic Operati***.....................207
9.5 Atomic Operation in Cache Memory ........................................210
9.6 Privatization........................................210
9.7 Aggregation ........................................211
9.8 Summary........................................213
9.9 Exercises........................................213
Reference........................................214
CHAPTER.10 Parallel Patterns: Sparse Matrix Computation ...........215
10.1 Background........................................216
10.2 Parallel SpMV Using CSR........................................219
10.3 Padding and Transposition........................................221
10.4 Using a Hybrid Approach to Regulate Padding.......................224
10.5 Sorting and Partitioning for Regularization.............................227
10.6 Summary........................................229
10.7 Exercises........................................229
References ........................................230
CHAPTER.11 Parallel Patterns: Merge Sort........................................231
11.1 Background........................................231
11.2 A Sequential Merge Algorithm........................................233
11.3 A Parallelization Approach........................................234
11.4 Co-Rank Function Implementation........................................236
Contents
11.5 A Basic Parallel Merge Kernel ........................................241
11.6 A Tiled Merge Kernel........................................242
11.7 A Circular-Buffer Merge Kernel........................................249
11.8 Summary........................................256
11.9 Exercises........................................256
Reference........................................256
CHAPTER.12 Parallel Patterns: Graph Search......................................257
12.1 Background........................................258
12.2 Breadth-First Search ........................................260
12.3 A Sequential BFS Function ........................................262
12.4 A Parallel BFS Function........................................265
12.5 Optimizati***........................................270
Memory Bandwidth........................................270
Hierarchical Queues ........................................271
Kernel Launch Overhead........................................272
Load Balance........................................273
12.6 Summary........................................273
12.7 Exercises........................................273
References ........................................274
CHAPTER.13 CUDA Dynamic Parallelism........................................275
13.1 Background........................................276
13.2 Dynamic Parallelism Overview ........................................278
13.3 A Simple Example........................................279
13.4 Memory Data Visibility........................................281
Global Memory ........................................281
Zero-Copy Memory........................................282
C***tant Memory........................................282
Local Memory........................................282
Shared Memory........................................283
Texture Memory........................................283
13.5 Configurati*** and Memory Management ..............................283
Launch Environment Configuration........................................283
Memory Allocation and Lifetime........................................283
Nesting Depth........................................284
Pending Launch Pool Configuration .......................................284
Errors and Launch Failures........................................284
13.6 Synchronization, Streams, and Events.....................................285
Synchronization........................................285
Synchronization Depth........................................285
Streams ........................................286
Events ........................................287
13.7 A More Complex Example........................................287
Linear Bezier Curves........................................288
Quadratic Bezier Curves........................................288
Bezier Curve Calculation (Without Dynamic Parallelism) .....288
Bezier Curve Calculation (With Dynamic Parallelism) ..........290
Launch Pool Size........................................292
Streams ........................................292
13.8 A Recursive Example........................................293
13.9 Summary........................................297
13.10 Exercises........................................299
References ........................................301
A13.1 Code Appendix........................................301
CHAPTER.14 Application Case Study—non-Cartesian Magnetic Resonance Imaging............................305
14.1 Background........................................306
14.2 I***tive Rec***truction........................................308
14.3 Computing FHD ........................................310
Step 1: Determine the Kernel Parallelism Structure................312
Step 2: Getting Around the Memory Bandwidth Limitation...317
Step 3: Using Hardware Trigonometry Functi***...........323
Step 4: Experimental Performance Tuning..............................326
14.4 Final Evaluation........................................327
14.5 Exercises........................................328
References ........................................329
CHAPTER.15 Application Case Study—Molecular Visualization and Analysis ....................................331
15.1 Background........................................332
15.2 A Simple Kernel Implementation........................................333
15.3 Thread Granularity Adjustment ........................................337
15.4 Memory Coalescing........................................338
15.5 Summary........................................342
15.6 Exercises........................................343
References ........................................344
CHAPTER.16 Application Case Study—Machine Learning ..............345
16.1 Background........................................346
16.2 Convolutional Neural Networks ........................................347
ConvNets: Basic Layers........................................348
ConvNets: Backpropagation........................................351
16.3 Convolutional Layer: A Basic CUDA Implementation of Forward Propagation.............................355
1*** Reduction of Convolutional Layer to Matrix Multiplication........................................359
16.5 cuDNN Library........................................364
16.6 Exercises........................................366
References ........................................367
CHAPTER.17 Parallel Programming and Computational Thinking ........................................369
17.1 Goals of Parallel Computing........................................370
17.2 Problem Decomposition........................................371
17.3 Algorithm Selection........................................374
17.4 Computational Thinking........................................379
17.5 Single Program, Multiple Data, Shared Memory and Locality ...................................380
17.6 Strategies for Computational Thinking....................................382
17.7 A Hypothetical Example: Sodium Map of the Brain...............383
17.8 Summary........................................386
17.9 Exercises........................................386
References ........................................386
CHAPTER.18 Programming a Heterogeneous Computing Cluster ........................................387
18.1 Background........................................388
18.2 A Running Example........................................388
18.3 Message Passing Interface Basics........................................391
18.4 Message Passing Interface Point-to-Point Communication.....393
18.5 Overlapping Computation and Communication......................400
18.6 Message Passing Interface Collective Communication...........408
18.7 CUDA-Aware Message Passing Interface ...............................409
18.8 Summary........................................410
18.9 Exercises........................................410
Reference........................................411
CHAPTER.19 Parallel Programming with OpenACC.............................413
19.1 The OpenACC Execution Model........................................414
19.2 OpenACC Directive Format........................................416
19.3 OpenACC by Example........................................418
The OpenACC Kernels Directive........................................419
The OpenACC Parallel Directive ........................................422
Comparison of Kernels and Parallel Directives.......................424
OpenACC Data Directives........................................425
OpenACC Loop Optimizati***........................................430
OpenACC Routine Directive........................................432
Asynchronous Computation and Data.....................................434
19.4 Comparing OpenACC and CUDA........................................435
Portability ........................................435
Performance........................................436
Simplicity ........................................436
19.5 Interoperability with CUDA and Libraries..............................437
Calling CUDA or Libraries with OpenACC Arrays................437
Using CUDA Pointers in OpenACC .......................................438
Calling CUDA Device Kernels from OpenACC.....................439
19.6 The Future of OpenACC........................................440
19.7 Exercises........................................441
CHAPTER.20 More on CUDA and Graphics Processing Unit Computing........................................443
20.1 Model of Host/Device In***ction........................................444
20.2 Kernel Execution Control ........................................449
20.3 Memory Bandwidth and Compute Throughput.......................451
20.4 Programming Environment........................................453
20.5 Future Outlook........................................455
References ........................................456
CHAPTER.21 Conclusion and Outlook........................................457
21.1 Goals Revisited........................................457
21.2 Future Outlook........................................458
Appendix A: An Introduction to OpenCL........................................461
Appendix B: THRUST: a Productivity-oriented Library for CUDA.....................475
Appendix C: CUDA Fortran........................................493
Appendix D: An introduction to C++ AMP........................................515
Index ........................................535
作者介绍:
[美]大卫·B. 柯克(David B. Kirk) 胡文美(Wen-mei W. Hwu) 著:大卫·B. 柯克(David B. Kirk) 美国国家工程院院士,NVIDIA Fellow,曾任NVIDIA公司首席科学家。他领导了NVIDIA图形技术的开发,并且是CUDA技术的创始人之一。2002年,他荣获ACM SIGGRAPH计算机图形成就奖,以表彰其在把高性能计算机图形系统推向大众市场方面做出的杰出贡献。他拥有加州理工学院计算机科学博士学位。
胡文美(Wen-mei W. Hwu) 美国伊利诺伊大学厄巴纳-香槟分校电气与计算机工程系AMD Jerry Sanders讲席教授,并行计算研究中心首席科学家,领导IMPACT团队和CUDA卓越中心的研究工作。他在编译器设计、计算机体系结构、微体系结构和并行计算方面做出了卓越贡献,是IEEE Fellow、ACM Fellow,荣获了包括ACM SigArch Maurice Wilkes Award在内的众多奖项。他还是MulticoreWare公司的联合创始人兼CTO。他拥有加州大学伯克利分校计算机科学博士学位。
出版社信息:
暂无出版社相关信息,正在全力查找中!
书籍摘录:
暂无相关书籍摘录,正在全力查找中!
在线阅读/听书/购买/PDF下载地址:
在线阅读地址:大规模并行处理器程序设计(英文版·原书第3版)在线阅读
在线听书地址:大规模并行处理器程序设计(英文版·原书第3版)在线收听
在线购买地址:大规模并行处理器程序设计(英文版·原书第3版)在线购买
原文赏析:
暂无原文赏析,正在全力查找中!
其它内容:
书籍介绍
本书介绍并行编程和GPU架构的基本概念,详细探索了构建并行程序的各种技术,涵盖性能、浮点格式、并行模式和动态并行等主题,适合专业人士及学生阅读。书中通过案例研究展示了开发过程,从计算思维的细节着手,最终给出了高效的并行程序示例。新版更新了关于CUDA的讨论,包含CuDNN等新的库,同时将不再重要的内容移到附录中。新版还增加了关于并行模式的两个新章节,并更新了案例研究,以反映当前的行业实践。
网站评分
书籍多样性:6分
书籍信息完全性:3分
网站更新速度:5分
使用便利性:9分
书籍清晰度:3分
书籍格式兼容性:5分
是否包含广告:9分
加载速度:8分
安全性:7分
稳定性:3分
搜索功能:7分
下载便捷性:7分
下载点评
- 服务好(394+)
- 傻瓜式服务(665+)
- 种类多(448+)
- 赞(418+)
- 经典(177+)
- pdf(377+)
- 无水印(215+)
- 无广告(545+)
- 中评多(600+)
- 赚了(343+)
- 引人入胜(654+)
- 值得购买(583+)
下载评价
- 网友 堵***洁:
好用,支持
- 网友 丁***菱:
好好好好好好好好好好好好好好好好好好好好好好好好好
- 网友 田***珊:
可以就是有些书搜不到
- 网友 步***青:
。。。。。好
- 网友 堵***格:
OK,还可以
- 网友 潘***丽:
这里能在线转化,直接选择一款就可以了,用他这个转很方便的
- 网友 孙***美:
加油!支持一下!不错,好用。大家可以去试一下哦
- 网友 居***南:
请问,能在线转换格式吗?
- 网友 国***舒:
中评,付点钱这里能找到就找到了,找不到别的地方也不一定能找到
- 网友 家***丝:
好6666666
- 网友 郗***兰:
网站体验不错
- 网友 扈***洁:
还不错啊,挺好
- 网友 冯***卉:
听说内置一千多万的书籍,不知道真假的
- 网友 温***欣:
可以可以可以
喜欢"大规模并行处理器程序设计(英文版·原书第3版)"的人也看了
聪明宝宝玩出来-0-3岁五感亲子游戏( 货号:710925267) mobi 下载 网盘 caj lrf pdf txt 阿里云
反不正当竞争法新论 mobi 下载 网盘 caj lrf pdf txt 阿里云
区域外大国参与湄公河地区合作策略的调整 mobi 下载 网盘 caj lrf pdf txt 阿里云
华职 2016全国高等教育自学考试创新型同步辅导系列本科:公共政策同步辅导·同步练习 mobi 下载 网盘 caj lrf pdf txt 阿里云
洪泽湖无机悬浮物浓度垂向分布遥感监测研究 mobi 下载 网盘 caj lrf pdf txt 阿里云
当代西方学者对***的批判性反思 mobi 下载 网盘 caj lrf pdf txt 阿里云
跨界 渠道中的人际关系与组织间关系 mobi 下载 网盘 caj lrf pdf txt 阿里云
老夫子64K mobi 下载 网盘 caj lrf pdf txt 阿里云
精灵鼠小弟 mobi 下载 网盘 caj lrf pdf txt 阿里云
中国收入分配改革40年 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 淘气包马小跳文字版11小大人丁文涛彩绘升级版儿童故事单本杨红樱系列书7-8-12岁三四五六年级读物小学生课外阅读书籍 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 公司法务部:揭开公司法务的面纱(第二版) mobi 下载 网盘 caj lrf pdf txt 阿里云
- 蒸汽火车头 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 物理化学实验 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 2005年MBA联考综合能力考试辅导教材 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 欧阳结体三十六法诠释 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 手把手教你买保险 中国铁道出版社有限公司 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 洛阳出土瓦当 程永建 编著 科学出版社,【正版保证】 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 长征 1934—1936 吴笛 主编 长征中进行重要战役战斗有600多次 血战湘江 强渡乌江 飞夺泸定桥 政治军事 中***事类书籍 湖北正版 mobi 下载 网盘 caj lrf pdf txt 阿里云
- 直视骄阳 mobi 下载 网盘 caj lrf pdf txt 阿里云
书籍真实打分
故事情节:5分
人物塑造:3分
主题深度:4分
文字风格:6分
语言运用:8分
文笔流畅:6分
思想传递:9分
知识深度:4分
知识广度:4分
实用性:5分
章节划分:8分
结构布局:5分
新颖与独特:9分
情感共鸣:9分
引人入胜:3分
现实相关:5分
沉浸感:4分
事实准确性:7分
文化贡献:6分