Re: [問題] 使用CUDA Share memory執行"點乘"比glo …
主要是你寫的 Matrix_Point_Multiplication_SM 有問題
你用for loop的功能是為了什麼?
能夠只執行一次的動作應該不需要重複執行好幾次吧
我把 Matrix_Point_Multiplication_SM 改成如下
呼叫的時候用
Matrix_Point_Multiplication_SM<<<blocks, threads,
BLOCK_SIZE*BLOCK_SIZE*2*sizeof(float)>>>( d_input1, d_input2, d_output_s );
// ------------------- 用 share memory ------------------- //
__global__ void Matrix_Point_Multiplication_SM( float *Ma, float *Nb, float
*Pc )
{
__shared__ float Msm[ BLOCK_SIZE ][ BLOCK_SIZE ];
__shared__ float Nsm[ BLOCK_SIZE ][ BLOCK_SIZE ];
int ty = threadIdx.y;
int tx = threadIdx.x;
int row = blockIdx.y * BLOCK_SIZE ;
int col = blockIdx.x * BLOCK_SIZE ;
if( (row+ty < NNy) && (col+tx < NNx) )
{
// 矩陣搬移Share memory
Msm[ ty ][ tx ] = Ma[ (row+ty)*NNx + (col+tx) ];
Nsm[ ty ][ tx ] = Nb[ (row+ty)*NNx + (col+tx) ];
// 矩陣運算Share memory
Pc[ (row+ty)*NNx + (col+tx) ] = Msm[ ty ][ tx ] * Nsm[ ty ][ tx ];
}
}
--
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 122.120.44.12
推
03/18 03:02, , 1F
03/18 03:02, 1F
推
03/18 03:09, , 2F
03/18 03:09, 2F
→
03/18 03:12, , 3F
03/18 03:12, 3F
推
03/18 03:14, , 4F
03/18 03:14, 4F
→
03/18 03:18, , 5F
03/18 03:18, 5F
→
03/18 03:19, , 6F
03/18 03:19, 6F
推
03/18 03:21, , 7F
03/18 03:21, 7F
→
03/18 03:22, , 8F
03/18 03:22, 8F
→
03/18 03:24, , 9F
03/18 03:24, 9F
→
03/18 03:25, , 10F
03/18 03:25, 10F
推
03/18 03:30, , 11F
03/18 03:30, 11F
→
03/18 03:35, , 12F
03/18 03:35, 12F
→
03/18 03:35, , 13F
03/18 03:35, 13F
推
03/18 03:38, , 14F
03/18 03:38, 14F
→
03/18 03:39, , 15F
03/18 03:39, 15F
→
03/18 03:40, , 16F
03/18 03:40, 16F
討論串 (同標題文章)
完整討論串 (本文為第 2 之 3 篇):
C_and_CPP 近期熱門文章
PTT數位生活區 即時熱門文章