[問題] cuda 矩陣乘法有幾個元素有錯

看板C_and_CPP (C/C++)作者 (C將。)時間16年前 (2009/12/15 23:03), 編輯推噓3(3010)
留言13則, 3人參與, 最新討論串1/1
遇到的問題: (題意請描述清楚) 最近剛開始學習CUDA, 照著國網中心的教學寫了個矩陣乘法練習, 出來的結果確在某些元素會有錯, 而且相同的輸入、相同的程式碼情況下, 跑出來的結果確可能是不一樣的(某幾個版本的答案在變動) 這是我輸入的兩個矩陣: M = 1.000000 3.000000 2.000000 0.000000 1.000000 0.000000 2.000000 2.000000 2.000000 0.000000 1.000000 1.000000 1.000000 3.000000 1.000000 3.000000 N = 3.000000 2.000000 3.000000 0.000000 3.000000 0.000000 2.000000 1.000000 0.000000 2.000000 1.000000 0.000000 2.000000 3.000000 3.000000 2.000000 希望得到的正確結果: P = 12.000000 6.000000 11.000000 3.000000 7.000000 12.000000 11.000000 4.000000 8.000000 9.000000 10.000000 2.000000 18.000000 13.000000 19.000000 9.000000 程式跑出來的錯誤結果: 主要就以下幾個版本在變動… P = 12.000000 6.000000 11.000000 3.000000 8.000000 5.000000 11.000000 4.000000 8.000000 5.000000 10.000000 0.000000 9.000000 5.000000 17.000000 5.000000 P = 12.000000 6.000000 11.000000 3.000000 8.000000 5.000000 11.000000 4.000000 8.000000 9.000000 8.000000 3.000000 18.000000 13.000000 5.000000 3.000000 P = 12.000000 6.000000 11.000000 3.000000 8.000000 5.000000 11.000000 4.000000 8.000000 9.000000 10.000000 2.000000 7.000000 4.000000 19.000000 9.000000 P = 12.000000 6.000000 7.000000 6.000000 7.000000 12.000000 7.000000 6.000000 8.000000 9.000000 8.000000 3.000000 18.000000 13.000000 5.000000 3.000000 開發平台: (例: VC++ or gcc/g++ or Dev-C++, Windows or Linux) XP - VS2005 GeForce GT 220 - CUDA 2.3 有問題的code: (請善用置底文標色功能) 其實我覺得code應該是沒什麼問題, 但還是將處理資料的那部份貼出來, 也許是我沒寫好 -> // Matrix multiplication dernel - per thread code __global__ void MatrixMulKernel(float* Md, float* Nd, float* Pd, int Width) { // Block ID int bx = blockIdx.x; int by = blockIdx.y; // Thread ID int tx = threadIdx.x; int ty = threadIdx.y; // Pvalue stores the element of the block sub-matrix // that is computed by the thread - automatic variable! float Pvalue = 0; // Loop over all the sub-matrices of M and N // required to compute the block sub-matrix for (int m = 0; m < Width/TILE_WIDTH; ++m) { // Get a pointer to the current sub-matrix Msub of M float *Mdsub = GetSubMatrix(Md, m, by, Width); // Get a pointer to the current sub-matrix Nsub of N float *Ndsub = GetSubMatrix(Nd, bx, m, Width); __shared__ float Mds[TILE_WIDTH][TILE_WIDTH]; __shared__ float Nds[TILE_WIDTH][TILE_WIDTH]; // each thread loads one element of the sub-matrix Mds[ty][tx] = GetMatrixElement(Mdsub, tx, ty, Width); // each thread loads one element of the sub-matrix Nds[ty][tx] = GetMatrixElement(Ndsub, tx, ty, Width); // synchronize to make sure the sub-matrices are loaded // before starting the computation __syncthreads(); // each thread computes one element of the block sub-matrix for (int k = 0; k < TILE_WIDTH; ++k) Pvalue += Mds[ty][k] * Nds[k][tx]; // Synchronize to make sure that the preceding // computation is done before loading two new // sub-matrices of M and N in the next iteration __syncthreads(); } // Get a pointer to the block sub-matrix of P float *Psub = GetSubMatrix(Pd, bx, by, Width); // Write the block sub-matrix to device memory; // each thread wreites one element SetMatrixElement(Psub, tx, ty, Pvalue, Width); } __device__ float* GetSubMatrix(float* Md, int x, int y, int Width) { return (Md + y*TILE_WIDTH*Width + x*TILE_WIDTH); } __device__ float GetMatrixElement(float* Mdsub, int x, int y, int Width) { return *(Mdsub + y*Width + x); } __device__ void SetMatrixElement(float* Psub, int x, int y, float Pvalue, int Width) { *(Psub + y*Width + x) = Pvalue; } 補充說明: 爬過其它文好像有人說過硬體相關也會影響, 但這方面我就不知道要怎麼解決了, 還請大家幫幫忙,謝謝!! -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 163.22.18.83

12/16 11:58, , 1F
看起來kernel的部份沒有問題 我測試執行的結果也正確
12/16 11:58, 1F

12/16 11:58, , 2F
要不要考慮用置底文的網站 把完整的code附上來
12/16 11:58, 2F

12/16 11:59, , 3F
比較容易抓出問題出在哪裡
12/16 11:59, 3F

12/16 13:15, , 4F
用emu mode下開debug看看
12/16 13:15, 4F

12/16 17:23, , 5F
l大 我把code附上來了 http://paste.plurk.com/show/105356/
12/16 17:23, 5F

12/16 17:24, , 6F
麻煩了 謝謝
12/16 17:24, 6F

12/16 19:25, , 7F
關於dimension的宣告 dim3 dimBlock(Width, Width);
12/16 19:25, 7F

12/16 19:25, , 8F
請特別小心.. dim3 dimBlock(TILE_WIDTH, TILE_WIDTH);
12/16 19:25, 8F

12/17 12:28, , 9F
請問這句要特別小心的原因是?!
12/17 12:28, 9F

12/17 13:29, , 10F
因為你矩陣乘法主kernel沒問題 問題是出在dimension宣告
12/17 13:29, 10F

12/17 13:29, , 11F
所以提醒你注意這部份 這種問題通常不容易發現
12/17 13:29, 11F

12/17 21:39, , 12F
l大 不好意思 這部份我還是沒有很懂 有沒有什麼文章教學呢?!
12/17 21:39, 12F

12/18 16:41, , 13F
l大 我懂你意思了 程式已改好了 謝謝!! ^^
12/18 16:41, 13F
文章代碼(AID): #1B9wLBHO (C_and_CPP)
文章代碼(AID): #1B9wLBHO (C_and_CPP)