[問題] UTF-8轉decimal
開發平台(Platform): (Ex: VC++, GCC, Linux, ...)
Code Blocks C++
"A"這個字,在UTF8為一字節編碼
16進位表示法為:41
10進位表示法為:65
"您"這個字,在UTF8為三字節編碼
16進位表示法為:E6 82 A8
10進位表示法為:230 130 168
UTF8.txt內容為:您 \n A
我現在想將UTF8.txt內容轉化成10進位表示法
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <cstdlib>
using namespace std;
int main(void)
{
int x;int y;
char txt[80]="";
ifstream ifile("C:\\Users\\Gon\\Desktop\\UTF8.txt",ios::binary);
if(ifile.is_open())
{
while(!ifile.eof())
{
ifile >> txt;
cout << txt<< endl;
x=char (txt[0]);
switch(x)
{ case 0-127:
cout <<"1st byte~ " <<x << endl;
break;
case 240-247:
cout <<"1st byte~ " <<x << endl;
y=char (txt[1]);
cout <<"2nd byte~ " <<y << endl;
y=char (txt[2]);
cout <<"3rd byte~ " <<y << endl;
break;
default:
cout <<"1st byte~ " <<x << endl;
y=char (txt[1]);
cout <<"2nd byte~ " <<y << endl;
y=char (txt[2]);
cout << "3rd byte~ " <<y << endl;
y=char (txt[3]);
cout << "4th byte~ " <<y << endl;
y=char (txt[4]);
cout << "5th byte~ " <<y << endl;
y=char (txt[5]);
cout << "6th byte~ " <<y << endl;
}
}
}
else
cout << "fail to open file" << endl;
ifile.close(); // close file
system("pause");
return 0;
}
我想要得到的結果是:
您
1st byte~ 230
2nd byte~ 130
3rd byte~ 168
A
1st byte~ 65
可是實際跑出來的結果是:
您
1st byte~ -26
2nd byte~ -126
3rd byte~ -88
4th byte~ 0
5th byte~ 0
6th byte~ 0
A
1st byte~ 65
2nd byte~ 0
3rd byte~ -88
4th byte~ 0
5th byte~ 0
6th byte~ 0
幾個問題點:
1. "A"的1st byte是65 應該代入case 0-127 可實際上卻代入default case 為何?
2. "A"跑出來是單字節 數值65沒錯 "您"跑出來是三個字節 數值完全不對 請問修改法?
有請大大們幫忙指出問題所在 感謝!!
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 61.231.54.192
※ 文章網址: https://www.ptt.cc/bbs/C_and_CPP/M.1473664008.A.DD8.html
推
09/12 15:15, , 1F
09/12 15:15, 1F
x跟y都改了 數值都正確了 感謝!!
但強迫代入default case的問題還在
→
09/12 15:21, , 2F
09/12 15:21, 2F
我剛看洪維恩範例有耶
上面寫 case 6 ... 8: cout << "夏天" << endl; //6~8月是夏天
現在我把我的 case 0-127 修改成 case 0 ... 127:
case 240-247修改成 case 240 ... 247:
"A"代入的case正確了 可是"您"還是代入錯誤的default case
問題還在
原來是我case寫錯了 應該是224 ... 239 抱歉!!
問題解決了
→
09/12 15:43, , 3F
09/12 15:43, 3F
→
09/12 16:30, , 4F
09/12 16:30, 4F
→
09/12 16:34, , 5F
09/12 16:34, 5F
因為中文是3個byte
6個byte就順便看第4,5,6個byte是否都是0 都是0表示UTF8檔讀取正確
※ 編輯: ReiFu21 (61.231.54.192), 09/12/2016 17:37:14
→
09/12 19:15, , 6F
09/12 19:15, 6F
→
09/12 19:15, , 7F
09/12 19:15, 7F
→
09/13 19:28, , 8F
09/13 19:28, 8F
→
09/13 19:28, , 9F
09/13 19:28, 9F
C_and_CPP 近期熱門文章
PTT數位生活區 即時熱門文章