Re: [分享] java nio performance tuning

看板java作者sbrhsieh (十年一夢)時間11年前 (2013/10/06 00:48)推噓0(0推 0噓 3→)

留言3則, 2人參與討論串5/6 (看更多)

※ 引述《dryman (dryman)》之銘言： : http://www.idryman.org/blog/2013/09/28/java-fast-io-using-java-nio-api/ : [略] 我也做了一些測試，來分享一下我的數據與觀察。說明：以下使用一個 4*26Mb 內容為隨機產生數值的檔案作測試(M=K^2, K=1024, b=byte)； method parameter 有名為 bufSize 者，實際測試時 actual argument=64K；測試數據是以先跑 10 次要量測的 method，再跑 10 次取後十次的平均值；這種測試方式應該會有 OS level cache 牽涉其中；配合所使用的演算法，有些 method 測試時會提供 length 等於測試檔案長度+64Kb 的 byte array。我彷原作者的 startegy 2/3 的作法與數據如下： Record: 1096 ms Code: public static final void readAll(int[] arr, InputStream in, int bufSize) throws IOException { DataInputStream dataIn = new DataInputStream( new BufferedInputStream(in, bufSize)); try { for (int i = 0; i < arr.length; ++i) { arr[i] = dataIn.readInt(); } } finally { dataIn.close(); } } Record: 126 ms Code: public static final void readAllNIO(int[] arr, File input) throws IOException { FileChannel channel = new FileInputStream(input).getChannel(); try { ByteBuffer buffer = ByteBuffer.allocateDirect(64 * 1024); int offset = 0; int n; while ((n = channel.read(buffer)) != -1) { buffer.flip(); buffer.asIntBuffer().get(arr, offset, n >> 2); buffer.clear(); offset += n >> 2; } } finally { channel.close(); } } 不過嚴格來說 strategy 3(readAllNIO) 不十分正確，他假設了每次從 channel 作 read 操作所 consume 的 data size 為 4 的倍數。故其他的測試也會以這個假設成立為前提(套用到 InputStream)。我還做了其它的測試，我認為可以推論為 strategy 2 的瓶頸不在 IO 上：假如只是把檔案所有的 byte 讀出到一個 byte array，一次讀一區塊，其實是蠻快的。 Record: 78 ms Code: public static final void readAllBytes(byte[] arr, InputStream in) throws IOException { in = new BufferedInputStream(in, 64 * 1024); try { int n, offset = 0; while ((n = in.read(arr, offset, 64 * 1024)) != -1) offset += n; } finally { in.close(); } } 但稍作修改每次最多只讀 4 bytes，大數量 iteration 裡的運算會多耗很多時間，約佔了 readAll 的 7 成。 Record: 758 ms Code: public static final void readAllBytes3(byte[] arr, InputStream in) throws IOException { InputStream dataIn = new BufferedInputStream(in, 64 * 1024); int offset = 0; int n; try { while ((n = dataIn.read(arr, offset, 4)) != -1) { offset += n; } } finally { dataIn.close(); } } 採用 DataInputStream::readFully 來實作則更差一點。 Record: 940 ms Code: public static final void readAllBytes2(byte[] arr, InputStream in) throws IOException { DataInputStream dataIn = new DataInputStream( new BufferedInputStream(in, 64 * 1024)); int offset = 0; try { while (true) { dataIn.readFully(arr, offset, 4); offset += 4; } } catch (EOFException e) { } finally { in.close(); } } readAllNIO 的作法是一次讀取一整塊數據，一整塊數據一批次 compose 成 int value(s)，我想如果要公平一點，對照的 non-NIO 作法應該也要一次讀一整塊，然後一整塊批次作 composing。我選擇使用 nio Buffer 來實作的情況： Record: 820 ms Code: public static final void readAll2(int[] arr, InputStream in, int bufSize) throws IOException { InputStream dataIn = new BufferedInputStream(in, bufSize); byte[] block = new byte[bufSize]; int n, offset = 0; try { while ((n = dataIn.read(block)) != -1) { ByteBuffer.wrap(block, 0, n).asIntBuffer().get( arr, offset, n >> 2); offset += n >> 2; } } finally { dataIn.close(); } } 最後考慮到 readAllNIO 是使用 direct Buffer，或許 direct/non-direct Buffer 在 bulk get 的實作上(也許 native 層面)差別很大，於是我自己寫了 native method 來作 composing 部分。在 Windows 7 上以 sun 1.6.0_45 32-bits client VM 測試，由於處裡器是 Little-endian，所以要處理 byte order 問題，若不處理 byte order 則可直接使用 memcpy，那 performance 幾乎與 readAllBytes 無異。 Record: 235 ms Code: public static final void readAll3(int[] arr, InputStream in, int bufSize) throws IOException { InputStream dataIn = new BufferedInputStream(in, bufSize); byte[] block = new byte[bufSize]; int n, offset = 0; try { while ((n = dataIn.read(block)) != -1) { arraycopy(block, 0, arr, offset, n >> 2); offset += n >> 2; } } finally { dataIn.close(); } } public static final void arraycopy(byte[] src, int srcOffset, int[] dest, int destOffset, int numsInt) { if (srcOffset + numsInt * 4 > src.length) throw new IllegalArgumentException("source array is too short"); if (destOffset + numsInt > dest.length) throw new IllegalArgumentException("destination array is too short"); arraycopyImpl(src, srcOffset, dest, destOffset, numsInt); } private static native void arraycopyImpl(byte[] src, int srcOffset, int[] dest, int destOffset, int numsInt); /* native method implementation */ #include <memory.h> #include <winsock2.h> #include "cc_ptt_java_IOSpeed.h" class PrimitiveArray { public: PrimitiveArray(JNIEnv *env, jarray arr, jint releaseMode) : _env(env), _array(arr), _mode(releaseMode), _address(0) { jboolean isCopy = JNI_FALSE; _address = env->GetPrimitiveArrayCritical(_array, &isCopy); } ~PrimitiveArray() { _env->ReleasePrimitiveArrayCritical(_array, _address, _mode); } operator jbyte *() const { return reinterpret_cast<jbyte *>(_address); } private: JNIEnv *_env; jarray _array; void* _address; jint _mode; }; JNIEXPORT void JNICALL Java_cc_ptt_java_IOSpeed_arraycopyImpl(JNIEnv *env, jclass klass, jbyteArray srcArray, jint srcOffset, jintArray destArray, jint destOffset, jint numsInt) { PrimitiveArray source(env, srcArray, JNI_ABORT); PrimitiveArray dest(env, destArray, 0); u_long *srcP = reinterpret_cast<u_long *>( source + srcOffset * sizeof (jbyte)); u_long *end = srcP + numsInt; u_long *destP = reinterpret_cast<u_long *>( dest + destOffset * sizeof(jint)); while (srcP < end) *destP++ = ntohl(*srcP++); } 看到這裡 NIO 能帶來多大的助益，各位心裡各自有一把尺，我自己則是對我之前推文裡的看法「NIO 帶來效率數十倍的提升」持懷疑的心態。 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 218.164.110.25 ※ 編輯: sbrhsieh 來自: 218.164.110.25 (10/06 01:05)