Reading files quickly
In Java, there are several classes for reading files, with and without buffering , random access, thread-safe, and memory mapping. Some of these are much faster than others.
FileInputStream
with byte reading
FileInputStream
opens a file by name or by the File
object. The method read()
reads byte after file byte.
FileInputStream
uses synchronization to make it thread safe.
FileInputStream f = new FileInputStream(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
checkSum += b;
}
FileInputStream
with byte array reading
The FileInputStream
performs an I / O operation on each read and it synchronizes on all method calls to make it thread safe. To reduce this overhead, you can read multiple bytes at a time in a byte buffer array.
FileInputStream f = new FileInputStream(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1)
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
}
BufferedInputStream
with byte reading
BufferedInputStream
deals with FileInputStream
doing buffer for you. It does the wrap of the stream entry, creates an internal byte array (usually 8 KB), and populates it to read. The read()
method takes each byte of the buffer .
BufferedInputStream
uses synchronization to be thread safe.
BufferedInputStream f = new BufferedInputStream(
new FileInputStream(name));
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
checkSum += b;
}
BufferedInputStream
with byte array reading
BufferedInputStream
synchronizes all methods when making thread-safe calls. To reduce synchronization and overhead of method calls, make fewer calls to the read()
method by reading multiple bytes at a time.
BufferedInputStream f = new BufferedInputStream(
new FileInputStream(name));
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
}
RandomAccessFile
with byte reading
RandomAccessFile
opens the file by name or object File
. He can read, write, or read and write by the position he chooses within the file. The read()
method reads the next byte of the current file position.
RandomAccessFile
is thread safe.
RandomAccessFile f = new RandomAccessFile(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
checkSum += b;
}
RandomAccessFile
with byte array reading
Like FileInputStream
, RandomAccessFile
faces the problem of performing an I / O operation on every access and synchronization on all method calls to be thread safe. To reduce this bottleneck, you can make fewer method calls by passing the bytes to an array and reading from the array.
RandomAccessFile f = new RandomAccessFile(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
}
FileChannel
with ByteBuffer
and search for bytes
FileInputStream
and RandomAccessFile
can return a FileChannel
for lower level operations with I / O. The read()
method of FileChannel
fills a ByteBuffer
created using the allocate()
method of the ByteBuffer
class. The get()
method of class ByteBuffer
retrieves the next byte of the buffer .
FileChannel
and ByteBuffer
are not thread safe.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
while (bb.hasRemaining()) {
checkSum += bb.get( );
}
bb.clear();
}
FileChannel
with ByteBuffer
and search for array of bytes
To reduce the bottleneck of one-byte method calls at a time, retrieve an array of bytes at a time. The array and ByteBuffer
can have different sizes.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
while(bb.hasRemaining()) {
nGet = Math.min(bb.remaining(), SIZE);
bb.get(barray, 0, nGet);
for (int i = 0; i < nGet; i++) {
checkSum += barray[i];
}
}
bb.clear( );
}
FileChannel
with array of ByteBuffer
and access to array of bytes
A ByteBuffer
created using the allocate()
method uses internal storage to save the bytes. Instead of using this strategy, call the wrap()
method to make a wrap of the ByteBuffer
wrapped in its own byte array. This allows the array to be accessed directly after each reading, reducing the bottleneck by method call and data copy.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
byte[] barray = new byte[SIZE];
ByteBuffer bb = ByteBuffer.wrap(barray);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
bb.clear();
}
FileChannel
with direct allocation of ByteBuffer
A ByteBuffer
created with the allocateDirect()
method can directly use storage in the JVM or machine operating system. This can reduce the copying of data to your application's array, avoiding some overhead.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
bb.position(0);
bb.limit(nRead);
while (bb.hasRemaining()) {
checkSum += bb.get( );
}
bb.clear();
}
FileChannel
with direct allocation of ByteBuffer
and search by array of bytes
Of course, you can recover byte arrays to reduce the overhead on method call. The size of buffer may be different from the size of the array.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
while(bb.hasRemaining()) {
nGet = Math.min(bb.remaining(), SIZE);
bb.get(barray, 0, nGet);
for (int i = 0; i < nGet; i++) {
checkSum += barray[i];
}
}
bb.clear();
}
FileChannel
with MappedByteBuffer
and retrieving with bytes
The method of class FileChannel
, map, can return a MappedByteBuffer
that stores in memory part or all of the file in memory space of the application. This allows more direct access to the file without an intermediate buffer. Call the get()
method of class MappedByteBuffer
to retrieve the next byte.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
0L, ch.size());
long checkSum = 0L;
while (mb.hasRemaining()) {
checkSum += mb.get();
}
FileChannel
with MappedByteBuffer
and bytes array reading
And retrieve byte arrays to decrease method overhead.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
0L, ch.size());
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nGet;
while (mb.hasRemaining()) {
nGet = Math.min(mb.remaining(), SIZE);
mb.get(barray, 0, nGet);
for (int i = 0; i < nGet; i++) {
checkSum += barray[i];
}
}
FileReader
and BufferedReader
Both classes read characters instead of bytes. For this reason they need to transform the bytes into characters, taking more time than any of the strategies shown above.
Faster
If we choose the fastest strategy, it would be one of these:
-
FileChannel
with MappedByteBuffer
and byte array reading.
-
FileChannel
with direct allocation of ByteBuffer
and search by array of bytes.