What is the best (faster) way to read a file from a web server?

9

I need to read a file on a web server, but when I need to store the content in an array of bytes it's taking too long. Does anyone know a faster way to do this? follow my code. Thank you in advance.

try {
        url = new URL(surl);

        urlConnection = (HttpURLConnection) url.openConnection();
        InputStream input = new BufferedInputStream(urlConnection.getInputStream());    
        int b = input.read();
        List<Byte> bytes = new LinkedList<Byte>();
        while (b != -1) {
            bytes.add((byte) b);
            b = in.read();
        }
        byte[] array = new byte[bytes.size()];


        //AQUI ESTÁ O PROBLEMA, ESTÁ DEMORANDO MUITO!
        for (int i = 0; i < bytes.size(); i++) {
            array[i] = bytes.get(i).byteValue();
        }


        String str = new String(array);
        myreturn = str;

    }
    
asked by anonymous 26.10.2015 / 15:15

2 answers

10

Reading files quickly

In Java, there are several classes for reading files, with and without buffering , random access, thread-safe, and memory mapping. Some of these are much faster than others.

FileInputStream with byte reading

FileInputStream opens a file by name or by the File object. The method read() reads byte after file byte.

FileInputStream uses synchronization to make it thread safe.

FileInputStream f = new FileInputStream(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
    checkSum += b;
}

FileInputStream with byte array reading

The FileInputStream performs an I / O operation on each read and it synchronizes on all method calls to make it thread safe. To reduce this overhead, you can read multiple bytes at a time in a byte buffer array.

FileInputStream f = new FileInputStream(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1)
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
 }

BufferedInputStream with byte reading

BufferedInputStream deals with FileInputStream doing buffer for you. It does the wrap of the stream entry, creates an internal byte array (usually 8 KB), and populates it to read. The read() method takes each byte of the buffer .

BufferedInputStream uses synchronization to be thread safe.

BufferedInputStream f = new BufferedInputStream(
    new FileInputStream(name));
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
    checkSum += b;
}

BufferedInputStream with byte array reading

BufferedInputStream synchronizes all methods when making thread-safe calls. To reduce synchronization and overhead of method calls, make fewer calls to the read() method by reading multiple bytes at a time.

BufferedInputStream f = new BufferedInputStream(
    new FileInputStream(name));
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
}

RandomAccessFile with byte reading

RandomAccessFile opens the file by name or object File . He can read, write, or read and write by the position he chooses within the file. The read() method reads the next byte of the current file position.

RandomAccessFile is thread safe.

RandomAccessFile f = new RandomAccessFile(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
    checkSum += b;
}

RandomAccessFile with byte array reading

Like FileInputStream , RandomAccessFile faces the problem of performing an I / O operation on every access and synchronization on all method calls to be thread safe. To reduce this bottleneck, you can make fewer method calls by passing the bytes to an array and reading from the array.

RandomAccessFile f = new RandomAccessFile(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
}

FileChannel with ByteBuffer and search for bytes

FileInputStream and RandomAccessFile can return a FileChannel for lower level operations with I / O. The read() method of FileChannel fills a ByteBuffer created using the allocate() method of the ByteBuffer class. The get() method of class ByteBuffer retrieves the next byte of the buffer .

FileChannel and ByteBuffer are not thread safe.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
    if (nRead == 0) {
        continue;
    }
    bb.position(0);
    bb.limit(nRead);
    while (bb.hasRemaining()) {
        checkSum += bb.get( );
     }
    bb.clear();
}

FileChannel with ByteBuffer and search for array of bytes

To reduce the bottleneck of one-byte method calls at a time, retrieve an array of bytes at a time. The array and ByteBuffer can have different sizes.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
    if (nRead == 0) {
        continue;
    }
    bb.position(0);
    bb.limit(nRead);
    while(bb.hasRemaining()) {
        nGet = Math.min(bb.remaining(), SIZE);
        bb.get(barray, 0, nGet);
        for (int i = 0; i < nGet; i++) {
            checkSum += barray[i];
        }
    }
    bb.clear( );
}

FileChannel with array of ByteBuffer and access to array of bytes

A ByteBuffer created using the allocate() method uses internal storage to save the bytes. Instead of using this strategy, call the wrap() method to make a wrap of the ByteBuffer wrapped in its own byte array. This allows the array to be accessed directly after each reading, reducing the bottleneck by method call and data copy.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
byte[] barray = new byte[SIZE];
ByteBuffer bb = ByteBuffer.wrap(barray);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
    bb.clear();
}

FileChannel with direct allocation of ByteBuffer

A ByteBuffer created with the allocateDirect() method can directly use storage in the JVM or machine operating system. This can reduce the copying of data to your application's array, avoiding some overhead.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
    bb.position(0);
    bb.limit(nRead);
    while (bb.hasRemaining()) {
        checkSum += bb.get( );
    }
    bb.clear();
}

FileChannel with direct allocation of ByteBuffer and search by array of bytes

Of course, you can recover byte arrays to reduce the overhead on method call. The size of buffer may be different from the size of the array.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
    if (nRead == 0) {
        continue;
    }
    bb.position(0);
    bb.limit(nRead);
    while(bb.hasRemaining()) {
        nGet = Math.min(bb.remaining(), SIZE);
        bb.get(barray, 0, nGet);
        for (int i = 0; i < nGet; i++) {
            checkSum += barray[i];
        }
    }
    bb.clear();
}

FileChannel with MappedByteBuffer and retrieving with bytes

The method of class FileChannel , map, can return a MappedByteBuffer that stores in memory part or all of the file in memory space of the application. This allows more direct access to the file without an intermediate buffer. Call the get() method of class MappedByteBuffer to retrieve the next byte.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
    0L, ch.size());
long checkSum = 0L;
while (mb.hasRemaining()) {
    checkSum += mb.get();
}

FileChannel with MappedByteBuffer and bytes array reading

And retrieve byte arrays to decrease method overhead.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
    0L, ch.size());
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nGet;
while (mb.hasRemaining()) {
    nGet = Math.min(mb.remaining(), SIZE);
    mb.get(barray, 0, nGet);
    for (int i = 0; i < nGet; i++) {
        checkSum += barray[i];
    }
}

FileReader and BufferedReader

Both classes read characters instead of bytes. For this reason they need to transform the bytes into characters, taking more time than any of the strategies shown above.

Faster

If we choose the fastest strategy, it would be one of these:

  • FileChannel with MappedByteBuffer and byte array reading.
  • FileChannel with direct allocation of ByteBuffer and search by array of bytes.
26.10.2015 / 16:15
4

TL; DR

The fastest way depends on the purpose of the program. If the idea is to load everything into memory, just use a more efficient method.

Reading file in String

The fastest way I know of loading a local file into a String in memory is as simple as that:

String conteudo = new String(Files.readAllBytes(Paths.get("meu.txt")));

However, this does not work for remote files accessed via HTTP.

Reading URL in String

In this case, the fastest method is to continue using InputStream and a better method to read the bytes.

As reported in other places , the most efficient method is by using the #

  •     InputStream input = new URL("http://www.textfiles.com/humor/mel.txt").openStream();
        String conteudo = new String(IOUtils.readFully(input, -1, true));
    

    Risks and alternatives

    Of course, using an internal implementation of a proprietary JDK is not always a good idea. The method may change or cease to exist in some future release.

    The good news is that it is easy to replace with an alternative. One of them is the Apache Commons IO library, whose method sun.misc.IOUtils.readFully() " also does the work in one step:

    String conteudo = IOUtils.toString(input, "UTF-8");
    

    The Google Guava library also does something similar in the IOUtils.toString() :

    String conteudo = new String(ByteStreams.toByteArray(input]));
    

    No additional code is required in Java 9, as the ByteStreama.toByteArray() class will be provided with new methods for mass byte copying.

    Considerations

    First, you do not have to use the fastest method exactly, as the performance bottleneck will certainly be the file download. So I would recommend using a library rather than the faster method that the internal library uses.

    Second, the current implementation is slow because it is making inefficient use of resources by reading everything in a list and copying everything back into an array and then all back in InputStream . They are at least 3 times more memory than necessary.

    Third, we often do not necessarily have to load the entire file into memory. If, after this routine, you record the content in a file, it would be more efficient to read and write at the same time. A very simple way is to use the String " of the Apache library.

    And be wary of IOUtils.copy s because several libraries have classes called import . Only in this example did we see two.

        
  • 30.10.2015 / 04:50