How to control file type in Javascript without capturing extension?

4

Example:

<input type="file" name="meu arquivo">

I know there is a way to catch the name of the extension described, but my goal is to prevent the client from changing the file extension, for example, PDF to TXT and try submitting the file. I want it to load .ret, .data, .txt, and others. Does anyone know if such madness is possible?

    
asked by anonymous 16.07.2014 / 16:16

2 answers

5

TL; DR : There's no such thing right now.

Long answer : Yes, it is possible, but not with the current tools.

Most current MIME Type detection solutions are based on file extension - FileType.JS, for example, does just that.

To determine the correct content-type , you need to open the file and investigate by headers of known formats, also known as Magic Bytes ):

Executáveis                 Mnemônico       Assinatura (bytes)
DOS Executable              "MZ"            0x4D 0x5A
PE32 Executable             "MZ"...."PE.."  0x4D 0x5A ... 0x50 0x45 0x00 0x00
Mach-O Executable (32 bit)  "FEEDFACE"      0xFE 0xED 0xFA 0xCE
Mach-O Executable (64 bit)  "FEEDFACF"      0xFE 0xED 0xFA 0xCF
ELF Executable              ".ELF"          0x7F 0x45 0x4C 0x46

Protocolos de compressão    Mnemônico       Assinatura (bytes)
Zip Archives                "PK.."          0x50 0x4B 0x03 0x04
Rar Archives                "Rar!...."      0x52 0x61 0x72 0x21 0x1A 0x07 0x01 0x00
Ogg Container               "OggS"          0x4F 0x67 0x67 0x53
Matroska/EBML Container     N/A             0x45 0x1A 0xA3 0xDF

Image File Formats          Mnemônico       Assinatura (bytes)
PNG Image                   ".PNG...."      0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
BMP Image                   "BM"            0x42 0x4D
GIF Image                   "GIF87a"        0x47 0x49 0x46 0x38 0x37 0x61
GIF Image                   "GIF89a"        0x47 0x49 0x46 0x38 0x39 0x61

( More information via Wikipedia. )

One of the new possibilities offered by the HTML5 specification is the File API , being able to open files and perform look-ups by such magic bytes. You'll still have some problems:

    Some formats do not have magic bytes, so it's impossible to distinguish their content without some kind of parsing : for example, JavaScript versus HTML versus plain text files.

    li>
  • Some formats have repeated magic bytes.

16.07.2014 / 17:34
5

You can recover the MIME Type as follows:

var mimeType = document.getElementById('fileUploader').files[0].type;
if (mimeType == "text/plain") { ... }

Example: link

If you want to know if the MIME Type is a txt, zip, etc., you can use the FileTypes.JS .

var mimeType = document.getElementById('fileUploader').files[0].type
if (Stretchr.Filetypes.extensionFor(mimeType) == "txt") { ... }
    
16.07.2014 / 16:48