How do antivirus programs scan my program?

29

I had a lesson in college that left me "puzzled", my teacher was talking about the differences of interpreted languages and compiled languages and stressed that interpreted languages could have their code stolen, when in compiled it does not happen. There he opened a series of doubts, where the main one is:

If my code is compiled and you can not tell how it was written, how do the antivirus programs know that it can be dangerous?

    
asked by anonymous 19.10.2017 / 20:36

2 answers

34
  

... my teacher was talking about the differences of the interpreted languages and the compiled languages and stressed that the interpreted languages could have their code stolen, when in compiled it does not happen.

I will give the benefit of the doubt to your teacher, and suppose that statement came here in this way due to cordless phone.

If you have a program on your computer, you have the source code. No exceptions.

The build process generates executables or libraries (for example, .dll files in Windows), which are files that are referred to as "machine language" instead of human readable text. In fact, if you try to open these files, you will find that they are unreadable and do not match the source files. However, bring this information to life: there is no compiled source code that can not be decompiled.

Want an example? Use C # to generate an executable or .dll file . Then open the file with ILSpy .

There are some people who believe that you can make the code more "protected" if you use a technique called obfuscation, which "shuffles" decompiled code generated by tools like the one I mentioned above. But even obfuscation does not protect anyone against "theft," since a truly motivated and dedicated programmer can reassemble the original code anyway.

The only way to ensure that a source code will never be read is to not deliver it to anyone. Leave the code on a server and grant access to your system over the internet. Only those who have access to the server's hard drive will have access to your source code. It's not 100% safe, but it's as close to what you can get.

Relevant edit: Someone commented on this answer:

  

But if any program can be "recompiled" because we do not have the windows source code for example?

Look, child, we have. It's even quite hilarious. My favorite is the Windows 2000, which has several pearls written in the comments in the code . Good reading: (stuck because this code was leaked, not obtained through reverse engineering, and comments are not included in compiled code).

For example, and again speaking of ILSpy: Many things in Windows use .NET, which is currently shipped with the system. You can open ILSpy and use the File - > Open from Gac to see the source of the main libraries of the platform.

For other system libraries, you can try a C / C ++ decompiler like Snowman . But try to open only small DLL's, otherwise the system hangs (to open large DLL's, you need a plugin). Tip: In Windows 8, you can try decompiling this:

  

c: \ windows \ system32 \ AltTab.dll

On anti-virus, they do not care about your source code - they see the actions that your program performs, regardless of how it was written. Every program interacts with the operating system through requests, requests ... I.e .: Windows, tell me there what time it is; Linux, send the byte 00101000 to serial port 2; Solaris, write this in that memory address, etc.

Antivirus specifically looks for programs that do maracutaias of the type:

  • try to read browser program state;
  • if you pass user to perform operations that require action by a human being (such as pressing the OK buttons on Windows permission requests);
  • force actions for which there is no permission;
  • send data to known malicious web addresses;

Etc, etc ...

This involves identifying patterns and currently involves some artificial intelligence.

A sad realization is that from time to time I see someone asking here in SOpt how to do something that will clearly be looked on by antivirus as a malware action. For example: Simulate an "ok" via command line . Too often people do not think about the consequences that certain actions would bring to the security and privacy of people if they were possible.

See also how Windows recognizes an application as secure: Installer recognized as a virus .

    
19.10.2017 / 20:46
22

I have decided to answer because there seems to have been some doubts about Renan's response.

Make it clear that antivirus software does not need to worry about source code.

There are mainly two strategies for detecting a virus.

  • One of them is to look for a signature in the same executable code. It checks to see if it has a certain sequence of bytes that is known to be a virus.
  • Another is to look at whether there are calls to certain APIs or code patterns in a way that can be used to cause problems. This is why there are false positives in certain applications.

Certainly there are other strategies, you can even check something during execution or you can intercept certain API calls.

What Renan said is that everything an application does is available for consultation. All instructions that a processor will execute and everything the application invokes in the application is encoded in a binary. All those bytes have a meaning that can be understood by anyone who knows (the processor for example), is not something random or encrypted. It's just a little trickier for a human to understand.

  

If my code is compiled and you can not tell how it was written, how do the antivirus programs know that it can be dangerous?

You can find out how it is written (in binary form), you can not know the exact source code that originated this binary.

Decompile

If you have a program on your computer you do not have the source code, but you can get something close to the source code that generated that binary. You will not get something like this, you will lack comments, local symbol names, and maybe even modified public symbols, and the exact flow will not be the same, it will just create the same result.

Decompiling is especially possible in languages that use bytecodes and metadados . But when the code is obfuscated it becomes much more difficult to get usable results.

But it is not exactly an easy process and is far from producing good results in most cases.

The purpose of antivirus is not to get the source code, it's just to understand what the binary does.

Font protection

This idea of interpreted and compiled language is already wrong .

It is possible to steal code from any application.

This idea of stealing source code is said by naive and lay people. Good code is too complex for naive to understand and for specialists to be interested in stealing it. Rude codes would only be of interest to very weak people. Hint, the vast majority of written codes are very crude and do not serve as reference for anyone. In general, code writers want to protect themselves.

As a matter of curiosity the Windows code has been leaked and not reverse-engineered, so you have even the comments.

    
24.10.2017 / 15:04