Why is "SELECT * FROM table" bad?

55

It is often said that getting all the columns of a table through the SELECT * FROM tabela command is bad practice. Any recommendations without explanation are not helpful. So ...

If it is bad practice, there must be a way to use it properly. There must be a situation where it does not cause problems. If it was always bad, there probably would not even be any use.

What problems are caused by your use?

Do these problems always occur? Is there a way around? Or are there situations that make no difference?

Are the recommendations valid for any database?

    
asked by anonymous 16.06.2014 / 21:26

8 answers

61

It's not bad / bad to use wildcards (*). It depends on your goal.

  • I want to be a good programmer (probably your case)

    A good programmer knows that he or she will need all columns in a row (for example, in a "show all data" style list) a or only a few (in a subquerie , for example) and will ONLY select the ones you need.

    As a good programmer, you know which columns will be added to your tables, and you know that the only way to make your system scalable - working without changes - is to use a wildcard (*). A framework usually uses the DB metadata to find out which columns exist, or even have it in its settings. But this information is cached by someone, whether on a host running applications, PHP, or the DB host. This task is always with someone.

But in joins , where we join different tables, columns with equal names may appear (now and in the future), and the best way to do this is by specifying column-by-column, nicknames "for columns with the same name.

  • Short money

Well, you are either someone who is responsible for the budget of the projects, or is aware that you should minimize money spent on the DB. There are servers that limit traffic in terms of queries , and others that limit quantity of traffic data . In general, you will only be aware of (*) when you can not traffic many bits, or even when the requirement-gathering step (steps prior to programming) has established that there will be many records in the BD or there will be heavy network usage. / p>

Ah, if there is heavy network traffic, the number of queries becomes a problem, because you save a lot on busy data, but you do not have much to do with multiple / concurrent access. you already enter into theories CAP / ACID , but it is no longer part of the basic programming scope, and in general it does not use DBMSs and is used in SQL.)

The price of the Internet is decreasing, and prices are falling - from BD servers; so you will less and less worry about how your system will access the DB and more on how it presents itself to the user.

  • Well, I want my client (or boss / my company) in my hand (can not).

Well, in the old days, some programmers had the bad faith to ignore the programming principles and wanted to "keep the job / customer." They thought that making changes to the system difficult would be the way, and in that way, they would never use wildcard. Because that has changed a lot: what matters is the data that is in the DB, not the system that uses it.

If you study hard, you'll find it's easy to find out all about a DB-which you did not do-and then you'll know how the system should work. This is called reverse engineering. Also, seeing the system working, there is reengineering, which is to make one system emulate another.

That is, there is no more dependency on the guy who created the query without wildcard , and you will be fired or you will tarnish your reputation in the market with other clients

Summarizing : There is no basis in this note now (before in the era of the chipped bit, yes), and you will probably always use wildcard , except in joins , where columns with equal names may appear in different tables and handling this is more difficult.

I think the people who "denied" did not understand. Everything depends on the goal , and it can change. Today, I handed it to the client and he is 100% fit for what he asked for. But tomorrow may not be (customer calls for maintenance). If we always think of "a system prepared for the future" we will fill the solution of theorems, creating a large, complex and complex software that will give a negative note to you today. Robust software should be used for problems that require robustness, but not for cases where the system will most likely not have larger requirements.

    
29.06.2014 / 07:50
28

SELECT * FROM Tabela is considered a bad practice because the fields that should be designed to return the query are not specified.

Whenever you use * , it is necessary for the database to understand which fields exist in the table to assemble the correct query (a query in the table metadata). This generates a certain overhead for the database.

If you need to query values, you should theoretically know which values you want to return, this optimizes the query. The cost of returning 1, 2, or 15 fields is different, so it is good practice and performance is gained when the query return fields are entered.

Example:

Suppose you have a table with 10 rows and 10 columns. You must select in this table. Suppose:

SELECT * FROM Tabela10por10;

If each field in the table is 32 bytes in size, its return will be the size: 32 * 10 * 10 = 3200 bytes .

If what you need is just one field, then the best would be:

SELECT campo1 FROM Tabela10por10;

The result for this search is: 32 * 1 * 10 = 320 bytes. Only 10% of what returned in the non-optimized query.

To understand more :

To understand optimizations of this type, you need to understand the Relational algebra for database .

Edited

  

...

...

...

Previously, it had been assumed that reporting the fields you want to project in the query would prevent you from querying the metadata. @Trinidad corrected this misconception. It is stated then, according to @Trinidad, that informing the fields that you want to select / design does not prevent the query of the DBMS in metadata.

  

This generates a certain overhead for the database.

This overhead always occurs, since the DBMS metadata is always queried. According to @Trinidad.

Addition

The semantics

As a complement to the answer, I believe that one of the biggest problems with select * from Tabela is the semantic meaning of the query. When you spell:

SELECT 
  Nome 
FROM 
  Cliente;

Of course, what you want is the customer. Even in more complicated queries it is possible to understand the purpose of the query, for example:

SELECT 
  SUM(Total) AS TotalDaNota
FROM 
  ItemNotaFiscal;

SELECT * does not tell us anything about the purpose of the query (the semantics - the meaning), which makes it very difficult, especially, to maintain the system.

    
16.06.2014 / 21:53
23

"What problems are caused by your use?"

The whole problem is that you will be (lazy) not specifying which columns you want from your SELECT clause:

  • Quantity Inefficiency : You will be manipulating a volume of data you do not need. Think of the network transmission from the database to the application, or the data set generated in the application itself; you may be consuming more memory and more network bandwidth without realizing it.
  • Indexing Inefficiency : You may only need to use columns indexed by the database; in this case if only these columns were selected the operation would be less costly.
  • Column name confusion : Columns with the same name (very common in extended joins) may be involved in your SELECT . This can cause problems not only for some human by reading the result of your query as well as some system interpreting that result.
  • Can not explain the meaning of the query : It is clearer to someone else (or even to you later) what that query is doing. A simple example is: imagine that a column is to be removed from the database. The usual way is to pass query by query seeing which ones use such a column; in the case of these queries with asterisk not only check the query, but it is necessary to check all manipulation done with the selected result (verifying if such a column is even used).

"Or are there situations that make no difference?"

The only situation I've heard that says it makes no difference is in the case of:

SELECT COUNT(*) FROM ...

But I've also heard the opposite (which makes a difference). Maybe this behavior is different from SGDB to SGDB (it may vary even between versions of it).

Would not worry about small tests; in production I have always used SELECT COUNT(0) FROM ... as a precaution.

"Are the recommendations valid for any database?"

When we talk about the main SGDBs, certainly yes; it is rare to get away from it.

But nowadays there are a multitude of database systems; give a check when this type of doubt arises which database behavior you are using.

    
16.06.2014 / 22:05
11

Many have already given the most important reasons:

  • impairs performance because it returns more data than necessary, including non-indexable ones.
  • is less legible, it is not known that data is really relevant in return.
  • does not spell out the name of which column, which is important when joining. DBMSs generally know how to deal with this, but it is less clear.

There is, however, yet another very important reason:

  • The code that uses the search result can, in some scenarios, be coupled to the order of the columns in the database.

If you have the table below ...

CREATE TABLE pessoa (
    id       INTEGER PRIMARY KEY,
    nome     VARCHAR(100),
    endereco VARCHAR(200)
);

... and you use an API that returns the result sets as a numeric array, as below ...

ResultSet results = query.execute('SELECT * FROM pessoa');

for (Result result : results) {
    String name = result.get(1);
    String endereco = result.get(2);
    // ... mais código aqui
}

... your code will depend on whether the first column of the table is the id, the second is the name, and the third is the address. This may, however, change; for example,

ALTER TABLE pessoa ADD COLUMN sobrenome AFTER nome;

Ready: Your code above no longer works.

This may not be as important if the results work as dictionaries / maps - String nome = result.get('nome'); - but still the above scenario is important.

(Note that almost everything here is pseudocode, I do not guarantee the code will run :))

    
17.06.2014 / 19:45
5

When you use a SELECT clause, you can specify the list of columns or expressions:

select Coluna, Coluna, ..., Coluna from Tabela

Or * (asterisk) as selection list:

select * from Tabela

Avoid the use of asterisks (*) in SELECT statements, mainly due to performance reasons , since the search in all fields will be costly as the size of the table increases.

Use this during development / testing , to visualize part of the table contents for query development, using for example TOP to display only the desired number of records:

SELECT TOP (100) * FROM tabela
    
16.06.2014 / 21:49
5
  

What problems are caused by your use?

From the technical point of view there is a loss of performance because:

  • may require reading more information in mass storage
  • Can increase memory consumption and decrease chances of caching other things (yes, this can affect the performance of other queries
  • Increasing memory consumption may require that the selection of results be played into the mass storage temporarily, which is much more expensive. And it can inhibit the use of certain more performative algorithms when there is risk of not being all in memory
  • The transmission of so much data takes more time and can cost more if the network has measurement (go, put it in the cloud! :))

But there are rare cases that the opposite may occur and a query that could be reused is not because it takes fewer columns. So it's not an absolute truth.

In terms of performance, what is the difference between using * and listing all columns? I doubt it is nonzero, and if it has it's ridiculous, it's just because of the metadata query.

Real problem

But in fact the biggest problem is not technical, it's conceptual: it's not thinking about what you're doing . It's bad to use * because people do it out of laziness. If she knows what she's doing, what the implications are, if that's what she wants, it's okay for someone to use.

And the biggest problem is that people do not know all the implications of using it. Besides performance, it does not understand that this makes the application dependent on the database structure. If it changes the order of the columns in the table, or if the database does this differently for some reason, its application may no longer work (it is true that most applications are very bureaucratic and do not use mechanisms that would be affected by this). Of course it depends on the application. When you conceptualize wrong something can go wrong in the future.

It may sound like a jerk, but wanting all columns from the database is semantically very different from wanting each column, without exception . Use what you want. Of course, if the person fails to interpret the text, he can not decide correctly what to do. IF the person is not thorough, make mistakes.

Of course there are contexts to do in one way or another will not change anything. As any recommendation that looks like "good practice" needs context.

What I understand is that if the person does not know what they are doing by default they should not use the *, because it tends to cause more damage than using. But the people who most need to follow the safest standard by default are just the ones who will least opt for it.

So using the column names is as bad as an asterisk if you do not think about what you're doing.

  

Do these problems always occur? Is there a way around? Or are there situations that make no difference?

The technical issue even occurs virtually always, but in low volumes it is tiny and even if it is proportionally large, the whole is low and does not matter much.

The clear conceptual that always occurs, just does not mean that it will always bring real problems. There are lucky people. Or they work on other problems so much more serious that it becomes irrelevant.

  

Are the recommendations valid for any database?

The conceptual ones, yes, of course.

The technique is not necessarily, although it is highly likely.

I answered because I think something like that was missing and deep down I wanted to know. Today I have more confidence that this is the most correct.

    
24.11.2017 / 12:49
4

It really is better to use the names instead of *, but this does not mean that the command can not be used or that it is bad. The * is very useful when you make a query only to consult directly in the DBMS (Database Manager System).

Now if you are going to work with this data with a programming language it is much better to put the fields for performance and organization. (This does not mean you can not use a * (all)), because on very small systems it may be much easier to put the * than all the fields you want (Not that it is the best solution).

Now if you are in a company, or if you are doing a serious project it would be unfeasible to put all the fields since you will not use all of them.

O * is used to query DBMS directly, makes it easier for you to type and is very useful.

Now from the time you query on a system, a query that returns data faster is much better.

 select Host, User, Password from user

In this way queries will return much faster, as the fields are easier to see (this second option is more personal freshness, does not interfere with system performance, just in organizing the code). The only relevant reason for not using * is that not using the data returns much faster, improving system performance.

    
23.07.2015 / 16:26
0

They say that using SELECT * FROM tabela is bad because you usually do not need all columns and all rows in the table .

If the table has a few rows, using query like this has no onus, but look at this example: The guy has a table of more than 10,000 records, with 25 columns, makes SELECT * FROM tabela to load into a javascript component, which displays only 4 columns, and display on the screen. This is a mistake.

Why?

  • Because one person does not view 10,000 records at a time.
  • Because the user's browser will crash.
  • Because the cost of bringing 10,000 records from the server to the person's machine will slow the execution.

How do you get out of this impasse?

  • Include in the place of the asteristic the names of the fields you want.
  • Use the WHERE clause to filter and bring only the records that matter.
  • Create indexes for the columns used in WHERE clauses.
  • Paging the data using LIMIT, passing the start point and the amount of records that should be brought into the query.
01.06.2017 / 12:56