Query on two tables without INNER JOIN

Question

Query on two tables without INNER JOIN

Navigation

#1 by (4 votes)
#2 by (3 votes)
#3 by (2 votes)

10

I wonder, is there a difference in the performance of these two queries? in MySql

SELECT employee.*, company.name  FROM 
company,
employee
WHERE
employee.company_id = company.id AND
company.id = '365'

or

SELECT employee.*, company.name  FROM 
company 
INNER JOIN employee ON employee.company_id  = company.id
WHERE
company.id = '365'

mysql sql desempenho

asked by anonymous 26.03.2014 / 19:43

3 answers

3

It makes no difference. The only difference is that in the first query you have an implicit JOIN

26.03.2014 / 19:55

2

Apparently it does not make any difference when the searches are small, because for the user the final result is the same. However, INNER JOIN was developed not only to visually improve code syntax, but also to get a small gain in performance that varies from server to server, but usually is about 1%.

When using INNER JOIN , before the server runs SELECT , it organizes the statement into a plan, which is a set of optimized steps to efficiently collect data.

Not to mention that INNER JOIN is part of and recommended by the ANSI standard.

27.03.2014 / 01:02

How to develop programs for Android through Python? How to analyze performance impact of a code snippet in ADVPL?

score 4 · Accepted Answer

This answer may not be correct for this case, but I think it's worth sharing the knowledge.

To my knowledge and experience, using INNER JOIN is actually tying the search method and filter that the database will use between the tables. In a way it's like imposing on the DB: first get the "father" record and then the "son" record. If you are sure of the relationship, knowing the content of the data, and that this content is not dynamic enough to affect how you built your relationship, INNER JOIN will always perform well.

But everything is not perfect and it is common to see monstrous or "analytic" SQLs under data, relatively complex relationships, and table data that change over time (I explain below)

When you use the "non-ansi" pattern (without the JOIN clause), you leave it open for the engine optimizer to choose the data search mode and it does this based on the tables involved and especially in the statistics collected from the data. (in the case of MySQL, ANALYSE command). Depending on how join, filters, indices, and data sampling, the optimizer may choose to first fetch the child record and then the parent record ... something that INNER JOIN would not allow (not for technical reasons but because it is written in the ANSI rules, however I have doubts if MySQL respects these rules 100%).

Copying and translating (google translator) the ANALYZE command, MySQL manual :

MySQL uses the distribution of stored keys to decide the order in which tables should be joined when you perform a join in something other than a constant. In addition, the of the keys can be used when deciding which indexes to use for a specific table in a query.

Update Statistics

That's why it's common to see in any bank, when someone complains of slowness the question is: have you updated the statistics? This is true for MySQL, Oracle, SQL SERVER, Informix, Dessert, etc ...

Data Mutation ...

This example is what I see most common to happen and easy to exemplify. Let's suppose a table of purchase orders for a site, where the daughter table of purchased items has status fields (Delivered, Pending, Canceled).

The amount of ordering with children in Pending status is relative to what you filter.

If you have a database with a good history (thousands of records), naturally the volume of status in Pending will be low ... but if you look at the last 2 months, it will probably be very high. Depending on what you are looking for in your SQL, by relating the Parent and Child table, filtering the date range and the status engine you may find it more advantageous and quick to start fetching your child's records.

I'll try to "draw" the situation: Imagine:

Orders table, with a 5 year history and 100,000 order records.
Daughter table of articles, with 800,000 records (average of 8 articles per request). This table has the status column and an index about this status.
This means that: 800,000 articles / 5 years = 160,000 articles per year
160,000 / 12 months = 13,300 articles per month.

Assuming the company maintains a good quality of delivery and you will hardly see orders with more than 2 months with items in status Pending ...
(this means at most 27,000 records of articles )

You want to fetch all pending items ...
But you wrote SQL like this:
select * from pedido, artigos where pedido.pedido_id = artigos.pedido_id and artigos.status = 'P'
Due to data sampling statistics, there is a great chance that the bank will choose to start the filter by the table table because it will fetch only 27,000 records in the first step and then much less records in the .

But if you write SQL in ANSI mode or if for some reason the bank chooses to start the search for the Orders table, it will fetch the 100,000 records from the Order the 800,000 records in the articles table and then filter the status. This is because the filter is on the second level of the search ...

Of course a period filter would help, but there are cases where business logic would not allow what eventually forces the developer to write the above SQL.

Again ... I have doubts if this explanation applies 100% to MySQL.