Mysql slow query: JOIN + multiple WHERES + ORDER BY

long time lurker, first question!

I am struggling to optimize this query, which selects the lowest priced items that match the chosen filters:

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link
FROM product_info
NATURAL JOIN (SELECT * FROM product_all WHERE product_all.date = '2010-09-30') as product_all
WHERE (product_info.category = 2  
AND product_info.gender = 'W' )
GROUP BY product_all.prod_id
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13

Its explain:

| id | select_type | table        | type   | possible_keys                                             | key     | key_len | ref                 | rows   | Extra                           |  
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+  
|  1 | PRIMARY     | <derived2>   | ALL    | NULL                                                     | NULL    | NULL    | NULL                | 89801  | Using temporary; Using filesort | 
|  1 | PRIMARY     | product_info | eq_ref | PRIMARY,category_prod_id_retail_price,category_ret...     | PRIMARY | 4       | product_all.prod_id | 1      | Using where                     | 
|  2 | DERIVED     | product_all  | ref    | date_2                                                    | date_2  | 3       |                     | 144107 |                                 | 

I've tried eliminating the subquery, which intuitively seems better but in practice takes even longer:

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link
FROM product_info
NATURAL JOIN product_all
WHERE (product_all.date = '2010-09-30'
AND product_info.category = 2 
AND product_info.gender = 'W' )
GROUP BY product_all.prod_id
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13

And its explain:

| id | select_type | table        | type | possible_keys                                             | key                      | key_len | ref                               | rows | Extra                                        |  
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+  
|  1 | SIMPLE      | product_info | ref  | PRIMARY,category_prod_id_retail_price,category_ret...     | category_retail_price    | 5       | const                             | 269  | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | product_all  | ref  | PRIMARY,prod_id,date_2                                    | prod_id                  | 4       | equipster_db.product_info.prod_id | 141  | Using where                                  | 

Here are the tables:

CREATE TABLE `product_all` (
`prod_id` INT( 10 ) NOT NULL PRIMARY KEY ,
`ref_id` INT( 10) NOT NULL PRIMARY KEY ,
`date` DATE NOT NULL ,
`buy_link` BLOB NOT NULL ,
`sale_price` FLOAT NOT NULL
) ENGINE = MYISAM ;


CREATE TABLE `product_info` (
`prod_id` INT( 10 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`prod_name` VARCHAR( 200 ) NOT NULL,
`brand` VARCHAR( 50 ) NOT NULL,
`retail_price` FLOAT NOT NULL
`category` INT( 3 ) NOT NULL,
`gender` VARCHAR( 1 ) NOT NULL,
`type` VARCHAR( 10 ) NOT NULL
) ENGINE = MYISAM ;

My Questions:
-which query structure seems optimal?
-what indices would optimize this query?
-less importantly: how does the indexing approach change when adding or removing WHERE clauses or using a different ORDER BY, such as sorting by % off:

ORDER BY (1-(MIN(product_all.sale_price)/product_info.retail_price)) DESC  

edit: both queries' natural join acts on prod_id (one record in product_info can have multiple instances in product_all, which is why they need to be grouped)


Indices make a massive difference in mysql, one query that took 15 minutes with a wrong set of indices took .2 seconds with the right ones, but its finding the right balance that is generally the issue. Naturally without some sample data its really hard to say if the below solution will save you any time, but in theory it should.

To answer your questions, I would redesign the tables like so:

CREATE TABLE `product_all` ( 
`prod_id` INT( 10 ) NOT NULL, 
`ref_id` INT( 10) NOT NULL, 
`date` DATE NOT NULL , 
`buy_link` BLOB NOT NULL , 
`sale_price` FLOAT NOT NULL,
PRIMARY KEY (prod_id, ref_id) ,
INDEX date_Index (`date` ASC),
UNIQUE INDEX prod_price_Index (prod_id ASC, sale_price ASC)
) ENGINE = MYISAM ; 


CREATE TABLE `product_info` ( 
`prod_id` INT( 10 ) NOT NULL AUTO_INCREMENT, 
`prod_name` VARCHAR( 200 ) NOT NULL, 
`brand` VARCHAR( 50 ) NOT NULL, 
`retail_price` FLOAT NOT NULL, 
`category` INT( 3 ) NOT NULL, 
`gender` VARCHAR( 1 ) NOT NULL, 
`type` VARCHAR( 10 ) NOT NULL,
PRIMARY KEY (prod_id) ,
UNIQUE INDEX prod_id_name_Index (prod_id ASC, prod_name ASC),
INDEX category_Index (category ASC),
INDEX gender_Index (gender ASC)
) ENGINE = MYISAM ;

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link         
FROM product_info         
NATURAL JOIN (SELECT * FROM product_all WHERE product_all.date = '2010-09-30') as product_all         
WHERE (product_info.category = 2           
AND product_info.gender = 'W' )         
GROUP BY product_all.prod_id         
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13        

The performance gain here is gained my indexing the main fields that are being joined upon and are featured in the where clause. Personally I would go with your first query as when you think about it that should perform better.

As far as I understand whats happening in the first and second query:

  • The first query is being filtered by a sub-query prior to doing the natural join, that means its only joining in the resulting data and not the whole table.
  • The second query is joining the entire second table and then filtering the resulting rows of the whole lot back to what you want.
  • As a rule of thumb normally you want to add indices on your major joining fields and also the fields that you use the most in where clauses. I've also put some unique indices on some of the fields that you will want to query regularly, such as prod_id_name_Index.

    If this doesn't improve your performance if you could maybe post some dummy data to play with I might be able to get a faster solution that I can benchmark.

    Here is an article that goes through indexing for performance in mysql, worth a read if you want to know more.

    Good luck!

    EDIT: Your final question I missed the first time, the answer is that if your indexing the main joining fields then changes to the where will only impact the overall performance slightly, but the unique indices I've put on the tables should account for the majority of things you'll want to base queries upon. The main thing to remember is if you query or join upon a field frequently then it should really be indexed, but minor queries and changes to the order by you should just not worry about in terms of realigning your indexing strategy.


    Performance wise, its never a good thing to use

    select *
    

    You should use the individual column names instead.

    select column1,column2 etc...
    

    Personally I'm a sql minimalist and avoid any kind of sub queries or joins that can't be index to index columns.

    If that's not really possible I'll probably run the subqueries individually to collect my keys, sort them client site, and then build a where in (...) clause.

    JohnVD makes a lot of good points but if you're going to need to make a unique key including product_name you should really see if that can be normalized out into an it.

    Indexing varchar columns is something to steer away from at all costs if possible. Each index entry is as big as the maximum size of the column, even if they're usually only a fraction of that. And if you're using a charset like utf-8 then the size is ~ maxlen+3.

    With your limit it seems the order by is needed. But just as an FYI when you're doing a group by, if you're going to consume the whole result set then tack on an ORDER BY NULL. Run the two variants through explain to see why; the order by null eliminates an implied filesort and you can sort client side. (This isn't possible if you're doing group by with rollup though)

    链接地址: http://www.djcxy.com/p/48254.html

    上一篇: 从文件中读取long []的最快方法?

    下一篇: Mysql慢速查询:JOIN +多个WHERES + ORDER BY