My sub query is adding 20 seconds to the execution time. How can I speed it up?

I have a table of sent SMS text messages which must join to a delivery receipt table to get the latest status of a message.

There are 997,148 sent text messages.

I am running this query:

SELECT
    m.id,
    m.user_id,
    m.api_key,
    m.to,
    m.message,
    m.sender_id,
    m.route,
    m.submission_reference,
    m.unique_submission_reference,
    m.reason_code,
    m.timestamp,
    d.id AS dlrid,
    d.dlr_status
FROM
    messages_sent m
LEFT JOIN
    delivery_receipts d
ON
    d.message_id = m.id
AND
    d.id = (SELECT MAX(id) FROM delivery_receipts WHERE message_id = m.id)

Which returns 997,148 results including the latest status of each message.

This takes 22.8688 seconds to execute.

Here is the SQL for messages_sent :

CREATE TABLE IF NOT EXISTS `messages_sent` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`api_key` varchar(40) NOT NULL,
`to` varchar(15) NOT NULL,
`message` text NOT NULL,
`type` enum('sms','mms') NOT NULL DEFAULT 'sms',
`sender_id` varchar(15) NOT NULL,
`route` tinyint(1) unsigned NOT NULL,
`supplier` tinyint(1) unsigned NOT NULL,
`submission_reference` varchar(40) NOT NULL,
`unique_submission_reference` varchar(40) NOT NULL,
`reason_code` tinyint(1) unsigned NOT NULL,
`reason` text NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `api_key` (`api_key`),
KEY `sender_id` (`sender_id`),
KEY `route` (`route`),
KEY `submission_reference` (`submission_reference`),
KEY `reason_code` (`reason_code`),
KEY `timestamp` (`timestamp`),
KEY `to` (`to`),
KEY `unique_submission_reference` (`unique_submission_reference`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000342 ;

And for delivery_receipts :

CREATE TABLE IF NOT EXISTS `delivery_receipts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`message_id` int(10) unsigned NOT NULL,
`dlr_id` bigint(20) unsigned NOT NULL,
`dlr_status` tinyint(2) unsigned NOT NULL,
`dlr_substatus` tinyint(2) unsigned NOT NULL,
`dlr_final` tinyint(1) unsigned NOT NULL,
`dlr_refid` varchar(40) NOT NULL,
`dlr_phone` varchar(12) NOT NULL,
`dlr_charge` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `message_id` (`message_id`),
KEY `dlr_status` (`dlr_status`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1468592 ;

Here is an EXPLAIN of the SQL:


There is a trick.

Instead with picking MAX element with subquery you join with interesting table twice like this:

SELECT
    m.id,
    m.user_id,
    m.api_key,
    m.to,
    m.message,
    m.sender_id,
    m.route,
    m.submission_reference,
    m.unique_submission_reference,
    m.reason_code,
    m.timestamp,
    d.id AS dlrid,
    d.dlr_status
FROM
    messages_sent m
JOIN
    delivery_receipts d
ON
    d.message_id = m.id
LEFT JOIN
    delivery_receipts d1
ON
    d1.message_id = m.id
    AND
    d1.id > d.id
WHERE
    d1.id IS NULL

The second time table is joined it has additional condition that field that you want to pick MAX of should be higher than in the first table. And filter out all rows except the ones that do not have other row that's higher.

This way only max rows remain.

I changed your LEFT JOIN to JOIN. I'm not sure if you need LEFT JOIN there. Even if you it should still work.

Amazingly this is much faster than subquery.

You might want to try out other variant of the same idea:

SELECT
    m.id,
    m.user_id,
    m.api_key,
    m.to,
    m.message,
    m.sender_id,
    m.route,
    m.submission_reference,
    m.unique_submission_reference,
    m.reason_code,
    m.timestamp,
    d.id AS dlrid,
    d.dlr_status
FROM
    messages_sent m
JOIN
(
SELECT d0.* FROM
    delivery_receipts d0
LEFT JOIN
    delivery_receipts d1
ON
    d1.message_id = d0.message_id
    AND
    d1.id > d0.id
WHERE
    d1.id IS NULL
) d
ON
    d.message_id = m.id

Make sure you have multicolumn index for fields message_id and id in table delivery_receipts maybe such:

ALTER TABLE  `delivery_receipts` 
ADD INDEX  `idx` (  `message_id` ,  `id` );

The slowdown seems large, but I'm afraid there is not much room for improvement if you need to stick with this query.

One problem is the reporting of d.dlr_status . Try to remove this from the list of reported columns and see if the query time improves.

You would get the best possible performance if everything was stored in messages_sent . This won't be NF anymore, but it's an option if you need performance. To achieve this, create id and dlr_status columns in messages_sent and add appropriate INSERT , UPDATE and DELETE triggers to delivery_receipts . The triggers would update the corresponding columns in messages_sent -- it's a trade-off between query time and update time.


You can "cache" part of the computation in the delivery_receipts table, just add is_last_status boolean to the delivery_receipts table. Using simple triggers you can change the value every insert of new receipt.

Than the select query becomes much simpler:

SELECT
  m.id,
  m.user_id,
  m.api_key,
  m.to,
  m.message,
  m.sender_id,
  m.route,
  m.submission_reference,
  m.unique_submission_reference,
  m.reason_code,
  m.timestamp,
  d.id AS dlrid,
  d.dlr_status
FROM
  messages_sent m
LEFT JOIN
  delivery_receipts d
ON
  d.message_id = m.id
WHERE
  d.is_last_status = true

If mysql would support partial indexes the query could be speed up even more.

链接地址: http://www.djcxy.com/p/55052.html

上一篇: 请求对象,有什么优点和缺点?

下一篇: 我的子查询添加20秒的执行时间。 我如何加快速度?