This post has already been read 2639 times!

Parents without children

Given tables parent(id INT), child(id INT,parent_id INT), how do we find parents with no children? It's the All X for which there is no Y pattern, which can be written as an exclusion join...

SELECT parent.id
FROM parent
LEFT JOIN child ON parent.id = child.parent_id
WHERE child.parent_id IS NULL;

or with a NOT EXISTS subquery, which is logically equivalent to the exclusion join, but usually performs much slower:

SELECT parent.id AS ParentID
FROM parent
WHERE NOT EXISTS (
SELECT parent.id
FROM parent
JOIN child ON parent.ID = child.parent_id
);

Parties who have contracts with one another

You have a parties table that holds info on peoples' names etc, and a contracts table where each row has clientID and contractorID value pointing at a parties.partyID value--that is, each contracts row points at two parties rows. You want to list the names of all contractors and their clients.

SELECT clientpartyID,
pCli.name AS Client,
contractorpartyID,
pCon.name AS Contractor
FROM contracts
INNER JOIN parties AS pCli
ON contracts.clientpartyID = pCli.partyID
INNER JOIN parties AS pCon
ON contracts.contractorpartyID = pCon.partyID;

The unbearable slowness of IN()

You track orders and their items in orders and orderdetails tables, as in the NorthWind database. How many of your orders have been for multiple items? We can use the standard SQL IN() operator to answer the question:

SELECT orderID
FROM orders
WHERE orderID IN (
SELECT orderID
FROM orderdetails
GROUP BY orderID
HAVING COUNT(orderID) > 1
);

The [Not] Exists query pattern

Given a table employee( employeeID INT, mgr_employeeID INT, salary DECIMAL(10,2)), find the managers who earn less than one or more of their subordinates.

We can write this query directly from the logic of its spec...

SELECT DISTINCT employeeID
FROM employee AS e
WHERE EXISTS (
SELECT employeeID
FROM employee AS m
WHERE m.mgr_employeeID = e.employeeID AND e.salary > m.salary
);

What exams did a student not register for?

from the Artful Common Queries page


We have a students table, an exams table, and a registrations table in which each row registers one student for one exam. How do we find the exams for which a particular student is not registered?

DROP TABLE IF EXISTS students, exams, registrations;
CREATE TABLE students (
sid int(10) unsigned PRIMARY KEY auto_increment,
firstname varchar(45) NOT NULL default '',
lastname varchar(45) NOT NULL default ''
);
INSERT INTO students VALUES
(1, 'Jack', 'Malone'),(2, 'Hiro', 'Nakamura'),(3, 'Bree', 'Van de Kamp'),
(4, 'Susan', 'Mayer'),(5, 'Matt', 'Parkman'),(6, 'Claire', 'Bennet');

CREATE TABLE exams (
eid int(10) unsigned PRIMARY KEY auto_increment,
exam_name varchar(45) NOT NULL default '',
active smallint(5) unsigned NOT NULL default '0'
);
INSERT INTO exams VALUES
(1, 'Javascript Expert', 1),(2, 'Lost Survival Course', 0),(3, 'Zend PHP Certification', 1),
(4, 'Superhero Advanced Skills', 1),(5, 'Desperation Certificate', 1);

CREATE TABLE registrations (
registration_id int(11) PRIMARY KEY auto_increment,
eid int(10) unsigned NOT NULL default '0',
sid int(10) unsigned NOT NULL default '0',
registration_date datetime NOT NULL default '0000-00-00 00:00:00'
);
INSERT INTO registrations (registration_id, eid, sid, registration_date) VALUES
(1, 5, 14, '2007-10-25 00:00:00'),(2, 5, 3, '0000-00-00 00:00:00'),
(3, 5, 4, '2007-10-23 00:00:00'),(4, 4, 2, '2007-10-16 00:00:00'),
(5, 4, 5, '2007-10-22 00:00:00'),(6, 4, 6, '2007-10-23 00:00:00'),
(7, 5, 2, '2007-10-23 00:00:00');

This is another instance of the All X for which there is no Y pattern, easily solved with either a Not Exists subquery or an Exclusion join. Exclusion joins usually perform better, but here is a wrinkle (thanks to Pascal Mitride for this example): we might expect to join registrations to students to get student info into the result, yet the registrations table will be the object of the exclusion join, so how do we retrieve the required student info?

A solution is to left join exams to a subquery: students left join registrations, restricted by a WHERE clause specifying the target student. The subquery encapsulates conditions on the joined table. Then we can impose the IS NULL condition on the subquery:

SELECT e.exam_name FROM exams AS e       -- all exams
LEFT JOIN (
SELECT eid                             -- Hiro's exams
FROM students s LEFT JOIN registrations r ON s.sid=r.sid
WHERE s.firstname='Hiro' AND s.lastname='Nakamura'
) AS tmp ON e.eid=tmp.eid
WHERE tmp.eid IS NULL;                   -- exclusion
+------------------------+
| exam_name              |
+------------------------+
| Javascript Expert      |
| Lost Survival Course   |
| Zend PHP Certification |
+------------------------+

A query showing Hiro's registration or not for all exams proves our logic correct:

SELECT e.exam_name,IF(tmp.sid IS NULL, 'No', 'Yes') AS 'Hiro registered'
FROM exams e
LEFT JOIN (
SELECT eid
FROM students s LEFT JOIN registrations r ON s.sid=r.sid
WHERE s.firstname='Hiro' AND s.lastname='Nakamura'
) tmp ON e.eid=tmp.eid;
+---------------------------+-----------------+
| exam_name                 | Hiro registered |
+---------------------------+-----------------+
| Javascript Expert         | No              |
| Lost Survival Course      | No              |
| Zend PHP Certification    | No              |
| Superhero Advanced Skills | Yes             |
| Desperation Certificate   | Yes             |
+---------------------------+-----------------+

Beginners often have trouble working out how to write this kind of query, especially when the LEFT JOIN condition requires comparison with a literal value on the right side, thereby making the LEFT JOIN, effectively, an INNER JOIN and undermining the query design.

The method is to subtract one set (here, the exams Hiro registered for) from another (all exams). Start by writing a query expression for the set to be subtracted, then write the query expression to be subtracted from, then LEFT JOIN the two, then add the IS NULL condition:

1. Write a query expression to retrieve rows which positively match the exclusion condition. In simple cases, this may be a simple table reference. Here we need it to be an inner query for the exams Hiro Nakamura did register for...

SELECT eid
FROM students s LEFT JOIN registrations r ON s.sid=r.sid
WHERE s.firstname='Hiro' AND s.lastname='Nakamura'

2. Write a query to retrieve the rows from which the result of [1] is to be subtracted--here, simply all exams...

SELECT e.exam_name FROM exams e

3. Left join [2] to [1] on the appropriate key...

SELECT e.exam_name FROM exams AS e
LEFT JOIN (
SELECT eid
FROM students s LEFT JOIN registrations r ON s.sid=r.sid
WHERE s.firstname='Hiro' AND s.lastname='Nakamura'
) AS tmp ON e.eid=tmp.eid

4. Add a WHERE clause condition requiring that some column in the inner query evaluates to NULL.

WHERE tmp.eid IS NULL;

List NULLs at end of query output

If ordering by col...

... ORDER BY IF(col IS NULL, 0, 1 ), col ...

Parents with and without children

You have parties and contracts tables. Every contracts row has a contractorpartyID value which references a row in parties, and a clientpartyID value which also references a row in parties. How to list all parties and their contracts, showing blanks as empty strings rather than NULLs?

SELECT parties.partyID,
IFNULL(contractorpartyID,'') AS contractor,
IFNULL(clientpartyID,'') AS client
FROM parties
LEFT JOIN contractor_client ON partyID=contractorpartyID
ORDER BY partyID;
+---------+------------+--------+
| partyID | contractor | client |
+---------+------------+--------+
|       1 |            |        |
|       2 | 2          | 1      |
|       3 |            |        |
+---------+------------+--------+

 

Emulate Row_Number()

from the Artful Common Queries page


ISO SQL defines a ROW_NUMBER() OVER function with an optional PARTITION clause for generating a derived row number column in a resultset. Several RDBMSs—including DB2, Oracle and SQL Server—implement it. Here is the simplest possible example. Given a table with two columns i and j, generate a resultset that has a derived sequential row_number column taking the values 1,2,3,... for a defined ordering of j which resets to 1 when the value of i changes:

DROP TABLE IF EXISTS test;
CREATE TABLE test(i int,j int);
INSERT INTO test
VALUES (3,31),(1,11),(4,14),(1,13),(2,21),(1,12),(2,22),(3,32),(2,23),(3,33);

The result must look like this:

+------+------+------------+
| i    | j    | row_number |
+------+------+------------+
|    1 |   11 |          1 |
|    1 |   12 |          2 |
|    1 |   13 |          3 |
|    2 |   21 |          1 |
|    2 |   22 |          2 |
|    2 |   23 |          3 |
|    3 |   31 |          1 |
|    3 |   32 |          2 |
|    3 |   33 |          3 |
|    4 |   14 |          1 |
+------+------+------------+

The SQL Server query for this is:

SELECT
i, j,
ROW_NUMBER() OVER ( PARTITION BY i ORDER BY j ) AS row_number
FROM test;

Can we have this result in MySQL? Here are two ways to get it. First, with user variables:

SET @iprev=0, @jprev=0;
SELECT i, j, row_number
FROM (
SELECT j, @jprev := if(@iprev = i, @jprev+1, 1) AS row_number, @iprev := i AS i
FROM test
ORDER BY i,j
) AS tmp;

The second method uses a join and aggregation:

SELECT a.i, a.j, count(*) as row_number
FROM test a
JOIN test b ON a.i=b.i AND a.j >= b.j
GROUP BY a.i, a.j ;

To simplify generalising these query patterns to multiple ordering columns, use self-documenting column names:

DROP TABLE IF EXISTS test;
CREATE TABLE test(partition int, ord1 int, ord2 int);
INSERT INTO test
VALUES (3,31,55),(1,11,19),(4,14,23),(2,22,8),(2,22,42),(1,13,56),
(2,21,77),(2,21,7),(1,12,17),(2,23,92),(3,32,24),(3,33,62);

The user variable solution easily handles multiple ordering columns: just add the ordering column names to the outer query's SELECT list and the inner query's ORDER BY list:

SET @partitionPrev=0, @ordPrev=0;
SELECT partition, ord1, ord2, row_number
FROM (
SELECT
ord1, ord2,
@ordPrev := If(@partitionPrev = partition, @ordPrev+1, 1) as row_number,
@partitionPrev := partition AS partition
FROM test
ORDER BY partition, ord1, ord2
) AS tmp;
+-----------+------+------+------------+
| partition | ord1 | ord2 | row_number |
+-----------+------+------+------------+
|         1 |   11 |   19 |          1 |
|         1 |   12 |   17 |          2 |
|         1 |   13 |   56 |          3 |
|         2 |   21 |    7 |          1 |
|         2 |   21 |   77 |          2 |
|         2 |   22 |    8 |          3 |
|         2 |   22 |   42 |          4 |
|         2 |   23 |   92 |          5 |
|         3 |   31 |   55 |          1 |
|         3 |   32 |   24 |          2 |
|         3 |   33 |   62 |          3 |
|         4 |   14 |   23 |          1 |
+-----------+------+------+------------+

In the aggregating solution for multiple columns, the theta join gets more complex as ordering columns are added:

SELECT a.partition, a.ord1, a.ord2, count(*) as row_number
FROM test a
JOIN test b
ON a.partition=b.partition
AND (a.ord1>b.ord1 OR (a.ord1=b.ord1 AND a.ord2>=b.ord2))
GROUP BY a.partition, a.ord1, a.ord2 ;

Next row

You have a table of names, you have retrieved a row with name $name, and you want the row for the next name in name order. MySQL LIMIT syntax makes this very easy:

SELECT *
FROM tbl
WHERE name > $name
ORDER BY name
LIMIT 1

Order by leading digits

To have column values 1abc,10abc,8abc appear in the expected order 1abc,8abc,10abc, take advantage of a trick built into MySQL string parsing ...

SELECT '3xyz'+0;
+----------+
| '3xyz'+0 |
+----------+
|        3 |
+----------+

to write ...

SELECT ...
...
ORDER BY colname+0, colname;

Order by month name

The MySQL FIELD(str,str1,str2,...,strN) function returns 1 if str=str1, 2 if str=str2, etc., so ...

SELECT .
ORDER BY FIELD(month,'JAN','FEB','MAR',...,'NOV','DEC') .

will order query output from a legacy table in month-number order.

Order by numerics then alphas

Given a column type with values 1,2,3,a,b,c, how to get the order 3,2,1,c,b,a?

ORDER BY type RLIKE '^[0-9]+$' DESC, `type` DESC 

Suppress repeating ordering values

from the Artful Common Queries page


You have tables tracking authors and their books, for example:

CREATE TABLE author (
id int(4) NOT NULL auto_increment PRIMARY KEY,
name text NOT NULL
);
INSERT INTO author (id, name)
VALUES (1,'Brad Phillips'),(2,'Don Charles'),(3,'Kur Silver');
CREATE TABLE book (
id int(4) NOT NULL auto_increment PRIMARY KEY,
name text NOT NULL
);
INSERT INTO book (id, name)
VALUES (1,'MySQL in a bucket '),(2,'Databases for Delinquents'),
(3,'Design Patterns'),(4,'PHP Professional'),(5,'Java Script Programming');
CREATE TABLE book_author (
book_id int(4) NOT NULL default '0',
author_id int(4) NOT NULL default '0'
);
INSERT INTO book_author (book_id, author_id)
VALUES (1,1), (1,2), (2,3), (4,1), (3,1), (5,2);

You want to list authors' books while suppressing repeating authors' names. A simple solution is to use MySQL's extremely useful GROUP_CONCAT() function to group books by author:

SELECT
a.name AS Author,
GROUP_CONCAT(b.name ORDER BY b.name) AS Books
FROM book_author AS ba
JOIN book AS b ON ba.book_id=b.id
JOIN author AS a ON ba.author_id=a.id
GROUP BY a.name;

For a neater-looking result:
1. Retrieve authors and their books.
2. Order them
3. Use a variable to remember and suppress repeating author names:

SET @last='';
SELECT
IF(r.author=@last,'',@last:=r.author) AS Author,
r.book AS Book
FROM (
SELECT DISTINCT a.name AS author,b.name AS book
FROM book_author AS ba
JOIN book AS b ON ba.book_id=b.id
JOIN author AS a ON ba.author_id=a.id
ORDER BY a.name,b.name
) AS r;
+---------------+---------------------------+
| author        | book                      |
+---------------+---------------------------+
| Brad Phillips | Design Patterns           |
|               | MySQL in a bucket         |
|               | PHP Professional          |
| Don Charles   | Java Script Programming   |
|               | MySQL in a bucket         |
| Kur Silver    | Databases for Delinquents |
+---------------+---------------------------+

Pagination

Suppose you have a phone book of names, addresses, etc. You are displaying 20 rows per page, you're on page 100, and you want to display page 99. How do you do this knowing only what page you are on?Assuming...

  • 1-based page numbers
  • you are on page P
  • each page shows N rows

then the general formula for translating a 1-based page number into a first LIMIT argument is ...

MAX(0,P-1) * N

which for the 99th 20-row page evaluates to 1960, and the second argument to LIMIT is just N, so to see page 99, write...

SELECT ... LIMIT (1960, N);

The trouble with this is scaling. MySQL doesn't optimise LIMIT at all well. SELECT ... LIMIT 1000000,20LIMIT takes.

What's the alternative? Build pagination logic into the WHERE clause, and ensure sure there is a covering index for the paginating column. On a table of 100,000 indexed random integers, SELECT ... WHERE ... for the last 20 integers in the table is twice as fast as the comparable LIMIT query. With a million integers, it's more than 500 times faster!

Suppose your interface calls for showing only 20 rows per page on a given order. Retrieve the twenty rows, plus the row just before the set if it exists, plus the next row after those twenty if it exists. When the user clicks the Previous Page button, adjust the WHERE clause to specify rows where the key value is <= the row just before the current set and ORDER BY the current index DESC LIMIT 20; likewise when the user clicks the Next Page button, have the WHERE clause specify key values >= that of the row just after the set, and ORDER BY the current index ASC LIMIT 20.

All possible recipes with given ingredients

from the Artful Common Queries page


We have tables for recipes (r), ingredients required for recipes (ri), and ingredients now available in the pantry (p). In table p there may be many rows for a given recipe, each specifying one ingredient.

drop table if exists r,p,ri;
create table r(id int);
insert into r values(1),(2),(3);
create table p(id int);
insert into p values(1),(2),(3);
create table ri(rid int,pid int);
insert into ri values (1,1),(1,2),(2,1),(2,4),(3,5),(3,6),(3,7);
select id as recipes from r;
+---------+
| recipes |
+---------+
|       1 |
|       2 |
|       3 |
+---------+
select id as 'available ingredients' from p;
+-----------------------+
| available ingredients |
+-----------------------+
|                     1 |
|                     2 |
|                     3 |
+-----------------------+
select rid as recipe, pid as ingredient from ri;
+--------+------------+
| recipe | ingredient |
+--------+------------+
|      1 |          1 |
|      1 |          2 |
|      2 |          1 |
|      2 |          4 |
|      3 |          5 |
|      3 |          6 |
|      3 |          7 |
+--------+------------+

Given our ingredients, what recipes can we make? Inspection shows the answer is recipe #1.

SQL has no universal quantifier, so how do we proceed? 'All A is B' is logically equivalent to the double negative 'there is no A that is not B', so we can reformulate the requirement ...

list the recipes for which we have all ingredients

into terms SQL can handle ...

list the recipes for which there is no ingredient we do not have

A double negative, so a double query. One inner query, one outer. Tackle the inner one first: find the recipes for which we are missing an ingredient.

That's a straight exclusion join, i.e., a left join on ingredient from 'required' to 'available', plus a where clause that restricts the resultset to nulls on the right ('available') side of the join:

SELECT DISTINCT rid AS 'Recipes for which the pantry is missing some ingredients'
FROM ri
LEFT JOIN p ON ri.pid=p.id
WHERE p.id IS NULL;
+----------------------------------------------------------+
| Recipes for which the pantry is missing some ingredients |
+----------------------------------------------------------+
|                                                        2 |
|                                                        3 |
+----------------------------------------------------------+

Our outer query has to find the recipes which are not in this list. That's another exclusion join, this time from recipes to the above derived table:

SELECT r.id
FROM r
LEFT JOIN (
SELECT DISTINCT rid
FROM ri
LEFT JOIN p ON ri.pid=p.id
WHERE p.id IS NULL
) AS rno ON r.id = rno.rid
WHERE rno.rid IS NULL;
+------+
| id   |
+------+
|    1 |
+------+

It's an example of relational division, one of Codd's eight basic relational operations. Dividing a divisor table into a dividend table yields a quotient or results table:

dividend ÷ divisor = quotient

As in arithmetic, multiplication reverses it:

divisor * quotient = dividend

+-----------+
+-----+     +------+     | table AxB |
|  A  |     |  B   |     +-----+-----+
+-----+     +------+     |key_a|key_b|
|key_a|     |key_b |     +-----+-----+
+-----+     +------+     |  2  |  1  |
|  2  |     |  1   |     |  2  |  7  |
|  4  |     |  7   |     |  2  |  3  |
+-----+     |  3   |     |  4  |  1  |
+------+     |  4  |  7  |
|  4  |  3  |
+-----+-----+

When we multiply (CROSS JOIN) tables A and B to yield AxB, AxB gets a row combining every row of A with every row of B, and all the columns from A and B. When we reverse that operation, dividing AxB by B, we get back A by listing distinct B values associated with A values in AxB.

All X for which all Y are Z (relational division)

from the Artful Common Queries page


You have an election database with tables listing political parties, election districts, and candidates running for parties in those districts. You want to know which parties have candidates running in all districts. Under Aggregates we show a GROUP BY solution (here).

If there are reasons not to aggregate, relational division can solve the problem. The basic idea in relational division is that, aside from aggregation, SQL has no direct way to express "all Xs for which all Y are Z", but does have a NOT EXISTS operator, so we can express "all Xs for which all Y are Z" in SQL as a double negative: "all Xs for which no Y is not Z". Once you think of formulating the question this way, the query almost writes itself:

SELECT DISTINCT party FROM parties
WHERE NOT EXISTS (
SELECT * FROM districts
WHERE NOT EXISTS (
SELECT * FROM candidates
WHERE candidates.party=parties.party AND candidates.district=districts.district
)
);

Why is it called relational division? See the All possible recipes with given ingredients entry. Here the dividend is candidates, the divisor is districts and the quotient is a party count.

Most NOT EXISTS() queries can be translated into exclusion joins, which are often much faster. An exclusion join from A to B excludes A rows for which the LEFT JOIN condition finds NULLs in B. The query we are translating has two NOT EXISTS clauses, so we need two exclusion joins:

SELECT p.party
FROM parties p
LEFT JOIN (
SELECT a.party
FROM (
SELECT DISTINCT party,district
FROM parties CROSS JOIN districts
) a
LEFT JOIN candidates c ON a.party=c.party AND a.district=c.district
WHERE c.party IS NULL
) b ON p.party=b.party
WHERE b.party IS NULL;

Like numeric division, relational division has a gotcha: divide by zero. If the divisor table has zero rows, the quotient counts all distinct dividend instances. If that is not what you want, use aggregation.

Most "all Xs for which all Y are Z" queries can be written in any of these three ways. Try each one to see which performs best for your problem.

Who makes all the parts for a given assembly?

One way to arrive at the answer is by asking: What are the assembly-supplier pairs such that no part of the assembly is not made by the supplier? That's relational division again, formulated for two tables by Stephen Todd. Given assemblyparts(assembly,part) and partsuppliers(part,supplier) tables, here is a query that Joe Celko credits to Pierre Mullin.

SELECT DISTINCT
AP1.assembly,
SP1.supplier
FROM AssemblyParts AS AP1, PartSuppliers AS SP1
WHERE NOT EXISTS (
SELECT *
FROM AssemblyParts AS AP2
WHERE AP2.assembly = AP1.assembly
AND NOT EXISTS (
SELECT SP2.part
FROM PartSuppliers AS SP2
WHERE SP2.part = AP2.part AND SP2.supplier = SP1.supplier
)
);

Find adjacent unbooked theatre seats

from the Artful Common Queries page


A theatre booking service is often asked to book adjecent seats:

drop table if exists seats;
create table seats(row char(1),seat int,booked tinyint);
insert into seats values
('i',1,0),('i',2,0),('i',3,0),('i',4,0),('i',5,0),('i',6,0),('i',7,0),('i',8,0),
('i',9,0),('i',10,0),('i',11,0),('i',12,0),('i',13,0),('i',14,0),('i',15,0),
('j',1,1),('j',2,0),('j',3,1),('j',4,0),('j',5,0),('j',6,0),('j',7,1),('j',8,0),
('j',9,0),('j',10,0),('j',11,0),('j',12,0),('j',13,1),('j',14,0),('j',15,0);

The simplest method is a self-join:

-- two adjacent seats
select a.row,a.seat,b.seat
from seats a
join seats b on a.row=b.row and a.seat=b.seat-1
where a.booked=0 and b.booked=0
limit 1;
+------+------+------+
| row  | seat | seat |
+------+------+------+
| i    |    1 |    2 |
+------+------+------+

-- three adjacent seats
select a.row,a.seat,b.seat,c.seat
from seats a
join seats b on a.row=b.row and a.seat=b.seat-1
join seats c on a.row=c.row and b.seat=c.seat-1
where a.booked=0 and b.booked=0 and c.booked=0
limit 1;
+------+------+------+------+
| row  | seat | seat | seat |
+------+------+------+------+
| i    |    1 |    2 |    3 |
+------+------+------+------+

But that approach requires a different query for each different number of adjacent seats. The Find blocks of unused numbers query pattern can be used to find all empty seats. Optionally you can add a HAVING clause specifying the number of adjacent seats required:

SELECT e.row, firstUnused, IF(mincseat IS NULL, IFNULL(dseat,firstUnused),mincseat-1) AS lastUnused
FROM (
SELECT first.row, first.seat AS firstUnused, MIN(c.seat) AS mincseat, MAX(d.seat) AS dseat
FROM (
SELECT a.row, a.seat
FROM seats a
LEFT JOIN seats b ON a.row=b.row AND a.seat=b.seat + 1
WHERE a.booked=0 AND (b.seat IS NULL OR b.booked=1)
) AS first
LEFT JOIN seats c ON first.seat < c.seat AND c.booked=1 AND c.row = first.row
LEFT JOIN seats d ON first.seat < d.seat AND d.booked=0 AND d.row = first.row
GROUP BY firstUnused
) AS e
HAVING lastUnused-firstUnused>=N;   -- N=no.of required adjacent seats

Find blocks of unused numbers

from the Artful Common Queries page


In a table of sequential IDs with no missing values, some are used and some are not. Find the blocks of unused IDs, if any:

DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl(id INT,used BOOL);
INSERT INTO tbl VALUES (1,1),(2,1),(3,0),(4,1),(5,0),(6,1),(7,1),(8,1),
(9,0),(10,0),(11,1),(12,1),(13,0),(14,0),(15,0);
SELECT * FROM tbl;
+------+------+
| id   | used |
+------+------+
|    1 |    1 |
|    2 |    1 |
|    3 |    0 |
|    4 |    1 |
|    5 |    0 |
|    6 |    1 |
|    7 |    1 |
|    8 |    1 |
|    9 |    0 |
|   10 |    0 |
|   11 |    1 |
|   12 |    1 |
|   13 |    0 |
|   14 |    0 |
|   15 |    0 |
+------+------+

The first ID in any unused sequence has used=0 and either no immediate predecessor, or an immediate predecessor where used=1. The last ID of any unused sequence either has no successor or the successor has used=1. So:

1. Find the first first ID of every unused sequence by left joining each row with used=0 to the immediate predecessor row, conditioning the result on the predecessor row not existing or having used=1.

2. As a basis for finding the last ID of every unused sequence that is followed by a row with used=1, left join first unused rows to rows with larger IDs and used=1.

3. As a basis for finding the last ID of an unused sequence which is also the largest ID in the table, left join first unused rows to rows with larger IDs and used=0.

4. For each first unused ID, the last unused ID in its sequence is one less than the smallest used ID greater than the first ID if it exists, otherwise it is the maximum unused ID greater than the first ID.

SELECT firstUnused, IF(mincid IS NULL, IFNULL(did,firstUnused),mincid-1) AS lastUnused
FROM (
SELECT first.id AS firstUnused, MIN(c.id) AS mincid, MAX(d.id) AS did
FROM (
SELECT a.id
FROM tbl a
LEFT JOIN tbl b ON a.id=b.id + 1
WHERE a.used=0 AND (b.id IS NULL OR b.used=1)
) AS first
LEFT JOIN tbl c ON first.id<c.id AND c.used=1
LEFT JOIN tbl d ON first.id<d.id AND d.used=0
GROUP BY firstUnused
) AS e;
+-------------+------------+
| firstUnused | lastUnused |
+-------------+------------+
|           3 |          3 |
|           5 |          5 |
|           9 |         10 |
|          13 |         15 |
+-------------+------------+

Thanks to Don Armstrong for finding a case where our previous algorithm failed.

Find missing numbers in a sequence

from the Artful Common Queries page


You have a table tbl(id int) with values (1,2,4,18,19,20,21), and you wish to find the first missing number in its sequence of id values:

SELECT t1.id+1 AS Missing
FROM tbl AS t1
LEFT JOIN tbl AS t2 ON t1.id+1 = t2.id
WHERE t2.id IS NULL
ORDER BY id LIMIT 1;
+---------+
| Missing |
+---------+
|       3 |
+---------+

For all the gaps, including gaps of more than 1 value, you need something a little more baroque...

SELECT
a.id+1 AS 'Missing From',
MIN(b.id) - 1 AS 'To'
FROM tbl AS a, tbl AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING `Missing From` < MIN(b.id);
+--------------+------+
| Missing From | To   |
+--------------+------+
|            3 |    3 |
|            5 |   17 |
+--------------+------+

We often need such lists, so the query is a natural for a stored procedure that finds missing sequence values in any table:

DROP PROCEDURE IF EXISTS MissingInSeq;
DELIMITER |
CREATE PROCEDURE MissingInSeq( db VARCHAR(64), tbl VARCHAR(64), col VARCHAR(64) )
BEGIN
SET @sql = CONCAT( "SELECT  a.", col,
"+1 AS 'Missing From',",
"MIN(b.",
col,
") - 1 AS 'To' FROM ",
db,
".",
tbl,
" AS a,",
db,
".",
tbl,
" AS b WHERE a.",
col,
" < b.",
col,
" GROUP BY a.",
col,
" HAVING a.",
col,
" < MIN(b.",
col,
") - 1"
);
-- SELECT @sql;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
END;
|
DELIMITER ;

Find missing values in a range

You have a table named tbl with an integer primary key named id, and you need to know what key values between 0 and 999999 are missing in the table.

The simplest solution is an exclusion join from a virtual table of sequential numbers to the target table:

create or replace view v as
select 0 i union select 1 union select 2 union select 3 union select 4
union select 5 union select 6 union select 7 union select 8 union select 9;

select x.i
from (
select a.i*100000 + b.i*10000 + c.i*1000 + d.i*100 + e.i*10 + f.i as i
from v a
join v b
join v c
join v d
join v e
join v f
) x
left join tbl on x.i=tbl.id
where tbl.id is null
order by x.i;

Find previous and next values in a sequence

from the Artful Common Queries page


Given a table t(ID int, ...), how would we display each ID and its next highest value? A simple method uses aggregation on a non-equi-self-join to find the smallest value of all greater values for each value:

SELECT id, MIN(b.id) AS Next
FROM t AS a
JOIN t AS b ON b.id > a.id
GROUP BY  a.id;

The same logic can retrieve next lowest values.

Suppose we wish to track daily changes in closing price:

drop table if exists t;
create table t (date date, closing_price decimal(8,2) );
insert into t values
('2009-01-02', 5),('2009-01-03', 5.5),('2009-01-04', 4),('2009-01-05', 6);

We can use the above non-equi-self-join to compute the ratios:

SELECT
seq.Next AS date, t.closing_price/seq.closing_price AS ChangeRatio
FROM t
JOIN (
SELECT a.date, a.closing_price, MIN(b.date) AS Next
FROM t AS a
JOIN t AS b ON b.date > a.date
GROUP BY  a.date
) seq ON t.date=seq.next;
+------------+-------------+
| date       | ChangeRatio |
+------------+-------------+
| 2009-01-03 |    1.100000 |
| 2009-01-04 |    0.727273 |
| 2009-01-05 |    1.500000 |
+------------+-------------+

Here is another algorithm, by Baron Schwartz (xaprb.com), for retrieving the previous and next column values in a sequence, given a particular column value thisvalue. The previous value is the maximum value less than thisvalue, and the next value is the minimum value greater than thisvalue:

SELECT
IF(col > thisvalue,'next','prev') AS Direction,
IF(col > thisvalue,MIN(col),MAX(col)) AS 'Prev/Next'
FROM tablename
WHERE col <> thisvalue
GROUP BY SIGN(col - thisvalue);

So, to find the previous and next order ids in the Northwind database table orders (nwib.orders), starting from order number 10800:

SELECT
IF(orderid > 10800,'next','prev') AS Direction,
IF(orderid > 10800,MIN(orderid),MAX(orderid)) AS 'Prev/Next'
FROM nwib.orders
WHERE orderid <> 10800
GROUP BY SIGN(orderid - 10800);
+-----------+-----------+
| Direction | Prev/Next |
+-----------+-----------+
| prev      | 10799     |
| next      | 10801     |
+-----------+-----------+

This is a natural for a stored procedure:

DROP PROCEDURE IF EXISTS PrevNext;
DELIMITER |
CREATE PROCEDURE PrevNext(
IN db CHAR(64), IN tbl CHAR(64), IN col CHAR(64), IN seq INT
)
BEGIN
IF db IS NULL OR db = '' THEN
SET db = SCHEMA();
END IF;
SET @sql = CONCAT( "SELECT ",
" IF(", col, " > ", seq,",'next','prev') AS Direction,",
" IF(", col, " > ", seq, ",MIN(", col, "),MAX(", col, ")) AS 'Prev/Next'",
" FROM ", db, ".", tbl,
" WHERE ", col, " <> ", seq,
" GROUP BY SIGN(", col, " - ", seq, ")" );
PREPARE stmt FROM @sql;
EXECUTE  stmt;
DEALLOCATE PREPARE stmt;
END;
|
DELIMITER ;

Or, it can be embedded in the FROM clause of another query, for example ...

SELECT o2.OrderID,o2.Value,o.customerid
FROM orders o
JOIN (
SELECT 'This' AS 'OrderId', 10800 AS 'Value'
UNION
SELECT
IF( orderid > 10800, 'Next', 'Prev') AS Which,
IF( orderid > 10800, MIN(orderid), MAX(orderid )) AS 'Value'
FROM orders
WHERE orderid <> 10800
GROUP BY SIGN( orderid - 10800 )
) AS o2 ON o.orderid=o2.value
ORDER BY o.orderid;
+---------+-------+------------+
| OrderID | Value | customerid |
+---------+-------+------------+
| Prev    | 10799 | KOENE      |
| This    | 10800 | SEVES      |
| Next    | 10801 | BOLID      |
+---------+-------+------------+

Find row with next value of specified column

from the Artful Common Queries page


Sometimes we need next values of a column on a given row ordering. Oracle has a LEAD(...) OVER(...) construct which simplifies this query. MySQL does not. The logic is:

1. Form a resultset consisting of all relevant rows joined with all relevant rows that have greater values in the ordering columns. For example, if the table has these rows:

+------+
|    2 |
|    4 |
|    6 |
|    8 |
|   10 |
+------+

Then the resultset is

+------+------+
|    2 |    4 |
|    2 |    6 |
|    2 |    8 |
|    2 |   10 |
|    4 |    6 |
|    4 |    8 |
|    4 |   10 |
|    6 |    8 |
|    6 |   10 |
|    8 |   10 |
+------+------+

2. For each row in this resultset, find the row with the lowest ordering value amongst the higher values. For the example the result of this from the above resultset is

+-----+-----+
|   2 |   4 |
|   4 |   6 |
|   6 |   8 |
|   8 |  10 |
+-----+-----+

The fastest way to do step #2 is a self-exclusion join (see self-exclusion join examples elsewhere on this page).

Put it together:

drop table if exists t;
create table t(id int);
insert into t values(2),(4),(6),(8),(10);
select x.aid as id,x.bid as nextvalue
from (
select a.id as aid,b.id as bid
from t a
join t b on a.id<b.id
) x
left join (
select a.id as aid,b.id as bid
from t a
join t b on a.id<b.id
) y on x.aid=y.aid and x.bid>y.bid
where y.bid is null
order by x.aid,x.bid;
+------+-----------+
| id   | nextvalue |
+------+-----------+
|    2 |         4 |
|    4 |         6 |
|    6 |         8 |
|    8 |        10 |
+------+-----------+

Modify the algorithm to suit for next lowest. Here is an example where the sequencing column is a timestamp:

drop table if exists t;
create table t(d timestamp,i int);
insert into t values
('2009-12-1 00:00:00',1),('2009-12-3 00:00:00',3),('2009-12-5 00:00:00',5),('2009-12-8 00:00:00',8);

select x.*
from (
select a.d as thisdate, a.i as thisvalue, b.d as nextdate, b.i as nextvalue
from t a
join t b on a.d < b.d
) x
left join (
select a.d as thisdate, b.d as nextdate
from t a
join t b on a.d < b.d
) y on x.thisdate = y.thisdate and x.nextdate > y.nextdate
where y.nextdate is null
order by x.thisdate, x.nextdate;
+---------------------+-----------+---------------------+-----------+
| thisdate            | thisvalue | nextdate            | nextvalue |
+---------------------+-----------+---------------------+-----------+
| 2009-12-01 00:00:00 |         1 | 2009-12-03 00:00:00 |         3 |
| 2009-12-03 00:00:00 |         3 | 2009-12-05 00:00:00 |         5 |
| 2009-12-05 00:00:00 |         5 | 2009-12-08 00:00:00 |         8 |
+---------------------+-----------+---------------------+-----------+

Find sequence starts and ends

from the Artful Common Queries page


To find the first and last values of column value sequences in a table like this ...

drop table if exists t;
create table t(id int);
insert into t values(1),(2),(3),(4),(6),(7),(8);

... an exclusion join on the previous sequential value finds the first value of each sequence, and the minimum next value from a left join and an exclusion join on the previous sequential value finds the end of each sequence:

SELECT
a.id AS Start,
MIN( c.id ) AS End
FROM tbl AS a
LEFT JOIN tbl AS b ON a.id = b.id + 1
LEFT JOIN tbl AS c ON a.id <= c.id
LEFT JOIN tbl AS d ON c.id = d.id - 1
WHERE b.id IS NULL
AND c.id IS NOT NULL
AND d.id IS NULL
GROUP BY a.id;
+-------+------+
| Start | End  |
+-------+------+
|     1 |    4 |
|     6 |    8 |
+-------+------+

Thanks to Scott Noyes for noticing that a.id<c.id fails to pick up sequences of 1 followed by skips of 1, but a.id<=c.id does.

To see how that query works, look at the output of this version of the query with exclusion and aggregation clauses removed:

SELECT a.id AS aid,b.id AS bid, c.id AS c.id, d.di AS did
FROM tbl AS a
LEFT JOIN tbl AS b ON a.id = b.id + 1
LEFT JOIN tbl AS c ON a.id <= c.id
LEFT JOIN tbl AS d ON c.id = d.id - 1
ORDER BY a.id,b.id,c.id,d.id;
+------+------+------+------+
| aid  | bid  | cid  | did  |
+------+------+------+------+
|    1 | NULL |    1 |    2 |
|    1 | NULL |    2 |    3 |
|    1 | NULL |    3 |    4 |
|    1 | NULL |    4 | NULL |  <-- end of sequence starting with 1
|    1 | NULL |    6 |    7 |
|    1 | NULL |    7 |    8 |
|    1 | NULL |    8 | NULL |
|    2 |    1 |    2 |    3 |
|    2 |    1 |    3 |    4 |
|    2 |    1 |    4 | NULL |
|    2 |    1 |    6 |    7 |
|    2 |    1 |    7 |    8 |
|    2 |    1 |    8 | NULL |
|    3 |    2 |    3 |    4 |
|    3 |    2 |    4 | NULL |
|    3 |    2 |    6 |    7 |
|    3 |    2 |    7 |    8 |
|    3 |    2 |    8 | NULL |
|    4 |    3 |    4 | NULL |
|    4 |    3 |    6 |    7 |
|    4 |    3 |    7 |    8 |
|    4 |    3 |    8 | NULL |
|    6 | NULL |    6 |    7 |
|    6 | NULL |    7 |    8 |
|    6 | NULL |    8 | NULL |  <-- end of sequence starting with 6
|    7 |    6 |    7 |    8 |
|    7 |    6 |    8 | NULL |
|    8 |    7 |    8 | NULL |
+------+------+------+------+

A variant of the problem: when some IDs are used and some are not, find blocks of unused IDs:

DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl(id INT,used BOOL);
INSERT INTO tbl VALUES(1,1),(2,0),(3,0),(4,1),(5,0),(6,0);
SELECT a.id AS Start, MIN( c.id ) AS End
FROM tbl AS a
LEFT JOIN tbl AS b ON a.id=b.id + 1 AND a.used=0 AND b.used=0
LEFT JOIN tbl AS c ON a.id<=c.id AND a.used=0 AND c.used=0
LEFT JOIN tbl AS d ON c.id=d.id-1 AND c.used=0 AND d.used=0
WHERE b.id IS NULL
AND c.id IS NOT NULL
AND d.id IS NULL
GROUP BY a.id;
+-------+------+
| Start | End  |
+-------+------+
|     2 |    3 |
|     5 |    6 |
+-------+------+

Here's another variation on the pattern from a MySQL forum. You have a history of prescription dose changes ...

DROP TABLE IF EXISTS dose_change;
CREATE TABLE dose_change (
oid INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
dose_date DATETIME NOT NULL,
dose INTEGER UNSIGNED,
);
INSERT INTO dose_change (dose_date, dose) values
('2000-01-01', 10),('2000-01-02', 10),('2000-01-03', 20),('2000-01-04', 20),
('2000-01-05', 10),('2000-01-06', 10),('2000-01-07', 10),('2000-01-08', NULL),
('2000-01-09', NULL),('2000-01-10', 30),('2000-01-11', 30),('2000-01-12', 30),
('2000-01-13', 10),('2000-01-14', 20),('2000-01-15', 10),('2000-01-16', NULL),
('2000-01-17', 10);
SELECT * FROM dose_change;
+-----+---------------------+------+
| oid | dose_date           | dose |
+-----+---------------------+------+
|   1 | 2000-01-01 00:00:00 |   10 |
|   2 | 2000-01-02 00:00:00 |   10 |
|   3 | 2000-01-03 00:00:00 |   20 |
|   4 | 2000-01-04 00:00:00 |   20 |
|   5 | 2000-01-05 00:00:00 |   10 |
|   6 | 2000-01-06 00:00:00 |   10 |
|   7 | 2000-01-07 00:00:00 |   10 |
|   8 | 2000-01-08 00:00:00 | NULL |
|   9 | 2000-01-09 00:00:00 | NULL |
|  10 | 2000-01-10 00:00:00 |   30 |
|  11 | 2000-01-11 00:00:00 |   30 |
|  12 | 2000-01-12 00:00:00 |   30 |
|  13 | 2000-01-13 00:00:00 |   10 |
|  14 | 2000-01-14 00:00:00 |   20 |
|  15 | 2000-01-15 00:00:00 |   10 |
|  16 | 2000-01-16 00:00:00 | NULL |
|  17 | 2000-01-17 00:00:00 |   10 |
+-----+---------------------+------+

... and you want the dosage history:

2000-01-01 - 2000-01-03, 10
2000-01-03 - 2000-01-05, 20
2000-01-05 - 2000-01-08, 10
2000-01-10 - 2000-01-13, 30
2000-01-13 - 2000-01-14, 10
2000-01-14 - 2000-01-15, 20
2000-01-15 - 2000-01-16, 10
2000-01-17 - null      , 10

Forum contributor "laptop alias" posted this solution:

SELECT a.dose_date AS start
, MIN(DATE(c.dose_date)) + INTERVAL 1 DAY AS end
, a.dose
FROM
( SELECT x.dose_date, x.dose, COUNT(*) id
FROM dose_change x
JOIN dose_change y
ON y.dose_date <= x.dose_date
GROUP BY x.oid
) AS a
LEFT JOIN
( SELECT x.dose_date, x.dose, COUNT(*) id
FROM dose_change x
JOIN dose_change y
ON y.dose_date <= x.dose_date
GROUP BY x.oid
) AS b ON a.id = b.id + 1 AND b.dose = a.dose
LEFT JOIN
( SELECT x.dose_date, x.dose, COUNT(*) id
FROM dose_change x
JOIN dose_change y
ON y.dose_date <= x.dose_date
GROUP BY x.oid
) AS c ON a.id <= c.id AND c.dose = a.dose
LEFT JOIN
( SELECT x.dose_date, x.dose, COUNT(*) id
FROM dose_change x
JOIN dose_change y
ON y.dose_date <= x.dose_date
GROUP BY x.oid
) AS d ON c.id = d.id - 1 AND d.dose = c.dose
WHERE b.id IS NULL AND c.id IS NOT NULL AND d.id IS NULL
GROUP BY start;
+---------------------+------------+------+
| start               | end        | dose |
+---------------------+------------+------+
| 2000-01-01 00:00:00 | 2000-01-03 |   10 |
| 2000-01-03 00:00:00 | 2000-01-05 |   20 |
| 2000-01-05 00:00:00 | 2000-01-08 |   10 |
| 2000-01-10 00:00:00 | 2000-01-13 |   30 |
| 2000-01-13 00:00:00 | 2000-01-14 |   10 |
| 2000-01-14 00:00:00 | 2000-01-15 |   20 |
| 2000-01-15 00:00:00 | 2000-01-16 |   10 |
| 2000-01-17 00:00:00 | 2000-01-18 |   10 |
+---------------------+------------+------+

Tom Melly found this simpler but slower solution:

SELECT
a.dose_date AS StartDate,
a.dose AS Dose,
( SELECT b.dose_date
FROM dose_change AS b
WHERE b.dose_date > a.dose_date AND (b.dose <> a.dose OR b.dose IS NULL)
ORDER BY b.dose_date ASC LIMIT 1
) AS StopDate
FROM dose_change AS a
WHERE Coalesce(
(SELECT c.dose
FROM dose_change AS c
WHERE c.dose_date <= a.dose_date
ORDER BY c.dose_date DESC LIMIT 1,1
), -99999
) <> a.dose
AND a.dose IS NOT NULL
ORDER BY a.dose_date ASC;

Find specific sequences

from the Artful Common Queries page


You have a table which tracks hits on named web pages...

CREATE TABLE hits (
id INT NOT NULL DEFAULT 0,
page CHAR(128) DEFAULT '',
time TIMESTAMP NOT NULL DEFAULT 0,
PRIMARY KEY(id, time)
)

where id is unique to a session. Here is a bit of sample data:

INSERT INTO hits VALUES
(1, 'A', TIMESTAMPADD(SECOND,10,NOW())),
(1, 'B', TIMESTAMPADD(SECOND,20,NOW())),
(2, 'A', TIMESTAMPADD(SECOND,40,NOW())),
(1, 'A', TIMESTAMPADD(SECOND,50,NOW())),
(1, 'C', TIMESTAMPADD(SECOND,60,NOW())),
(3, 'A', TIMESTAMPADD(SECOND,110,NOW())),
(3, 'A', TIMESTAMPADD(SECOND,120,NOW())),
(3, 'C', TIMESTAMPADD(SECOND,130,NOW())),
(2, 'C', TIMESTAMPADD(SECOND,90,NOW())),
(2, 'A', TIMESTAMPADD(SECOND,100,NOW()));

You desire a count of the number of sessions where a user moved from one particular page directly to another, for example from 'A' to 'C'.

To find the next hit in a given session, scope on id, order by time, and limit the output to one row. Then simply count the rows meeting the page criteria:

SELECT
COUNT(DISTINCT h1.id) AS 'Moves from A to C'
FROM hits AS h1
WHERE
h1.page = 'A'
AND 'C' = (
SELECT h2.page
FROM hits AS h2
WHERE h2.id = h1.id
AND h2.time > h1.time
ORDER BY h2.time LIMIT 1
);
-------------------
| Moves from A to C |
-------------------
|                 3 |
-------------------

Find the next value after a sequence

from the Artful Common Queries page


Given a sequence of values on a given ordering, what is the next value? It's a common requirement (eg in DNA sequencing). A MySQL Forum user posted this example:

drop table if exists data;
create table data(id smallint unsigned primary key auto_increment, val smallint);
insert into data (val) values
(8),(21),(28),(29),(31),(32),(27),(20),(31),(1),(18),(35),
(18),(30),(22),(9),(2),(8),(33),(8),(19),(31),(6),(31),(14),(5),
(26),(29),(34),(34),(19),(27),(29),(3),(21),(18),(31),(5),(18),
(34),(4),(15),(12),(20),(28),(31),(13),(22),(19),(30),(0),(2),
(30),(28),(2),(10),(27),(9),(23),(28),(29),(16),(16),(31),(35),(18),
(2),(15),(1),(30),(15),(11),(17),(26),(35),(1),(22),(19),(23),(1),
(18),(35),(28),(13),(9),(14);

What value immediately follows the sequence 23,28,29,16,16,31,35?

A simple solution is to form a comma-separated list of values on the given ordering, then locate the target sequence and the next value after it by counting commas:

SET @list = (SELECT Group_Concat(val ORDER BY id) FROM data);
SET @target = '23,28,29,16,16,31,35';
SELECT @pos := Locate( @target, @list, 1 );
SELECT @remainder := SubStr( @list, @pos + Char_length( @target ) + 1 );
SELECT Substr( @remainder, 1, Locate( ',', @remainder ) - 1 ) as NextValue;
+-----------+
| NextValue |
+-----------+
| 18        |
+-----------+

Note that a bug in the MySQL implementation of SET requires that we use SELECT @var := ... syntax after the first two SET assignments.

The calculation seems a natural for a stored function, except that MySQL functions do not support PREPAREd statements. The logic is easily encapsulated in a stored procedure:

drop procedure if exists nextvalue;
delimiter go
create procedure nextvalue( tbl char(64), ordcol char(64), valcol char(64) )
begin
SET @sql = Concat( 'SELECT Group_Concat(', valcol, ' ORDER BY ', ordcol, ') INTO @list', ' FROM ', tbl );
PREPARE stmt FROM @sql; EXECUTE stmt;
SET @target = '23,28,29,16,16,31,35';
SELECT Locate( @target, @list, 1 ) INTO @pos;
SELECT SubStr( @list, @pos + Char_length( @target ) + 1 ) INTO @remainder;
DROP PREPARE stmt;
SELECT Substr( @remainder, 1, Locate( ',', @remainder ) - 1 );
end;
go
delimiter ;
call nextvalue('data','id','val');

Gaps in a time series

from the Artful Common Queries page


Advanced time series analysis generally requires custom software, but straightforward SQL queries can answer simple time series questions. You have a jobtimes table with columns ID, job, machine, start_time, and stop_time. You wish to know which machines have had gaps between activity periods. It's a version of "Find available booking periods":

drop table jobtimes;
create table jobtimes(id int, machine smallint, start_time timestamp, stop_time timestamp);
insert into jobtimes values(1,1,'2011-7-1 08:00:00', '2011-7-1 10:00:00');
insert into jobtimes values(2,1,'2011-7-1 11:00:00', '2011-7-1 14:00:00');
insert into jobtimes values(3,2,'2011-7-1 08:00:00', '2011-7-1 09:00:00');
insert into jobtimes values(4,2,'2011-7-1 09:00:00', '2011-7-1 10:00:00');
insert into jobtimes values(5,3,'2011-7-1 08:00:00', '2011-7-1 08:30:00');
insert into jobtimes values(6,3,'2011-7-1 10:00:00', '2011-7-1 12:00:00');
select * from jobtimes;
+------+---------+---------------------+---------------------+
| id   | machine | start_time          | stop_time           |
+------+---------+---------------------+---------------------+
|    1 |       1 | 2011-07-01 08:00:00 | 2011-07-01 10:00:00 |
|    2 |       1 | 2011-07-01 11:00:00 | 2011-07-01 14:00:00 |
|    3 |       2 | 2011-07-01 08:00:00 | 2011-07-01 09:00:00 |
|    4 |       2 | 2011-07-01 09:00:00 | 2011-07-01 10:00:00 |
|    5 |       3 | 2011-07-01 08:00:00 | 2011-07-01 08:30:00 |
|    6 |       3 | 2011-07-01 10:00:00 | 2011-07-01 12:00:00 |
+------+---------+---------------------+---------------------+

SELECT
a.machine,
a.stop_time AS 'Unused From',
Min(b.start_time) AS 'To'
FROM jobtimes AS a
JOIN jobtimes AS b ON a.machine=b.machine AND a.stop_time < b.start_time
GROUP BY a.stop_time
HAVING a.stop_time < MIN(b.start_time)
ORDER BY machine;
+---------+---------------------+---------------------+
| machine | Unused From         | To                  |
+---------+---------------------+---------------------+
|       1 | 2011-07-01 10:00:00 | 2011-07-01 11:00:00 |
|       3 | 2011-07-01 08:30:00 | 2011-07-01 10:00:00 |
+---------+---------------------+---------------------+

Make values of a column sequential

You have a table tbl with an integer primary key column keycol which is not a key in another table, and which you wish to make perfectly sequential starting with 1.

SET @i=0;
UPDATE tbl SET keycol=(@i:=@i+1); 

Track stepwise project completion

from the Artful Common Queries page


A master table has one row for each project, and the number of sequential steps required to complete each project. A detail table has one row per project per completed step:

DROP TABLE IF EXISTS t1 ;
CREATE TABLE t1 (
id INT, projectname CHAR(2), projectsteps INT
);
INSERT INTO t1 VALUES
(1, 'xx', 3),
(2, 'yy', 3),
(3, 'zz', 5);

DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (
id INT, projectID INT, xid INT
);
INSERT INTO t2 VALUES
(1, 1, 1),
(2, 1, 2),
(3, 2, 1),
(4, 1, 3),
(5, 3, 2),
(6, 1, 2),
(7, 2, 1),
(8, 2, 1);

The requirement is for a query which, for every project, reports 'OK' if there is at least one detail row for every project step, or otherwise reports the number of the last sequential completed step:

Here is one way to build such a query:

1. Join t1 to t2 on projectID.

2. Left Join t2 to itself on projectID and integer succession.

3. Add a WHERE condition which turns the left self-join into an exclusion join that finds the first missing sequential xid value.

4. To the SELECT list add this item:

IF( a.xid < p.projectstep,a.xid,'OK' ) AS StepState

so when the first sequential missing xid is not less than the number of project steps, display 'Ok', otherwise display the xid value before the first missing xid value.

5. Remove dupes with a GROUP BY clause.

SELECT
p.projectname,p.projectsteps,a.xid,
IF(a.xid < p.projectsteps, a.xid, 'OK') AS CompletionState
FROM t1 p
JOIN t2 a ON p.id = a.projectID
LEFT JOIN t2 AS b ON a.projectID = b.projectID AND a.xid+1 = b.xid
WHERE b.xid IS NULL
GROUP BY p.projectname;
+-------------+--------------+------+-----------------+
| projectname | projectsteps | xid  | CompletionState |
+-------------+--------------+------+-----------------+
| xx          |            3 |    3 | OK              |
| yy          |            3 |    1 | 1               |
| zz          |            5 |    2 | 2               |
+-------------+--------------+------+-----------------+

Winning Streaks

from the Artful Common Queries page


Given a table of IDs and won-lost results, how do we find the longest winning streak?

drop table if exists results;
create table results(id int,result char(1));
insert into results values
(1,'w'),(2,'l'),(3,'l'),(4,'w'),(5,'w'),(6,'w'),(7,'l'),(8,'w'),(9,'w');
select * from results;
+------+--------+
| id   | result |
+------+--------+
|    1 | w      |
|    2 | l      |
|    3 | l      |
|    4 | w      |
|    5 | w      |
|    6 | w      |
|    7 | l      |
|    8 | w      |
|    9 | w      |
+------+--------+

We can find streaks of two with a left join on a.id=b.id+1. To count streak lengths, initialise a counter to 0 then increment it for every hit:

set @count=0;
select a.id, a.result, b.result, @count := IF(a.result = b.result, @count + 1, 1) as Streak
from results a
left join results b on a.id = b.id + 1
where a.result = 'w';

The longest winning streak is the longest such streak found:

set @count=0;
select MAX(@count:=IF(a.result = b.result, @count + 1, 1)) as LongestStreak
from results a
left join results b on a.id = b.id + 1
where a.result = 'w';
+---------------+
| LongestStreak |
+---------------+
|             3 |
+---------------+

That solution is from a response by Jon Roshko to a question by Ed Ball on the MySQL Newbie Forum. Scott Noyes points out that our query pattern for sequence starts and ends also works for winning streaks:

SELECT MIN( c.id ) - a.id + 1 as LongestStreak
FROM results AS a
LEFT JOIN results AS b ON a.id = b.id + 1 AND b.result = 'w'
LEFT JOIN results AS c ON a.id <= c.id AND c.result = 'w'
LEFT JOIN results AS d ON c.id = d.id - 1 AND d.result = 'w'
WHERE
a.result = 'w'
AND b.id IS NULL
AND c.id IS NOT NULL
AND d.id IS NULL
GROUP BY a.id
ORDER BY LongestStreak DESC LIMIT 1;

Great circle distance

from the Artful Common Queries page


Find the distance in kilometres between two points on the surface of the earth. This is just the sort of problem stored functions were made for. For a first order approximation, ignore deviations of the earth's surface from the perfectly spherical. Then the distance in radians is given by a number of trigonometric formulas. ACOS and COS behave reasonably:

COS(lat1-lat2)*(1+COS(lon1-lon2)) - COS(lat1+lat2)*(1-COS(lon1-lon2))
rads = ACOS( --------------------------------------------------------------------- )
2

We need to convert degrees latitude and longitude to radians, and we need to know the length in km of one radian on the earth's surface, which is 6378.388. The function:

set log_bin_trust_function_creators=TRUE;

DROP FUNCTION IF EXISTS GeoDistKM;
DELIMITER |
CREATE FUNCTION GeoDistKM( lat1 FLOAT, lon1 FLOAT, lat2 FLOAT, lon2 FLOAT ) RETURNS float
BEGIN
DECLARE pi, q1, q2, q3 FLOAT;
DECLARE rads FLOAT DEFAULT 0;
SET pi = PI();
SET lat1 = lat1 * pi / 180;
SET lon1 = lon1 * pi / 180;
SET lat2 = lat2 * pi / 180;
SET lon2 = lon2 * pi / 180;
SET q1 = COS(lon1-lon2);
SET q2 = COS(lat1-lat2);
SET q3 = COS(lat1+lat2);
SET rads = ACOS( 0.5*((1.0+q1)*q2 - (1.0-q1)*q3) );
RETURN 6378.388 * rads;
END;
|
DELIMITER ;

-- toronto to montreal (505km):
select geodistkm(43.6667,-79.4167,45.5000,-73.5833);
+----------------------------------------------+
| geodistkm(43.6667,-79.4167,45.5000,-73.5833) |
+----------------------------------------------+
|                           505.38836669921875 |
+----------------------------------------------+

(Setting log_bin_trust_function_creators is the most convenient way to step round determinacy conventions implemented since 5.0.6.)

 

Moving average

from the Artful Common Queries page


Given a table of dates and daily values, retrieve their moving 5-day average:

DROP TABLE IF EXISTS t;
CREATE TABLE t (dt DATE, qty INT);
INSERT INTO t VALUES
('2007-1-1',5),('2007-1-2',6),('2007-1-3',7),('2007-1-4',8),('2007-1-5',9),
('2007-1-6',10),('2007-1-7',11),('2007-1-8',12),('2007-1-9',13);

SELECT
a.dt,
a.qty,
Round( ( SELECT SUM(b.qty) / COUNT(b.qty)
FROM t AS b
WHERE DATEDIFF(a.dt, b.dt) BETWEEN 0 AND 4
), 2 ) AS '5dayMovingAvg'
FROM t AS a
ORDER BY a.dt;
+------------+------+---------------+
| dt         | qty  | 5dayMovingAvg |
+------------+------+---------------+
| 2007-01-01 |    5 |          5.00 |
| 2007-01-02 |    6 |          5.50 |
| 2007-01-03 |    7 |          6.00 |
| 2007-01-04 |    8 |          6.50 |
| 2007-01-05 |    9 |          7.00 |
| 2007-01-06 |   10 |          8.00 |
| 2007-01-07 |   11 |          9.00 |
| 2007-01-08 |   12 |         10.00 |
| 2007-01-09 |   13 |         11.00 |
+------------+------+---------------+

Correlated subqueries can be slow, and we can get rid of this one:

SELECT a.dt, format(sum(b.qty)/count(b.qty),2)  5dayMovAvg
FROM t a
JOIN t b on datediff(a.dt,b.dt) BETWEEN 0 AND 4
GROUP BY a.dt;

The first method generalises easily to moving averages for multiple items:

DROP TABLE IF EXISTS t;
CREATE TABLE t (item int, dt DATE, qty INT);
INSERT INTO t VALUES (1,'2007-1-1',5),
(1,'2007-1-2',6),
(1,'2007-1-3',7),
(1,'2007-1-4',8),
(1,'2007-1-5',9),
(1,'2007-1-6',10),
(1,'2007-1-7',11),
(1,'2007-1-8',12),
(1,'2007-1-9',13),
(2,'2007-1-1',6),
(2,'2007-1-2',7),
(2,'2007-1-3',8),
(2,'2007-1-4',9),
(2,'2007-1-5',10),
(2,'2007-1-6',11),
(2,'2007-1-7',12),
(2,'2007-1-8',13),
(2,'2007-1-9',14);
SELECT
t1.item,t1.dt,
( SELECT SUM(t2.qty) / COUNT(t2.qty)
FROM t AS t2
WHERE item=t1.item AND DATEDIFF(t1.dt, t2.dt) BETWEEN 0 AND 4
) AS '5dayMovingAvg'
FROM t AS t1
GROUP BY t1.item,t1.dt;
+------+------------+---------------+
| item | dt         | 5dayMovingAvg |
+------+------------+---------------+
|    1 | 2007-01-01 |        5.0000 |
|    1 | 2007-01-02 |        5.5000 |
|    1 | 2007-01-03 |        6.0000 |
|    1 | 2007-01-04 |        6.5000 |
|    1 | 2007-01-05 |        7.0000 |
|    1 | 2007-01-06 |        8.0000 |
|    1 | 2007-01-07 |        9.0000 |
|    1 | 2007-01-08 |       10.0000 |
|    1 | 2007-01-09 |       11.0000 |
|    2 | 2007-01-01 |        6.0000 |
|    2 | 2007-01-02 |        6.5000 |
|    2 | 2007-01-03 |        7.0000 |
|    2 | 2007-01-04 |        7.5000 |
|    2 | 2007-01-05 |        8.0000 |
|    2 | 2007-01-06 |        9.0000 |
|    2 | 2007-01-07 |       10.0000 |
|    2 | 2007-01-08 |       11.0000 |
|    2 | 2007-01-09 |       12.0000 |
+------+------------+---------------+

or more simply...

SELECT t1.item,t1.dt, AVG(t2.qty) AS 5DayAvg
FROM t t1
JOIN t t2 ON t1.item=t2.item AND DATEDIFF(t1.dt, t2.dt) BETWEEN 0 AND 4
GROUP BY t1.item,t1.dt;

See http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average for formulas for other kinds of moving averages. For example if you run this exponential moving average on the sample we started with, you'll see that it tends to logarthmicise the moving average ...

SET @weight=7;
SELECT
dt,
qty,
@x:=Round((@weight*@x+qty)/10,2) as ExpMovingAvg
FROM t
JOIN (
SELECT @x:=1
) AS dummy
ORDER BY dt;
+------------+------+--------------+
| dt         | qty  | ExpMovingAvg |
+------------+------+--------------+
| 2007-01-01 |    5 |         1.20 |
| 2007-01-02 |    6 |         1.44 |
| 2007-01-03 |    7 |         1.71 |
| 2007-01-04 |    8 |         2.00 |
| 2007-01-05 |    9 |         2.30 |
| 2007-01-06 |   10 |         2.61 |
| 2007-01-07 |   11 |         2.93 |
| 2007-01-08 |   12 |         3.25 |
| 2007-01-09 |   13 |         3.58 |
+------------+------+--------------+

and you can adjust that tendency by varying the @weight.

Multiple sums across a join

You have a parties table that holds info on peoples' names etc, and a contracts table in which each row defines one contract, identifying a client as clientpartyID and a contractor as contractorpartyID, each of these a foreign key referencing parties.partyID. You want a list of parties showing how many contracts they have participated in as client, and how many they've participated in as contractor.

SELECT
p.partyID,
p.name,
(SELECT COUNT(*) FROM contractor_client c1 WHERE c1.clientpartyID = p.partyID )
AS ClientDeals,
(SELECT COUNT(*) FROM contractor_client c2 WHERE c2.contractorpartyID = p.partyID)
AS ContractorDeals
FROM parties p
ORDER BY partyID;

Percentiles

from the Artful Common Queries page


In the Sakila table film, retrieve a top-down percentile ranking of film lengths:

SELECT
a.film_id ,
ROUND( 100.0 * ( SELECT COUNT(*) FROM film AS b WHERE b.length <= a.length ) / total.cnt, 1 )
AS percentile
FROM film a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM film
) AS total
ORDER BY percentile DESC;

If there are NULLs, filter them out before computing percentiles.

On his blog, Roland Bouman shows a much faster query; here is a version retrieving the first film at or above the 90th percentile:

SELECT
SUM(g1.r) sr,
g2.length l,
SUM(g1.r)/(SELECT COUNT(*) FROM film) p
FROM (
SELECT COUNT(*) r, length
FROM film
GROUP BY length
) g1
JOIN (
SELECT COUNT(*) r, length
FROM film
GROUP BY length
) g2 ON g1.length < g2.length
GROUP BY g2.length
HAVING p >= 0.9
ORDER BY p LIMIT 1

Random row selection

When your web page loads it is to provide a randomly selected Murphy's Law from your murphy table (id int, text law)):

SELECT law
FROM murphy
ORDER BY RAND()
LIMIT 1;

Running sums, chequebooks

from the Artful Common Queries page


A user variable can maintain a per-row cumulative sum of column values. Initialise it, then adjust its value as desired in the appropriate SELECT expression:

SET @total=0;
SELECT id, value, @total:=@total+value AS RunningSum
FROM tbl;

Chequebook balancing programs often use this pattern. This one tracks the running balance and how much money from the most recent deposit remains:

DROP TABLE IF EXISTS chequebook;
CREATE TABLE chequebook (
entry_date timestamp default now() PRIMARY KEY,
entry_item varchar(48) NOT NULL DEFAULT '',
entry_amount decimal(10,2) NOT NULL DEFAULT 0.00
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO chequebook (entry_date,entry_item,entry_amount) VALUES
('2010-01-02 12:34:00','Deposit A',215.56),
('2010-01-02 21:44:00','Withdrawal A' ,-23.34),
('2010-01-03 10:44:00','Withdrawal B',-150.15),
('2010-01-03 15:44:00','Deposit B',154.67),
('2010-01-04 18:44:00','Withdrawal C',-65.09),
('2010-01-05 08:44:00','Withdrawal D',-74.23),
('2010-01-06 14:44:00','Deposit C',325.12),
('2010-01-06 20:44:00','Withdrawal E',-80.12),
('2010-01-07 04:44:00','Withdrawal F',-110.34),
('2010-01-07 16:44:00','Withdrawal G',-150.25),
('2010-01-08 16:44:00','Withdrawal H',-23.90),
('2010-01-08 21:44:00','Withdrawal I',-75.66),
('2010-01-08 22:44:00','Deposit C',275.78),
('2010-01-09 11:44:00','Withdrawal K',-85.99),
('2010-01-09 21:44:00','Withdrawal J',-100.00);

set @depos=0;
set @total=0;
select
entry_date,
entry_item,
entry_amount,
if( entry_amount>0, @depos:=entry_amount, @depos:=@depos+entry_amount ) as depos_bal,
@total:=@total+entry_amount as net_bal
from chequebook
order by entry_date;
+---------------------+--------------+--------------+-----------+---------+
| entry_date          | entry_item   | entry_amount | depos_bal | net_bal |
+---------------------+--------------+--------------+-----------+---------+
| 2010-01-02 12:34:00 | Deposit A    |       215.56 |    215.56 |  215.56 |
| 2010-01-02 21:44:00 | Withdrawal A |       -23.34 |    192.22 |  192.22 |
| 2010-01-03 10:44:00 | Withdrawal B |      -150.15 |     42.07 |   42.07 |
| 2010-01-03 15:44:00 | Deposit B    |       154.67 |    154.67 |  196.74 |
| 2010-01-04 18:44:00 | Withdrawal C |       -65.09 |     89.58 |  131.65 |
| 2010-01-05 08:44:00 | Withdrawal D |       -74.23 |     15.35 |   57.42 |
| 2010-01-06 14:44:00 | Deposit C    |       325.12 |    325.12 |  382.54 |
| 2010-01-06 20:44:00 | Withdrawal E |       -80.12 |    245.00 |  302.42 |
| 2010-01-07 04:44:00 | Withdrawal F |      -110.34 |    134.66 |  192.08 |
| 2010-01-07 16:44:00 | Withdrawal G |      -150.25 |    -15.59 |   41.83 |
| 2010-01-08 16:44:00 | Withdrawal H |       -23.90 |    -39.49 |   17.93 |
| 2010-01-08 21:44:00 | Withdrawal I |       -75.66 |   -115.15 |  -57.73 |
| 2010-01-08 22:44:00 | Deposit C    |       275.78 |    275.78 |  218.05 |
| 2010-01-09 11:44:00 | Withdrawal K |       -85.99 |    189.79 |  132.06 |
| 2010-01-09 21:44:00 | Withdrawal J |      -100.00 |     89.79 |   32.06 |
+---------------------+--------------+--------------+-----------+---------+

If your platform does not permit multiple queries per connection, and if you can tolerate the O(N2) inefficiency of a self-join, a self-join does the same job as an accumulating user variable:

SELECT c.id, c.value, d.RunningSum
FROM tbl c
JOIN (
SELECT a.id, SUM(b.value) AS RunningSum
FROM tbl a
LEFT JOIN tbl b ON b.id <= a.id
GROUP BY a.id
) d USING (id);

Sum across categories

from the Artful Common Queries page


You often need to sum across several categories to total customer purchase amounts, salesperson sales amounts, political party election spending, etc.

For this example assume three tables: candidates, parties and ridings. You want to get the total amount spent in all ridings by every party in one output row. Here is the schema:

CREATE TABLE candidates (
id int(11) NOT NULL default '0',
`name` char(10) ,
riding char(12) ,
party char(12) ,
amt_spent decimal(10,0) NOT NULL default '0',
PRIMARY KEY  (id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO candidates
VALUES (1,'Anne Jones','Essex','Liberal','5000'),
(2,'Mary Smith','Malton','Liberal','7000'),
(3,'Sara Black','Riverdale','Liberal','15000'),
(4,'Paul Jones','Essex','Socialist','3000'),
(5,'Ed While','Essex','Conservative','10000'),
(6,'Jim kelly','Malton','Liberal','9000'),
(7,'Fred Price','Riverdale','Socialist','4000');

CREATE TABLE ridings (
riding char(10) NOT NULL default '',
PRIMARY KEY  (riding)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO ridings VALUES ('Essex'),('Malton'),('Riverdale');

CREATE TABLE parties (
party char(12) NOT NULL default '',
PRIMARY KEY  (party)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO parties VALUES ('Conservative'),('Liberal'),('Socialist');

And here is the query that does it:

SELECT
SUM(amt_spent) AS Total,
(SUM(amt_spent)-SUM(CASE WHEN data.party='Conservative' THEN NULL ELSE amt_spent END)) AS Cons,
(SUM(amt_spent)-SUM(CASE WHEN data.party='Liberal' THEN NULL ELSE amt_spent END)) AS Lib,
(SUM(amt_spent)-SUM(CASE WHEN data.party='Socialist' THEN NULL ELSE amt_spent END)) AS Soc
FROM
(SELECT * FROM candidates
INNER JOIN parties ON candidates.party=parties.party
INNER JOIN ridings ON candidates.riding=ridings.riding) AS data

------- ------- ------- ------
| Total | Cons  | Lib   | Soc  |
------- ------- ------- ------
| 53000 | 10000 | 36000 | 7000 |
------- ------- ------- ------

Top ten

We often want to know the top 1, 2, 10 or whatever values from a query. This is dead simple in MySQL. However many JOINs and WHEREs the query has, simply ORDER BY the column(s) whose highest values are sought, and LIMIT the resultset:

SELECT (somecolumn), (othercolumns) ...
FROM (some tables) ...
ORDER BY somecolumn DESC
LIMIT 10;

A cursor if necessary, but not necessarily a cursor

from the Artful Common Queries page


You have photos (id INT, photo BLOB, tally INT) and votes(id INT, userID INT, photoID INT) tables. You wish to update photos.tally values from counts per photo in the votes table. You can use a cursor to walk the photos table, updating the tally as you go:

DROP TABLE IF EXISTS photos;
CREATE TABLE photos (id INT, photo BLOB, tally INT);
INSERT INTO photos VALUES(1,'',0),(2,'',0);
DROP TABLE IF EXISTS VOTES;
CREATE TABLE VOTES( userID INT, photoID INT);
INSERT INTO votes VALUES (1,1),(2,1),(2,2);

DROP PROCEDURE IF EXISTS updatetallies;
DELIMITER //
CREATE PROCEDURE updatetallies()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE pid INT;
DECLARE cur1 CURSOR FOR SELECT id FROM photos;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1;
OPEN cur1;
FETCH cur1 INTO pid;
WHILE done = 0 DO
UPDATE photos
SET tally = (SELECT COUNT(*) FROM votes WHERE photoid = pid )
WHERE id = pid;
FETCH cur1 INTO pid;
END WHILE;
CLOSE cur1;
SELECT id,tally FROM photos;
END //
DELIMITER ;
CALL updatetallies();
+------+-------+
| id   | tally |
+------+-------+
|    1 |     2 |
|    2 |     1 |
+------+-------+

but a simple join does exactly the same job at much less cost:

UPDATE photos
SET tally = (
SELECT COUNT(*) FROM votes WHERE votes.photoid = photos.id
);

Before you burden your app with a cursor, see if you can simplify the processing to a straightforward join.

Emulate sp_exec

Sometimes it is desirable to call multiple stored procedures in one command. In SQL Server this can be done with sp_exec. In MySQL we can easily write such an sproc that calls as many sprocs as we please, for example...

USE sys;
DROP PROCEDURE IF EXISTS sp_exec;
DELIMITER |
CREATE PROCEDURE sp_exec( p1 CHAR(64), p2 CHAR(64) )
BEGIN
-- permit doublequotes to delimit data
SET @sqlmode=(SELECT @@sql_mode);
SET @@sql_mode='';
SET @sql = CONCAT( "CALL ", p1 );
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
SET @sql = CONCAT( "CALL ", p2 );
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
SET @@sql_mode=@sqlmode;
END;
|
DELIMITER ;

Variable-length argument for query IN() clause

To have an sproc accept a variable-length parameter list for an IN(...) clause in a query, code the sproc to PREPARE the query statement:

DROP PROCEDURE IF EXISTS passInParam;
DELIMITER |
CREATE PROCEDURE passInParam( IN qry VARCHAR(100), IN param VARCHAR(1000) )
BEGIN
SET @qry = CONCAT( qry, param, ')' );
PREPARE stmt FROM @qry;
EXECUTE stmt;
DROP PREPARE stmt;
END;
|
DELIMITER ;

For this example, the query string should be of the form:

SELECT ... FROM ... WHERE ... IN (

but so long as it has those elements, it can be as complex as you like. When you call the sproc:
1. Quote each argument with a pair of single quotes,
2. Separate these quoted arguments with commas,
3. Surround the whole param string with another set of single quotes:

CALL passInParam( 'SELECT * FROM tbl WHERE colval IN (', ('''abc'',''def'',''ghi''' ));

Count delimited substrings

from the Artful Common Queries page


Here is a function to count substrings delimited by a constant delimiting string:

DROP FUNCTION IF EXISTS strcount;
SET GLOBAL log_bin_trust_function_creators=1;
DELIMITER |
CREATE FUNCTION strCount( pDelim VARCHAR(32), pStr TEXT) RETURNS int(11)
BEGIN
DECLARE n INT DEFAULT 0;
DECLARE pos INT DEFAULT 1;
DECLARE strRemain TEXT;
SET strRemain = pStr;
SET pos = LOCATE( pDelim, strRemain );
WHILE pos != 0 DO
SET n = n + 1;
SET pos = LOCATE( pDelim, strRemain );
SET strRemain = SUBSTRING( strRemain, pos+1 );
END WHILE;
RETURN n;
END |
DELIMITER ;

-- example call:
SET @str  = "The quick brown fox jumped over the lazy dog";
SET @delim = " ";
SELECT strCount(@delim,@str);

Count substrings

To count instances of a search string in a target string ...

  • in the target string, replace the search string with a single character,
  • subtract the length of the modified target string from the length of the original target string,
  • divide that by the length of the search string:


SET @str  = "The quick brown fox jumped over the lazy dog";
SET @find = "the";
SELECT ROUND(((LENGTH(@str) - LENGTH(REPLACE(LCASE(@str), @find, '')))/LENGTH(@find)),0)
AS COUNT;
+-------+
| COUNT |
+-------+
|     2 |
+-------+

Note that REPLACE() does a case-sensitive search; to get a case-insensitive result you must coerce target and search strings to one case.

To remove decimals from the result:

SELECT CAST((LENGTH(@str) - LENGTH(REPLACE(LCASE(@str)), @find, '')))/LENGTH(@find) AS SIGNED) AS COUNT;

Levenshtein distance

from the Artful Common Queries page


The Levenshtein distance between two strings is the minimum number of operations needed to transform one string into the other, where an operation may be insertion, deletion or substitution of one character. Jason Rust published this MySQL algorithm for it at http://www.codejanitor.com/wp/.

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END;

Helper function:

CREATE FUNCTION levenshtein_ratio( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, max_len INT;
SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);
IF s1_len > s2_len THEN
SET max_len = s1_len;
ELSE
SET max_len = s2_len;
END IF;
RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100);
END;

Proper case

from the Artful Common Queries page


The basic idea is...

  • lower-case the string
  • upper-case the first character if it is a-z, and any other a-z character that follows a punctuation character

Here is the function. To make it work with strings long than 128 characters, change its input and return declarations accordingly:

DROP FUNCTION IF EXISTS proper;
SET GLOBAL  log_bin_trust_function_creators=TRUE;
DELIMITER |
CREATE FUNCTION proper( str VARCHAR(128) )
RETURNS VARCHAR(128)
BEGIN
DECLARE c CHAR(1);
DECLARE s VARCHAR(128);
DECLARE i INT DEFAULT 1;
DECLARE bool INT DEFAULT 1;
DECLARE punct CHAR(18) DEFAULT ' ()[]{},.-_\'!@;:?/'; -- David Rabby & Lenny Erickson added \'
SET s = LCASE( str );
WHILE i <= LENGTH( str ) DO -- Jesse Palmer corrected from < to <= for last char
BEGIN
SET c = SUBSTRING( s, i, 1 );
IF LOCATE( c, punct ) > 0 THEN
SET bool = 1;
ELSEIF bool=1 THEN
BEGIN
IF c >= 'a' AND c <= 'z' THEN
BEGIN
SET s = CONCAT(LEFT(s,i-1),UCASE(c),SUBSTRING(s,i+1));
SET bool = 0;
END;
ELSEIF c >= '0' AND c <= '9' THEN
SET bool = 0;
END IF;
END;
END IF;
SET i = i+1;
END;
END WHILE;
RETURN s;
END;
|
DELIMITER ;
select proper("d'arcy");
+------------------+
| proper("d'arcy") |
+------------------+
| D'Arcy           |
+------------------+

But there are always exceptions, for example some guy with that name will want it spelled d'Arcy.

Comments are closed.

Post Navigation