User Tools

Site Tools


mysql:finding_and_deleting_orphaned_rows

MySQL - Finding and Deleting Orphaned Rows

If you store data with a parent-child relationship in MySQL tables, then you may want to identify and/or delete orphaned rows from the child table. An orphaned row is defined as a child row that references a non-existent parent row. If you enforce the parent-child relationship with a foreign key constraint on the child table then orphans are less likely to occur, but you can still end up with orphans; for example if a user disables foreign_key_checks in a session and deletes a parent row. If you choose not to create foreign keys in your database in order to avoid the performance overhead, then you are at higher risk of having orphaned rows.

Finding orphaned rows

Finding orphaned rows is fairly simple either using an exclusion join or a NOT EXISTS query.

Using the sakila database, here are two queries to count rows in the payment table that have an invalid rental_id:

-- count orphaned payments using exclusion join
select count(*) 
from sakila.payment as child
  left outer join sakila.rental as parent on parent.rental_id = child.rental_id 
where child.rental_id is not null 
  and parent.rental_id is null;
 
-- count orphaned payments using NOT EXISTS
select count(*) 
from sakila.payment as child 
where child.rental_id is not null 
  and not exists (
  select NULL from sakila.rental as parent where parent.rental_id = child.rental_id
);

What about NULLs?

The treatment of child rows with a NULL parent_id will vary depending on the requirements of your application, but in general I do not count NULL values as orphans. Thus in my queries above I explicitly check that child.rental_id is not NULL.

What not to do

I’ve also seen people write queries like the following to find orphaned rows. In general this will be slower, especially if you are running an older version of MySQL, so I do not recommend this method:

-- count orphaned payments using NOT IN
select count(*) 
from sakila.payment as child 
where rental_id not in (select rental_id from sakila.rental);

Next Steps

What should I do with orphaned rows once I find them? That depends on my specific application. The correct response could be one of the following:

  • Insert the missing parent row(s)
  • Update the child row(s) to reference an existing parent row
  • Delete the orphaned row(s) from the child table

For the purposes of this post, I will assume that I want to delete the orphaned row(s).

Deleting orphaned rows

It’s fairly simple to take the queries I used to find orphaned rows and rewrite them as delete statements:

-- delete orphaned payments using exclusion join
delete child.*
from sakila.payment as child
  left outer join sakila.rental as parent on parent.rental_id = child.rental_id 
where child.rental_id is not null 
  and parent.rental_id is null;
 
-- delete orphaned payments using NOT EXISTS
delete child.*
from sakila.payment as child 
where child.rental_id is not null 
  and not exists (
  select NULL from sakila.rental as parent where parent.rental_id = child.rental_id
);

Those queries are sufficient for deleting a small number of rows. If I need to delete a large number of orphaned rows then I will break the delete statement into multiple chunks. Here’s an example using common_schema:

call common_schema.run("split(
  sakila.payment: delete sakila.payment.* 
    from sakila.payment  
    left outer join sakila.rental on rental.rental_id = payment.rental_id 
    where payment.rental_id is not null   
    and rental.rental_id is null
) pass;");

Dynamically generated orphan queries

If you use foreign keys to maintain referential integrity then it’s possible to dynamically generate queries like the ones from earlier in this post by querying information_schema.

For example, here’s a query on information_schema to generate queries to count orphaned rows based on all 22 foreign keys in the sakila schema:

select concat('select count(*) as "',
  table_schema,'.',table_name,
  ' -> ',
  referenced_table_schema,'.',referenced_table_name,  
  ' (',
  constraint_name,
  ') orphan count" from ',
  table_schema,'.',table_name,
  ' as child left outer join ',referenced_table_schema,'.',referenced_table_name,' as parent on ',
  group_concat(concat('parent.',referenced_column_name,' = child.',column_name) separator ' and '),
  ' where ',
  group_concat(concat('child.',column_name,' is not null') separator ' and '),
  ' and ',
  group_concat(concat('parent.',referenced_column_name,' is null') separator ' and '),
  ';') as sql_stmt
from
  information_schema.key_column_usage 
where
  referenced_table_name is not null
  and table_schema = 'sakila'
group by table_schema,table_name,referenced_table_schema,referenced_table_name,constraint_name
order by table_schema,table_name,referenced_table_schema,referenced_table_name,constraint_name;

I wrote the query in such a way that it should work with multiple column foreign keys even though the sakila schema doesn’t have any such foreign keys.

Using dynamic queries makes it easier to script the process of counting orphans, and potentially even deleting them.

Testing it out

The sakila database doesn’t have any orphaned payment rows by default (unless you count the 5 rows with a NULL rental_id), so I’m going to create some in order to test out my queries.

First I delete 1,000 random rows from the rental table with foreign_key_checks disabled in order to orphan 1,000 payment rows:

mysql> set foreign_key_checks = 0;
Query OK, 0 rows affected (0.00 sec)
 
mysql> delete from sakila.rental 
    -> order by rand() 
    -> limit 1000;
Query OK, 1000 rows affected (0.05 sec)

Now I count the orphaned payment rows using the dynamically generated orphan counter from my previous example:

mysql> select count(*) as "sakila.payment -> sakila.rental (fk_payment_rental) orphan count" 
    -> from sakila.payment as child 
    -> left outer join sakila.rental as parent on parent.rental_id = child.rental_id 
    -> where child.rental_id is not null 
    -> and parent.rental_id is null;
+------------------------------------------------------------------+
| sakila.payment -> sakila.rental (fk_payment_rental) orphan count |
+------------------------------------------------------------------+
|                                                             1000 |
+------------------------------------------------------------------+
1 row in set (0.01 sec)

And I delete the 1,000 orphaned rows using one of the delete statements from my previous examples:

mysql> -- delete orphaned payments using exclusion join
mysql> delete child.*
    -> from sakila.payment as child
    ->   left outer join sakila.rental as parent on parent.rental_id = child.rental_id 
    -> where child.rental_id is not null 
    ->   and parent.rental_id is null;
Query OK, 1000 rows affected (0.09 sec)

Re-running the orphan counter shows that there are now 0 orphaned payment rows:

mysql> select count(*) as "sakila.payment -> sakila.rental (fk_payment_rental) orphan count" 
    -> from sakila.payment as child 
    -> left outer join sakila.rental as parent on parent.rental_id = child.rental_id 
    -> where child.rental_id is not null 
    -> and parent.rental_id is null;
+------------------------------------------------------------------+
| sakila.payment -> sakila.rental (fk_payment_rental) orphan count |
+------------------------------------------------------------------+
|                                                                0 |
+------------------------------------------------------------------+
1 row in set (0.02 sec)
mysql/finding_and_deleting_orphaned_rows.txt · Last modified: 2020/07/15 10:30 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki