igorclauss.de open_in_new

SQL - Find and replace all occurrences of a substring in a database text field


Given a long enough amount of time the situation might arise where we have a database with entries containing outdated substrings. Sounds incomprehensible? Imagine having URLs in a blog posts table. The domain might change over the years and ofcourse we want our posts to lead to the new URL. Having a huge amount of rows in the posts table we might not know all occurrences of the old URL in the dataset and for sure we lazy people don't want to change them manually.

Fortunately there are different solutions to approach this problem. Let's dig into it.

Changing the data manually

We might just have a small dataset in this case changing the data manually by using SQL search querys would surely be the fastest solution. But only if the dataset is small, if it exceeds an amount of posts it will be hell.

We need a different solution for this case.

Find and replace with a texteditor

When working on a search and replace script it is highly recomended to backup the database before editing. A typo or error can cause several issues, which we wouldn't even know where to search at.

This said, we can use a SQL dump to perform string operations. Just open your backup with an texteditor (e.g. Visual Studio Code) and use the inbuilt search and replace functionality. After changes are made we can upload the backup to the live site.

But to be honest this solution kind of makes me feel unconfortable. Maybe because it means killing a database and resurrecting it by backup, where this is not needed, only to change some values. Hey, we're working with an SQL database, there must be a SQL statement to solve this.

SQL statement to find and replace all occurrences of a substring in a database text field

This surely will be the best solution for the general case, just define the find-string and the replace-string and let the database do the work. SQL provides a REPLACE function to perform this kind of string operations.

UPDATE table SET column = REPLACE(column, 'old string', 'new string');

It's as easy as that. Done.

Be carefull when defining the old string!

What about replacement by regular expretions?

There have been multiple plugins built to allow this in the past. There are even pretty popular once for *cough* Wordpress *cough*. But you might won't need to do the search for one, because luckily MySQL implemented REGEXP_REPLACE starting from version 8.0 natively.

 REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])

Quoting the MySQL documentation:

occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.

Digging into it further we can read more about the optional arguments:

  • pos: The position in expr at which to start the search. If omitted, the default is 1.
  • occurrence: Which occurrence of a match to replace. If omitted, the default is 0 (which means “replace all occurrences”).
  • match_type: A string that specifies how to perform matching.

Possible match types are:

  • c: Case sensitive matching.
  • i: Case insensitive matching.
  • m: Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression.
  • n: The . character matches line terminators. The default is for . matching to stop at the end of a line.
  • u: Unix-only line endings. Only the newline character is recognized as a line ending by the ., ^, and $ match operators.

This seems kind of complicated to me. Let's try it by example.

UPDATE table SET column = REGEXP_REPLACE(column, '/our-regex-pattern/', 'new string', 1, 0, 'c');

This will replace all case sensitive matches by /our-regex-pattern/ on column by new string starting at the first character.


Einen Kommentar posten

Kommentare zu diesem Beitrag