Has it really been over a week since last I blogged? Oh well ... I've been busy both at work and at home.

Something I've been working on of late is a better routine to split a string into individual words in T-SQL. You might remember my previous effort, which works really well. This one works even better (although it may be a little slower), because it allows for any number of "delimiter" characters (in the @delims variable).

I know a few of my readers are absolute SQL guns, so if there's something in here that doesn't look right, or could work better, feel free to call me on it in the comments section.

Update! It turns out that I need to move the hyphen character in @delims to the end of the string - otherwise it's treating it as a range operator, and matching everything between the characters on either side of it!

CREATE FUNCTION SplitWords(@text varchar(8000))
RETURNS @words TABLE (
pos smallint,
value varchar(8000)
)
AS
BEGIN
SET @text = RTRIM(@text)
 
DECLARE @delims varchar(10)
SET @delims = ' ,:/-'
DECLARE @pos smallint, @i smallint, @s varchar(8000)
 
SET @pos = 1
 
WHILE @pos < LEN(@text)
AND CHARINDEX(SUBSTRING(@text, @pos, 1), @delims) > 0
SET @pos = @pos + 1
 
WHILE @pos <= LEN(@text)
BEGIN
SET @i = PATINDEX('%[' + @delims + ']%',
SUBSTRING(@text, @pos, len(@text) - @pos + 1))
IF @i > 0
BEGIN
SET @i = @i + @pos - 1
IF @i > @pos
BEGIN
-- @i now holds the earliest delimiter in the string
SET @s = SUBSTRING(@text, @pos, @i - @pos)
INSERT INTO @words
VALUES (@pos, @s)
END
SET @pos = @i + 1
 
WHILE @pos < LEN(@text)
AND CHARINDEX(SUBSTRING(@text, @pos, 1), @delims) > 0
SET @pos = @pos + 1
END
ELSE
BEGIN
SET @s = SUBSTRING(@text, @pos, LEN(@text) - @pos + 1)
INSERT INTO @words
VALUES (@pos, @s) SET @pos = LEN(@text) + 1
END
END
-- remove common words that we don't want to search for
DELETE FROM @words
WHERE value IN ('an', 'the', 'of', 'and', '&')
RETURN
END