Tuesday, February 05, 2008

PostgreSQL 8.3 on OS X: I like the full text indexing/search features

I built the latest version from source, with one problem: I was only able to install readline from source using "--disable-shared" so I ended up also building PostgreSQL statically linked - oh well so much for being in hurry, I have 2 gigs of RAM on my MacBook, so what is a little memory between friends :-)

I have been waiting for version 8.3 because of the full text indexing/search features. Here is the Text Search documentation - enjoy! Here is a little sample of the SQL extensions to support indexing and search:
test=# create table test (id integer, name varchar(30), email varchar(30));
CREATE TABLE

test=# create index test_name_idx on test using gin(to_tsvector('english', name));
CREATE INDEX
test=# insert into test values (1, 'Mark Watson', 'mark@mark.com');
INSERT 0 1
test=# insert into test values (2, 'Carol Watson', 'carol@mark.com');
INSERT 0 1
test=# select * from test where to_tsvector(name) @@ to_tsquery('mark');
id | name | email
----+-------------+---------------
1 | Mark Watson | mark@mark.com
(1 row)

test=# select * from test where to_tsvector(name) @@ to_tsquery('watsons');
id | name | email
----+--------------+----------------
1 | Mark Watson | mark@mark.com
2 | Carol Watson | carol@mark.com
(2 rows)

test=# test=# select * from test where to_tsvector(name) @@ to_tsquery('mark & watson');
id | name | email
----+-------------+---------------
1 | Mark Watson | mark@mark.com
(1 row)

test=# select * from test where to_tsvector(name) @@ to_tsquery('mark | watson');
id | name | email
----+--------------+----------------
1 | Mark Watson | mark@mark.com
2 | Carol Watson | carol@mark.com
(2 rows)

test=#
Obviously, if you were creating a new table with many rows, add the index after the data is added to the table. "gin" refers to a complete inverted word index. Specifying 'english' ensures that a word stemmer if used that understands English language conventions. Note that a search for 'watsons' matches because the search terms are stemmed before search.

The search syntax looks odd, but I expect to get used to it quickly. For Rails: I use "acts_like_ferret" a lot; I'll wait a month to see if any handy plugin is written for PostgreSQL specific search - I would rather that someone else write it. I need to check out acts_as_tsearch, but I don't think that it is updated yet to work with the final 8.3 release.

No comments: