Just yesterday on hackers, a post by Kevin Grittner mentionned using pg_trgm to find similar sentences in a text. His example involved using “War and Peace” (which is under the public domain). When trying to tune queries, you might want to use always the same set of examples to actually analyze easily if there is a potential gain for your query. Just by thinking about that, using classic books available directly is a great way for people to evaluate the performance of a text search algorithm or a given application implementation. This could be even useful to compare the performance of several database systems regarding index scans because the data is the same, only counts the search speed, assuming that all the systems have been tuned to scan data the same way (for example all data on memory).
Read more...