Postgresql full text search tokenizer -


just run issue. i'm trying set full text search on localized content (russian in particular). problem default configuration (as custom) not deal letter cases. example:

select * to_tsvector('test_russian', 'На рынке появились новые рублевые облигации'); > 'На':1 'новые':4 'облигации':6 'появились':3 'рублевые':5 'рынке':2 

'На' stopword , should removed, not lowercased in result vector. if pass lowercased string, works properly

select * to_tsvector('test_russian', 'на рынке появились новые рублевые облигации'); > 'новые':4 'облигации':6 'появились':3 'рублевые':5 'рынке':2 

sure can pass pre-lowercased strings, manual says

the simple dictionary template operates converting input token lower case , checking against file of stop words.

config russian_test looks this:

create text search configuration test_russian (copy = 'russian');  create text search dictionary russian_simple (     template = pg_catalog.simple,     stopwords = russian );  create text search dictionary russian_snowball (     template = snowball,     language = russian,     stopwords = russian );  alter text search configuration test_russian      alter mapping word     russian_simple,russian_snowball; 

but same results built-in russian config.

i tried ts_debug , tokens treated word, expected.

any ideas?

problem solved. reason database initiated default ("c") ctype , collate. used

initdb --locale=utf-8 --lc-collate=utf-8 --encoding=utf-8 -u pgsql *pgsql data dir*  

to recreate instance ,

create database "scratch"   owner "postgres"   encoding 'utf8'   lc_collate = 'ru_ru.utf-8'   lc_ctype = 'ru_ru.utf-8'; 

to recreate db , simple dictionary works.


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -