python - How can i decode JSON-like string in cyrillic? -
i'm trying create simple spider in scrapy adverts site. problem adverts in cyrillic strings that:
1-\u043a\u043e\u043c\u043d\u0430\u0442\u043d\u0430\u044f \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430
here's spider's code:
def parse_advert(self, response): x = htmlxpathselector(response) advert = advertitem() advert['title'] = x.select("//h1/text()").extract() advert['phone'] = "111111111111" advert['text'] = "text text text text text text" filename = response.url.split("/")[-2] open(filename, 'wb').write(str(advert['title']))
is there way "translate" string on fly?
thanks.
use str.decode('unicode-escape')
:
>>> print r'1-\u043a\u043e\u043c\u043d\u0430\u0442\u043d\u0430\u044f \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430' 1-\u043a\u043e\u043c\u043d\u0430\u0442\u043d\u0430\u044f \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430 >>> print r'1-\u043a\u043e\u043c\u043d\u0430\u0442\u043d\u0430\u044f \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430'.decode('unicode-escape') 1-комнатная квартира
Comments
Post a Comment