php - Compare 200.000 texts from DB -
i have mysql db , solr instance, exact copy of mysql db.
we have lot of duplicated texts (or texts simular more 80%). texts needs bundled on. in php know there functions calculate simulairities.
my problem is follows:
how can smart (read: fast possible) compare every new text, existing text in db? dont think practice read 200.000 records db/solr compare 1 new item, , next thing, again.
you install mysql plugin implementing levenshtein distance algorithm mysql , let database comparison.
Comments
Post a Comment