php - Compare 200.000 texts from DB -


i have mysql db , solr instance, exact copy of mysql db.

we have lot of duplicated texts (or texts simular more 80%). texts needs bundled on. in php know there functions calculate simulairities.

my problem is follows:

how can smart (read: fast possible) compare every new text, existing text in db? dont think practice read 200.000 records db/solr compare 1 new item, , next thing, again.

you install mysql plugin implementing levenshtein distance algorithm mysql , let database comparison.


Comments

Popular posts from this blog

matlab - Deleting rows with specific rules -

jquery - How would i go about shortening this code? And to cancel the previous click on click of new section? -