python - Remove all elements which occur in less than 1% and more than 60% of the list -
if have list of strings:
['fsuy3,fsddj4,fsdg3,hfdh6,gfdgd6,gfdf5', 'fsuy3,fsuy3,fdfs4,sdgsdj4,fhfh4,sds22,hhgj6,xfsd4a,asr3'] (big list)
how can remove words occur in less 1% , more 60% of strings?
you can use collections.counter:
counts = counter(mylist) and then:
newlist = [s s in mylist if 0.01 < counts[s]/len(mylist) < 0.60] (in python 2.x use float(counts[s])/len(mylist))
if you're talking comma-seperated words, can use similar approach:
words = [l.split(',') l in mylist] counts = counter(word l in words word in l) newlist = [[s s in l if 0.01 < counts[s]/len(mylist) < 0.60] l in words]
Comments
Post a Comment