python - Take certain words and print the frequency of each phrase/word? -


i have file has list of bands , album , year produced. need write function go through file , find different names of bands , count how many times each of bands appear in file.

the way file looks this:

beatles - revolver (1966) nirvana - nevermind (1991) beatles - sgt pepper's lonely hearts club band (1967) u2 - joshua tree (1987) beatles - beatles (1968) beatles - abbey road (1969) guns n' roses - appetite destruction (1987) radiohead - ok computer (1997) led zeppelin - led zeppelin 4 (1971) u2 - achtung baby (1991) pink floyd - dark side of moon (1973) michael jackson -thriller (1982) rolling stones - exile on main street (1972) clash - london calling (1979) u2 - can't leave behind (2000) weezer - pinkerton (1996) radiohead - bends (1995) smashing pumpkins - mellon collie , infinite sadness (1995) . . . 

the output has in descending order of frequency , this:

band1: number1 band2: number2 band3: number3 

here code have far:

def read_albums(filename) :      file = open("albums.txt", "r")     bands = {}     line in file :         words = line.split()         word in words:             if word in '-' :                 del(words[words.index(word):])         string1 = ""         in words :             list1 = []              string1 = string1 + + " "             list1.append(string1)         k in list1 :             if (k in bands) :                 bands[k] = bands[k] +1             else :                 bands[k] = 1       word in bands :         frequency = bands[word]         print(word + ":", len(bands)) 

i think there's easier way this, i'm not sure. also, i'm not sure how sort dictionary frequency, need convert list?

you right, there easier way, counter:

from collections import counter  open('bandfile.txt') f:    counts = counter(line.split('-')[0].strip() line in f if line)  band, count in counts.most_common():     print("{0}:{1}".format(band, count)) 

what doing: line.split('-')[0].strip() line in f if line?

this line long form of following loop:

temp_list = [] line in f:     if line: # makes sure skip blank lines       bits = line.split('-')       temp_list.add(bits[0].strip())  counts = counter(temp_list) 

unlike loop above - doesn't create intermediary list. instead, creates generator expression - more memory efficient way step through things; used argument counter.


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -