python - Always run a constant number of subprocesses in parallel -
i want use subprocesses let 20 instances of written script run parallel. lets have big list of urls 100.000 entries , program should control time 20 instances of script working on list. wanted code follows:
urllist = [url1, url2, url3, .. , url100000] i=0 while number_of_subproccesses < 20 , i<100000: subprocess.popen(['python', 'script.py', urllist[i]] = i+1 my script writes database or textfile. doesnt output , dont need more input url.
my problem wasnt able find how number of subprocesses active. im novice programmer every hint , suggestion welcome. wondering how can manage once 20 subprocesses loaded while loop checks conditions again? thought of maybe putting while loop on it, like
while i<100000 while number_of_subproccesses < 20: subprocess.popen(['python', 'script.py', urllist[i]] = i+1 if number_of_subprocesses == 20: sleep() # wait time until check again or maybe theres bette possibility while loop checking on number of subprocesses?
i considered using module multiprocessing, found convenient call script.py subprocessing instead of function multiprocessing.
maybe can me , lead me right direction. alot!
taking different approach above - seems callback can't sent parameter:
nexturlno = 0 maxprocesses = 20 maxurls = 100000 # note better len(urllist) processes = [] def startnew(): """ start new subprocess if there work """ global nexturlno global processes if nexturlno < maxurls: proc = subprocess.popen(['python', 'script.py', urllist[nexturlno], onexit]) print ("started process %s", urllist[nexturlno]) nexturlno += 1 processes.append(proc) def checkrunning(): """ check running processes , start new ones if there spare slots.""" global processes global nexturlno p in range(len(processes):0:-1): # check processes in reverse order if processes[p].poll() not none: # if process hasn't finished return none del processes[p] # remove list - why needed reverse order while (len(processes) < maxprocesses) , (nexturlno < maxurls): # more , spare slots startnew() if __name__ == "__main__": checkrunning() # start max processes running while (len(processes) > 0): # thing still going on. time.sleep(0.1) # may wish change time checkrunning() print ("done!")
Comments
Post a Comment