本文共 4051 字,大约阅读时间需要 13 分钟。
学习Python爬虫过程中的心得体会以及知识点的整理,方便我自己查找,也希望可以和大家一起交流。
import multiprocessingimport timedef worker_1(interval): print ("worker_1") time.sleep(interval) print ("end worker_1")def worker_2(interval): print ("worker_2") time.sleep(interval) print ("end worker_2")def worker_3(interval): print ("worker_3") time.sleep(interval) print ("end worker_3")if __name__ == "__main__": p1 = multiprocessing.Process(target = worker_1, args = (2,)) p2 = multiprocessing.Process(target = worker_2, args = (3,)) p3 = multiprocessing.Process(target = worker_3, args = (4,)) p1.start() p2.start() p3.start() print("The number of CPU is:" + str(multiprocessing.cpu_count())) for p in multiprocessing.active_children(): print("child p.name:" + p.name + "\tp.id" + str(p.pid)) print ("END!!!!!!!!!!!!!!!!!")
结果如图:
在windows系统下使用Python3的IDLE进行编译会出现子进程进行不了的问题,针对这个问题官方回复是:
Well, IDLE is a strange thing. In order to “capture” everything what you write using print statements orsys.stdout.write, IDLE “overrides” sys.stdout and replaces it with an object that passes everything back to IDLE so it can print it. I guess when you are starting a new process from multiprocessing, this hackery is not inherited by the child process, therefore you don’t see anything in IDLE. But I’m just guessing here, I don’t have a Windows machine at the moment to check it. – Tamás May 6 '10 at 9:10
也就是说由于Windows安全机制以及IDLE设计的问题,这个没办法搞定,只能在命令行模式下运行正常。所以这个部分我们将就在命令模式下运行。
当没有将父进程设置为守护进程时:
import multiprocessingimport time#不加daemondef worker(interval): print("work start:{0}".format(time.ctime())); time.sleep(interval) print("work end:{0}".format(time.ctime()));if __name__ == "__main__": p = multiprocessing.Process(target = worker, args = (3,)) p.start() print "end!"
结果如图:
我们可以看到是先进行的父进程再进行的子进程。 那么我们将父进程设置为守护进程时:import multiprocessingimport time#加上daemondef worker(interval): print("work start:{0}".format(time.ctime())); time.sleep(interval) print("work end:{0}".format(time.ctime()));if __name__ == "__main__": p = multiprocessing.Process(target = worker, args = (3,)) p.daemon = True p.start() print "end!"
结果如图:
可以看到,当父进程为守护进程时,父进程一旦结束,子进程便不再进行。当然我们还可以设置daemon执行完结束的方法:import multiprocessingimport time#设置daemon执行完结束的方法def worker(interval): print("work start:{0}".format(time.ctime())); time.sleep(interval) print("work end:{0}".format(time.ctime()));if __name__ == "__main__": p = multiprocessing.Process(target = worker, args = (3,)) p.daemon = True p.start() p.join() print "end!"
结果如图:
p.join()的作用就是告诉电脑,等子进程执行结束在运行父进程。import multiprocessingimport os, time, randomdef Lee(): print ("\nRun task Lee-%s" %(os.getpid())) #os.getpid()获取当前的进程的ID start = time.time() time.sleep(random.random() * 10) #random.random()随机生成0-1之间的小数 end = time.time() print ('Task Lee, runs %0.2f seconds.' %(end - start))def Marlon(): print ("\nRun task Marlon-%s" %(os.getpid())) start = time.time() time.sleep(random.random() * 40) end=time.time() print ('Task Marlon runs %0.2f seconds.' %(end - start))def Allen(): print ("\nRun task Allen-%s" %(os.getpid())) start = time.time() time.sleep(random.random() * 30) end = time.time() print ('Task Allen runs %0.2f seconds.' %(end - start))def Frank(): print ("\nRun task Frank-%s" %(os.getpid())) start = time.time() time.sleep(random.random() * 20) end = time.time() print ('Task Frank runs %0.2f seconds.' %(end - start))if __name__=='__main__': function_list= [Lee, Marlon, Allen, Frank] print ("parent process %s" %(os.getpid())) pool=multiprocessing.Pool(2) for func in function_list: pool.apply_async(func) #Pool执行函数,apply执行函数,当有一个进程执行完毕后,会添加一个新的进程到pool中 print ('Waiting for all subprocesses done...') pool.close() pool.join() #调用join之前,一定要先调用close() 函数,否则会出错, close()执行后不会有新的进程加入到pool,join函数等待素有子进程结束 print ('All subprocesses done.')
结果如图:
转载地址:http://wdazi.baihongyu.com/