首页 文章

Python多请求URL

提问于
浏览
0

Hi Guys, 我是python的新手(即时开始不到2周)所以我需要一些建议和技巧:p

什么是获取大约1500 api请求的最快和最有效的方法?

  • 使用异步函数执行它们并返回以获得结果?

  • 将它们分成300个url的列表并将每个列表放在一个Thread中,它将在异步循环中执行它们?

  • 与第二个建议做同样的事情,但使用Processes而不是Threads?

for the moment it's working for me but it takes something like 8s to execute 1400 api requests but when i try a single request without threads it takes 9s im i doing something wrong ??!

获取一个URL(我试图将会话作为参数传递但是在达到700个请求时出现错误)

async def fetch_one(url):
    async with curio_http.ClientSession() as session:
        response = await session.get(url)
        content =  await response.json()
        return  content

获取异步循环内的URL列表

async def fetchMultiURLs(url_list):
        tasks = []
        responses = []
        for url in url_list:
            task = await curio.spawn(fetch_one(url))
            tasks.append(task)

        for task in tasks:
            content = await task.join()
            responses.append(content)
            print(content)

创建线程并在其中放入异步循环,具体取决于URL / X URL by Loop

例如MultiFetch(URLS [600],200)将创建3个线程,它们将以线程和异步方式调用200个请求

def MultiFetch(URLS,X):
    MyThreadsList = []
    MyThreadsResults = []
    N_Threads = (lambda x:  int (x/X) if (x % X == 0) else int(x/X)+1) (len(URLS))
    for i in range( N_Threads  ): # will iterate X = ListSize / X
              MyThreadsList.append( Thread(  target = curio.run , args = (fetchMultiURLs( (URLS[ i*X:(X*i+X)]) )  ,)    )  )
              MyThreadsList[i].start()
    for i in range( N_Threads  ):
              MyThreadsResults.append(MyThreadsList[i].join())
    return MyThreadsResults

1 回答

  • 0

    Finaly i found a solution :) 获取1400个网址需要2.2秒

    我使用了3ed建议(进程内的异步循环)

    # Fetch 1 URL

    async def fetch_one(url):
        async with curio_http.ClientSession() as session:
            response = await session.get(url)
            content =  await response.json()
            return  content
    

    # Fetch X URLs async def fetchMultiURLs(url_list):tasks = [] answers = [] for url in url_list:task = await curio.spawn(fetch_one(url))tasks.append(task)

    for task in tasks:
                content = await task.join()
                responses.append(content)
            return responses
    

    # i tried to put lambda instead of this function but it not working

    def RuningCurio(X):
        return curio.run(fetchMultiURLs(X))
    

    # Create Processes and Async Loops depending on URLs / X URL by Loop

    # in my case (im using a VPS) a single Process can easly fetch 700 links in less than 1s , so dont make multiProcesses under this number of urls (just use the fetchMultiURLs function)

    def MultiFetch(URLS,X):
        MyListofLists = []
        LengthURLs = len(URLS)
        N_Process = int (LengthURLs / X) if ( LengthURLs % X == 0) else int( LengthURLs / X) + 1
        for i in range( N_Process  ): # Create a list of lists (  [ [1,2,3],[4,5,6],[7,8,9] ] )
            MyListofLists.append(URLS[ i*X:(X*i+X)])
        P = Pool( N_Process) 
        return  P.map( RuningCurio ,MyListofLists)
    

    # im fetching 2100 urls in 1.1s i hope this Solution will help you Guys

相关问题