我想为特定链接抓取多个页面 . 例如,我希望能够选择具有特定迭代次数的链接 . 必须在用户输入后附加或替换初始输入的刮擦结果 . 我有:

#url = raw_input('Enter - ')
url = 'http://www.columbia.edu/kermit/k95.html'
itr = raw_input('Enter iteration: ')
i = int(itr)

n = raw_input('Enter Number: ')
n = int(n)

html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
tags = soup('a')

print 'Link:' , url
while i > 0:
    i = i - 1
    if i == 0:
        break
    for tag in tags:  
        me = tag.get('href', None)
        #Just to make sure the link/content match print tag.contents[0]
        link = tags[(n - 1)]
        #print link 
    links = link.get('href', None)
    print 'Link:', links

Enter - http://www.columbia.edu/~fdc/
Enter count: 4
Enter Position: 9
Link: http://www.columbia.edu/~fdc/
Link: http://www.columbia.edu/kermit/k95.html
Link: http://www.columbia.edu/kermit/k95.html (Should be k95faq.html)
Link: http://www.columbia.edu/kermit/k95.html (Should be ckfaq.html)

我得到了我想要的迭代次数和特定的链接,但是我需要第一个url(用户输入)用每个迭代的变量“links”下的链接替换 .

示例将是用户输入类似http://www.columbia.edu/~fdc/的URL,其中页面上的第9个链接有4次迭代 . 第一次迭代将返回http://www.columbia.edu/kermit/k95.html作为"links" . 我想第二次迭代给我"links"上的第9个链接,它应该是k95faq.html