编辑:想出来 . 我刚刚做了以下事情:
import sys
sys.setrecursionlimit(1500) #This increases the recursion limit, ultimately moving
#up the ceiling on the stack so it doesn't overflow.
查看此帖子了解更多信息:What is the maximum recursion depth in Python, and how to increase it?
--------------原始问题-----------------
我成功地使用re.findall以我正在搜索的格式拉日期,但是一旦我到达第33个链接,我得到"Maximum recursion depth exceeded while calling a Python object"错误并且它一直指向 dates = re.findall(regex, str(webpage)) 对象 .
根据我的阅读,我需要在我的代码中使用一个循环,以便我可以摆脱递归,但作为一个新手, I'm unsure how to change the piece of code dealing with the RegEx and re.findall from recursive to iterative. 提前感谢您的任何见解 .
import urllib2
from bs4 import BeautifulSoup as BS
import re
#All code is correct between imports and the start of the For loop
for url in URLs:
...
#Open and read the URL and specify html.parser as the parsing agent so that the parsing method remains uniform across systems
webpage = BS(urllib2.urlopen(req).read(), "html.parser")
#Create a list to store the dates to be searched
regex = []
#Append to a list those dates that have the end year "2011"
regex.append("((?:January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Sept|Oct|Nov|Dec)[\.]*[,]*[ ](?:0?[1-9]|[12][0-9]|3[01])[,|\.][ ](?:(?:20|'|`)[1][1]))")
#Join all the dates matched on the webpage from the regex by a comma
regex = ','.join(regex)
#Find the matching date format from the opened webpage
#[Recursion depth error happens here]
dates = re.findall(regex, str(webpage))
#If there aren't any dates that match, then go to the next link
if dates == []:
print "There was no matching date found in row " + CurrentRow
j += 1
continue
#Print the dates that match the RegEx and the row that they are on
print "A date was found in the link at row " + CurrentRow
print dates
j += 1
2 回答
我不认为
regex.append("...")
正在做你认为应该做的事情 .
然后调用append方法,regex现在是一个保存正则表达式的单元素数组 . 以下连接向我表明您认为它应该是一个多元素数组 .
一旦你解决了这个问题,我怀疑你的代码会更好用 .
继续我的评论,你可以做的是创建许多不同的模式并遍历每个模式而不是使用一个具有许多不同
OR
语句的模式 . 像这样的东西可能会起作用:这是一种更加迭代的方式,但这非常慢 . 这是因为它将每个月运行re.findall类型EVERY SINGLE WEBPAGE . 正如您所看到的,如果您在问题中至少有33个链接,那么
24*33
将运行re.findall
. 此外,我不是任何方式的python专家,我甚至不完全确定这个解决方案将完全摆脱你的问题 .