清風竹林

ぷ雪飄絳梅映殘紅
ぷ花舞霜飛映蒼松
----- Do more,suffer less

統計

隨筆 - 68
文章 - 0
評論 - 110
引用 - 0

常用鏈接

留言簿(5)

隨筆分類

隨筆檔案

相冊

picture

TLink

搜索

閱讀排行榜

評論排行榜

Python Challenge lv4: follow the chain

題目鏈接： http://www.pythonchallenge.com/pc/def/linkedlist.php

說實話，好不容易通過google搞清楚題目的要求：通過不斷的從服務器取得一個web page，然后從源碼中找出下一個鏈接的地址。需要注意的是：雖然頁面的源碼很簡單，但并不是其中所有的數字都是有效的，需要使用正則表達式找出正確的pattern形式才可以，對本題而言r'nothing is (\d+)'是一個可用的pattern，使用''.join([x for x in text if x.isdigit()] 將所有的數字都粘連起來了，結果跟蹤到4000多還沒結束，才知道上當了。。。

import re

import urllib.request

if __name__ == '__main__':

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing='

index = '17675'

counter = 1

pattern = re.compile(r'nothing is (\d+)')

while True:

try:

request= urllib.request.Request(url+index)

# my pc must use proxy to connect

request.set_proxy('172.16.0.252:80', 'http')

response= urllib.request.urlopen(request)

content=str(response.read().decode())

response.close()

print(counter, content)

result = pattern.search(content)

if not result:

break

index = result.group(1)

counter += 1

except Exception as ex:

print(ex)

break

程序輸出：
1 and the next nothing is 8511
2 and the next nothing is 89456
3 and the next nothing is 43502
4 and the next nothing is 45605
5 and the next nothing is 12970
6 and the next nothing is 91060
7 and the next nothing is 27719
8 and the next nothing is 65667
9 peak.html

得到下一個題目的地址peak.html (注：我的index初始值是17675，題目中最早給出的可不是這個值，我是從地址列表的后一部分選了一個數字而已，因此不要擔心)

posted on 2009-05-11 16:05 李現民閱讀(600) 評論(2) 編輯收藏引用所屬分類: python

# re: Python Challenge lv4: follow the chain[未登錄] 2011-05-31 20:17 simon

# re: Python Challenge lv4: follow the chain 2011-06-01 10:04 李現民

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品