清風(fēng)竹林

ぷ雪飄絳梅映殘紅
ぷ花舞霜飛映蒼松
----- Do more,suffer less

導(dǎo)航

<

2009年5月

>

日

一

二

三

四

五

六

26

27

28

29

30

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

1

2

3

4

5

6

統(tǒng)計

隨筆 - 68
文章 - 0
評論 - 110
引用 - 0

常用鏈接

留言簿(5)

隨筆分類

隨筆檔案

相冊

picture

TLink

搜索

閱讀排行榜

評論排行榜

Python Challenge lv4: follow the chain

題目鏈接： http://www.pythonchallenge.com/pc/def/linkedlist.php

說實話，好不容易通過google搞清楚題目的要求：通過不斷的從服務(wù)器取得一個web page，然后從源碼中找出下一個鏈接的地址。需要注意的是：雖然頁面的源碼很簡單，但并不是其中所有的數(shù)字都是有效的，需要使用正則表達式找出正確的pattern形式才可以，對本題而言r'nothing is (\d+)'是一個可用的pattern，使用''.join([x for x in text if x.isdigit()] 將所有的數(shù)字都粘連起來了，結(jié)果跟蹤到4000多還沒結(jié)束，才知道上當(dāng)了。。。

import re

import urllib.request

if __name__ == '__main__':

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing='

index = '17675'

counter = 1

pattern = re.compile(r'nothing is (\d+)')

while True:

try:

request= urllib.request.Request(url+index)

# my pc must use proxy to connect

request.set_proxy('172.16.0.252:80', 'http')

response= urllib.request.urlopen(request)

content=str(response.read().decode())

response.close()

print(counter, content)

result = pattern.search(content)

if not result:

break

index = result.group(1)

counter += 1

except Exception as ex:

print(ex)

break

程序輸出：
1 and the next nothing is 8511
2 and the next nothing is 89456
3 and the next nothing is 43502
4 and the next nothing is 45605
5 and the next nothing is 12970
6 and the next nothing is 91060
7 and the next nothing is 27719
8 and the next nothing is 65667
9 peak.html

得到下一個題目的地址peak.html (注：我的index初始值是17675，題目中最早給出的可不是這個值，我是從地址列表的后一部分選了一個數(shù)字而已，因此不要擔(dān)心)

posted on 2009-05-11 16:05 李現(xiàn)民閱讀(596) 評論(2) 編輯收藏引用所屬分類: python

只有注冊用戶登錄后才能發(fā)表評論。


相關(guān)文章: Python Challenge lv5: peak hell Python Challenge lv4: follow the chain Python Challenge lv3: re Python Challenge lv2: ocr Python Challenge lv1: What about making trans?

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

# re: Python Challenge lv4: follow the chain[未登錄] 2011-05-31 20:17 simon

# re: Python Challenge lv4: follow the chain 2011-06-01 10:04 李現(xiàn)民

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品