清風(fēng)竹林

ぷ雪飄絳梅映殘紅
ぷ花舞霜飛映蒼松
----- Do more,suffer less

導(dǎo)航

<

2009年5月

>

日

一

二

三

四

五

六

26

27

28

29

30

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

1

2

3

4

5

6

統(tǒng)計(jì)

隨筆 - 68
文章 - 0
評(píng)論 - 110
引用 - 0

常用鏈接

留言簿(5)

隨筆分類

隨筆檔案

相冊(cè)

picture

TLink

搜索

閱讀排行榜

評(píng)論排行榜

Python Challenge lv4: follow the chain

題目鏈接： http://www.pythonchallenge.com/pc/def/linkedlist.php

說(shuō)實(shí)話，好不容易通過(guò)google搞清楚題目的要求：通過(guò)不斷的從服務(wù)器取得一個(gè)web page，然后從源碼中找出下一個(gè)鏈接的地址。需要注意的是：雖然頁(yè)面的源碼很簡(jiǎn)單，但并不是其中所有的數(shù)字都是有效的，需要使用正則表達(dá)式找出正確的pattern形式才可以，對(duì)本題而言r'nothing is (\d+)'是一個(gè)可用的pattern，使用''.join([x for x in text if x.isdigit()] 將所有的數(shù)字都粘連起來(lái)了，結(jié)果跟蹤到4000多還沒(méi)結(jié)束，才知道上當(dāng)了。。。

import re

import urllib.request

if __name__ == '__main__':

url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing='

index = '17675'

counter = 1

pattern = re.compile(r'nothing is (\d+)')

while True:

try:

request= urllib.request.Request(url+index)

# my pc must use proxy to connect

request.set_proxy('172.16.0.252:80', 'http')

response= urllib.request.urlopen(request)

content=str(response.read().decode())

response.close()

print(counter, content)

result = pattern.search(content)

if not result:

break

index = result.group(1)

counter += 1

except Exception as ex:

print(ex)

break

程序輸出：
1 and the next nothing is 8511
2 and the next nothing is 89456
3 and the next nothing is 43502
4 and the next nothing is 45605
5 and the next nothing is 12970
6 and the next nothing is 91060
7 and the next nothing is 27719
8 and the next nothing is 65667
9 peak.html

得到下一個(gè)題目的地址peak.html (注：我的index初始值是17675，題目中最早給出的可不是這個(gè)值，我是從地址列表的后一部分選了一個(gè)數(shù)字而已，因此不要擔(dān)心)

posted on 2009-05-11 16:05 李現(xiàn)民閱讀(596) 評(píng)論(2) 編輯收藏引用所屬分類: python

評(píng)論

# re: Python Challenge lv4: follow the chain[未登錄](méi) 2011-05-31 20:17 simon

有個(gè)問(wèn)題想問(wèn)：
你是如何知道要用nothing去做pattern的呢？回復(fù) 更多評(píng)論

# re: Python Challenge lv4: follow the chain 2011-06-01 10:04 李現(xiàn)民

@simon
你去看一個(gè)網(wǎng)頁(yè)的源代碼，里面有一個(gè)鏈接，你點(diǎn)一下會(huì)出現(xiàn) and the next nothing is 92512，替換Url里linkedlist.php?nothing=12345中的12345，然后再回車，你就會(huì)發(fā)現(xiàn)規(guī)律了回復(fù) 更多評(píng)論

刷新評(píng)論列表

只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。


相關(guān)文章: Python Challenge lv5: peak hell Python Challenge lv4: follow the chain Python Challenge lv3: re Python Challenge lv2: ocr Python Challenge lv1: What about making trans?

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問(wèn) Chat2DB 管理