当前位置：首页>开发>正文

Python提取网页链接和标题 JavaScript怎么获取当前页面的标题

2023-06-26 09:32:44 互联网未知开发

Python提取网页链接和标题

方法1：BS版
简单写了个，只是爬链接的，加上标题老报错，暂时没看出来原因，先给你粘上来吧（方法2无问题）
from BeautifulSoup import BeautifulSoup
import urllibimport re

def grabHref(url,localfile):
html = urllib2.urlopen(url).read()
html = unicode(html,gb2312,ignore).encode(utf-8,ignore)
content = BeautifulSoup(html).findAll(a)
myfile = open(localfile,w)
pat = re.compile(rhref="([^"]*)")
pat2 = re.compile(r/tools/)
for item in content:
h = pat.search(str(item))
href = h.group(1)
if pat2.search(href):
# s = BeautifulSoup(item)

# myfile.write(s.a.string)
# myfile.write( )

myfile.write(href)
myfile.write( )
# print s.a.sting
print href
myfile.close()

def main():
url = "http://www.freebuf.com/tools"
localfile = aHref.txt
grabHref(url,localfile)
if __name__=="__main__":
main()

方法2：Re版由于方法1有问题，只能获取到下载页面链接，所以换用Re解决，代码如下：

import urllibimport re

url = http://www.freebuf.com/tools
find_re = re.compile(rhref="([^"]*)". ?>(. ?))
pat2 = re.compile(r/tools/)
html = urllib2.urlopen(url).read()
html = unicode(html,utf-8,ignore).encode(gb2312,ignore)
myfile = open(aHref.txt,w)
for x in find_re.findall(html):
if pat2.search(str(x)):
print >>myfile,x[0],x[1]
myfile.close()

print Done!

JavaScript怎么获取当前页面的标题

究竟是要【当前页面】的标题还是【文本框输入的URL】的标题？
如果是前者，则 document.title 就是；
如果是后者，则要通过ajax获取指定url的页面内容，再从中分析出其标题。

Python提取网页链接和标题 JavaScript怎么获取当前页面的标题

Python提取网页链接和标题

JavaScript怎么获取当前页面的标题

最新文章

随便看看