当前位置：首页>开发>正文

如何用python的selenium提取页面所有资源加载的链接如何运行python selenium

2023-04-26 02:37:18 互联网未知开发

如何用python的selenium提取页面所有资源加载的链接

最近需要写一个爬虫，逻辑上有一个环节是取得一个页面的所有资源加载链接（html5页面）
（ps：python下的确是是有个第三方包叫Ghost.py可以取得，但是尝试后效果并不好，估计是因为Ghost.py的webkit对html5的支持并不好。）
选择用selenium，但是没找到selenium的webdriver下取得所有资源加载链接的方法。
selenium包下有一个selenium模块。查看源码时看到有个get_all_links方法。但是一直没找到这个模块的用法。
最后，求解答。谢谢大家。

方法不成的话，就自己do it把：

all_links = browser.find_element_by_xpath(//a)
for a in all_links:
a.getAttribute(href)

如何运行python selenium

如何运行python selenium
1、下载Python2.7版本，默认运行安装即可； 2、安装完成之后，设置Python环境变量C:Python27（操作步骤：电脑->属性->高级->环境变量->系统变量中的PATH为:变量值： C:Python27 ） 3、在python的官方网站上可以找到SetupTools的下载,解压安装安装即可； 4、当安装SetupTools之后，就可以在python安装目录下看到Script目录， 5、同样在变量中加入 path：C:Python27Scripts， 6、打开cmd命令行，将目录切换到C:Python27Scripts下，输入命令“easy_install pip“安装pip； 7、安装成功pip之后，执行pip install -U selenium 进行下载安装最新selenium的版本。

python怎么连接selenium

from selenium import webdriver
import os

def openBrowser():
    #chromedriver需要你自行下载，这里需要给出你放置该driver的路径
    chromedriver = "C:UsersSigmaAppDataLocalGoogleChromeApplicationchromedriver.exe"
    if not os.path.exists(chromedriver):
        chromedriver = C:Program FilesGoogleChromeApplicationchromedriver.exe
    os.environ["webdriver.chrome.driver"] = chromedriver
    browser = webdriver.Chrome(chromedriver)
    #fireFox不需要driver，只要安装了firefox，selenium会自动去找到它
#     browser = webdriver.Firefox()

    return browser

def closeBrowser(browser):
    browser.close()
    killAllDriver()

def killAllDriver():
    cmd = taskkill /F /IM chromedriver.exe
    os.system(cmd)

如何用python的selenium提取页面所有资源加载的链接如何运行python selenium

如何用python的selenium提取页面所有资源加载的链接

如何运行python selenium

python怎么连接selenium

最新文章

随便看看

如何用python的selenium提取页面所有资源加载的链接 如何运行python selenium

如何用python的selenium提取页面所有资源加载的链接

如何运行python selenium

python怎么连接selenium

最新文章

随便看看

如何用python的selenium提取页面所有资源加载的链接如何运行python selenium