python logo

python selenium get html


Python hosting: Host, run, and code Python in the cloud!

Selenium provides a powerful way for automating web browsers. Among its diverse range of capabilities, one can easily fetch the HTML source of a webpage. In this guide, we delve into how you can accomplish this with Python’s Selenium module.

Using Selenium’s page_source attribute, you can effortlessly capture the HTML content of any website. If you’re treading new waters with Selenium, consider exploring the course highlighted below to strengthen your grasp.

Related Course:
Browser Automation with Python Selenium

Setting up Selenium

If you haven’t yet set up Selenium on your system, follow the steps below:

  1. Install the Selenium Python package.

    1
    pip install selenium
  2. Depending on your browser of choice (in our example, it’s Chromium), you might need to specify its path:

    1
    export PATH=$PATH:/usr/lib/chromium/

Extracting the HTML Source

To begin, import the necessary webdriver from the Selenium package. Here, we’re employing the Chromium browser for demonstration, but the flexibility of Selenium allows you to utilize any browser you prefer.

After launching the browser, guide it to your desired URL using the get() method. Following this, the page’s HTML can be fetched with ease.

1
2
3
4
5
6
7
8
9
10
11
from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
options.binary_location = "/usr/bin/chromium"
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://python.org')

html = driver.page_source
print(html)

Executing the above script will display the webpage’s source code, stored within the html variable.

Selenium initiating the chromium browser

For more hands-on Selenium examples and code snippets, you can Download Selenium Examples here.

Continue your exploration in Selenium with these links:
<- Previous Topic | Next Topic ->






Leave a Reply: