Python & Selenium Automation Tutorial for Beginners

What is Selenium?

Selenium automates browsers. It is a cross-platform portable framework for testing web applications, without the need to use or learn a specific testing language. This means that you can write tests using Selenium with any major programming language, such as Python, C#, Java, Ruby, and others.

Why is Selenium important?

Selenium is great for testing web applications and finding usability bugs within them, but it is also frequently used in automating tasks that require the use of a web browser, or for crawling and scraping data off the web.

Selenium has several components. The most widely used component within the Selenium ecosystem is WebDriver, which is a headless browser. In other words, it’s a browser that doesn’t have a user interface, and can be commanded through code.

Thus, having the ability to control a browser through code, is very powerful.

Getting Started

In How to Automate Filling In Web Forms with Python I used JavaScript and Python to automate a manual web form data entry process, by extracting data from PDFs. The only downside of that approach was that we had to open the browser’s developer tools to paste the generated JavaScript code, so the fields on the web page could get filled in.

The idea now is to take that solution to the next level and use Selenium WebDriver to help us fill in the values of our web form automatically, without having to open the browser’s developer tools manually.

But before we do that, let’s do a test drive with Selenium WebDriver and see how we can program some basic interactions with a headless browser.

1. Selenium Test Drive

First download our original Python code from my last article. To get started, we need to first install the driver for the browser we will be using, which is Chrome.

We can get the Chrome driver from the Chrome Driver website. Preferably, choose the current stable release version that corresponds to your operating system. On Windows the file is called chromedriver.exe. On a Mac, the file is simply called chromedriver.

Place the downloaded driver (which is usually an executable) within the folder that you will use to follow along with this project.

Then, within your editor of choice (in my case, I use Visual Studio Code, also known as VS Code), create a new file, name it gettingstarted.py, and add the following code.

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.set_headless()
assert opts.headless # Operating in headless mode
browser = Chrome(options=opts)
browser.get('https://onemonth.com')

Then, open a terminal window within VS Code (make sure you have Python 3.6.X or above installed) and enter the following command: pip install selenium. If you are on a Mac, you can also open the terminal within VS Code, or alternatively use bash.

Before you get started, you will need Python and pip installed. When installing Python, pip gets automatically installed. If you are on a Mac, this article explains how to install Python and pip.

Once Selenium has been installed, we can execute the gettingstarted.py script. So far, we have created a headless Chrome browser and navigated to https://onemonth.com.

The script can be executed by running the following command from within the VS Code terminal window: python gettingstarted.py.

If you are running this on Windows and have the Windows Defender software active, you might get the following message. If so, click on Allow access.

So, let’s review what we have done so far in the gettingstarted.py code. We have created an Options instance and used it to activate the headless mode when we passed it to the Chrome constructor.

This is the equivalent to opening Chrome and then navigating to that site, but without having Chrome visible to us.

2. Inspecting the DOM

At this stage, we have managed to load the main web page of One Month. We can query the DOM using methods within our browser object. But how do we know what to query?

The best way is to manually open a web browser and use its developer tools to inspect the contents of the page.

We want to get hold of the Browse All Courses button so we can check all the courses available on the website.

By inspecting One Month’s home page with the browser’s developer tools, we can see that the button element has a class attribute of “btn btn-success btn-lg center”.

This can be obtained by right-clicking on the button and choosing the Inspect option from the browser’s menu.

The button’s class name can be seen on the right-hand side. The CSS selector can be seen below highlighted in purple.

We’ll be using the CSS selector going forward, as this is the easiest way to access an HTML element when using Selenium.

So, let’s take a shortcut, and go directly to https://onemonth.com/courses and continue our gettingstarted.py script. Modify the last instruction we wrote for the script as follows:

browser.get('https://onemonth.com/courses')

What we want to do is to print out the names of all the courses listed on that web page. We can do this by inspecting the h4 tag, by using the following CSS selector, highlighted in purple.

Here’s a close up of the CCS selector for the h4 tag we are interested in, highlighted in purple.

3. Updating gettingstarted.py

So, to get a list of all the courses listed on that web page, we would need to add the following lines to gettingstarted.py.

courses = browser.find_elements_by_css_selector('h4.card-title')
for c in courses:
    print(c.text)

What this code does is to retrieve all the h4 tags (found on the page) that have a CSS selector of h4.card-title, each corresponding to a course, and then it prints out the text property of each, which will give us the course names.

It’s very important to prevent headless browser instances from piling up on our machine’s memory, so it is a good practice to close the browser object before finalizing the script. This can be done as follows.

browser.close()

The finalized gettingstarted.py script looks like this.

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.set_headless()
assert opts.headless # Operating in headless mode
browser = Chrome(options=opts)
browser.get('https://onemonth.com/courses')courses = browser.find_elements_by_css_selector('h4.card-title')
for c in courses:
print(c.text)
browser.close()

We can execute the script by running the following command from the built-in terminal in VS Code: python gettingstarted.py.

Running this script returns the list of courses available on the One Month website. Please note that on the screenshot below, only some of the results can be seen. Awesome!

4. Improving myview.py

The script that we wrote in the How to Automate Filling In Web Forms with Python article is called myview.py.

What we are going to do is to make some modifications to the existing code, so that we don’t need to open the browser’s developer tools manually and paste the generated JavaScript code, to fill in the web page fields.

We’ll do that step directly, by using Selenium to open the browser and assign the field values, automatically.

So, let’s add the following import statements to our existing myview.py script.

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
import time

Then, let’s create a new function called automateBrowser, which we can place just before the execute function.

def automateBrowser(code):
opts = Options()
browser = Chrome(options=opts)
browser.get('file:///C:/Projects/Projects/One%20Month/Income%20Form.htm')
time.sleep(3)
s = ''
for c in code:
s += c
browser.execute_script(s)

time.sleep(5)
browser.close()

This function essentially opens the Income Form.htm (Income%20Form.htm) file (which resides in the same folder as the myview.py file).

Please make sure that you adjust accordingly the exact folder location (file:///C:/…) on your machine, to the current path where your Income Form.htm file resides.

Then, we loop through the generated JavaScript code (which is a list of strings), and we concatenate each code line (c) into a single one-line script string (s), which we execute directly in the browser by calling the browser.execute_script method.

Finally, we wait for five seconds, so we can see the fields on the web page getting filled in automatically, and we close the browser.

Notice that we have removed the opts.set_headless instruction that was used in the gettingstarted.py script, which makes the browser invisible.

We then need to call the automateBrowser function from the execute function as follows (highlighted in bold).

def execute(args):
try: 
fl = readList('myview.ini')
fl_ext = readList('myview_ext.ini')
if len(args) == 2:
pdf_file_name = args[1]
items = get_form_fields(pdf_file_name)
code = createBrowserScript(fl, fl_ext, items, pdf_file_name)
automateBrowser(code)
else:
files = [f for f in os.listdir('.') 
if os.path.isfile(f) and f.endswith('.pdf')]
for f in files:
items = get_form_fields(f)
code = createBrowserScript(fl, fl_ext, items, f)
automateBrowser(code)
except BaseException as msg:
print('An error occured... :( ' + str(msg))

Notice as well, that the createBrowserScript function now returns a code variable that is passed to the automateBrowser function.

We can see the changes to the createBrowserScript function below (highlighted in bold).

def createBrowserScript(fl, fl_ext, items, pdf_file_name):
res = []
if pdf_file_name and len(fl) > 0:
of = os.path.splitext(pdf_file_name)[0] + '.txt'
all_lines = []
for k, v in items.items():
print(k + ' -> ' + v)
if (v in ['/Yes', '/On']):
all_lines.append("document.getElementById('" + k + "').checked = true;\n");
elif (v in ['/0'] and k in fl_ext):
all_lines.append("document.getElementById('" + k + "').checked = true;\n");
elif (v in ['/No', '/Off', '']):
all_lines.append("document.getElementById('" + k + "').checked = false;\n");
elif (v in [''] and k in fl_ext):
all_lines.append("document.getElementById('" + k + "').checked = false;\n");
elif (k in fl):
selectListOption(all_lines, k, v)
else:
all_lines.append("document.getElementById('" + k + "').value = '" + v + "';\n");
outF = open(of, 'w')
outF.writelines(all_lines)
outF.close()
res = all_lines
return res

If we now run the updated myview.py script from the built-in terminal within VS Code, we should see the following execution.

The PDF file will be read, the data extracted, and then the web browser will be shown, with the web page fields empty. This can be seen in the screenshot below.

Then, after three seconds, the fields will be automatically filled in, as can be seen in the screenshot that follows.

Then, five seconds after the web page fields have been filled in automatically, the browser window will close. That’s it!

Now, you know some of the basics of working with Python and Selenium, to achieve web automation.

Download the Complete Python & Selenium Project Files

Written by Chris Castiglione based on the research by Ed Freitas.