Web Scraping Using an Automated Browser

Sometimes when we scrape the web, we need to automate our computer to open a web browser to gather information from each page. This is especially true when the site we want to scrape has content that is loaded dynamically with javascript.

We will solve this by installing Google Chrome and using a tool called Chromedriver. The former has to be installed manually, but the latter will be handled by a very handy Python package we have already installed (chromedriver-autoinstaller).

Installing this stuff is operating system specific, hence so are the instructions below.

Department Managed Laptops

Head over to the Sofware Center and install Google Chrome.

Mac Users

We need an up to date version of the web browser Google Chrome. We will install it via Homebrew. Enter the following into the terminal and hit Return:

brew install --cask google-chrome

Verify the install:

google-chrome --version

which should yield output similar to:

Google Chrome 103.0.5060.53

Linux Users

We need an up to date version of Google Chrome and some additional linux packages.

First add the additional linux packages by entering the following into the terminal:

sudo apt-get install libxss1 libappindicator1 libindicator7

Now let's download the latest stable version of Google Chrome using the terminal:

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

And now install it:

sudo dpkg -i google-chrome*.deb
sudo apt-get install -f

Verify the install:

google-chrome --version

which should yield output similar to:

Google Chrome 103.0.5060.53

Windows Users

Everyone with admin right can install Google Chrome with:

winget install -e --id Google.Chrome

Verify the installation by opening Chrome.