Web Scraping Using an Automated Browser
Sometimes when we scrape the web, we need to automate our computer to open a web browser to gather information from each page. This is especially true when the site we want to scrape has content that is loaded dynamically with javascript.
We will solve this by installing Google Chrome and using a tool called Chromedriver. The former has to be installed manually, but the latter will be handled by a very handy Python package we have already installed (chromedriver-autoinstaller
).
Installing this stuff is operating system specific, hence so are the instructions below.
Department Managed Laptops
Head over to the Sofware Center and install Google Chrome.
Mac Users
We need an up to date version of the web browser Google Chrome.
We will install it via Homebrew.
Enter the following into the terminal and hit Return
:
brew install --cask google-chrome
Verify the install:
google-chrome --version
which should yield output similar to:
Google Chrome 103.0.5060.53
Linux Users
We need an up to date version of Google Chrome and some additional linux packages.
First add the additional linux packages by entering the following into the terminal:
sudo apt-get install libxss1 libappindicator1 libindicator7
Now let's download the latest stable version of Google Chrome using the terminal:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
And now install it:
sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
Verify the install:
google-chrome --version
which should yield output similar to:
Google Chrome 103.0.5060.53
Windows Users
Everyone with admin right can install Google Chrome with:
winget install -e --id Google.Chrome
Verify the installation by opening Chrome.