Web Scraping Using an Automated Browser
Sometimes when we scrape the web, we need to automate our computer to open a web browser to gather information from each page. This is especially true when the site we want to scrape has content that is loaded dynamically with javascript.
We will install one package to help us here: Chromedriver
Installing this stuff is operating system specific, hence so are the instructions below.
Mac Users
Make sure your homebrew
package is up-to-date. To do so, open a terminal and enter
brew update
Chromedriver
-
We assume you have Google Chrome installed. If not, do this first.
-
Install
chromedriver
via homebrew:
brew cask install chromedriver
- Verify your install, by entering the following in your terminal. The expected output is
ChromeDriver 2.4X.X
chromedriver --version
Windows Users
Chromedriver
- Install Google Chrome from here
- Download the windows version of Chromedriver from here.
- Extract the contents from the zip file, and extract them into a new directory under
C:\chromedriver
- Make sure that the chromedriver.exe file is directly under the PATH you specified, i.e. under C:\chromedriver. If your zip unpacker created a new folder with a different name inside your specified folder, move the .exe file to
C:\chromedriver
. - Add the directory
C:\chromedriver
to your PATH as described before. - If this went successfully, open a new Cygwin session, and enter
chromedriver --version
, you should get output that looks likeChromeDriver 2.4X.XX
Linux Users
Chromedriver
- Open a terminal session
- Install Google Chrome for Debian/Ubuntu by pasting the following and then pressing
Return
sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
- Install
xvfb
so chrome can run 'headless' by pasting the following and then pressingReturn
sudo apt-get install xvfb
- Install Chromedriver by pasting the following and then pressing
Return
:
sudo apt-get install unzip
wget -N https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
- Your install worked, you should get
ChromeDriver 2.4X.XX
returned if the installation was successful
chromedriver --version
Hat-tip
We borrowed quite liberally from Christopher Su to for instructions on installing Chrome and Chromedriver.