Web scraper using Puppeteer - How to login to a website?

Web scrapers have a lot of utility if you wish to get extract some data from other websites. Some people use it for copying the whole data and others for some useful data extraction from certain websites. One of the most common use cases is to get leads data for cold calls or emails. Hence, the web-scrapers are useful for any SAAS or B2B business who are looking for leads. In this article, I will talk about how can you login to a website in particular to LinkedIn. You can extend the methodology to any other website.

I would strongly recommend to make a new profile for trying out this code and save yourself from getting blocked on LinkedIn.

In this blog, I will be talking about Linkedin but you can apply the same methodology to any other website for writing a scraper to extract some meaningful data for your use.

Q1. How to create a Node app with Puppeteer?

Create a folder scraper and add a file package.json to the folder. Paste the code below in package.json

{
  "name":    "scraper",
  "version": "0.0.0",
  "private": true,
  "scripts": {
               "start": "node scraper.js"
             },
  "dependencies": {
                  }
}

To install the Puppeteer, run the code below. This may need some time as it will download the Chromium which is around 100mb in size.

npm install --save puppeteer

Now, you are ready to start your code for scraper in Puppeteer.

Q2. How to login to LinkedIn?

First, we will make a constants file where we will keep the credentials to log in to LinkedIn. This needs to be your actual LinkedIn username and password. Make a constants.js file and copy this code with your credentials:

module.exports = {
                   username: 'abcd@abc.com',
                   password: 'abc123'
                 }

Now, make another file in the same folder scraper.js and paste the code:

const puppeteer = require('puppeteer');
const C = require('./constants');
const USERNAME_SELECTOR = '#login-email';
const PASSWORD_SELECTOR = '#login-password';
const CTA_SELECTOR = '#login-submit';

async function startBrowser() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  return {browser, page};
}

async function closeBrowser(browser) {
  return browser.close();
}

async function playTest(url) {
  const {browser, page} = await startBrowser();
  page.setViewport({width: 1366, height: 768});
  await page.goto(url);
  await page.click(USERNAME_SELECTOR);
  await page.keyboard.type(C.username);
  await page.click(PASSWORD_SELECTOR);
  await page.keyboard.type(C.password);
  await page.click(CTA_SELECTOR);
  await page.waitForNavigation();
  await page.screenshot({path: 'linkedin.png'});
}

(async () => {
  await playTest("https://www.linkedin.com/");
  process.exit(1);
})();

For typing in our username and password, we need to know the CSS selector of that particular input field and type in the same field. To copy the selector, do an inspect element and then right click on the selected element in "Elements" in the browser as shown:

Copy selector for scraper

Once you will copy this, you will get the CSS selector of the selected element which is #login-email in our case. Similarly, find out the CSS selectors of the password field and login button. Now, if you will run this code and see the screenshot, you will see your profile page picture.



How much is a great User Experience worth to you?


Browsee helps you understand your user's behaviour on your site. It's the next best thing to talking to them.

Browsee Product