Skip to main content

Building a Website Screenshot API with Puppeteer and Google Cloud Functions

· 4 min read
Umut YILDIRIM
Fullstack Developer

Website Screenshot API

In this blog post, I describe the steps I took to set up this API, let’s dive in!

Puppeteer

Puppeteer is a node package that allows you to control a headless chrome browser using Javascript. A headless chrome browser is just a browser without a window.

I can use this package to spin up a headless chrome instance, navigate to a website and take a screenshot.

To start I’m going to create a local node project and install the puppeteer package.

npm init
npm install puppeteer

Now I can create a file called index.js and add the following code.

const puppeteer = require('puppeteer');

takeScreenshot()
.then(() => {
console.log("Screenshot taken");
})
.catch((err) => {
console.log("Error occured!");
console.dir(err);
});

async function takeScreenshot() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://medium.com", {waitUntil: 'networkidle2'});


const buffer = await page.screenshot({
path: './screenshot.png'
});

await page.close();
await browser.close();
}

Note that I am making the takeScreenshot() function async. This way I can use the await keyword in the function to wait for all the promises.

After running the code I get the following screenshot! 🎉

Screenshot of Medium

Google Cloud Functions

So I now have a local script that I can call to take a screenshot, but I want to build an API. The next logical step is to put this script on a server somewhere.

I don’t want to worry about my server running out of memory, so I’m going to put it on Google Cloud Functions. This way it can handle a huge number of requests without me having to worry about buying more RAM memory.

Once I have the cloud function running, I can call it with an HTTP request — meaning that I will have a working screenshot API 🚀

Let’s port the previous code to the Google Cloud Function format. The cloud function I created is async and called run().

So far I have a working screenshot API. But I’m going to extend it by uploading the screenshots directly to Google Storage.

I’m going to use the @google-cloud/storage npm package for this. Note that I have created a Google Cloud Storage bucket called screenshot-api checkout this page for how to set up a storage bucket.

const puppeteer = require('puppeteer');
const { Storage } = require('@google-cloud/storage');

const GOOGLE_CLOUD_PROJECT_ID = "portfolio-umut-yildirim";
const BUCKET_NAME = "screenshot-jobs-portfolio-umut-yildirim";

exports.run = async (req, res) => {
res.setHeader("content-type", "application/json");

try {
const buffer = await takeScreenshot(req.body);

let screenshotUrl = await uploadToGoogleCloud(buffer, req.body.name+".png");

res.status(200).send(JSON.stringify({
'screenshotUrl': screenshotUrl
}));

} catch(error) {
res.status(422).send(JSON.stringify({
error: error.message,
}));
}
};

async function uploadToGoogleCloud(buffer, filename) {
const storage = new Storage({
projectId: GOOGLE_CLOUD_PROJECT_ID,
});

const bucket = storage.bucket(BUCKET_NAME);

const file = bucket.file(filename);
await uploadBuffer(file, buffer, filename);

await file.makePublic();

return `https://${BUCKET_NAME}.storage.googleapis.com/${filename}`;
}

async function takeScreenshot(params) {
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto(params.url, {waitUntil: 'networkidle2'});

const buffer = await page.screenshot();

await page.close();
await browser.close();

return buffer;
}

async function uploadBuffer(file, buffer, filename) {
return new Promise((resolve) => {
file.save(buffer, { destination: filename }, () => {
resolve();
});
})
}

The new result — My postman client is showing the URL to the screenshot 🚀

Note that in the code above each screenshot is saved as screenshot.png on Google Storage. In the real world, you would need to generate a random id for each image.

Conclusion

Here’s the source of a Google Cloud function that, using Puppeteer, takes a screenshot of a given website and store the resulting screenshot in a bucket on Google Cloud Storage. This was a fun project to do.

You can find the source code of the completed Google Cloud function and package.json here.

Thanks for reading!