Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading PDF files using Puppeteer #8010

Open
ags4436 opened this issue Feb 14, 2022 · 9 comments
Open

Downloading PDF files using Puppeteer #8010

ags4436 opened this issue Feb 14, 2022 · 9 comments

Comments

@ags4436
Copy link

ags4436 commented Feb 14, 2022

I am trying to download PDF files using puppeteer, when .pdf URLs are hit the chrome opens the pdf in the tab, instead I want the pdf to be downloaded. This can be achieved through changing the user-preferences of browser and Sites automatically follow this setting when you visit them by default it is Open PDFs in Chrome.

I found that we can use puppeteer-extra for downloading the PDF files. Is that a legit NPM which I could use in my project instead of puppeteer-core ?

Is there any alternatives available from Puppeteer itself ?

@OrKoN
Copy link
Collaborator

OrKoN commented Feb 14, 2022

I believe it should be possible to use CDP's setDownloadBehaviour https://chromedevtools.github.io/devtools-protocol/tot/Browser/#method-setDownloadBehavior but there is no dedicated API in Puppeteer for this yet:

await  page._client.send('Page.setDownloadBehavior', {
	behavior: 'allow',
	downloadPath: tempFolderPath,
})

@ags4436
Copy link
Author

ags4436 commented Feb 14, 2022

This allows us to set the download path while downloading the files. PDFs will open in the tab itself in this case as well.

@OrKoN
Copy link
Collaborator

OrKoN commented Feb 14, 2022

I see. It looks like puppeteer-extra writes those settings into the user profile directory and to me it looks like the approach might be a bit fragile if the structure of the directory changes. But I am not aware of any other way to achieve this. Most likely, it's possible to replicate what puppeteer-extra does and create the profile with the right setting but I think using puppeteer-extra is fine too.

@MrXyfir
Copy link

MrXyfir commented Feb 26, 2022

Really surprised puppeteer still doesn't support this. Playwright does.

@jrandolf jrandolf added the bug label Jun 7, 2022
@nros
Copy link

nros commented Jul 14, 2022

Because of PR 8506 (since release 14.4.0), it's not _client anymore but it is _client(). So use:

await page._client().send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: tempFolderPath,
})

See: Page.ts

@Korak-997
Copy link

I had to twick a lot till i found this solution 😄
but it works perfectly for me.

And Downloads the pdf automatically which means you do not need to do anything else

// WAITS TILL THE OBJECT IS RENDERED
await page.waitForSelector('SELECTOR FOR PDF OBJECT')
   //SET DOWNLOAD PATH
    await page._client.send("Page.setDownloadBehavior", {
      behavior: "allow",
      downloadPath: PATH,
    })
    
  
    await page.evaluate(() => {
      const pdfObj = document.querySelector("SELECTOR FOR PDF OBJECT")
      const btn = document.createElement("a")
      btn.setAttribute("href", pdfObj.data)
      btn.setAttribute("download","FILE_NAME")
      btn.click()
      })

@stale
Copy link

stale bot commented Sep 16, 2022

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

@OrKoN
Copy link
Collaborator

OrKoN commented Sep 19, 2022

We will definitely need a better API for this.

@Kle0s
Copy link
Contributor

Kle0s commented Sep 19, 2022

I had to twick a lot till i found this solution 😄 but it works perfectly for me.

And Downloads the pdf automatically which means you do not need to do anything else

// WAITS TILL THE OBJECT IS RENDERED
await page.waitForSelector('SELECTOR FOR PDF OBJECT')
   //SET DOWNLOAD PATH
    await page._client.send("Page.setDownloadBehavior", {
      behavior: "allow",
      downloadPath: PATH,
    })
    
  
    await page.evaluate(() => {
      const pdfObj = document.querySelector("SELECTOR FOR PDF OBJECT")
      const btn = document.createElement("a")
      btn.setAttribute("href", pdfObj.data)
      btn.setAttribute("download","FILE_NAME")
      btn.click()
      })

This is also not a perfect solution, since you should technically use Browser.setDownloadBehavior but accessing browser._connection does not work anymore as well (I think since release 14.4.0). Is there a way to bypass that and receive the CDPSession used by the browser like there is with the page (Page._client())?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants