PDF generation on Umbraco Cloud

Hi hive mind,

I have a requirement to generate server side customised PDFs, probably using HTML as the source (HTML→PDF) within an Umbraco Cloud solution. Does anyone have any experience with this who can suggest libraries known to work within Umbraco Cloud.

I know I also want to have my cake and be able to eat it… but a library which is open source/not too expensive would also help this client too.

I am also in two minds as the the architecture for this, as it could be done in process within Umbraco, or as an external service. Any thoughts/experience?

Thanks

Rather than a library to take html to pdf.. the modern alternative is to use NuGet Gallery | PuppeteerSharp 20.2.2 to use chrome | firefox native savetoPdf

https://discord-chats.umbraco.com/t/22731634/solved-pdf-generator

Though you’d set this up on an other server so a little legwork, and depending if you have an alternative server could be a cost involved.. (can’t run the chrome binary required in azure app service, but can do this via appContainer on azure)

Could also use the native Puppeteer | Puppeteer via javascript if not wanting to use .net, I think…

Hi lakesol :waving_hand:

We’ve had a few of these requests too, but in all cases we ended on me running a script locally that fetched the ~100 pages and generated the PDFs (using wkhtmltopdf), then uploading them to a folder, from where they are served on the site as static PDFs.

This was based on the actual necessity of them being 100% up to date, the moment something changed in Umbraco (spoiler: they were not) and the frequency of which they were requested.

Your mileage may vary of course, just wanted to throw this in, in case you didn’t actually have to have them being generated on the fly by some kind of robot/machine setup

/Chriztian

@lakesol Despite not knowing the entire scope of your request (how dynamic, how often, for whom) I’d like to offer you some advice. I work with budget-constrained clients as well, many nonprofits. Their need to generate PDFs was out of convenience, and it wasn’t very often. I built a report for them out of HTML and a print-friendly stylesheet, coached them to use Print → Save as PDF in their browsers, and they were happy.

I’m sure there are open source nuget packages you can try and use, but perhaps your issue can be solved through client training/education.

We have a few sites using Rotativa.

It’s basically a wrapper of wkhtmltopdf (WebKit html to pdf). It works great, but the rendering engine is a bit out of date.

IIRC you can’t run Puppetteer or headles Chrome in Azure Web Apps / Umbraco Cloud.

Thanks for the information. I had seen this tool bit ideally was looking for something which can work in process as future requirements may mean we want to personalise the PDF documents. But at the moment this may be the best option.

Thanks this architecture may be a good approach if we cannot run in process, i.e. hook into the publish pipeline to instigate an out of process update of the PDFs

Many thanks

Doesn’t Rotativa require access to the wkhtmltopdf.exe, have you got that working on an azure_appservice in process (either inside or outside of cloud) I thought anything relying on a binary executable was off the cards?

So with puppetter you get it to consume a page from your site, and save as pdf… Basically as if you visited the page.

Simplistically you can use an alttemplate and pass query params as the url you give to puppeteer. You can even pass the asp.identity cookie for use with protected content pages. To get your personalised pdf’s.

1 Like

I’ll toss in one other approach that we use to generate 100’s of PDFs/day - a 3rd-party conversion Api. We use Cloudmersive’s Convert Api ( Convert API - Cloudmersive APIs ) but I know there are other offerings as well - I’ve no special attachment to Cloudmersive, it’s just one we’ve used successfully for 4+ years now. The cost is nominal - about 12 USD/month.

In our case we use the HTML string to PDF endpoint with embedded CSS and it gives us fine grained control over both layout and content. Every PDF is unique for us.

They offer a dotnet client so integrating into Umbraco is super simple (or any dotnet app).

var CloudmersiveConvertApiClient = require(‘cloudmersive-convert-api-client’);
var defaultClient = CloudmersiveConvertApiClient.ApiClient.instance;
// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications[‘Apikey’];
Apikey.apiKey = ‘YOUR API KEY’;
var apiInstance = new CloudmersiveConvertApiClient.ConvertWebApi();
var input = new CloudmersiveConvertApiClient.HtmlToPdfRequest(); 
// HtmlToPdfRequest | HTML to PDF request parameters
var callback = function(error, data, response) {
if (error) 
{console.error(error);} 
else 
{console.log('API called successfully. Returned data: ’ + data);}};

apiInstance.convertWebHtmlToPdf(input, callback);
2 Likes

I do it in a similar way but with SelectPdf via their API

No real reason to choose them but again - it’s quietly done a job for not much money for a few years now.

Same idea - generate a html string using a view (built a simple viewRender service based on https://stackoverflow.com/a/57888901) and post it to them via another service. Returns a memory stream and serve that back on a post request.

return new FileStreamResult(memoryStream, "application/pdf");
1 Like

Running a browser in-process on a webserver can be rather problematic for performance.

If not puppeteer, an API as suggested by @cheeseytoastie and @paulsterling will be a better approach.

Compatibility is a moving target and styles can be hard to debug since what the engine renders vs. what you would see from Chromium/Firefox will be different - especially if you start using some of the more modern CSS in anger.

Do the personalization when you generate the HTML, and just have Puppeteer/API/whatever do the conversion bit.

FWIW - I use Select PDF for my custom PDF requirements.
They have a good HTML→PDF API generator, as well as a .Net library.
I find the HTML→PDF API to be accurate and uses the Chrome engine so is up-to-date with most CSS

Throwing my “pdf gen” hat in the ring; for smaller usage I spun up a super simple API that uses a wkhtml C# wrapper, accepts a URL from a POST request, generates said PDF then sends it back.

Needs to be hosted somewhere (I had a little Windows VPS running) and the page you want to convert needs to be public facing.

I like it. How much does this cost a month - and do you have to keep an eye on it from crashing etc?

I like the fire and forget approach of the API but costs are always a consideration - especially with charity clients.

I have a mail relay setup on the actual API that sends an email if there’s an issue or if it’s requested from a domain I didn’t expect (it’s server to server with basic auth, but you never know…)

Other than that I send a 500 back to the caller so I just handle that as if I were handling any other external API error.

I use eUK for some of my hosting, so already had a server there. I’m sure there will be cheaper, but bobby basic VPS will set you back £25+VAT

If you’re already on azure..
For Azure Container Apps (the modern serverless app container platform):

  • Pricing is usage-based: You get a free grant each month (180,000 vCPU-seconds, 360,000 GiB-seconds, 2 million requests).
1 Like

Thanks for the input. Yes, we would always have the HTML personalised, and the PDF conversion “dumb” and just convert formats.

1 Like

This has all been great feedback.

For the SelectPdf advocates, how are you seeing the performance of this when converting PDFs? I did a test locally and found it quite slow. Hence I am looking at solutions where the PDF generation can be done out of process.

I have just done some testing too on Puppeteer. Performance seems quite good. Output seems OK to control, however more complex header/footer support than SelectPdf.