Skip to main content
← All posts

Building PDF Report Engines with Rust and LaTeX

by Royce Carbowitz
Infrastructure
Rust
LaTeX
AWS Lambda

Every SaaS platform eventually hits the moment where someone asks “can we get this as a PDF?” It might be a client who needs a formatted deliverable for their stakeholders. It might be a compliance requirement that demands archival-quality documents. It might be a sales team that wants to attach branded reports to proposals. Whatever the trigger, PDF generation is one of those capabilities that sounds simple until you actually try to build it at production quality.

At Pinpoint, our QA testing platform, the ask was concrete: teams running security and regression tests needed professional reports they could hand to clients and executives. These reports had to carry the Pinpoint brand, include detailed severity tables, render consistently across every PDF viewer, and handle anything from a single-bug finding to a consolidated report spanning hundreds of issues. Browser-based PDF generation using tools like Puppeteer or wkhtmltopdf was never going to meet our quality bar.

I built our report engine using Rust for the orchestration layer and LuaLaTeX for the typesetting, running the entire pipeline inside AWS Lambda. This post covers the architectural decisions behind that choice, the template management patterns that make branded reports maintainable, the hardest problems you will encounter in server-side PDF generation, and the real-world results this approach has delivered.

Why Generate PDF Reports Server-Side Instead of Client-Side?

Server-side generation produces identical output regardless of the recipient’s device, browser, or operating system. This consistency is not a convenience but a requirement for any document that carries professional or legal weight. When a QA team delivers a security audit report to a client, that report needs to look the same whether the client opens it on a MacBook, a Windows desktop, or a tablet. Client-side PDF generation using browser rendering engines introduces variability because each browser interprets CSS differently, handles page breaks according to its own logic, and produces different font rendering depending on the system’s installed fonts.

Branded output is another compelling reason to keep generation on the server. When the PDF engine lives in your infrastructure, you control every pixel. Custom fonts load from your own font files rather than falling back to whatever the client’s system provides. Color profiles match your brand specifications exactly. Margins, headers, footers, and page numbering follow templates that your design team approved. None of this is reliably achievable when you delegate rendering to a browser engine running on hardware you have never seen.

Professional formatting at the level that LaTeX provides is simply unavailable through CSS-based rendering. LaTeX’s typesetting algorithms handle ligatures, kerning, hyphenation, and justification with a sophistication that web browsers do not attempt. The difference is visible in dense technical reports where tables span multiple pages, footnotes need precise placement, and mathematical notation or code samples require specialized formatting. For Pinpoint’s reports, which include severity scoring tables, screenshot annotations, and detailed reproduction steps, LaTeX produces output that looks like it came from a professional publisher rather than a web application.

Archival quality matters for documents that enter regulatory or compliance workflows. PDF/A, the ISO standard for long-term archival, requires embedded fonts, device-independent color, and specific metadata structures. LaTeX can produce PDF/A-compliant documents natively with the right configuration, while browser-based generators require extensive post-processing to meet the standard. For clients in financial services or healthcare, where Pinpoint’s testing reports become part of audit trails, this compliance capability is table stakes.

Finally, server-side generation decouples the document from the user interface. The same API that generates a report for download can also generate it for email attachment, for archival storage, or for integration into a third-party document management system. The report becomes a data product with its own lifecycle, independent of the web application that requested it.

Why Pair Rust with LaTeX for Document Generation?

Rust and LaTeX serve fundamentally different roles in the pipeline, and their combination produces better results than either could achieve alone. Rust handles the orchestration: receiving the API request, querying the database for report data, transforming that data into LaTeX-compatible structures, invoking the LaTeX compiler, managing temporary files, uploading the result to S3, and returning a presigned download URL. LaTeX handles the typesetting: taking structured data and a template, then producing a pixel-perfect PDF according to rules that have been refined over four decades of academic and professional publishing.

I chose LuaLaTeX specifically rather than pdfLaTeX or XeLaTeX because of its modern font handling capabilities. LuaLaTeX supports OpenType and TrueType fonts natively through the fontspec package, which means I can use the same brand fonts that our design team specified without converting them to LaTeX’s legacy Type 1 format. The Lua scripting engine embedded in LuaLaTeX also provides an escape hatch for complex data transformations that would be cumbersome in pure TeX macros, though I rarely need it because Rust handles most of the data processing before the template stage.

The separation of concerns between Rust and LaTeX is the architectural decision I am most satisfied with. The Rust layer knows nothing about typography, and the LaTeX layer knows nothing about APIs or databases. The contract between them is a populated template file: Rust fills in the placeholders, and LaTeX renders the result. This clean boundary means I can change the report’s visual design without modifying any Rust code, and I can restructure the data pipeline without touching any LaTeX templates. In practice, our designer iterates on report layouts by editing LaTeX files and previewing locally, while I work on the infrastructure independently.

Rust’s suitability for the orchestration role goes beyond performance. The type system catches data transformation errors at compile time, which is critical when the consequences of a malformed template are a LaTeX compilation failure inside a Lambda function with no debugging tools available. If a field that should contain a sanitized string accidentally receives raw user input with LaTeX special characters, the compiler will not catch that specifically, but the structured approach of using typed transformations rather than string concatenation minimizes the risk. I built a dedicated sanitization layer that escapes LaTeX special characters in user-provided content, and the type system ensures that all user-facing strings pass through it before reaching the template.

The alternative approaches I evaluated before settling on this architecture all fell short in specific ways. Puppeteer running in Lambda worked for simple documents but produced inconsistent page breaks and could not match LaTeX’s typographic quality. Generating PDFs directly with a Rust library like printpdf or genpdf gave me programmatic control but required implementing layout logic that LaTeX already solves. Hosted PDF services like DocRaptor and Prince XML delivered good quality but introduced an external dependency and per-document costs that would not scale. The Rust plus LaTeX combination gave me full control, professional quality, and predictable costs.

How Does a Serverless PDF Generation Architecture Work?

The architecture starts with an API Gateway endpoint that receives a POST request containing the report parameters: the project identifier, the bug identifiers to include, the report type (single finding or consolidated), and any customization options like whether to include screenshots or reproduction steps. API Gateway validates the request structure and forwards it to a Lambda function written in Rust using the lambda_http crate.

The Rust Lambda function performs three phases of work. In the data assembly phase, it queries the database (RDS via the sqlx crate) for all the information needed to populate the report: project metadata, bug details with severity scores, tester notes, screenshots, and client information. It transforms this data into a structured representation that maps directly to the LaTeX template’s placeholder variables. Every string that originated from user input passes through the sanitization layer that escapes LaTeX special characters like ampersands, percent signs, underscores, and curly braces.

In the template rendering phase, the function selects the appropriate LaTeX template based on the report type, substitutes the placeholder variables with the assembled data, and writes the populated template to a temporary directory. The function then invokes the LuaLaTeX compiler as a subprocess, pointing it at the populated template file. LuaLaTeX compiles the document and produces a PDF file in the same temporary directory. If the compilation fails, the function captures the log output and returns a structured error message that identifies the specific template line where compilation broke.

In the delivery phase, the function uploads the generated PDF to an S3 bucket organized by project and date, generates a presigned URL with a configurable expiration time (defaulting to 24 hours), and returns the URL to the caller. The presigned URL approach means that the caller receives a download link immediately without the function needing to stream the file contents through API Gateway, which has a response size limit that large reports could exceed.

Cold start optimization is critical for a Lambda function that bundles a LaTeX distribution. The Lambda deployment package includes the LuaLaTeX compiler, essential LaTeX packages, and the brand fonts, all packaged into a Lambda layer. I stripped the TeX Live distribution down to only the packages required by our templates, reducing the layer size from over 2 GB to approximately 350 MB. The Rust binary itself adds only about 15 MB. First invocation cold starts take roughly 3 to 4 seconds, which is acceptable because report generation is not a latency-sensitive operation. Subsequent warm invocations complete in under a second for single-bug reports and scale linearly with the number of findings in consolidated reports.

I also implemented a warming strategy that sends a lightweight health check request every five minutes to keep the Lambda instance warm during business hours. This eliminates cold starts for the vast majority of real user requests, because report generation follows predictable business-hour patterns. Outside of business hours, the function scales to zero and incurs no cost.

What Template Management Patterns Work for Branded Reports?

Template management is the area where report engines either become maintainable assets or unmaintainable liabilities. I designed Pinpoint’s template system around a principle of layered composition, where each report type is assembled from reusable components rather than existing as a monolithic template file.

The base layer is a document class file that defines the page geometry, font selections, color definitions, and common macros. This file encodes the brand identity: Pinpoint’s primary colors, heading fonts, body text fonts, and standard margins. Every report type inherits from this base, so changing a brand color or updating a font propagates automatically across all report types. The document class also defines macros for common elements like severity badges, which render as colored boxes with white text that visually encode the severity level of each finding.

The cover page template is a standalone component that receives the project name, client name, report date, and report type as parameters. It produces a full-bleed cover with the Pinpoint logo, report title, and metadata positioned according to the brand guidelines. By keeping the cover page separate, I can update its design independently, and the same cover component works for both single-finding and consolidated report types.

Severity tables present the most interesting template challenge because they need to handle variable-length data while maintaining consistent visual formatting. Each bug finding has a severity level, a title, a description, reproduction steps, and optional screenshots. The template uses LaTeX’s longtable environment to allow tables that span multiple pages, with headers repeated on each page for readability. I built a custom LaTeX command that takes a severity level as input and renders the appropriate colored badge, so the Rust code only needs to supply the raw severity string rather than embedding formatting logic.

The table of contents is auto-generated by LaTeX based on the report’s section structure. For consolidated reports that might contain fifty or more findings, the table of contents provides essential navigation. Each finding becomes a section, and LaTeX generates page number references automatically. This is one of many areas where LaTeX’s built-in capabilities eliminate work that would require custom code in a programmatic PDF library.

I maintain two primary report templates: a single-bug report for delivering individual findings with full detail, and a consolidated report that aggregates multiple findings into a comprehensive document with an executive summary, severity distribution chart, and detailed findings organized by severity level. Both templates share the base document class and cover page component, reducing the total template code to roughly 400 lines of LaTeX across all files. The Rust code that populates these templates is even shorter, because most of the visual logic lives in the LaTeX layer where it belongs.

What Are the Hardest Problems in Server-Side PDF Generation?

Font embedding and licensing represent the single most frustrating problem in the entire pipeline. LaTeX can embed fonts into PDFs, but the process requires the fonts to be installed correctly in the TeX Live tree, configured in a font map file, and referenced by name in the template. LuaLaTeX with fontspec simplifies this considerably compared to pdfLaTeX, but you still need to verify that the embedded fonts pass validation when the PDF is opened in strict readers like Adobe Acrobat. I spent two full days debugging a situation where a font appeared correctly in Preview on macOS but rendered as Times New Roman in Acrobat on Windows, caused by a missing font descriptor flag that LuaLaTeX’s older version was not setting correctly.

Licensing adds another layer of complexity. Many commercial fonts restrict embedding or require a separate license for server-side use. Pinpoint’s brand fonts required an extended license that explicitly permitted embedding in generated documents distributed to third parties. I verified this before writing any code, because discovering a licensing conflict after building the entire system would have meant redesigning the brand’s typography. For anyone starting a similar project, I strongly recommend auditing font licenses before selecting your brand fonts, or choosing fonts with permissive licenses like those available through Google Fonts or the SIL Open Font License.

LaTeX dependency management inside Lambda is a packaging puzzle. A full TeX Live installation weighs over 4 GB and includes thousands of packages, most of which your templates will never use. I built a custom TeX Live installation script that installs only the base scheme and then adds individual packages one at a time based on what the templates require. The resulting installation fits within Lambda’s layer size constraints while still containing everything needed for compilation. When a template change requires a new LaTeX package, I add it to the installation script, rebuild the layer, and redeploy. The whole process is automated through CI, but it required significant trial and error to identify the minimal set of dependencies for each template.

Large document compilation time becomes a concern for consolidated reports with many findings. A report containing 100 bug findings with embedded screenshots can take 15 to 20 seconds to compile, approaching Lambda’s execution timeout if configured conservatively. I addressed this in two ways. First, I increased the Lambda timeout to 60 seconds for the report generation function, which provides comfortable headroom. Second, I optimized the template to minimize the number of LaTeX compilation passes required. LaTeX normally requires two or three passes to resolve cross-references and generate the table of contents, but I restructured the template to pre-compute page references where possible, reducing most reports to a single pass.

Error handling when LaTeX compilation fails requires special attention because LaTeX error messages are notoriously opaque. A missing closing brace can produce 200 lines of error output before the actual cause appears. I built a log parser in Rust that extracts the first meaningful error from LuaLaTeX’s output, identifies the source line number, and maps it back to the original template section. The structured error response includes the template section name, the approximate source of the failure, and the raw LaTeX error message for debugging. This parser has saved me hours of debugging time because I can usually identify the problem from the API response without needing to reproduce it locally.

S3 upload and presigned URL generation are straightforward individually but introduce failure modes that need handling. The upload can fail if the S3 bucket’s permissions are misconfigured or if the network connection drops mid-transfer. The presigned URL can become invalid if the IAM role that generated it loses its signing permissions before the URL expires. I handle upload failures with retries and exponential backoff, and I set presigned URL expirations conservatively at 24 hours with an option for the caller to request shorter durations. The S3 bucket is configured with lifecycle rules that automatically delete reports older than 30 days, keeping storage costs proportional to active usage.

What Results Has This Approach Delivered at Pinpoint?

The report engine generates professional PDF documents that QA teams use to deliver findings to their clients and internal stakeholders. The reports carry Pinpoint’s branding with custom fonts, a designed cover page, and severity-coded tables that make it easy to scan a document and understand the distribution of findings at a glance. Before this system existed, testers were assembling reports manually in Google Docs, which consumed hours of non-testing time and produced inconsistent formatting across different team members.

The consolidated report format has proven especially valuable for teams running comprehensive security audits. Instead of delivering findings one at a time, the QA team can trigger a single consolidated report that aggregates all findings from a testing cycle into a structured document with an executive summary, severity breakdown, and detailed reproduction steps for each issue. Clients receive one polished deliverable instead of a stream of individual notifications, which has improved how our testing results are perceived by stakeholders who expect formal documentation.

The Lambda-based architecture keeps costs near zero for the sporadic generation pattern that characterizes report creation. Reports are generated in bursts around testing milestones rather than continuously, so the function spends most of its time scaled to zero. When a team completes a testing cycle and generates reports, the function scales up to handle the burst and then scales back down. Our monthly Lambda costs for report generation are consistently under five dollars, which includes the compute time, the S3 storage for generated reports, and the API Gateway requests. Compared to hosted PDF generation services that charge per document, the serverless approach provides effectively unlimited capacity at negligible marginal cost.

The quality of the generated documents has become a differentiator for Pinpoint’s service. When prospective clients evaluate QA testing platforms, the professionalism of the deliverables matters alongside the testing methodology. A LaTeX-generated report with proper typography, consistent branding, and structured formatting conveys a level of seriousness that a Google Docs export or an HTML-to-PDF conversion cannot match. Multiple clients have specifically mentioned the report quality during sales conversations, which validates the engineering investment.

The system has generated thousands of reports since launch without a single user-facing failure. The combination of Rust’s compile-time guarantees, thorough input sanitization, and defensive error handling means the most common failure mode during development (unsanitized LaTeX special characters in user content) was eliminated before the system reached production. The only operational issues have been related to Lambda layer updates when TeX Live releases break backward compatibility, which happens roughly once a year and requires rebuilding the layer with updated package versions. Each incident has been resolved within a few hours because the build process is fully automated.

If I were starting this project again, I would make the same core architectural decisions. Rust for orchestration, LuaLaTeX for typesetting, Lambda for compute, and S3 for storage form a stack that is inexpensive to operate, straightforward to maintain, and capable of producing output quality that surpasses anything I could achieve with browser-based rendering. The upfront investment in understanding LaTeX’s template system and building the Lambda layer paid for itself within the first month of operation, when the manual report assembly hours it eliminated far exceeded the engineering hours it consumed to build.

Related Posts

Need to automate professional report generation for your platform? Schedule a conversation and let’s discuss how a Rust and LaTeX pipeline can deliver branded, production-quality documents at scale.

← All posts