From hours to minutes: Building an AI-powered PDF importer for local government for LocalGov Drupal

Created
Fri, 17/10/2025 - 23:01
Updated
Fri, 17/10/2025 - 23:01

Guest blog post by Angie Forson, Web and Digital Programme Lead, Southwark Council.

The Web and Digital team at Southwark Council, along with our partners at Chicken, is building an AI-powered PDF importer for the LocalGov Drupal Publication Module. Together, we’re unlocking a faster, more accessible, and more collaborative future for publishing. 

Why this matters 

Manual PDF conversion can take hours – sometimes days. With our importer, it happens in minutes – often under one minute. Multiply that across thousands of PDFs, and the time savings are game-changing. 

I’m excited about the impact this product will have — not just for our users, but also in transforming how we design, build, and create content internally. We’re shaping a future where services start with HTML-first thinking.

Evelyn Francourt, User Experience Lead 

Understanding the workflow 

We upload a PDF to the module, which will then kick-start the importing process in the background.  

The result is the HTML representation of the PDF content, which is then saved into a Drupal Publication. We can then review and publish the Publication.  

Each import process is logged so that any errors can be reviewed and fixed. 

How the technology works 

Each PDF goes through a three-step ETL process, called an “import pipeline” in the module: 

  1. Extract: A PDF parser pulls content from the PDF. The default is the smalot PDF parser. 
  2. Transform: The parsed content is AI converts it to properly tagged HTML with logical pagination. Currently the module uses Claude Sonnet. 
  3. Save: Clean HTML pages ready to publish in Drupal 

Built for flexibility 

We can build as many import pipelines as needed, each with its own custom AI prompt. Useful for things like handling different types of PDF content or layout.

Furthermore, the pipeline uses a plugin architecture, where each step can be swapped out. Councils can use different extractors, AI models, or output to different Drupal content types to suit their needs. 

This project is a great example of AI working alongside and empowering content creators, and Drupal as a platform supports this really well.

Farez Rahman, Drupal Developer 

Agile, user-centred delivery 

We’re delivering this project the way we deliver our best work – agile and user-centred by design.  
 
We have adapted our delivery to meet the challenges of innovation design. Our team has had to continuously refine requirements and acceptance criteria to ensure the tool meets real user needs and delivers meaningful outcomes.  

Working on this AI product is an incredible experience — each day comes with new challenges, unexpected turns, and fresh opportunities to innovate. The pace of change made the whole process an absolute adrenaline rush.

Giorgi Bujiashvili, Delivery Manager

What we’ve achieved so far 

As Chicken fast-tracks development, we’ve been testing and refining prompts across a wide range of PDFs to prove what’s possible: 

  • import images, URLs and linked text 
  • rebuild tables with correct HTML tags 
  • apply accurate heading hierarchies (H1, H2, H3) 
  • remove unwanted hard returns from PDF text

We’ve also cracked the pagination challenge. Early versions mirrored PDFs page-by-page, causing awkward breaks mid-paragraph or mid-list. Now the importer processes the entire document at once and, with the right AI prompt, inserts page breaks at logical user-friendly points such as topic changes or new sections.   

Built with (and for) the community 

This project has been co-designed with content designers, developers, and the LocalGov Drupal community.

Together, we’re shaping a scalable, open-source tool that other councils can adopt, adapt, and improve.

Angie Forson, Web and Digital Programme Lead 

A leap forward in accessible publishing 

The AI PDF Importer isn’t just a tool – it’s a step change in accessible, open-source publishing for local government. Following this release, it will be open and shareable with the LocalGov Drupal community for other councils to adopt and iterate. 

If you’re interested in supporting or scaling this project, contact Angie Forson – Angie.Forson@southwark.gov.uk. Let’s change the game together.