docx-wasm: working with Word documents in node

Native Documents is pleased to announce the first release of docx-wasm, our general purpose library for working with Word documents in nodejs.

If you think about Java or .NET, developers in those environments have long had decent libraries with which to manipulate Office documents. In Java, there is docx4j and Apache POI. For .NET, Microsoft itself offers its Open XML SDK. In addition to these open source offerings, there are commercial products such as Aspose.

These libraries variously provide low level and higher-level APIs for manipulating Open XML files. For example with the Java libraries and Aspose, there are higher-level APIs to extract text, insert a paragraph, convert to PDF etc. With docx4j, POI and the Open XML SDK, you can also manipulate the XML directly, which means that in principle you can do anything the file formats allow (which includes creating documents Office can’t open!)

Javascript is the most popular programming language on GitHub and StackOverflow, and yet there are no libraries of comparable breadth and depth.

Yes, there are some simple libraries for creating documents, but if you wanted to do something more complex, you’d be disappointed.

Take docx to PDF conversion for example. On NPM, you can find projects which try to solve this by first converting to HTML (ie with loss of fidelity), by relying on some remote API, or by using LibreOffice or unoconv.

To go direct from Word to PDF, you need a library capable of Word document layout. And up until now, that’s been missing.

Here at Native Documents, we’ve now delivered this missing piece, opening the way for things like PDF conversion without nasty “hacks”.

The secret to our approach is given away by the name we chose for our library.

We’ve taken the Word-compatible layout engine we developed for Word document viewing, editing and PDF conversion, and compiled that to WebAssembly (the newish binary format for executing code in node and on the web).

A single code base is used across environments and use-cases, which means a layout fix for our viewer, say, is likely to bring the same improvement to our PDF output.

In docx-wasm, this code is executed where you choose to run node. This means that document manipulation is done locally where you are running Node, so your sensitive documents remain under your control, and aren’t entrusted to some random service halfway across the planet.

This release focuses on converting Word documents to PDF.  Since our code can handle both docx files and the older binary .doc format, we’ve also included binary .doc to docx conversion.

To give it a try against your documents, fire up our sample code in Node.