Skip to content

Validate XML files against their referenced XML Schemas concurrently and fast

Notifications You must be signed in to change notification settings

FranklinChen/validate-xml-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Validate XML against XML Schema using Rust

Prerequisites

Cargo for Rust is required.

libxml2 needs to be installed.

Installation

$ cargo install --path .

will install validate-xml into $HOME/.cargo/bin.

Usage

Basic usage:

$ validate-xml root_dir 2> log.txt

Detailed usage:

$ validate-xml --help
Validate XML files concurrently and downloading remote XML Schemas only once.

Usage:
  validate-xml [--extension=<extension>] <dir>
  validate-xml (-h | --help)
  validate-xml --version

Options:
  -h --help                Show this screen.
  --version                Show version.
  --extension=<extension>  File extension of XML files [default: cmdi].

Performance

This was written to be super fast:

  • Concurrency, using number of cores available.
  • Use of C FFI with libxml2 library.
  • Downloading of remote XML Schema files only once per run, and caching to memory.
  • Caching of libxml2 schema data structure for reuse in multiple concurrent parses.

On my machine, it takes a few seconds to validate a sample set of 20,000 XML files. This is hundreds of times faster than the first attempt, which was a shell script that sequentially runs xmllint (after first pre-downloading remote XML Schema files and passing the local files to xmllint with --schema). The concurrency is a big win, as is the reuse of the libxml2 schema data structure across threads and avoiding having to spawn xmllint as a heavyweight process.

Comparison

If I had known about Python's lxml binding to C libxml2, I might not have bothered this Rust program. I found out about it and wrote the Python version of this program, which looks very similar (but without all the types): https://github.com/FranklinChen/validate-xml-python

However, as the Rust ecosystem fills in gaps in available libraries, I think I would actually be pretty happy to write in Rust instead of Python.