Skip to content

digital-preservation/utf8-validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UTF-8 Validator

A UTF-8 Validation Tool which may be used as either a command line tool or as a library embedded in your own program.

Released under the BSD 3-Clause Licence.

CI Maven Central

Use from the Command Line

You can either download the application from here or build from the source code. You should extract this ZIP file to the place on your computer where you keep your applications. You can then run either bin/validate.sh (Linux/Mac/Unix) or bin\validate.bat (Windows).

For example, to report all validation errors:

$ cd /opt/utf8-validator-1.2
$ bin/validate /tmp/my-file.txt

For example to report the first validation error and exit:

$ cd /opt/utf8-validator-1.2
$ bin/validate.sh --fail-fast /tmp/my-file.txt

Command Line Exit Codes

  • 0 Success
  • 1 Invalid Arguments provided to the application
  • 2 File was not UTF-8 Valid
  • 4 IO Error, e.g. could not read file

Use as a Library

The UTF-8 Validator is written in Java and may be easily used from any Java (Scala, Clojure, JVM Language etc) application. We are using the Maven build system, and our artifacts have been published to Maven Central.

If you are using Maven, you can simply add this to the dependencies section of your pom.xml:

<dependency>
    <groupId>uk.gov.nationalarchives</groupId>
    <artifactId>utf8-validator</artifactId>
    <version>1.2</version>
</dependency>

Alternatively if you are using Sbt, you can add this to your library dependencies:

"uk.gov.nationalarchives" % "utf8-validator" % "1.2"

To use the Library you need to implement the very simple interface uk.gov.nationalarchives.utf8.validator.ValidationHandler (or you could use uk.gov.nationalarchives.utf8.validator.PrintingValidationHandler if it suits you). The interface has a single method which is called whenever a validator finds a validation error. You can then instantiate Utf8Validator and validate from either a java.io.File or java.io.InputStream. For example:

ValidationHandler handler = new ValidationHandler() {
	@Override
	public void error(final String message, final long byteOffset) throws ValidationException {
		System.err.println("[Error][@" + byteOffset + "] " + message);
	};
};

File f = ... //your file here

new Utf8Validator(handler).validate(f);

Building from Source Code