Skip to content

Parser combinators for Augmented BNF grammars (RFC 4234)

License

Notifications You must be signed in to change notification settings

iraikov/chicken-abnf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chicken-abnf

Parser combinators for Augmented BNF grammars (RFC 4234)

Documentation

The abnf library provides a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars RFC 4234.

Library Procedures

The combinator procedures in this library are based on the interface provided by the lexgen library.

Terminal values and core rules

(char CHAR) => MATCHER

Procedure char builds a pattern matcher function that matches a single character.

(lit STRING) => MATCHER

lit matches a literal string (case-insensitive).

The following primitive parsers match the rules described in RFC 4234, Section 6.1.

(alpha STREAM-LIST) => STREAM-LIST

Matches any character of the alphabet.

(binary STREAM-LIST) => STREAM-LIST

Matches [0..1].

(decimal STREAM-LIST) => STREAM-LIST

Matches [0..9].

(hexadecimal STREAM-LIST) => STREAM-LIST

Matches [0..9] and [A..F,a..f].

(ascii-char STREAM-LIST) => STREAM-LIST

Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).

(cr STREAM-LIST) => STREAM-LIST

Matches the carriage return character.

(lf STREAM-LIST) => STREAM-LIST

Matches the line feed character.

(crlf STREAM-LIST) => STREAM-LIST

Matches the Internet newline.

(ctl STREAM-LIST) => STREAM-LIST

Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].

(dquote STREAM-LIST) => STREAM-LIST

Matches the double quote character.

(htab STREAM-LIST) => STREAM-LIST

Matches the tab character.

(lwsp STREAM-LIST) => STREAM-LIST

Matches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.

(sp STREAM-LIST) => STREAM-LIST

Matches the space character.

(vspace STREAM-LIST) => STREAM-LIST

Matches any printable ASCII character. That is, any character in the decimal range of [33..126].

(wsp STREAM-LIST) => STREAM-LIST

Matches space or tab.

(quoted-pair STREAM-LIST) => STREAM-LIST

Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.

(quoted-string STREAM-LIST) => STREAM-LIST

Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.

The following additional procedures are provided for convenience:

(set CHAR-SET) => MATCHER

Matches any character from an SRFI-14 character set.

(set-from-string STRING) => MATCHER

Matches any character from a set defined as a string.

Operators

(concatenation MATCHER-LIST) => MATCHER

concatenation matches an ordered list of rules. (RFC 4234, Section 3.1)

(alternatives MATCHER-LIST) => MATCHER

alternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)

(range C1 C2) => MATCHER

range matches a range of characters. (RFC 4234, Section 3.4)

(variable-repetition MIN MAX MATCHER) => MATCHER

variable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)

(repetition MATCHER) => MATCHER

repetition matches zero or more consecutive elements that match the given rule.

(repetition1 MATCHER) => MATCHER

repetition1 matches one or more consecutive elements that match the given rule.

(repetition-n N MATCHER) => MATCHER

repetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)

(optional-sequence MATCHER) => MATCHER

optional-sequence matches the given optional rule. (RFC 4234, Section 3.8)

(pass) => MATCHER

This matcher returns without consuming any input.

(bind F P) => MATCHER

Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.

Note: this combinator will signal failure if the input stream is empty.

(bind* F P) => MATCHER

The same as bind, but will signal success if the input stream is empty.

(drop-consumed P) => MATCHER

Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.

Abbreviated syntax

abnf supports the following abbreviations for commonly used combinators:

; :: : concatenation ; :? : optional-sequence ; :! : drop-consumed ; :s : lit ; :c : char ; :* : repetition ; :+ : repetition1

Examples

The following parser libraries have been implemented with abnf, in order of complexity:

  • csv
  • internet-timestamp
  • json-abnf
  • mbox
  • smtp
  • internet-message
  • mime

Parsing date and time

(import abnf)

(define fws
  (concatenation
   (optional-sequence 
    (concatenation
     (repetition wsp)
     (drop-consumed 
      (alternatives crlf lf cr))))
   (repetition1 wsp)))

(define (between-fws p)
  (concatenation
   (drop-consumed (optional-sequence fws)) p 
   (drop-consumed (optional-sequence fws))))

;; Date and Time Specification from RFC 5322 (Internet Message Format)

;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;;   Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional. 
			     
;; Match the abbreviated weekday names

(define day-name 
  (alternatives
   (lit "Mon")
   (lit "Tue")
   (lit "Wed")
   (lit "Thu")
   (lit "Fri")
   (lit "Sat")
   (lit "Sun")))

;; Match a day-name, optionally wrapped in folding whitespace

(define day-of-week (between-fws day-name))


;; Match a four digit decimal number

(define year (between-fws (repetition-n 4 decimal)))

;; Match the abbreviated month names

(define month-name (alternatives
		    (lit "Jan")
		    (lit "Feb")
		    (lit "Mar")
		    (lit "Apr")
		    (lit "May")
		    (lit "Jun")
		    (lit "Jul")
		    (lit "Aug")
		    (lit "Sep")
		    (lit "Oct")
		    (lit "Nov")
		    (lit "Dec")))

;; Match a month-name, optionally wrapped in folding whitespace

(define month (between-fws month-name))


;; Match a one or two digit number

(define day (concatenation
	     (drop-consumed (optional-sequence fws))
	     (alternatives 
	      (variable-repetition 1 2 decimal)
	      (drop-consumed fws))))

;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))

;; Match a two-digit number 

(define hour      (repetition-n 2 decimal))
(define minute    (repetition-n 2 decimal))
(define isecond   (repetition-n 2 decimal))

;; Match a time-of-day specification of hh:mm or hh:mm:ss.

(define time-of-day (concatenation
		     hour (drop-consumed (char #\:))
		     minute (optional-sequence 
			     (concatenation (drop-consumed (char #\:))
 					 isecond))))

;; Match a timezone specification of the form
;; +hhmm or -hhmm 

(define zone (concatenation 
	      (drop-consumed fws)
	      (alternatives (char #\-) (char #\+))
	      hour minute))

;; Match a time-of-day specification followed by a zone.

(define itime (concatenation time-of-day zone))

(define date-time (concatenation
		   (optional-sequence
		    (concatenation
		     day-of-week
		     (drop-consumed (char #\,))))
		   date
		   itime
		   (drop-consumed (optional-sequence fws))))

(define (err s)
  (print "lexical error on stream: " s)
  `(error))

(print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))

Version History

  • 8.3 Removed unneeded dependency on yasos [thanks to Mario Domenech Goulart]
  • 8.0 Ported to CHICKEN 5 and yasos collections interface
  • 7.0 Added bind* variant of bind [thanks to Peter Bex]
  • 6.0 Using utf8 for char operations
  • 5.1 Improvements to the CharLex->CoreABNF constructor
  • 5.0 Synchronized with lexgen 5
  • 3.2 Removed invalid identifier :|
  • 3.0 Implemented typeclass interface
  • 2.9 Bug fix in consumed-objects (reported by Peter Bex)
  • 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
  • 2.6 Bug fixes in consumer procedures
  • 2.5 Removed procedure memo
  • 2.4 Moved the definition of bind and drop to lexgen
  • 2.2 Added pass combinator
  • 2.1 Added procedure variable-repetition
  • 2.0 Updated to match the interface of lexgen 2.0
  • 1.3 Fix in drop
  • 1.2 Added procedures bind drop consume collect
  • 1.1 Added procedures set and set-from-string
  • 1.0 Initial release

License

Copyright 2009-2021 Ivan Raikov

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

A full copy of the GPL license can be found at http://www.gnu.org/licenses/.