Skip to content

ProtonMail/go-rfc5322

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Outline

The rfc5322 package implements a parser for address-list and date-time strings, as defined in RFC5322. It also supports encoded words (RFC2047) and has international tokens (RFC6532).

Generated code

The lexer and parser are generated using ANTLR4. The grammar is defined in the g4 files:

  • RFC5322Parser.g4 defines the parser grammar,
  • RFC5322Lexer.g4 defines the lexer grammar.

These grammars are derived from the ABNF grammar provided in the RFCs mentioned above, albeit with some relaxations added to support "nonstandard" (and in some cases, bad) input.

Running go generate generates a parser which recognises strings conforming to the grammar:

  • parser/rfc5322_lexer.go
  • parser/rfc5322parser_base_listener.go
  • parser/rfc5322_parser.go
  • parser/rfc5322parser_listener.go

The generated parser can then be used to convert a valid address/date into an abstract syntax tree.

Parsing

Once we have an abstract syntax tree, we must turn it into something usable, namely a mail.Address or time.Time.

The generated code in the parser directory implements a walker. This walker walks over the abstract syntax tree, calling a callback when entering and another when when exiting each node. By default, the callbacks are no-ops, unless they are overridden.

walker.go

The walker type extends the base walker, overriding the default no-op callbacks to do something specific when entering and exiting certain nodes.

The goal of the walker is to traverse the syntax tree, picking out relevant information from each node's text. For example, when parsing a mailbox node, the relevant information to pick out from the parse tree is the name and address of the mailbox. This information can appear in a number of different ways, e.g. it might be RFC2047 word-encoded, it might be a string with escaped chars that need to be handled, it might have comments that should be ignored, and so on.

So while walking the syntax tree, each node needs to ask its children what their "value" is. The mailbox needs to ask its child nodes (either a nameAddr node or an addrSpec node) what the name and address are. If the child node is a nameAddr, it needs to ask its displayName child what the name is and the angleAddr what the address is; these in turn ask word nodes, addrSpec nodes, etc.

Each child node is responsible for telling its parent what its own value is. The parent is responsible for assembling the children into something useful.

Ideally, this would be done with the visitor pattern. But unfortunately, the generated parser only provides a walker interface. So we need to make use of a stack, pushing on nodes when we enter them and popping off nodes when we exit them, to turn the walker into a kind of visitor.

parser.go

This file implements two methods, ParseAddressList(string) ([]*mail.Address, error) and ParseDateTime(string) (time.Time, error).

These methods set up a parser from the raw input, start the walker, and convert the walker result into an object of the correct type.

Example: Parsing dateTime

Parsing a date-time is rather simple. The implementation begins in date_time.go. The abridged code is below:

type dateTime struct {
	year   int
	...
}

func (dt *dateTime) withYear(year *year) {
	dt.year = year.value
}

...

func (w *walker) EnterDateTime(ctx *parser.DateTimeContext) {
	w.enter(&dateTime{
		loc: time.UTC,
	})
}

func (w *walker) ExitDateTime(ctx *parser.DateTimeContext) {
	dt := w.exit().(*dateTime)
	w.res = time.Date(dt.year, ...)
}

As you can see, when the walker reaches a dateTime node, it pushes a dateTime object onto the stack:

w.enter(&dateTime{
	loc: time.UTC,
})

and when it leaves a dateTime node, it pops it off the stack, converting it from interface{} to the concrete type, and uses the parsed dateTime values like day, month, year etc to construct a go time.Time object to set the walker result:

dt := w.exit().(*dateTime)
w.res = time.Date(dt.year, ...)

These parsed values were discovered while the walker continued to walk across the date-time node.

Let's see how the walker discovers the year. Here is the abridged code of what happens when the walker enters a year node:

type year struct {
	value int
}

func (w *walker) EnterYear(ctx *parser.YearContext) {
	var text string

	for _, digit := range ctx.AllDigit() {
		text += digit.GetText()
	}

	val, err := strconv.Atoi(text)
	if err != nil {
		w.err = err
	}

	w.enter(&year{
		value: val,
	})
}

When entering the year node, it collects all the raw digits, which are strings, then converts them to an integer, and sets that as the year's integer value while pushing it onto the stack.

When exiting, it pops the year off the stack and gives itself to the parent (now on the top of the stack). It doesn't know what type of object the parent is, it just checks to see if anything above it on the stack is expecting a year node:

func (w *walker) ExitYear(ctx *parser.YearContext) {
	type withYear interface {
		withYear(*year)
	}

	res := w.exit().(*year)

	if parent, ok := w.parent().(withYear); ok {
		parent.withYear(res)
	}
}

In our case, the date is expecting a year node because it implements withYear,

func (dt *dateTime) withYear(year *year) {
	dt.year = year.value
}

and that is how the dateTime data members are collected.

About

An RFC5322 address/date parser written in Go

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks