Skip to content

Commit

Permalink
Fix dotenv parsing inconsistency
Browse files Browse the repository at this point in the history
Cherry-picked from:

Add standard newline/quoting behavior to dotenv store (getsops#622)
4507019

Closes issue:
https://github.com/arrikto/rok/issues/5563

Rationale
=========

The dotenv store as it exists right now performs splitting on newlines
to determine where a new key-value pair or comment begins. This works
remarkably well, up until you need to handle values that contain
newlines.

While I couldn't find an offical dotenv file format spec, I sampled a
number of open-source dotenv parsers and it seems that they typically
apply the following rules:

Comments:

* Comments may be written by starting a line with the `#` character.

Newline handling:

* If a value is unquoted or single-quoted and contains the character
  sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`).

* If a value is double-quoted and contains the character sequence `\n`
  (`0x5c6e`), it IS decoded to a line feed (`0x0a`).

Whitespace trimming:

* For comments, the whitespace immediately after the `#` character and any
  trailing whitespace is trimmed.

* If a value is unquoted and contains any leading or trailing whitespace, it
  is trimmed.

* If a value is either single- or double-quoted and contains any leading or
  trailing whitespace, it is left untrimmed.

Quotation handling:

* If a value is surrounded by single- or double-quotes, the quotation marks
  are interpreted and not included in the value.

* Any number of single-quote characters may appear in a double-quoted
  value, or within a single-quoted value if they are escaped (i.e.,
  `'foo\'bar'`).

* Any number of double-quote characters may appear in a single-quoted
  value, or within a double-quoted value if they are escaped (i.e.,
  `"foo\"bar"`).

Because single- and double-quoted values may contain actual newlines,
we cannot split our input data on newlines as this may be in the middle
of a quoted value. This, along with the other rules around handling
quoted values, prompted me to try and implement a more robust parsing
solution. This commit is my first stab at that.

Special Considerations
======================

This is _not_ a backwards-compatible change:

* The `dotenv` files produced by this version of SOPS _cannot_ be read
  by an earlier version.

* The `dotenv` files produced by an earlier version of SOPS _can_ be
  read by this version, with the understanding that the semantics around
  quotations and newlines have changed.

Examples
========

The below examples show how double-quoted values are passed to the
running environment:

```console
$ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep FOO | xxd'
00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261  FOO=foo\nbar\nba
00000010: 7a0a                                     z.
```

```console
$ echo 'FOO="foo\nbar\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd'
00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a  FOO=foo.bar.baz.
```
  • Loading branch information
Karl Schriek committed Mar 8, 2022
1 parent 96daa0b commit 6fe356f
Show file tree
Hide file tree
Showing 4 changed files with 392 additions and 62 deletions.
20 changes: 12 additions & 8 deletions cmd/sops/subcommand/exec/exec.go
@@ -1,13 +1,14 @@
package exec

import (
"bytes"
"fmt"
"io/ioutil"
"os"
"runtime"
"strings"

"go.mozilla.org/sops/v3/logging"
"go.mozilla.org/sops/v3/stores/dotenv"

"github.com/sirupsen/logrus"
)
Expand Down Expand Up @@ -85,15 +86,18 @@ func ExecWithEnv(opts ExecOpts) error {
}

env := os.Environ()
lines := bytes.Split(opts.Plaintext, []byte("\n"))
for _, line := range lines {
if len(line) == 0 {
continue
}
if line[0] == '#' {
store := dotenv.Store{}

branches, err := store.LoadPlainFile(opts.Plaintext)
if err != nil {
log.Fatal(err)
}

for _, item := range branches[0] {
if item.Value == nil {
continue
}
env = append(env, string(line))
env = append(env, fmt.Sprintf("%s=%s", item.Key, item.Value))
}

cmd := BuildCommand(opts.Command)
Expand Down
311 changes: 311 additions & 0 deletions stores/dotenv/parser.go
@@ -0,0 +1,311 @@
package dotenv

// The dotenv parser is designed around the following rules:
//
// Comments:
//
// * Comments may be written by starting a line with the `#` character.
// End-of-line comments are not currently supported, as there is no way to
// encode a comment's position in a `sops.TreeItem`.
//
// Newline handling:
//
// * If a value is unquoted or single-quoted and contains the character
// sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`).
//
// * If a value is double-quoted and contains the character sequence `\n`
// (`0x5c6e`), it IS decoded to a line feed (`0x0a`).
//
// Whitespace trimming:
//
// * For comments, the whitespace immediately after the `#` character and any
// trailing whitespace is trimmed.
//
// * If a value is unquoted and contains any leading or trailing whitespace, it
// is trimmed.
//
// * If a value is either single- or double-quoted and contains any leading or
// trailing whitespace, it is left untrimmed.
//
// Quotation handling:
//
// * If a value is surrounded by single- or double-quotes, the quotation marks
// are interpreted and not included in the value.
//
// * Any number of single-quote characters may appear in a double-quoted
// value, or within a single-quoted value if they are escaped (i.e.,
// `'foo\'bar'`).
//
// * Any number of double-quote characters may appear in a single-quoted
// value, or within a double-quoted value if they are escaped (i.e.,
// `"foo\"bar"`).

import (
"bytes"
"fmt"
"io"
"regexp"
"strings"

"go.mozilla.org/sops/v3"
)

var KeyRegexp = regexp.MustCompile(`^[A-Za-z_]+[A-Za-z0-9_]*$`)

func parse(data []byte) (items []sops.TreeItem, err error) {
reader := bytes.NewReader(data)

for {
var b byte
var item *sops.TreeItem

b, err = reader.ReadByte()

if err != nil {
break
}

if isWhitespace(b) {
continue
}

if b == '#' {
item, err = parseComment(reader)
} else {
reader.UnreadByte()
item, err = parseKeyValue(reader)
}

if err != nil {
break
}

if item == nil {
continue
}

items = append(items, *item)
}

if err == io.EOF {
err = nil
}

return
}

func parseComment(reader io.ByteScanner) (item *sops.TreeItem, err error) {
var builder strings.Builder
var whitespace bytes.Buffer

for {
var b byte
b, err = reader.ReadByte()

if err != nil {
break
}

if b == '\n' {
break
}

if isWhitespace(b) {
whitespace.WriteByte(b)
continue
}

if builder.Len() == 0 {
whitespace.Reset()
}

_, err = io.Copy(&builder, &whitespace)

if err != nil {
break
}

builder.WriteByte(b)
}

if builder.Len() == 0 {
return
}

item = &sops.TreeItem{Key: sops.Comment{builder.String()}, Value: nil}
return
}

func parseKeyValue(reader io.ByteScanner) (item *sops.TreeItem, err error) {
var key, value string

key, err = parseKey(reader)
if err != nil {
return
}

value, err = parseValue(reader)
if err != nil {
return
}

item = &sops.TreeItem{Key: key, Value: value}
return
}

func parseKey(reader io.ByteScanner) (key string, err error) {
var builder strings.Builder

for {
var b byte
b, err = reader.ReadByte()

if err != nil {
break
}

if b == '=' {
break
}

builder.WriteByte(b)
}

key = builder.String()

if !KeyRegexp.MatchString(key) {
err = fmt.Errorf("invalid dotenv key: %q", key)
}

return
}

func parseValue(reader io.ByteScanner) (value string, err error) {
var first byte
first, err = reader.ReadByte()

if err != nil {
return
}

if first == '\'' {
return parseSingleQuoted(reader)
}

if first == '"' {
return parseDoubleQuoted(reader)
}

reader.UnreadByte()
return parseUnquoted(reader)
}

func parseSingleQuoted(reader io.ByteScanner) (value string, err error) {
var builder strings.Builder
escaping := false

for {
var b byte
b, err = reader.ReadByte()

if err != nil {
break
}

if !escaping && b == '\'' {
break
}

if !escaping && b == '\\' {
escaping = true
continue
}

if escaping && b != '\'' {
builder.WriteByte('\\')
}

escaping = false
builder.WriteByte(b)
}

value = builder.String()
return
}

func parseDoubleQuoted(reader io.ByteScanner) (value string, err error) {
var builder strings.Builder
escaping := false

for {
var b byte
b, err = reader.ReadByte()

if err != nil {
break
}

if !escaping && b == '"' {
break
}

if !escaping && b == '\\' {
escaping = true
continue
}

if escaping && b == 'n' {
b = '\n'
} else if escaping && b != '"' {
builder.WriteByte('\\')
}

escaping = false
builder.WriteByte(b)
}

value = builder.String()
return
}

func parseUnquoted(reader io.ByteScanner) (value string, err error) {
var builder strings.Builder
var whitespace bytes.Buffer

for {
var b byte
b, err = reader.ReadByte()

if err != nil {
break
}

if b == '\n' {
break
}

if isWhitespace(b) {
whitespace.WriteByte(b)
continue
}

if builder.Len() == 0 {
whitespace.Reset()
}

_, err = io.Copy(&builder, &whitespace)

if err != nil {
break
}

builder.WriteByte(b)
}

value = builder.String()
return
}

func isWhitespace(b byte) bool {
return b == ' ' || b == '\t' || b == '\r' || b == '\n'
}

0 comments on commit 6fe356f

Please sign in to comment.