Skip to content

Merge shell scripts using a special comment tag, allowing for modular shell script development

License

Notifications You must be signed in to change notification settings

wpyoga/merge-shell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

merge-shell

This utility merges modular shell scripts into one big shell script for easy distribution. The merged shell script is a singular shell script that can be streamed using curl and executed directly by sh, like this:

$ curl https://example.com/script.sh | sh

I have set up a few forked repositories to showcase merge-shell functionality:

Overview

Sometimes, shell utilities are developed in a monolithic manner, with a big script containing all the code to the project. It's understandable that some developers have chosen this development pattern, because it's easier to distribute a single shell script, rather than multiple files. A user can simply use a one-liner like this to execute the script directly:

$ curl https://github.com/wpyoga/script-example/my_script.sh | sh

The shell script can be very long, sometimes reaching thousands of lines in a single file. Functions are often interspersed with calling code. This kind of development pattern may cause a few problems:

  • the code becomes difficult to reason about
  • developers have to navigate a big shell script just to find and edit a section of the code
  • as a result, it is difficult to maintain the code, so bug fixes and feature additions become slower over time
  • it is often impossible to do unit tests
  • another side effect is that new developers can sometimes struggle to understand the code, thus raising the bar for potential contributors
  • during development, accidental keystrokes changing unintended parts of the script might go unnoticed, especially when the changeset is quite large -- leading to hard-to-find bugs

In contrast, projects written in other languages are often written in a modular way. Projects written in compiled languages (C, Java, Rust, ...) are most of the time modular, and even scripting languages like Python and JavaScript are usually written in a modular way. This really helps with the readability and maintainability of the code.

It's not impossible to develop shell scripts in a modular way, and I've actually read about another approach of self-extracting shell script. In that scenario, script components are packaged in a compressed tarball, which is then embedded as a base64-encoded string inside the script. When the script is run, it decodes the string and extracts the archive, the contents of which is then executed. I remember that it was hosted on GitHub -- however, after searching for a few days, I still couldn't find the project. Please let me know if you have information about that project.

Mechanism

Our solution, merge-shell takes a single shell script as its argument, parses it, and prints the merged script to stdout. It works by parsing annotations inside comments, so that the split scripts are still valid shell scripts, and can work just like the merged script. This is useful during development, enabling the developer to test the script without merging them.

There are currently 3 defined annotations:

  • @MERGE for merging sub-scripts into the main script
  • @HEREDOC and @HEREDOC-END to maintain proper indentation for here-documents
  • @MULTILINE and @MULTILINE-END to maintain proper indentation for multi-line strings, respectively

With @HEREDOC and @MULTILINE annotations, merge-shell does not need to parse shell syntax to recognize here-documents and multi-line strings.

Please let me know if you see any other shell constructs that need their own custom annotations.

@MERGE

The @MERGE annotation means that the following dot line is to be merged into the main script. The merging is done recursively, and the indentation is carried over, so that the sourced file will be indented just like the dot line. The annotation itself is not carried over to the output script.

The @MERGE annotation should match this regex:

^\s*#\s*@MERGE\s*$

The dot line should match this regex:

^\s*\.\s+[a-zA-Z0-9_\.][a-zA-Z0-9_\.\-]+\s*$

If the next line does not match the regex, then it will be treated as any other line.

Example usage:

# @MERGE
. options.sh

if [ -f "/etc/myfile.conf" ]; then
    # @MERGE
    . load-custom.sh
fi

Notes:

  • In this mode, merge-shell recursively merges each sourced script, recursively indented according to the source line indentation.
  • For each sourced script, the shebang and one empty line immediately following the shebang are discarded.
  • For simplicity, the sourced script file name can only consist of letters, numbers, underscore, dot, or dash. And it cannot start with a dash.
  • If the @MERGE annotation is not specified, then the file will not be merged. This may be useful when sourcing config files at runtime.

@HEREDOC and @HEREDOC-END

This annotation marks the start and end of a here-document. Between @HEREDOC and @HEREDOC-END, only the first line is indented, and all the other lines are not indented. Thus ensuring that the here-document contents are correct, and that the here-document EOF marker can be identified correctly.

The annotations should match these regexes:

^\s*#\s*@HEREDOC\s*$
^\s*#\s*@HEREDOC-END\s*$

Note that if the @HEREDOC-END annotation is not given, then the rest of the sub-script file is treated as a here-document, thus not indented at all.

Example usage:

    # @HEREDOC
    cat >custom.conf <<EOF
[default]
option1 = true
option2 = 2.5
EOF
    # @HEREDOC-END

Notes:

  • Avoid having a line that matches the HEREDOC-END line inside the here-document, otherwise the HEREDOC mode will end prematurely, and the following lines will be indented, thus breaking the here-document syntax.

@MULTILINE and @MULTILINE-END

This annotation has the same effect as @HEREDOC -- only the first line is indented. The annotation should match these regular expressions:

^\s*#\s*@MULTILINE\s*$
^\s*#\s*@MULTILINE-END\s*$

Example usage:

    # @MULTILINE
    echo '
# ip config
ip 192.168.0.105
netmask 255.255.255.0' >ip-set.conf
    # @MULTILINE-END

Caveats

Files that don't have an ending newline won't be read properly. According to POSIX:

3.206 Line

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

If the last line does not have a terminating <newline> character, it won't be recognized as a line, thus read won't be able to read it, and will not be processed. The proper solution is to always have a terminating <newline>.

This problem is observed on files created using Windows. To avoid it, always add an extra newline at the end of source files, like this:

line 1
line 2
last line is an empty line

Instead of just this:

line 1
line 2
last line without extra newline

Alternative Implementation(s)

It might be a good idea to reimplement this using Python.

TODO

The code needs to be refactored.

About

Merge shell scripts using a special comment tag, allowing for modular shell script development

Resources

License

Stars

Watchers

Forks

Packages

No packages published