Skip to content

Latest commit

 

History

History
229 lines (140 loc) · 9.12 KB

.verb.md

File metadata and controls

229 lines (140 loc) · 9.12 KB

globbing

Cheatsheet and introduction to "globbing", a programming concept that involves the use of wildcards and special characters for matching and filtering.

Table of contents

Contributing

We love contributors!

Pull requests to add documentation, links to tools, corrections or anything else are warmly accepted and gratefully appreciated!

Too busy to contribute directly, but still want to show your support? Please consider starring this project or tweeting about it!

What is "globbing"?

The term "globbing", also referred to as "glob matching" or "filepath expansion", is a programming concept that describes the process of using wildcards, referred to as "glob patterns" or "globs", for matching file paths or other similar sets of strings.

Similar to regular expressions, but much simpler and limited in scope, glob patterns are defined using a combination of special characters, or wildcards, alongside literal (non-matching) characters. For example, the glob pattern *.txt would match all files in a directory with a .txt file extension.

Globbing syntax

TODO: describe wildcards vs. extended globbing, and segue to following sections

  • wildcards
  • extended globbing

Wildcards

The commonly supported characters supported across globbing implementations for basic wildcard matching are *, ? and a simplified version of regex brackets for matching any of a given set of characters.

Although many different globbing implementations exist across a number of different languages and environments, the following table summarizes the most commonly supported basic globbing features.

Character Description
* Matches any character zero or more times, except for /.
** Matches any character zero or more times, including /.
? Matches any character one time
[abc] Matches any of the specified characters (in this case, a, b or c)

Special exceptions:

  • * typically does not match dotfiles (file names starting with a .) unless explicitly enabled by the user via options
  • ? also typically does not match the leading dot
  • More than two stars in a glob path segment are typically interpreted as a single star (e.g. /***/ is the same as /*/)

Implementations

Globbing is typically enabled using third-party libraries. A notable exception, [bash][], provides built-in support for basic globbing.

TODO: add list of implementations

Additional reading

Extended globbing

In addition to wildcard matching, extended globbing describes the addition of more advanced matching features, such as:

  • brace expansion
  • extglobs
  • POSIX character classes
  • regular expressions

TBC...

brace expansion

In simple cases, brace expansion appears to work the same way as the logical OR operator. For example, (a|b) will achieve the same result as {a,b}.

Here are some powerful features unique to brace expansion (versus character classes):

  • range expansion: a{1..3}b/*.js expands to: ['a1b/*.js', 'a2b/*.js', 'a3b/*.js']
  • nesting: a{c,{d,e}}b/*.js expands to: ['acb/*.js', 'adb/*.js', 'aeb/*.js']

TBC...

Visit the [braces][braces] library for more examples and information about brace expansion.

extglobs

TBC...

As described by the bash man page:

pattern regex equivalent matches
?(foo) (foo)? zero or one occurrence of the given patterns
*(foo) (foo)* zero or more occurrences of the given patterns
+(foo) (foo)+ one or more occurrences of the given patterns
@(foo) (foo) * one of the given patterns
!(foo) ^(?:(?!(foo)$).*?)$ anything except one of the given patterns

* Note that @ isn't a RegEx character.

Example implementations

  • extglob - extended glob parser and matcher for node.js

POSIX character classes

POSIX character classes, or "bracket expressions", provide a way of defining regular expressions using something closer to plain english.

For example, the pattern [[:alpha:][:digit:]] would match a1, but not aa.

TBC...

Example implementations

  • expand-brackets, node.js API for parsing and matching POSIX bracket expressions

Regular expressions

This section describes matching using regular expressions as it relates to globbing.

regex character classes

When regex character classes are used in glob patterns, with the exception of brace expansion ({a,b}, {1..5}, etc), most of the special characters convert directly to regex, so you can expect them to follow the same rules and produce the same results as regex.

For example, given the list: ['a.js', 'b.js', 'c.js', 'd.js', 'E.js']:

  • [ac].js: matches both a and c, returning ['a.js', 'c.js']
  • [b-d].js: matches from b to d, returning ['b.js', 'c.js', 'd.js']
  • [b-d].js: matches from b to d, returning ['b.js', 'c.js', 'd.js']
  • a/[A-Z].js: matches and uppercase letter, returning ['a/E.md']

However, there is

Learn about [regex character classes][character-classes].

regex groups

Given ['a.js', 'b.js', 'c.js', 'd.js', 'E.js']:

  • (a|c).js: would match either a or c, returning ['a.js', 'c.js']
  • (b|d).js: would match either b or d, returning ['b.js', 'd.js']
  • (b|[A-Z]).js: would match either b or an uppercase letter, returning ['b.js', 'E.js']

As with regex, parentheses can be nested, so patterns like ((a|b)|c)/b will work. But it might be easier to achieve your goal using brace expansion.

Common options

The following options are commonly available on various globbing implementations.

Option name Description
extglob Enable extended globs. In addition to the traditional globs (using wildcards: *, *, ? and [...]), extended globs add (almost) the expressive power of regular expressions, allowing the use of patterns like `foo/!(a
dotglob Allows files beginning with . to be included in matches. This option is automatically enabled if the glob pattern begins with a dot. Aliases: dot (supported by: [minimatch][], [micromatch][])
failglob report an error when no matches are found
globignore allows you to specify patterns a glob should not match Aliases: ignore (supported by: [minimatch][], [micromatch][])
globstar recursively match directory paths (enabled by default in [minimatch][] and [micromatch][], but not in [bash][])
nocaseglob perform case-insensitive pathname expansion
nocasematch perform case-insensitive matching. Aliases: nocase (supported by: [minimatch][], [micromatch][])
nullglob when enabled, the pattern itself will be returned when no matches are found. Aliases: nonull (supported by: [minimatch][], [micromatch][])

Caveats

WIP

Like regular expressions, glob patterns are a type of regular language that must first be interpreted by a computer program before any actual matching takes place. This introduces two areas of risk:

  • interpreting patterns:
  • performing matches:

Interpreting patterns

TODO (The process of parsing and compiling a glob pattern into a regular expression...)

Performing matches

TODO (The process of matching the compiled regular expression against a set of strings)

Risks

  • Intentional denial-of-service (DoS) attacks from hackers
  • Unintentional denial-of-service (DoS) attacks from agressively greedy wildcard patterns

Resources

Learn more about globbing.

Guides and documentation

Tools and software

Name Programming language
[bash-glob][] JavaScript/node.js
[brace-expansion][] JavaScript/node.js
[braces][] JavaScript/node.js
[expand-brackets][] JavaScript/node.js
[glob][] JavaScript/node.js
[micromatch][] JavaScript/node.js
[minimatch][] JavaScript/node.js
[nanomatch][] JavaScript/node.js

Related concepts

The following concepts are similar to or include the concept of globbing:

TODO

  • cheatsheet
  • what is globbing
  • wildcards
  • extglobs
  • posix brackets
  • braces
  • options