Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Near-Future Work #370

Open
milseman opened this issue Apr 29, 2022 · 0 comments
Open

Near-Future Work #370

milseman opened this issue Apr 29, 2022 · 0 comments

Comments

@milseman
Copy link
Collaborator

milseman commented Apr 29, 2022

I want to gather up many areas of near-future work that we've been clarifying through the proposal reviews.

Loose categorization:

Language and integration

  • Ability to use a String-backed, CaseIterable enum as a regex component
  • Define errors types for compilation and type mismatches
  • Callouts from literals
  • A Regex-backed enum that will construct a ChoiceOf all cases in order

API

  • Ability to map over a regex, perhaps per-capture, to supply post-processing transforms at regex declaration time
  • A modifier on a regex to convert it to matches-anywhere semantics
    • E.g. regex.matchingAnywhere => Regex { /.*?/ ; regex ; /.*/ }.
    • But we'd preserve the matched range, i.e. reset start/end position
  • Character alignment queries
    • API for whether start/end is Character-aligned for whole match and each capture
  • API to query options (e.g. is this case insensitive?)
  • API for (?n), could be nice to strip out captures you don't care about, especially for type erased regexes.
    • compilation error if there are back-references or it if changes the semantics of the program

Algorithms

  • Add a replace(_:withTemplate:) method that recognizes $1 or \1 placeholders
  • A separator-preserving split variant
  • Suffix / from-the-end operations (trim etc)
  • Customize search

String and Unicode

  • Add unsupported Unicode properties to Unicode.Properties and support in regexes
  • Add Unicode.AllScalars as a public type (semi-tangential)
  • Add var Substring.range: Range<String.Index> to simplify getting the range of a capture group
  • Inits for making a NFC string from UTF-8
  • String.lines() and String.words()
  • Add option for canonical equivalence in scalar-semantic mode

Dynamic Regex API

  • Add a capture-description API to all regexes
    • some RAC of capture, which has a type and optionality
  • Missing match conversions
    • Regex<T>.Match.init?(_:ARO)
    • Regex<T>.Match.init?(_:Regex<ARO>.Match)

Builders

  • A high-level helper for separated/quoted repetitions, e.g Repeat(separator: \.whitespace) { ... }
  • A helper for repeated matching lookahead and negative lookahead, e.g. Repeat(while:) Repeat(whileNot:)
    • Until(negLookaheadCondition) { ... }
  • A func compile() throws to explicitly trigger compilation and get errors, such as quantifying the unquantifiable
    • This is useful when composing regexes together to check the final result instead of trapping at run time.
  • Default Reference capture type to Substring.self

Engine

  • Engine limiters, low-level backtracking control and timeouts
  • Provide a way to access all values of a repeated capture (e.g. subscribe)
  • Conditionals (?(x)...) (requires updated parsing)
  • Quoted string inside custom character classes (e.g. [a-z\q{ch}])

Parser

  • Support for duplicate group names through (?J) (requires figuring out typed captures)
  • Support for branch reset alternations (?|) (parsing is implemented, but requires figuring out typed captures)
  • Parsing of conditionals (?(x)...) in accordance to what is in the syntax proposal (we currently parse the condition differently)
    • Including interpolation conditions (?(?{...}))
    • Conditional conditions don't capture on their own, only for child nodes e.g (?((x))x). .NET also forbids named capture conditions, we should ban that.
    • Stop parsing named reference conditions for (?(x)...)
    • Don't allow (?(DEFINE)) to have a false branch
  • Support for regex property values \p{key=/regex/}
  • Support for transform matching e.g \p{toNFKC_Casefold=@toNFKC@}
  • Support for alternative character property separators?
    • UTS#18 suggests key≠value, key!=value
    • Perl allows key:value
  • Support a** syntax as explicitly eager quantification
    • I.e. it's not affected by API to change default quantification kind, (probably) not affected by (?U)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant