Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: add string match for easily handling sub-string. [merged, looking for feedback] #113

Open
ccqpein opened this issue Dec 29, 2023 · 10 comments

Comments

@ccqpein
Copy link
Contributor

ccqpein commented Dec 29, 2023

I was thinking if the cl-str can "pattern match" the string like some other languages' match case.

So I write my own version (like example below), what do you guys think? Is that fit the cl-str's purpose? I checked the doc and there is a string-case. Should I change the name of the macro? Thanks!

(defun expand-match-branch (str block patterns forms)
  (case patterns
    ((t 'otherwise) `(progn ,@forms))
    (t (loop with regex = '("^")
            and vars = '()
            for x in patterns
            do (cond ((stringp x)
                      (push x regex))
                     ((symbolp x)
                      (push "(.*)" regex)
                      (push x vars))
                     (t (error "only symbol and string allowed in patterns")))
            finally (push "$" regex)
            finally (return (let ((whole-str (gensym))
                                  (regs (gensym)))
                              `(multiple-value-bind (,whole-str ,regs)
                                   (cl-ppcre:scan-to-strings
                                    ,(apply #'str:concat (reverse regex))
                                    ,str)
                                 (declare (ignore ,whole-str))
                                 (when ,regs
                                   (let ,(reverse vars)
                                     ,@(loop for ind from 0 below (length vars)
                                             collect `(setf ,(nth ind (reverse vars))
                                                            (elt ,regs ,ind)))
                                     (return-from ,block
                                       (progn ,@forms)))))))))))

(defmacro str-match (str &rest match-branches)
  (let ((block-sym (gensym)))
    `(block ,block-sym
       ,@(loop for statement in match-branches
               collect (expand-match-branch
                        str
                        block-sym
                        (nth 0 statement)
                        (cdr statement))))))
CL-USER> (macroexpand-1 '(str-match sss
                     (("a" b "c") (parse-integer b))
                     (("a" x "c" y "b") (print (parse-integer x)) (print (parse-integer y)) (list (parse-integer x) (parse-integer y)))
                     (t (print "aa"))
                     ))
(BLOCK #:G415
  (MULTIPLE-VALUE-BIND (#:G416 #:G417)
      (CL-PPCRE:SCAN-TO-STRINGS "^a(.*)c$" SSS)
    (DECLARE (IGNORE #:G416))
    (WHEN #:G417
      (LET (B)
        (SETF B (ELT #:G417 0))
        (RETURN-FROM #:G415 (PROGN (PARSE-INTEGER B))))))
  (MULTIPLE-VALUE-BIND (#:G418 #:G419)
      (CL-PPCRE:SCAN-TO-STRINGS "^a(.*)c(.*)b$" SSS)
    (DECLARE (IGNORE #:G418))
    (WHEN #:G419
      (LET (X Y)
        (SETF X (ELT #:G419 0))
        (SETF Y (ELT #:G419 1))
        (RETURN-FROM #:G415
          (PROGN
           (PRINT (PARSE-INTEGER X))
           (PRINT (PARSE-INTEGER Y))
           (LIST (PARSE-INTEGER X) (PARSE-INTEGER Y)))))))
  (PROGN (PRINT "aa")))
T
CL-USER> (str-match "a1c5b"
(("a" b "c") (parse-integer b))
(("a" x "c" y "b") (print (parse-integer x)) (print (parse-integer y)) (list (parse-integer x) (parse-integer y)))
(t (print "aa"))
)

1 
5 
(1 5)
@vindarel
Copy link
Owner

vindarel commented Jan 3, 2024

Nice, that is pretty interesting.

With some indentation the snippet becomes

(str-match "a1c5b"
           (("a" b "c")
            (parse-integer b))
           (("a" x "c" y "b")
            (print (parse-integer x))
            (print (parse-integer y))
            (list (parse-integer x) (parse-integer y)))
           (t (print "aa")))

so by using &body instead of &rest we get this indentation:

(str-match "a1c5b"
  (("a" b "c")
   (parse-integer b))
  (("a" x "c" y "b")
   (print (parse-integer x))
   (print (parse-integer y))
   (list (parse-integer x) (parse-integer y)))
  (t (print "aa")))

Would you not use the Trivia library for pattern matching? It probably does this, and more.

What are users going to ask for pattern matching features after we introduce this one?

like some other languages' match case.

what are your favourite examples?

(and yes "string-match" might be better)

@ccqpein
Copy link
Contributor Author

ccqpein commented Jan 4, 2024

so by using &body instead of &rest we get this indentation:

Nice catch!

Would you not use the Trivia library for pattern matching? It probably does this, and more.

Gonna check it now.

@ccqpein
Copy link
Contributor Author

ccqpein commented Jan 5, 2024

I checked the trivia it looks good when I am trying to pattern matching the list like

(trivia:match '(1 2 3)
  ((list* 1 x _)
   x)
  ((list* _ x)
   x)) ;; => 2

but I have an issue when I run the string pattern. I am not sure because I am using sbcl or not (maybe because this?)

beside, I can match the whole string like

(trivia:match "a1c5b" ("a1c5b" 1))
;; or
(trivia:match "ab" ((vector #\a #\b) 1))

but not these:

(trivia:match "a1c5b" ((string "a1c" "5b") 1))

so look like I can only binding char rather than the sub-string like my purposal

@vindarel
Copy link
Owner

Let's use and try this macro. I'm interested in everybody's feedback.

A stupid test: I match like your example, but I don't use the matching variable, so I get style warnings:

(match "a1c5b"
       (("a" i "c")
        (print "got axc"))
       (("a" x "c" y "b")
        (print "got axcyb"))
       (t (print "default"))
       )
;; =>
;; ;   The variable I is assigned but never read.
;; (and for x and y)

Would it be possible to avoid the warnings? Using a _ placeholder?

@ccqpein
Copy link
Contributor Author

ccqpein commented Jan 24, 2024

Yes, I just try on my side. Will give PR soon.

@ccqpein
Copy link
Contributor Author

ccqpein commented Jan 25, 2024

Gave the PR #114

@vindarel
Copy link
Owner

I tried this more on an AOC problem (day 19), and OMG this match macro felt so powerful. Easier and faster than searching for the right regexp.

@vindarel
Copy link
Owner

Other quick test:

(str::match "123 hello 456"
             (("\\d+" s "\\d+")
              s)
             (t "nothing"))
;; =>" hello 45"

I didn't expect to see "45". The first number regex was correctly matched, not the second?

(str::match "123 hello 456"
             (("\\d+" s "\\d*")
              s)
             (t "nothing"))
;; " hello 456"

here I didn't expect "456".

@ccqpein
Copy link
Contributor Author

ccqpein commented Jan 31, 2024

@vindarel Just figure out fixing this issue need to write the un-greedy regex. I just fix it in the latest commit. Good catch!

@ccqpein
Copy link
Contributor Author

ccqpein commented Feb 9, 2024

The PR #114 is merged, I am not sure if we keep this idea issue open or not for future potential changes. I left this decision to repo owner.

@vindarel vindarel changed the title Idea: add string match for easily handling sub-string. Idea: add string match for easily handling sub-string. [merged, looking for feedback] Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants