emailbook.janet (including benchmark) #1419
Replies: 3 comments 8 replies
-
I was looking near the top of the (try (import ./deps/base64 :as base64)
([err fib] (import spork/base64)))
(try (use judge)
([err fib] (defn test [&] nil))) I don't think that will work. I tried putting the following in a file named (try
(use judge)
([_]
(defn test [&] nil)))
(print test) When I ran it with
There might be a better way to address the (defn test [&] :a)
(protect (use judge))
(print (test :smile)) May be not the prettiest thing. Perhaps @ianthehenry has a better idea. |
Beta Was this translation helpful? Give feedback.
-
Hey cool! Some notes on the implementation: You could use compound PEGs instead of unquoting components: (def bracket-email
~{:main (sequence "<" :plain-email ">")
:plain-email (sequence (some :address-char) "@" (some :address-char))
:address-char (if-not (+ :s (set ",<>@\"'")) 1)}) For small examples this doesn't really matter, but with this approach you only compile the Though I see that you dynamically create some PEGs at runtime though that reference these so maybe that's tricky? But also you might have a better time if you avoid dynamically creating PEGs. e.g. your (def peg (peg/compile ~(* "foo: " (cmt (* (argument 0) '(to -1)) ,=))))
(peg/match peg "foo: bar" 0 "bar")
(peg/match peg "foo: bar" 0 "baz") Although that specific pattern is simple enough that not using a PEG would probably be faster. But maybe it would be useful elsewhere in the script? Though I kinda think moving value checks outside of the PEG would be even better. E.g. parse a string, then compare it against the expected mailbox in Running (def pattern (peg/compile
~{:main (* :charset '(to "?=") "?=")
:charset (* "=?" (+ "UTF" "utf") "-8?B?")}))
(defn decode-utf8-base64 [line]
(peg/replace-all pattern (fn [_ bytes] (base64/decode bytes)) line)) Having read through the whole script I see that you have a lot of dynamically-constructed PEGs so maybe it might be tricky to precompile all of them. But if you wanted to try to optimize it I think that that would be a good place to look -- make PEGs just do parsing, and do value checking outside of that. This probably won't make much difference if you're running this as an interpreted script (though that depends on how many times you call those functions). You use Entirely subjective but: (if quoted-mailbox
(break (string (first quoted-mailbox) (second quoted-mailbox)))) Could be: (if-let [[first second] quoted-mailbox]
(break (string first second)) And another entirely subjective thing: (def mailbox-sanitized (sanitize (decode-iso8859-q (decode-utf8-q (decode-utf8-base64 mailbox)))))
(def mailbox-sanitized (-> mailbox decode-utf8-base64 decode-utf8-q decode-iso8859-q sanitize)) I would expect that Meta thing but you could use GNU |
Beta Was this translation helpful? Give feedback.
-
saikyun/janet-profiling helped me to identify further bottlenecks. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
this my first real Janet project: emailbook-janet: A minimalistic address book for e-mails only (mainly for aerc)
Background
I use aerc for e-mails. aerc doesn't have an address book or auto-completion for e-mail addresses built-in. But you can configure an external tool for this job. When I searched such a tool, I found and picked aercbook: Minimalistic address book for aerc. aercbook does a great job but it doesn't always behave like I want. For example, it doesn't autocomplete e-mail addresses when there is a display name for this contact. (Of course, it inserts the e-mail address as well but you can't search for it.) This is why I first wrote a single shell script wrapper to grep the address book. Then I reimplemented everything as a (POSIX) shell script (with different behavior here and there): emailbook: A minimalistic address book for e-mails only (mainly for aerc). I'm happy with it. It works well and fast enough for what it was made for: Parsing a single e-mail for e-mail addresses and filtering the address book.
But then I tried parsing thousands of old e-mails on my hard disk at once. And here it showed how slow the shell script implementation is. Then I thought this might be a good use case for Janet. I simply wanted to see how easy or hard it is: Rewriting this in Janet in general and replacing all these regular expressions with parsing expression grammar. And of course how well it would perform.
Benchmark
Parse 1000 e-mails using these tools:
Two methods:
and pipe this stream into a single instance of the tool.
(This does not work for aercbook because it stops after finding one
To:
,From:
, ... header field.)UPDATE: In case of
emailbook-janet
, this loop is the bottleneck.This is why I have added a new option
--parse-files
to read a list offilenames from stdin. Then
emailbook-janet
open these files directly.(This applies both to the script and the compiled version.)
Durations in seconds
Updated numbers after applying the ideas from the comments and also improving the shell script emailbook:
As you can see, Janet does a good job!
Review
If you are more experienced with Janet than I am (the chances for this are good), maybe you could have a look at my Janet script and tell me how I could do better.
Regarding the custom argument parsing: I'm planning to replace it with ianthehenry/cmd: command-line argument parser for Janet.
Beta Was this translation helpful? Give feedback.
All reactions