Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dictionary .IfContainsKey() .IfNotContainsKey() #40

Open
amantinband opened this issue Mar 16, 2022 · 17 comments
Open

add dictionary .IfContainsKey() .IfNotContainsKey() #40

amantinband opened this issue Mar 16, 2022 · 17 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@amantinband
Copy link
Owner

Add the following extensions methods for dictionaries:

var dictionary = new Dictionary<string, string>
{
    ["key1"] = "value1",
    ["key2"] = "value2",
};

dictionary.Throw().IfContainsKey("key1");
dictionary.Throw().IfNotContainsKey("key3");

And the corresponding dictionary properties extension methods:

var dictionary = new Dictionary<string, string>
{
    ["key1"] = "value1",
    ["key2"] = "value2",
};

var foo = new { Values = dictionary };

foo.Throw().IfContainsKey(f => f.Values, "key1");
foo.Throw().IfNotContainsKey(f => f.Values, "key3");

We don't have dictionary validation yet, so this will hopefully be the start of a bigger feature 🚀

@amantinband amantinband added enhancement New feature or request good first issue Good for newcomers labels Mar 16, 2022
@beton-kun
Copy link

I want to develop this issue

@AndreasHoersken
Copy link

Oh boy - I'm sorry @beton-kun I did not see your first comment there today.

@AndreasHoersken
Copy link

I closed my first PR as the code only works with IDictionary-interface and not with the IDictionary<,> generic interface. The sample given in #40 works as the generic class Dictionary<,> implements the non-generic interface.

@beton-kun : Your comment was right, where is it? As an alternative to another generic parameter perhaps we could have two extension-methods (one for IDictionary, one for IDictionary<,>) or infer the generic arguments with reflection somehow? Any thoughts?

@beton-kun
Copy link

@AndreasHoersken listen, gtfo from my issue

@AndreasHoersken
Copy link

AndreasHoersken commented Mar 21, 2022

@beton-kun are you serious? No collaboration here? Have fun then. Looking forward to your solution.

@amantinband
Copy link
Owner Author

@AndreasHoersken listen, gtfo from my issue

+

image

=
I've blocked @beton-kun.

This is a library from the community for the community. No bullying or disrespectful comments will be tolerated.

@AndreasHoersken, let me poke at this a little and we'll collaborate on a solution for this?

@AndreasHoersken
Copy link

Wow. Strange guy. Thanks @mantinband . Sorry for the inconvenience. I'll be happy to work on it further. If you have any ideas on the solution, please let me know. I can only work on it in the evenings though.

@amantinband
Copy link
Owner Author

@AndreasHoersken I see the issue you ran into. I had a similar issue with #20.

There is a way to work around this, I prototyped it in the following PR: #25

Basically what we would have to do is:

  1. Update the Validatable struct to inherit from some interface which defines TValue as covariant
public readonly record struct Validatable<TValue>(TValue Value, string ParamName, ExceptionCustomizations? ExceptionCustomizations = null)
    : IValidatable<TValue> {}

public interface IValidatable<out TValue> {}
  1. Recieve the validatable as the interface. something like this:
    public static IValidatable<IDictionary<TKey, TValue>> IfContainsKey<TKey, TValue>(this IValidatable<IDictionary<TKey, TValue>> validatable, TKey key)
    {
        Console.WriteLine($"Contains key: {validatable.Value.ContainsKey(key)}");
        return validatable;
    }
  1. Then, all the following are supported:
IDictionary<string, string> dictionary1 = new Dictionary<string, string>
{
    { "key", "value" },
    { "key2", "value2" },
};

Dictionary<string, string> dictionary2 = new()
{
    { "key", "value" },
    { "key2", "value2" },
};

var dictionary3 = new ReadOnlyDictionary<string, string>(dictionary1);
var dictionary4 = new ConcurrentDictionary<string, string>(dictionary1);
// etc

The only issue with this solution is that when passing the Validatable struct to the method, it will be boxed, which will have runtime consequences..
It also means we cannot use this method interchangeably with the extension methods that receive a Validatable, i.e this won't work:

dictionary.Throw().IfContainsKey("key").If{other-methods}(dictionary2);

Unless we change all places to accept an IValidatable and then we have the issue that the Validatable will always be boxed, which again, isn't great.

I wonder if there is some way around the boxing, or maybe a different solution I'm missing. What do you think?

@AndreasHoersken
Copy link

@mantinband - Thank you for your deliberate analysis!

I tried out two things myself

public static ref readonly Validatable<TValue> IfContainsKey<TValue,TDictKey,TDictValue>(this in Validatable<TValue> validatable, TDictKey dictKey)
        where TValue : IDictionary<TDictKey,TDictValue>
        where TDictKey : notnull

and

    public static ref readonly Validatable<IDictionary<TDictKey, TDictValue>> IfContainsKey<TDictKey, TDictValue>(this in Validatable<IDictionary<TDictKey, TDictValue>> validatable, TDictKey dictKey)
        where TDictKey : notnull
    {
        Validator.ThrowIfContainsKey(validatable.Value, dictKey, validatable.ParamName, validatable.ExceptionCustomizations);

        return ref validatable;
    }

Both do not work because the missing covariant interface. What I do not get is why the implementation for collections does work, as it uses the same strategy:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static ref readonly Validatable<TValue> IfContains<TValue, TElement>(
        this in Validatable<TValue> validatable,
        TElement element)
        where TValue : notnull, IEnumerable<TElement?>
        where TElement : notnull
    {
        Validator.ThrowIfContainsElement(
            validatable.Value,
            element,
            validatable.ParamName,
            validatable.ExceptionCustomizations);

        return ref validatable;
    }

Even with subclasses of IEnumerable, i.e. List. Why isn't the missing covariant interface a problem here?

So after all, I think we need the covariant interface. Regarding your points:

  1. Boxing / Unboxing. It is not ideal as we have performance & memory overhead, but I do not think it is to bad in the context of error-handling either.
    Why is Validatable implemented as record struct after all? If it is performance, I personally would choose records or a simple classes over the boxing / unboxing - penalties we get. A little side effect here would be a better backportability to older .net-Versions.

  2. Regarding the changes: If we switched to the interface, I would do it for the whole solution to preserve the possibility of method-chaining. One reason more against the record structs.

@mantinband: What do you think? Is there another reason for using record structs beside performance benefits?

@amantinband
Copy link
Owner Author

@AndreasHoersken first of all thanks for taking the time to think about these issues, I'm happy to have you working with me on this project 🙏🏽

  1. How does it work with collections - Excellent question, actually 🤓😂 It was a lot of trial and error solving the similar collection issue.
  2. class vs struct: The reason behind using a struct is performance. I think for many projects, the decision on which validation library to use comes down to what has the least performance overhead.
    From the anecdotal benchmarks I ran a few weeks ago comparing Throw against the most common validation libraries, we were considerably faster than some and not slower than any. Following your reply I ran some benchmarks comparing using struct vs using a class & accepting it via the covariant interface 👇🏽:
    image
    struct not only has no GC overhead, it is also twice as fast.
    (I create a gist if you want to play around with it: https://gist.github.com/mantinband/f86084c72a414dbafa784a2b66033e6e)
  3. backward compatibility: My main motivation to create Throw was to produce a meaningful error response (String should not be longer than 3 characters. (Parameter 'user: u => u.Name') without having to pass any nameofs (nameof(user)) or messages (user: u => u.Name). This wasn't possible before .NET 6 introduced the CallerArgumentExpression, so unfortunately this library can't be backward compatible.

What do you think?

@AndreasHoersken
Copy link

AndreasHoersken commented Mar 25, 2022

@mantinband: I think I can learn a lot here, so its my pleasure :-). Thanks again for taking your time to explain things :-).

Ad 1.) I would have hoped that, if we understand the "simple case" with one generic param in collections, we could get a simple solution here... I will have another look on that over the weekend. Perhaps this one works because IEnumerable<> is itself covariant. That would not apply to IDictionary<>.

Ad 2.) Benchmark:
a) In your Benchmark, no boxing / unboxing does occur, does it? This would be interesting for our decision how to proceed on this issue. I will add this to your gist and share the results over the weekend.
b) Even if the record struct is twice as fast as the record class in this (simple) benchmark, I'm asking myself if it would matter in a real world scenario. So if I think of a real world case in a cloud environment with much communication going over the network, I don't belief that 2 ns is a critical point here. Same holds for 99.9% of projects, I think.
If, on the other side, we have a performance-critical usecase, I would think that you would not use validation-methods, where i.e. whole collections are enumerated to get the count (if I could get it with a list.length otherwise). I even would argue that you would omit exception-handling all together and move it to a non performance-critical place, if possible.

All in all, it feels a little bit like a case of premature optimization. But then again, I don't know the domain.
Perhaps #35 would be a good starter to go into this topic a little further.

Ad 3.) Thats interesting. I did not have the CallerArgumentExpression in mind.


Edit: The statement regarding enumerating the whole collections ist not correct. I did not see the pattern matching expression in GetCollectionCount, but the pattern matching also comes to a price (see next comment).

@AndreasHoersken
Copy link

AndreasHoersken commented Mar 26, 2022

@mantinband : Here are my results:

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ThrowStruct 36.81 ns 0.250 ns 0.234 ns 1.00 0.00 - -
ThrowStructOneBoxing 69.80 ns 0.758 ns 0.633 ns 1.90 0.01 0.0254 80 B
ThrowStructAllBoxing 123.48 ns 2.504 ns 2.219 ns 3.35 0.06 0.0253 80 B
ThrowRecord 90.65 ns 1.799 ns 2.637 ns 2.48 0.08 0.0254 80 B
ThrowStructRandom 23,274.13 ns 83.025 ns 69.330 ns 632.51 4.14 1.3123 4,144 B
ThrowRecordRandom 23,045.62 ns 144.028 ns 120.270 ns 626.29 4.01 1.3428 4,224 B

I added the following results to your tests:
1.) ThrowStructOneBoxing / ThrowStructAllBoxing: Add boxing to the last element of the chain / to the whole chain. One can easily see, that the record-results lie in between those two results.
2.) ThrowStructRandom / ThrowRecordRandom: Here, a 1000 element long, randomly generated string is used. Interestingly enough, struct and record are almost identical in speed. What I don't quite understand is, why there is allocated memory for the struct.


I ran a second set of tests, where I removed the usages of list.length with the GetCollectionCount implementation to see the impact of the pattern matching. That also represents the current implementation of the throw-library better.

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ThrowStruct 37.31 ns 0.193 ns 0.171 ns 1.00 0.00 - -
ThrowStructOneBoxing 77.62 ns 1.335 ns 1.183 ns 2.08 0.04 0.0254 80 B
ThrowStructAllBoxing 147.80 ns 2.965 ns 3.414 ns 3.99 0.10 0.0253 80 B
ThrowRecord 137.66 ns 1.068 ns 0.892 ns 3.69 0.03 0.0253 80 B
ThrowStructRandom 23,199.00 ns 162.635 ns 144.172 ns 621.87 4.69 1.3123 4,144 B
ThrowRecordRandom 23,352.66 ns 193.710 ns 171.719 ns 625.99 5.10 1.3428 4,224 B

3.) The patternmatching has almost no effect on the structs, whereas the record-implementations perform a little bit worse.
4.) The results for the longer, random-strings do not differ significantly.


Getting back to our originating discussion about the containskey-implementation, I, personally would go with the interface and the record. I think the library will be used if you have a rich feature-set which is easy and concise to use and has only little dependencies. Performance comes second and we can optimize for it when we have real world use cases.

But, if you are sure we have to keep the highest performance level in mind, we should only use the boxing / unboxing where it is not possible otherwise as in the dictionary case.

@mantinband: What do you think?

I will work on the test-implementation a little bit more and share it with #35. e

@dwakel
Copy link
Contributor

dwakel commented Apr 2, 2022

Learning a lot here!!!

@amantinband
Copy link
Owner Author

amantinband commented Apr 11, 2022

Thanks for the benchmarks @AndreasHoersken, will you be able to share the source code?

Regarding runtime implications, I agree that for most projects and for the majority of scenarios inside them this optimization is premature and wouldn't have a noticeable impact. That said, I think it is important for the success of the library for the following reasons:

  1. Many repositories (especially across MS) have argument validation on many/all methods. Combined, I would estimate that for a single request there would be around 100 argument validations executed for simple CRUD services and hundreds per request on more complex projects.
  2. My main performance concern is not the validation, but rather the GC. We would be creating many objects in Gen 0 which means more GC. I haven't profiled this, but I'm sure this hurts the overall QPS.
  3. When working on the original spec I asked around to get some feedback. The main concern was usually around performance. I think when adding a library that is used so widely across a project, even if irrational, a twice-as-fast library has a big advantage.

Would love to hear your opinion. I also hope to play around with this in the next couple of days, maybe there is something creative that can be done here that I've missed

@AndreasHoersken
Copy link

Hi @mantinband, here you can find the code: https://gist.github.com/AndreasHoersken/4de67a3da3a91b5ae6b4f7afe41977fa
I changed the tests a little. In my tests above, the throw*random-methods created the random string in the benchmark-method itself. To make it clearer that the longer runtimes are due to this "workload" and not due to the random generated string, I renamed them with the "-workload" suffix and added methods which work on the same string without the string-generation in the benchmark method.

Method Mean Error StdDev Ratio RatioSD Gen 0 Allocated
ThrowStruct 37.79 ns 0.235 ns 0.220 ns 1.00 0.00 - -
ThrowStructOneBoxing 80.39 ns 0.837 ns 0.699 ns 2.13 0.02 0.0254 80 B
ThrowStructAllBoxing 166.89 ns 3.355 ns 8.660 ns 4.19 0.12 0.0253 80 B
ThrowRecord 132.78 ns 2.684 ns 7.915 ns 3.60 0.14 0.0253 80 B
ThrowStructRandomW 40.02 ns 0.837 ns 0.653 ns 1.06 0.02 - -
ThrowRecordRandom 125.98 ns 1.730 ns 1.534 ns 3.33 0.05 0.0253 80 B
ThrowStructRandomWorkload 22,711.39 ns 225.619 ns 176.149 ns 600.24 4.88 1.4343 4,536 B
ThrowRecordRandomWorkload 23,484.05 ns 435.699 ns 447.431 ns 621.04 13.68 1.4648 4,616 B

@truegoodwill
Copy link
Contributor

Why not just write the functionality once for each type, treating IDictionary and IDictionay<,> as differently to each other as int and bool?

public static ref readonly Validatable<TValue> IfContainsKey<TValue, TKey>(
    this in Validatable<TValue> validatable,
    TKey key,
    [CallerArgumentExpression("key")] string? keyParamName = null)
    where TValue : IDictionary
    where TKey : notnull
{
    if (validatable.Value.Contains(key))
    {
        // throw
    }

    return ref validatable;
}

public static ref readonly Validatable<TValue> IfContainsKey<TValue, TKey, TDictValue>(
    this in Validatable<TValue> validatable,
    TKey key,
    [CallerArgumentExpression("key")] string? keyParamName = null)
    where TValue : IDictionary<TKey, TDictValue>
    where TKey : notnull
{
    if (validatable.Value.ContainsKey(key))
    {
        // throw
    }

    return ref validatable;
}

@amantinband
Copy link
Owner Author

@truegoodwill that's probably the best solution here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants