-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rueidisprob package and standard bloom filter #493
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #493 +/- ##
========================================
Coverage 95.50% 95.51%
========================================
Files 78 80 +2
Lines 32941 33090 +149
========================================
+ Hits 31460 31605 +145
- Misses 1275 1278 +3
- Partials 206 207 +1 ☔ View full report in Codecov by Sentry. |
Hi @proost, There are a couple of things we can address and discuss about:
|
@rueian |
rueidisprob/bloomfilter.go
Outdated
table.insert(bitfieldArgs, index) | ||
end | ||
return redis.call('BITFIELD', unpack(bitfieldArgs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about whether would it be better if we calculate key existences in Lua and return just a bool array instead of returning every bits? I am not sure which one will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is trade-off. if we calculate on redis, we should use more CPU, but it can reduce bandwidth per request. In this situation, I prefer use less CPU. because bit array size is small event if keys and hash iteration size is large. but if you think bandwidth is more valuable than CPU, i will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this for future evaluation. I may be wrong, but my guess is that the calculation on redis is cheap because it just loops through the return of BITFIELD and sums up bits for every hash iteration. If we do that, we can save the bandwidth by hash iteration times. This requires more evaluation for verification.
I also have another idea: Can we use a normal GET to fetch the whole bitmap and do the calculation on the client side? By that, we can leverage client-side caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I change it.
Still, I don't think it is good to depend on Lua script heavily. scaling out servers is easy, but scaling out redis is not easy.
I found another implementation of bloom filter using redis: https://github.com/igrigorik/bloomfilter-rb
rueidisprob/bloomfilter.go
Outdated
if oneBits == hashIterations then | ||
table.insert(result, 1) | ||
else | ||
table.insert(result, 0) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if oneBits == hashIterations then | |
table.insert(result, 1) | |
else | |
table.insert(result, 0) | |
end | |
table.insert(result, oneBits == hashIterations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lua false
is nil in redis. It can be confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning Nils is actually good I think. It requires fewer bytes to transmit. A nil message takes 3 bytes to transmit, but an integer 0 takes 4 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
56f7d9d. I changed it.
rueidisprob/bloomfilter.go
Outdated
func (c *bloomFilter) indexes(keys []string) []string { | ||
allIndexes := make([]string, 0, len(keys)*int(c.hashIterations)) | ||
for _, key := range keys { | ||
indices := indexes([]byte(key), c.hashIterations, c.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the indices
for every key to save some allocations? Or how about letting the indexes
function write to the allIndexes
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @proost, thanks! This looks good to me. I also made a small benchmark between the latest commit and the previous commit d118842:
The CPU loading of the Redis server in these two cases was both ~97% and ~98%. The latest feat-rueidisprob had slightly heavy CPU loading on the Redis, but it had -58.61% and -17.78% reduction in Benchmark Code: package main
import (
"context"
"math/rand/v2"
"strconv"
"testing"
"github.com/redis/rueidis"
"github.com/redis/rueidis/rueidisprob"
)
func Benchmark(b *testing.B) {
c, err := rueidis.NewClient(rueidis.ClientOption{InitAddress: []string{"127.0.0.1:6379"}})
if err != nil {
panic(err)
}
defer c.Close()
bf, err := rueidisprob.NewBloomFilter(c, "bf", 1000000, 0.01)
if err != nil {
panic(err)
}
if err := bf.Add(context.Background(), "1"); err != nil {
panic(err)
}
keys := make([]string, 10)
for i := range keys {
keys[i] = strconv.Itoa(rand.IntN(10000))
}
b.ReportAllocs()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
if _, err := bf.ExistsMulti(context.Background(), keys); err != nil {
panic(err)
}
}
})
b.StopTimer()
} |
But the tests failed for unknown reasons. Maybe some tests use too much memory causing redis crashes on the circleci. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that test failures may be caused by not yet initialized redis cluster. Adding a sleep 5 after docker-compose up -d in the dockertest.sh could be helpful.
cd $CIRCLE_WORKING_DIRECTORY/rueidisprob | ||
list=$(go list ./... | circleci tests split --split-by=timings) | ||
echo "Test Packages: $list" | ||
for n in {1..5}; do ../dockertest.sh $list && break; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that test failures may be caused by not yet initialized redis cluster. Adding a sleep 5
after docker-compose up -d
in the dockertest.sh
could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. some time is needed to cluster configuration. deterministic approach is required, later we can do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@proost LGTM. Thank you for this great work!
@rueian |
discussion: #486
standard bloom filter for
rueidisprob