Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rueidisprob package and standard bloom filter #493

Merged
merged 20 commits into from
Mar 16, 2024

Conversation

proost
Copy link
Contributor

@proost proost commented Mar 10, 2024

discussion: #486

standard bloom filter for rueidisprob

@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2024

Codecov Report

Attention: Patch coverage is 97.31544% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 95.51%. Comparing base (d83782c) to head (cfcfad5).
Report is 6 commits behind head on main.

Files Patch % Lines
rueidisprob/bloomfilter.go 97.22% 3 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #493    +/-   ##
========================================
  Coverage   95.50%   95.51%            
========================================
  Files          78       80     +2     
  Lines       32941    33090   +149     
========================================
+ Hits        31460    31605   +145     
- Misses       1275     1278     +3     
- Partials      206      207     +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rueian
Copy link
Collaborator

rueian commented Mar 10, 2024

Hi @proost,

There are a couple of things we can address and discuss about:

  1. Please add a test entry to the .circleci/config.yml, for example
    - run: # test ./rueidisotel/go.mod
    command: |
    cd $CIRCLE_WORKING_DIRECTORY/rueidisotel
    list=$(go list ./... | circleci tests split --split-by=timings)
    echo "Test Packages: $list"
    for n in {1..5}; do ../dockertest.sh $list && break; done
    no_output_timeout: 15m
  2. The filter key and the counter key should be located in the same Redis slot by using a tag so that these Lua scripts can work in cluster mode.
  3. The current implementation increments the counter no matter what, but that is inaccurate. Can we check the SETBIT results before incrementing the count? In other words, if the sum of the return values of SETBIT equals to the number of hash iterations, we should not increment the counter.
  4. We use SETBIT one by one for each index. Can we make it a batch for a key with BITFIELD + multiple SETs?
  5. We should not use EVAL every time. That hurts performance. Can we leverage the rueidis.NewLuaScript?

@proost
Copy link
Contributor Author

proost commented Mar 10, 2024

@rueian
Yes, you're right. I was wrong many points. i fixed it.

table.insert(bitfieldArgs, index)
end
return redis.call('BITFIELD', unpack(bitfieldArgs))
Copy link
Collaborator

@rueian rueian Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about whether would it be better if we calculate key existences in Lua and return just a bool array instead of returning every bits? I am not sure which one will be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is trade-off. if we calculate on redis, we should use more CPU, but it can reduce bandwidth per request. In this situation, I prefer use less CPU. because bit array size is small event if keys and hash iteration size is large. but if you think bandwidth is more valuable than CPU, i will change it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this for future evaluation. I may be wrong, but my guess is that the calculation on redis is cheap because it just loops through the return of BITFIELD and sums up bits for every hash iteration. If we do that, we can save the bandwidth by hash iteration times. This requires more evaluation for verification.

I also have another idea: Can we use a normal GET to fetch the whole bitmap and do the calculation on the client side? By that, we can leverage client-side caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7491a90

OK. I change it.

Still, I don't think it is good to depend on Lua script heavily. scaling out servers is easy, but scaling out redis is not easy.
I found another implementation of bloom filter using redis: https://github.com/igrigorik/bloomfilter-rb

@proost proost requested a review from rueian March 10, 2024 14:23
Comment on lines 73 to 77
if oneBits == hashIterations then
table.insert(result, 1)
else
table.insert(result, 0)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if oneBits == hashIterations then
table.insert(result, 1)
else
table.insert(result, 0)
end
table.insert(result, oneBits == hashIterations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lua false is nil in redis. It can be confused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning Nils is actually good I think. It requires fewer bytes to transmit. A nil message takes 3 bytes to transmit, but an integer 0 takes 4 bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

56f7d9d. I changed it.

@proost proost requested a review from rueian March 12, 2024 14:22
func (c *bloomFilter) indexes(keys []string) []string {
allIndexes := make([]string, 0, len(keys)*int(c.hashIterations))
for _, key := range keys {
indices := indexes([]byte(key), c.hashIterations, c.size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the indices for every key to save some allocations? Or how about letting the indexes function write to the allIndexes directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@proost proost requested a review from rueian March 14, 2024 15:11
@rueian rueian added the feature label Mar 14, 2024
@rueian
Copy link
Collaborator

rueian commented Mar 14, 2024

Hi @proost, thanks! This looks good to me. I also made a small benchmark between the latest commit and the previous commit d118842:

goos: darwin
goarch: arm64
pkg: cmd/bloom
    │ d118842c3649    │        feat-rueidisprob         │
    │     sec/op      │   sec/op     vs base            │
-10       48.08µ ± 0%   48.12µ ± 0%  ~ (p=0.183 n=20)

    │ d118842c3649    │            feat-rueidisprob            │
    │      B/op       │     B/op      vs base                  │
-10      8.749Ki ± 1%   3.621Ki ± 0%  -58.61% (p=0.000 n=20)

    │ d118842c3649     │           feat-rueidisprob           │
    │    allocs/op     │ allocs/op   vs base                  │
-10        90.00 ± 0%   74.00 ± 0%  -17.78% (p=0.000 n=20)

The CPU loading of the Redis server in these two cases was both ~97% and ~98%. The latest feat-rueidisprob had slightly heavy CPU loading on the Redis, but it had -58.61% and -17.78% reduction in B/op and allocs/op.

Benchmark Code:

package main

import (
	"context"
	"math/rand/v2"
	"strconv"
	"testing"

	"github.com/redis/rueidis"
	"github.com/redis/rueidis/rueidisprob"
)

func Benchmark(b *testing.B) {
	c, err := rueidis.NewClient(rueidis.ClientOption{InitAddress: []string{"127.0.0.1:6379"}})
	if err != nil {
		panic(err)
	}
	defer c.Close()

	bf, err := rueidisprob.NewBloomFilter(c, "bf", 1000000, 0.01)
	if err != nil {
		panic(err)
	}

	if err := bf.Add(context.Background(), "1"); err != nil {
		panic(err)
	}

	keys := make([]string, 10)
	for i := range keys {
		keys[i] = strconv.Itoa(rand.IntN(10000))
	}

	b.ReportAllocs()
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			if _, err := bf.ExistsMulti(context.Background(), keys); err != nil {
				panic(err)
			}
		}
	})
	b.StopTimer()
}

@rueian
Copy link
Collaborator

rueian commented Mar 14, 2024

But the tests failed for unknown reasons. Maybe some tests use too much memory causing redis crashes on the circleci.

@proost
Copy link
Contributor Author

proost commented Mar 15, 2024

@rueian
a2df9bb
I want to add this package to main README file. is it ok?

@rueian
Copy link
Collaborator

rueian commented Mar 15, 2024

@rueian a2df9bb I want to add this package to main README file. is it ok?

Sure. Please go ahead.

Copy link
Collaborator

@rueian rueian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that test failures may be caused by not yet initialized redis cluster. Adding a sleep 5 after docker-compose up -d in the dockertest.sh could be helpful.

cd $CIRCLE_WORKING_DIRECTORY/rueidisprob
list=$(go list ./... | circleci tests split --split-by=timings)
echo "Test Packages: $list"
for n in {1..5}; do ../dockertest.sh $list && break; done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that test failures may be caused by not yet initialized redis cluster. Adding a sleep 5 after docker-compose up -d in the dockertest.sh could be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfcfad5

Yes. some time is needed to cluster configuration. deterministic approach is required, later we can do it.

@proost proost requested a review from rueian March 16, 2024 08:00
Copy link
Collaborator

@rueian rueian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@proost LGTM. Thank you for this great work!

@rueian rueian merged commit 236216b into redis:main Mar 16, 2024
1 check passed
@proost
Copy link
Contributor Author

proost commented Mar 17, 2024

@rueian
Thank you for accepting proposal! I will send Counting Bloom Filter Design to Discussion soon.

@proost proost deleted the feat-rueidisprob branch March 17, 2024 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants