feat: rueidisprob package and standard bloom filter #493

proost · 2024-03-10T06:16:17Z

discussion: #486

standard bloom filter for rueidisprob

codecov-commenter · 2024-03-10T06:32:38Z

Codecov Report

Attention: Patch coverage is 97.31544% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 95.51%. Comparing base (d83782c) to head (cfcfad5).
Report is 6 commits behind head on main.

Files	Patch %	Lines
rueidisprob/bloomfilter.go	97.22%	3 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff            @@
##             main     #493    +/-   ##
========================================
  Coverage   95.50%   95.51%            
========================================
  Files          78       80     +2     
  Lines       32941    33090   +149     
========================================
+ Hits        31460    31605   +145     
- Misses       1275     1278     +3     
- Partials      206      207     +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rueian · 2024-03-10T07:28:38Z

Hi @proost,

There are a couple of things we can address and discuss about:

Please add a test entry to the .circleci/config.yml, for example

rueidis/.circleci/config.yml

Lines 61 to 67 in b08dc62

    
                 - run: # test ./rueidisotel/go.mod 
        
                     command: | 
        
                       cd $CIRCLE_WORKING_DIRECTORY/rueidisotel 
        
                       list=$(go list ./... | circleci tests split --split-by=timings) 
        
                       echo "Test Packages: $list" 
        
                       for n in {1..5}; do ../dockertest.sh $list && break; done 
        
                     no_output_timeout: 15m

The filter key and the counter key should be located in the same Redis slot by using a tag so that these Lua scripts can work in cluster mode.
The current implementation increments the counter no matter what, but that is inaccurate. Can we check the SETBIT results before incrementing the count? In other words, if the sum of the return values of SETBIT equals to the number of hash iterations, we should not increment the counter.
We use SETBIT one by one for each index. Can we make it a batch for a key with BITFIELD + multiple SETs?
We should not use EVAL every time. That hurts performance. Can we leverage the rueidis.NewLuaScript?

proost · 2024-03-10T11:25:22Z

@rueian
Yes, you're right. I was wrong many points. i fixed it.

rueidisprob/bloomfilter.go

rueian · 2024-03-10T12:42:45Z

rueidisprob/bloomfilter.go

+	table.insert(bitfieldArgs, index)
+end
+
+return redis.call('BITFIELD', unpack(bitfieldArgs))


I am curious about whether would it be better if we calculate key existences in Lua and return just a bool array instead of returning every bits? I am not sure which one will be better.

there is trade-off. if we calculate on redis, we should use more CPU, but it can reduce bandwidth per request. In this situation, I prefer use less CPU. because bit array size is small event if keys and hash iteration size is large. but if you think bandwidth is more valuable than CPU, i will change it.

We can leave this for future evaluation. I may be wrong, but my guess is that the calculation on redis is cheap because it just loops through the return of BITFIELD and sums up bits for every hash iteration. If we do that, we can save the bandwidth by hash iteration times. This requires more evaluation for verification.

I also have another idea: Can we use a normal GET to fetch the whole bitmap and do the calculation on the client side? By that, we can leverage client-side caching.

7491a90

OK. I change it.

Still, I don't think it is good to depend on Lua script heavily. scaling out servers is easy, but scaling out redis is not easy.
I found another implementation of bloom filter using redis: https://github.com/igrigorik/bloomfilter-rb

rueidisprob/bloomfilter.go

rueian · 2024-03-11T15:54:04Z

rueidisprob/bloomfilter.go

+		if oneBits == hashIterations then
+			table.insert(result, 1)
+		else
+			table.insert(result, 0)
+		end


Suggested change

if oneBits == hashIterations then

table.insert(result, 1)

else

table.insert(result, 0)

end

table.insert(result, oneBits == hashIterations)

lua false is nil in redis. It can be confused.

Returning Nils is actually good I think. It requires fewer bytes to transmit. A nil message takes 3 bytes to transmit, but an integer 0 takes 4 bytes.

56f7d9d. I changed it.

rueidisprob/bloomfilter.go

rueian · 2024-03-12T16:24:18Z

rueidisprob/bloomfilter.go

+func (c *bloomFilter) indexes(keys []string) []string {
+	allIndexes := make([]string, 0, len(keys)*int(c.hashIterations))
+	for _, key := range keys {
+		indices := indexes([]byte(key), c.hashIterations, c.size)


Can we reuse the indices for every key to save some allocations? Or how about letting the indexes function write to the allIndexes directly?

rueidisprob/go.mod

rueian · 2024-03-14T16:00:32Z

Hi @proost, thanks! This looks good to me. I also made a small benchmark between the latest commit and the previous commit d118842:

goos: darwin
goarch: arm64
pkg: cmd/bloom
    │ d118842c3649    │        feat-rueidisprob         │
    │     sec/op      │   sec/op     vs base            │
-10       48.08µ ± 0%   48.12µ ± 0%  ~ (p=0.183 n=20)

    │ d118842c3649    │            feat-rueidisprob            │
    │      B/op       │     B/op      vs base                  │
-10      8.749Ki ± 1%   3.621Ki ± 0%  -58.61% (p=0.000 n=20)

    │ d118842c3649     │           feat-rueidisprob           │
    │    allocs/op     │ allocs/op   vs base                  │
-10        90.00 ± 0%   74.00 ± 0%  -17.78% (p=0.000 n=20)

The CPU loading of the Redis server in these two cases was both ~97% and ~98%. The latest feat-rueidisprob had slightly heavy CPU loading on the Redis, but it had -58.61% and -17.78% reduction in B/op and allocs/op.

Benchmark Code:

package main

import (
	"context"
	"math/rand/v2"
	"strconv"
	"testing"

	"github.com/redis/rueidis"
	"github.com/redis/rueidis/rueidisprob"
)

func Benchmark(b *testing.B) {
	c, err := rueidis.NewClient(rueidis.ClientOption{InitAddress: []string{"127.0.0.1:6379"}})
	if err != nil {
		panic(err)
	}
	defer c.Close()

	bf, err := rueidisprob.NewBloomFilter(c, "bf", 1000000, 0.01)
	if err != nil {
		panic(err)
	}

	if err := bf.Add(context.Background(), "1"); err != nil {
		panic(err)
	}

	keys := make([]string, 10)
	for i := range keys {
		keys[i] = strconv.Itoa(rand.IntN(10000))
	}

	b.ReportAllocs()
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			if _, err := bf.ExistsMulti(context.Background(), keys); err != nil {
				panic(err)
			}
		}
	})
	b.StopTimer()
}

rueian · 2024-03-14T16:25:27Z

But the tests failed for unknown reasons. Maybe some tests use too much memory causing redis crashes on the circleci.

proost · 2024-03-15T15:14:00Z

@rueian
a2df9bb
I want to add this package to main README file. is it ok?

rueian · 2024-03-15T16:35:49Z

@rueian a2df9bb I want to add this package to main README file. is it ok?

Sure. Please go ahead.

rueian

I found that test failures may be caused by not yet initialized redis cluster. Adding a sleep 5 after docker-compose up -d in the dockertest.sh could be helpful.

rueian · 2024-03-15T16:59:14Z

.circleci/config.yml

+            cd $CIRCLE_WORKING_DIRECTORY/rueidisprob
+            list=$(go list ./... | circleci tests split --split-by=timings)
+            echo "Test Packages: $list"
+            for n in {1..5}; do ../dockertest.sh $list && break; done


I found that test failures may be caused by not yet initialized redis cluster. Adding a sleep 5 after docker-compose up -d in the dockertest.sh could be helpful.

cfcfad5

Yes. some time is needed to cluster configuration. deterministic approach is required, later we can do it.

rueian

@proost LGTM. Thank you for this great work!

proost · 2024-03-17T09:14:16Z

@rueian
Thank you for accepting proposal! I will send Counting Bloom Filter Design to Discussion soon.

feat: rueidisprob

0b3f688

proost added 3 commits March 10, 2024 16:58

ci: update rueidisprob

8ccce64

fix: apply hash tag

d825f12

fix: wrong operations

1d6e25c

rueian reviewed Mar 10, 2024

View reviewed changes

rueidisprob/bloomfilter.go Outdated Show resolved Hide resolved

rueian reviewed Mar 10, 2024

View reviewed changes

rueidisprob/bloomfilter.go Show resolved Hide resolved

rueian reviewed Mar 10, 2024

View reviewed changes

rueidisprob/bloomfilter.go Outdated Show resolved Hide resolved

rueian reviewed Mar 10, 2024

View reviewed changes

proost added 2 commits March 10, 2024 23:10

perf: shorter Lua script

299d097

refactor: error when rate is zero

e7cd23c

proost requested a review from rueian March 10, 2024 14:23

proost added 2 commits March 11, 2024 00:15

test: fix wrong test

d118842

refactor: use lua script to calculate exists count

7491a90

rueian reviewed Mar 11, 2024

View reviewed changes

proost added 2 commits March 12, 2024 22:53

perf: use allocation once

8941769

perf: use ToArray

d037dad

proost requested a review from rueian March 12, 2024 14:22

rueian reviewed Mar 12, 2024

View reviewed changes

rueian reviewed Mar 13, 2024

View reviewed changes

rueidisprob/go.mod Outdated Show resolved Hide resolved

rueidisprob/go.mod Outdated Show resolved Hide resolved

proost added 3 commits March 15, 2024 00:04

perf: use bool

56f7d9d

perf: less allocations

56fc836

chore: change to go version lower

4ed476f

proost requested a review from rueian March 14, 2024 15:11

rueian added the feature label Mar 14, 2024

test: unify setup & cleanup

8330e05

doc: add redisprob to main README

a2df9bb

proost added 3 commits March 16, 2024 00:32

test: add ping during setup

4b4214e

test: make higher rate

bccfee7

test: add close

4d2563d

test: bigger than 32

c78bd00

rueian reviewed Mar 15, 2024

View reviewed changes

ci: add sleep to wait for redis cluster initialization

cfcfad5

proost requested a review from rueian March 16, 2024 08:00

rueian approved these changes Mar 16, 2024

View reviewed changes

rueian merged commit 236216b into redis:main Mar 16, 2024
1 check passed

proost deleted the feat-rueidisprob branch March 17, 2024 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rueidisprob package and standard bloom filter #493

feat: rueidisprob package and standard bloom filter #493

proost commented Mar 10, 2024

codecov-commenter commented Mar 10, 2024 •

edited

Loading

rueian commented Mar 10, 2024

proost commented Mar 10, 2024 •

edited

Loading

rueian Mar 10, 2024 •

edited

Loading

proost Mar 10, 2024

rueian Mar 10, 2024

proost Mar 11, 2024

rueian Mar 11, 2024

proost Mar 12, 2024

rueian Mar 12, 2024

proost Mar 14, 2024

rueian Mar 12, 2024

proost Mar 14, 2024

rueian commented Mar 14, 2024

rueian commented Mar 14, 2024

proost commented Mar 15, 2024

rueian commented Mar 15, 2024

rueian left a comment •

edited

Loading

rueian Mar 15, 2024

proost Mar 16, 2024

rueian left a comment

proost commented Mar 17, 2024

feat: rueidisprob package and standard bloom filter #493

feat: rueidisprob package and standard bloom filter #493

Conversation

proost commented Mar 10, 2024

codecov-commenter commented Mar 10, 2024 • edited Loading

Codecov Report

rueian commented Mar 10, 2024

proost commented Mar 10, 2024 • edited Loading

rueian Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rueian commented Mar 14, 2024

rueian commented Mar 14, 2024

proost commented Mar 15, 2024

rueian commented Mar 15, 2024

rueian left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rueian left a comment

Choose a reason for hiding this comment

proost commented Mar 17, 2024

codecov-commenter commented Mar 10, 2024 •

edited

Loading

proost commented Mar 10, 2024 •

edited

Loading

rueian Mar 10, 2024 •

edited

Loading

rueian left a comment •

edited

Loading