Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance populating tree #48

Open
phemmer opened this issue Feb 27, 2023 · 11 comments
Open

Slow performance populating tree #48

phemmer opened this issue Feb 27, 2023 · 11 comments

Comments

@phemmer
Copy link

phemmer commented Feb 27, 2023

I was looking at various modules that can provide a fast lookup for whether an IP is in a list, and while testing this module, it turned out to have significantly slower load time compared to the other alternatives I looked at. The lookup is indeed faster, but not enough to justify the increased load time, which for me is prohibitively slow (20x slower than the fastest alternative, and takes over a minute to load a list of about 5 million IPs). I just wanted to create this issue so you're aware, and in case there's something which can be done to increase performance.

Benchmark code
package main

import (
	"encoding/binary"
	"math/rand"
	"net"
	"strconv"
	"testing"

	"github.com/asergeyev/nradix"
	"github.com/infobloxopen/go-trees/iptree"
	"github.com/yl2chen/cidranger"
)


var rng = rand.New(rand.NewSource(0))

func randIP(bits int) net.IP {
	var ipa [4]byte
	ip := net.IP(ipa[:])
	binary.BigEndian.PutUint32(ip[:], uint32(rng.Intn(1<<bits)<<(32-bits)))
	return ip
}

var IPs []string
func init() {
	for len(IPs) < 100000 {
		ip := randIP(24)
		mask := strconv.Itoa(rand.Intn(25) + 8)
		IPs = append(IPs, ip.String() + "/" + mask)
	}
}

func BenchmarkRangerLoad(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ranger := cidranger.NewPCTrieRanger()
		for _, ipStr := range IPs {
			_, ipnet, _ := net.ParseCIDR(ipStr)
			ranger.Insert(cidranger.NewBasicRangerEntry(*ipnet))
		}
	}
}

func BenchmarkInfobloxLoad(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ipt := iptree.NewTree()
		for _, ipStr := range IPs {
			_, ipnet, _ := net.ParseCIDR(ipStr)
			ipt.InplaceInsertNet(ipnet, nil)
		}
	}
}

func BenchmarkNRadixLoad(b *testing.B) {
	for n := 0; n < b.N; n++ {
		tree := nradix.NewTree(0)
		for _, ipStr := range IPs {
			tree.AddCIDR(ipStr, nil)
		}
	}
}

func BenchmarkRangerRead(b *testing.B) {
	b.StopTimer()
	ranger := cidranger.NewPCTrieRanger()
	for _, ipStr := range IPs {
		_, ipnet, _ := net.ParseCIDR(ipStr)
		ranger.Insert(cidranger.NewBasicRangerEntry(*ipnet))
	}
	b.StartTimer()

	for n := 0; n < b.N; n++ {
		ranger.Contains(randIP(32))
	}
}

func BenchmarkInfobloxRead(b *testing.B) {
	b.StopTimer()
	ipt := iptree.NewTree()
	for _, ipStr := range IPs {
		_, ipnet, _ := net.ParseCIDR(ipStr)
		ipt.InplaceInsertNet(ipnet, nil)
	}
	b.StartTimer()

	for n := 0; n < b.N; n++ {
		ipt.GetByIP(randIP(32))
	}
}

func BenchmarkNRadixRead(b *testing.B) {
	b.StopTimer()
	tree := nradix.NewTree(0)
	for _, ipStr := range IPs {
		tree.AddCIDR(ipStr, nil)
	}
	b.StartTimer()

	for n := 0; n < b.N; n++ {
		ip := randIP(32)
		tree.FindCIDR(ip.String() + "/32")
	}
}
BenchmarkRangerLoad-16      	       2	 797361462 ns/op
BenchmarkInfobloxLoad-16    	      14	  88521051 ns/op
BenchmarkNRadixLoad-16      	      27	  40973037 ns/op
BenchmarkRangerRead-16      	 8325171	       156.6 ns/op
BenchmarkInfobloxRead-16    	 2999112	       450.3 ns/op
BenchmarkNRadixRead-16      	 3116860	       362.2 ns/op

 

Edit: Update: In the end I ended up with a private fork that strips out a ton of code from cidranger to get the performance up. Mostly I removed support for all the different address types and used only netip types, as well as adding a loader which uses the last insert point as the starting point to search for a location for the new node. This got load times over 20x faster, and lookups 1.5x faster.
https://gist.github.com/phemmer/6231b12d5207ea93a1690ddc44a2c811

@AmitThakkar
Copy link

@phemmer What are the other alternatives you found?

@phemmer
Copy link
Author

phemmer commented Mar 7, 2024

They're listed in the benchmark code on the issue description.

@ldkingvivi
Copy link
Contributor

ldkingvivi commented Mar 23, 2024

@phemmer Find func may not seem do what you want, it doesn't return most specific result. Try two entry in trie like 8.8.8.0/24 and 8.8.8.0/25 each with some value, and Find "8.8.8.8", it will show up /24 value instead of /25 value.

I think the original fn just Contain, which return bool and doesn't matter if that's more specific or not, and your Find fn try to return most specific entry(smallest prefix), which require follow the children path

maybe change the find fn to

func (p *Trie) find(number netip.Addr) any {
	if !netContains(p.network, number) {
		return nil
	}

	if p.network.Bits() == 128 {
		return nil
	}
	bit := p.discriminatorBitFromIP(number)
	child := p.children[bit]
	if child != nil {
           if r := child.find(number); r != nil{
		return r
           }
        }
    
        if p.value != nil {
		return p.value
	}

	return nil
}

ldkingvivi added a commit to ldkingvivi/cidranger that referenced this issue Mar 23, 2024
@gaby
Copy link

gaby commented Mar 24, 2024

@phemmer I'm guessing your gist is under MIT License, since it's based on cidranger. I can develop/publish a go module with this code, tests, benchmarks, docs and give you credit in the README. Let me know if that is fine with you.

@gaby
Copy link

gaby commented Mar 25, 2024

@ldkingvivi Seems like you integrated those changes already and added even more functionality. The Contains() bool function is very useful!

@phemmer
Copy link
Author

phemmer commented Mar 25, 2024

Yes, you may do what you want with it.

Also as a side note, I'd encourage discussion on the gist itself, as it's linked to from places other than here.

I also updated it to address the issue regarding find(). Solution was basically the same as what @ldkingvivi proposed.
I also included another fix when matching against v4 /32 or v6 /128 addresses.

@phemmer
Copy link
Author

phemmer commented Mar 26, 2024

Since there seems to be moderate interest in my implementation, I've published it as a full package here: https://github.com/phemmer/go-iptrie/

@gaby
Copy link

gaby commented Mar 26, 2024

@phemmer Awesome, from my local benchmarks it seems the changes made by @ldkingvivi make it even faster.

See here: https://github.com/ldkingvivi/cidranger/blob/master/iptire/trie.go

@ldkingvivi
Copy link
Contributor

ldkingvivi commented Mar 26, 2024

@phemmer my change was mainly adding Entry with generic, so containing and cover networking will all return something instead of pure prefix, and of course some tests and benchmark using existing way how cidrange tested/benchmakred. Do you want me create a PR to your repo? I will likely have time during weekend.

ldkingvivi@d5fcd23

@phemmer
Copy link
Author

phemmer commented Mar 26, 2024

@phemmer my change was mainly adding Entry with generic

I had considered using generics, but stuck with any types as the usage is simpler (in some scenarios). I think using generics is a valid design that some might prefer, but I think it should likely be a new package entirely. It's possible one might be able to implement a design using any types on top of a generics implementation. But I didn't feel the effort was justified. If you do have a solid design that still allows both to be used, then it might be worth considering.

containing and cover networking will all return something instead of pure prefix

The Contains method stops searching on the first (largest) network it finds. Thus it's not going to be have access to the data from the most specific network (largest prefix) entry. If you do want the data from the largest network that matches, you can use the FindLargest method instead. I don't want to change Contains to be anything other than a boolean.

Having CoveredNetworks return the data, or something which allows obtaining both the data and the network might be valid. However I feel like a good design that allows retrieving the data along with the network itself would be rather complicated. Though I could be wrong, as I may not have considered all possibilities. I'm instead considering ripping out CoveredNetworks entirely. It's there because it was in cidranger. However I don't know of any valid use cases for it (if you have one, I'd love to hear it). So rather than expand its capability, I'm more inclined to remove it until someone actually needs it. I would rather provide minimal functionality, and expand it in the future, than have to maintain useless functionality in perpetuity.

and of course some tests and benchmark using existing way how cidrange tested/benchmakred

I omitted tests from my gist because gists are meant to be concise, not because they didn't exist. They exist, and were published in the package.

As far as benchmarks, the package also contains a benchmarking suite. If you don't feel the benchmarks are meaningful, I'd be willing to consider additional scenarios. I do know that the random networks aren't likely representative of real-world scenarios, as it creates too many networks which are too large. I've considered solving that using a curve to make smaller prefixes more rare. But I don't think it'd change the overall results significantly, so I haven't bothered.

Do you want me create a PR to your repo? I will likely have time during weekend.

If you like, though I'd recommend proposing specific changes for discussion first. The proposals should be in that repo as well, as this one should be for discussions about cidranger only, and I feel we've hijacked it enough.

@ldkingvivi
Copy link
Contributor

ldkingvivi commented Mar 26, 2024

I didn't referring to Contains, but ContainingNetworks. Also for the comments about tests and benchmark, I just simply mentioned the fact what I added, nothing criticize your gist. At this point, I will just continue use my branch directly, since I do have use case to fetch data other than just prefix, as well as generic for my use case.

One thing I agree, we probably hijacked this place enough, I will stop doing that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants