Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

octodns code logic slow when dealing with large number of domains/zones? #1166

Open
quistian opened this issue Apr 26, 2024 · 9 comments
Open

Comments

@quistian
Copy link

I've recently created a scenario in which I have one authoritative name server, with over 50,000 zones.
(It's currently running on BlueCat Integrity 9.4). I tried the following, which is to sync one of the source zones, with another destination (I tried both the local YAML driver as well as another BlueCat server). I both cases, e.g the:

$ octodns-sync --config ./config/bc2yaml.yml 278.privatelink.postgres.database.azure.com. --doit

command took well over 5 minutes to complete.

I was wondering whether by nature of the logic and code of octodns, things slow down considerable, given N zones, where N is a very large number (relatively speaking)?

@brianeclow
Copy link
Contributor

tl;dr: I have found this as well on the 1k zones I manage. I created logic in the plan/apply to limit the number of zones applied by change requested for a given zone set.

I manage around 1k zones across multiple providers with some cross synchronization, and I only run the "full" synchronization via an automated sweep to "true up" all the zones and all providers. I split the zone synchronization upon usage and need boundaries, then only plan/apply changes when a zone within a given boundary has been altered. This more closely mirrors how BIND can either reload all the things, or if you indicate a zone, only update that zone.

@ross
Copy link
Contributor

ross commented Apr 26, 2024

What's bc2yaml.yml look like? Does it have all 50k zones listed out or is it dynamically configuring the zones, sourcing them from BlueCat?

The planning phase of things has the ability to run in parallel, but the initial configuration and apply phases are serial.

The largest setup I personally ran was 100s of zones, I think I did some testing in the distant past with 1000+, but don't remember the details and at that point there weren't major problems with whatever it was that lead me to do that testing.

At GitHub we had a single large config file, but split up our runs by target using the -target option. That was less to reduce the number of zones in play and more to completely parallelize the process of syncing data to lots of places, internally and externally.

Anyway, I can poke around with some synthetic/made up local tests with large numbers, but I don't have the ability to do so with real providers since they cost $ and at such large numbers lots of $.

@quistian
Copy link
Author

bc2yaml.yml has a dynamic configuration. I assume thinks would be sped up if the static zones were in the config file?

@ross
Copy link
Contributor

ross commented Apr 26, 2024

To be honest I don't know, especially not being familiar with bluecat. Transferring 50k zone names out of an API wouldn't be fast, but doesn't seem like it should take 5m either.

I probably won't get to sit down and try things until tomorrow. I'll know more then.

@quistian
Copy link
Author

Thanks... Let me know what you find.

@ross
Copy link
Contributor

ross commented Apr 26, 2024

Ok first up I created a generator script to set up a config for large yaml to yaml syncs:

https://gist.github.com/ross/0ff009f4b558921ec034063ff4cc1100

That creates 1000 yaml files and a dynamic config for them. ll as you can see below the run takes only 4 second to complete and that's loading all 1000 zones, 4 records each, computing the plans, and then applying them writing out 1000 output yaml files.

(env) coho:octodns ross$ rm -rf tmp/in/ tmp/out/ && ./tmp/gen-zones.py 1000 dynamic
(env) coho:octodns ross$ cat tmp/large.yaml

manager:
  max_workers: 1

providers:
  in:
    class: octodns.provider.yaml.YamlProvider
    directory: tmp/in

  out:
    class: octodns.provider.yaml.YamlProvider
    directory: tmp/out

zones:
  '*':
    sources:
      - in
    targets:
      - out
(env) coho:octodns ross$ cat tmp/in/0d5cbb8e98c546bf8ec78de83e6e0fe5.com.yaml
---
? ''
: - type: A
    value: 1.2.3.4
  - type: NS
    values:
      - ns1.0d5cbb8e98c546bf8ec78de83e6e0fe5.com.
      - ns2.0d5cbb8e98c546bf8ec78de83e6e0fe5.com.

ns1:
  type: A
  value: 2.3.4.5
ns2:
  type: A
  value: 3.4.5.6
(env) coho:octodns ross$ PYTHONPATH=. ./octodns/cmds/sync.py --config-file=tmp/large.yaml --doit
2024-04-26T14:28:05  [140704294879488] INFO  Manager __init__: config_file=tmp/large.yaml, (octoDNS 1.6.1)
2024-04-26T14:28:05  [140704294879488] INFO  Manager _config_executor: max_workers=1
2024-04-26T14:28:05  [140704294879488] INFO  Manager _config_include_meta: include_meta=False
2024-04-26T14:28:05  [140704294879488] INFO  Manager _config_enable_checksum: enable_checksum=False
2024-04-26T14:28:05  [140704294879488] INFO  Manager _config_auto_arpa: auto_arpa=False
2024-04-26T14:28:05  [140704294879488] INFO  Manager __init__: global_processors=[]
2024-04-26T14:28:05  [140704294879488] INFO  Manager __init__: global_post_processors=[]
2024-04-26T14:28:05  [140704294879488] INFO  Manager __init__: provider=in (octodns.provider.yaml 1.6.1)
2024-04-26T14:28:05  [140704294879488] INFO  Manager __init__: provider=out (octodns.provider.yaml 1.6.1)
2024-04-26T14:28:05  [140704294879488] INFO  Manager sync: eligible_zones=[], eligible_targets=[], dry_run=False, force=False, plan_output_fh=<stdout>, checksum=None
2024-04-26T14:28:05  [140704294879488] INFO  Manager sync:     sources=['in']
2024-04-26T14:28:05  [140704294879488] INFO  Manager sync:   dynamic zone=*, sources=None
2024-04-26T14:28:05  [140704294879488] INFO  Manager sync:     adding dynamic zone=010b200966e24e359de74952bef1ddeb.com.
2024-04-26T14:28:05  [140704294879488] INFO  Manager sync:     adding dynamic zone=014f379ef9524d7692ec52809e41dd91.com.
2024-04-26T14:28:05  [140704294879488] INFO  Manager sync:     adding dynamic zone=0180a5b725604e99bf61534c87e871d6.com.
...
2024-04-26T14:28:09  [140704294879488] INFO  YamlProvider[out] apply: making 4 changes to fee1a62c67c243239e664fa7eca38e97.com.
2024-04-26T14:28:10  [140704294879488] INFO  YamlProvider[out] apply: making 4 changes to ff60e2b7f98f4a248c5e26a5deee64db.com.
2024-04-26T14:28:10  [140704294879488] INFO  YamlProvider[out] apply: making 4 changes to ffa3929fe3c14cdba1aafb7fe624921f.com.
2024-04-26T14:28:10  [140704294879488] INFO  Manager sync:   4000 total changes

Extrapolating that we'd expect 4*50 seconds, so 200s which would be about 3.5m. BUT that's doing a full sync and creating ALL the zones.

A 10k run takes about 40s so that math seems to hold.

(env) coho:octodns ross$ date; PYTHONPATH=. ./octodns/cmds/sync.py --config-file=tmp/large.yaml --doit > /tmp/run.log 2>&1; date
Fri Apr 26 14:37:50 PDT 2024
Fri Apr 26 14:38:27 PDT 2024

If however i specify a single zone to sync as you did in the original issue it only takes about 5s to run:

2024-04-26T14:40:29  [140704294879488] INFO  Manager __init__: config_file=tmp/large.yaml, (octoDNS 1.6.1)
...
2024-04-26T14:40:34  [140704294879488] INFO  Manager sync:   4 total changes

Note this is with the default manager.max_workers=1 so everything is done completely serially, but with the local YamlProvider that won't actually make a difference (actually it'll slow it down a bit b/c of the context switches.) With a remote API there's actual waiting/blocking for responses so it can speed up the planning process immensely, but that only applies on a full sync, if you're specifying a single domain it shouldn't matter.

Conclusion

I'd expect a full sync of 50k zones to take 2-3m plus whatever time is added for IO to the remote provider's API. max_workers can help keep that as close to the 2-3m as possible, but it's not going to drop under that for such a large number of zones.

However you appear to only be syncing a single zone and that should be really fast. The only thing I can think of that would be slow in that case would be dynamically pulling the info for 50k zones back from BlueCat as that needs to happen first before it can find the one you specified and (quickly) proceed to process it and ignore the others.

@ross
Copy link
Contributor

ross commented Apr 26, 2024

Was going to peek at the bluecat provider, but I can't find it in search results or anything. Got a link?

@quistian
Copy link
Author

There's some not quite polished code here:

https://github.com/quistian/octodns-bluecat-v1-api

@ross
Copy link
Contributor

ross commented Apr 27, 2024

There's some not quite polished code here:

https://github.com/quistian/octodns-bluecat-v1-api

🆒

Looks like it's fetching all 50k in one go
https://github.com/quistian/octodns-bluecat-v1-api/blob/407edb3e5ccb82adb342367f8acaad1070cefed4/octodns_bluecat_v1/__init__.py#L394

Do you know how long the call is taking? There's a debug log before & after

https://github.com/quistian/octodns-bluecat-v1-api/blob/407edb3e5ccb82adb342367f8acaad1070cefed4/octodns_bluecat_v1/__init__.py#L297

https://github.com/quistian/octodns-bluecat-v1-api/blob/407edb3e5ccb82adb342367f8acaad1070cefed4/octodns_bluecat_v1/__init__.py#L302

Running things with --debug to get the timing on it.

It does look like it's probably returning a plain list of 50k domain names, newline separated so the tx should at least be fast, if the servicing of the request on the server side is.

Nothing else really stands out. I don't see anything obvious there that would cause client side delays in doing

$ octodns-sync --config ./config/bc2yaml.yml 278.privatelink.postgres.database.azure.com. --doit

That seems to be requesting a list of all zone names in a single call (which may or may not take a long time) and then as populate is run for the specified zone it seems to make a single call for all the records in that zone.

Conclusion

Nothing stands out. The only thing there I can see taking a long time would be the call to get all the zones. No clue though not having used Bluecat.

I would expect a full sync (no fqdn provided on the cmdline) to take a long time to run, but a single zone sync should be reasonable fast, probably limited by how fast the domain list comes back from Bluecat. Probably best to get a solid idea of what's up there fist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants