New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove provider checks at startup #5337
base: master
Are you sure you want to change the base?
Conversation
4d95fd8
to
64963c5
Compare
b486f8a
to
1c43770
Compare
9ad4249
to
d6c76f8
Compare
f15c7dd
to
002def6
Compare
cdfca0f
to
c4886a3
Compare
c4886a3
to
c9784a8
Compare
c828c7d
to
d3a0e68
Compare
808fd97
to
ec0306a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review (it's Friday evening and I am running out of time) but I wanted to provide some feedback for the PR. It feels like this could be simplified quite a bit by restructuring how it works and changing code so that it can rely on basic assumptions like 'the store is there and we can talk to it'
To elaborate a little more what I mean: my basic intuition is that whatever structs we use today to represent endpoints now become enums that roughly look like
enum Endpoint {
NotReady(details)
Ready(details)
// .. maybe other states to help with bookkeeping
}
and methods on Endpoint
that, for example, make a request, might become fallible, i.e. go from fn foo(&self) -> Something
to fn foo(&self) -> Result<Something, Error>
where foo
then needs to check for the NotReady
case if the endpoint looks ok, and either return an error if it doesn't or transition to the Ready
case when it does. That's a little too simplistic since it involves mutation.
But most of these operations happen on a pool of endpoints, and the manager for that pool, like ProviderManager
can mediate the mutation. Instead of calling methods directly on the endpoint, call it on the manager which can then call Endpoint.get_ready(self) -> Result<Self, Error>
and hide the complexity of readiness checking and going from not ready -> ready from the outside world. In that world, you'd call foo
on the provider manager, maybe with some hints on how to select an appropriate endpoint, and it would succeed if it could find or make an endpoint ready, and fail if none of them are ready.
chain/ethereum/src/network.rs
Outdated
manager: ProviderManager::default(), | ||
call_only_adapters: vec![], | ||
retest_percent: DEFAULT_ADAPTER_ERROR_RETEST_PERCENT, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need all these default implementations? I think the original one was already a bad idea, and trying to keep up with that just forces us to have more and more basically meaningless default impls. I would always try to make sure that when a struct is constructed, it also has meaningful functionality.
Playing with the code a little, it seems that the associated Client
type for Blockchain
needs to be default because ChainClient::new_firehose
needs a client to pass into ChainClient::new
. I think this could all be avoided if new_firehose
returned Result<Self,..>
and just barfs if the firehose_available
check fails - that avoids a misconfiguration causing us to have a dummy rpc endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually add these types of defaults when the setup is not super straightforward so that anyone using it doesn't have to dive into how the struct works to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out new was almost not used so I just removed it along with the default for EthereumNetworkAdapters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. In general, it doesn't make sense to have a default implementation if it can't hold up the same assumptions as an object constructed 'the normal way'. When you've managed to construct an object, there should be assumptions that are always true. Otherwise, when you read code that uses that object you always have to wonder if this is a real one or just a dummy. But it seems in this case that's now moot.
chain/ethereum/src/network.rs
Outdated
// pub fn remove(&mut self, provider: &str) { | ||
// self.manager.get_all_sync(&self.chain_id).unwrap_or_default() | ||
// .retain(|adapter| adapter.adapter.provider() != provider); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left over from testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was left over from another layer around endpoints that was removed 😛
chain/ethereum/src/network.rs
Outdated
// .ok_or(anyhow!("network not supported: {}", &network_name)) | ||
// .and_then(|adapters| adapters.cheapest_with(requirements)) | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code still needed?
graph/src/blockchain/mod.rs
Outdated
@@ -510,18 +512,36 @@ impl BlockchainKind { | |||
|
|||
/// A collection of blockchains, keyed by `BlockchainKind` and network. | |||
#[derive(Default, Debug, Clone)] | |||
pub struct BlockchainMap(HashMap<(BlockchainKind, String), Arc<dyn Any + Send + Sync>>); | |||
pub struct BlockchainMap(pub HashMap<(BlockchainKind, ChainId), Arc<dyn Any + Send + Sync>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make this pub
? Could we get by with adding some accessor method to BlockchainMap
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an iter accessor
graph/src/components/adapter.rs
Outdated
|
||
use crate::components::store::{BlockStore as BlockStoreTrait, ChainStore as ChainStoreTrait}; | ||
|
||
const VALIDATION_ATTEMPT_TTL_SECONDS: i64 = 60 * 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to make that a Duration
. That'll also save you the _SECONDS
postfix ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
graph/src/components/adapter.rs
Outdated
.ok_or_else(|| IdentValidatorError::UnavailableStore(chain_id.clone()))?; | ||
let store_ident = network_chain.chain_identifier(); | ||
|
||
if store_ident == &ChainIdentifier::default() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be something like store_ident.is_unset()
, and would ideally be implemented by making ChainIdentifier
an enum that has an Unset
variant; or even better make ChainStore.chain_identifier
return a Result<Option<ChainIdentifier>, Error>
. Relying on the convention that the default means 'not really there' is very brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I went with this is basically because the store does not have a way of expressing "unset", it won't be null, since the field is not nullable. This means that to return the Option we'd just hide this assumption further down in the store. The main issue with that IMO is that the store should not know anything about set/unset unless that can be expressed in the data model.
While this is not strictly true because the store sets the default (since it isn't passed one) I feel like the better place fro all these decisions is in the IdentValidator which would be responsible for all of this. I agree relying on default values is a bit brittle but in this case is a complexity trade-off to avoid a migration in a somewhat critical place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two options for addressing that:
- Change the data model to allow
null
in those columns. That then requires that all code handles the possibility that this isnull
- Insist that for new chains we can actually talk to an endpoint to set these fields; if not the chain is ignored like it is today. I think that would be a reasonable requirement, that adding a new chain requires a bit more oversight. It's also not that big a deal if the new chain gets disabled as there can not be any subgraphs for it.
My preference would be (2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into it and I think it's needs to change a bit again because we will need remove the assumption one will always be returned and then expose a way of adding the store. So I think I will do that on a following PR, I feel like this implementation is ok enough as a first step I think it could be merge as is to make the follow update easier to review and get merged ?
node/src/config.rs
Outdated
rpc_provider_manager: ProviderManager::new( | ||
Logger::root(Discard, o!()), | ||
vec![].into_iter(), | ||
Arc::new(MockIdentValidator), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be the real validator? If there's no endpoints, the validator won't have to do anything. I don't like having all these traits floating around where whether they do anything or not depends on setup and you're never quite sure if the setup did things correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only implemented this for BlockStore and then the different test/mock implementations, I could add a Noop implementation but otherwise we don't have the necessary information to create it without arguments. This struct is not supposed to be mutated manually so the fact it it generated as noop is just intended for situations where it is expected to be used as noop/empty. Not sure yet another empty implementation is really useful, what do you think?
node/src/config.rs
Outdated
impl Config { | ||
pub async fn networks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module was really only meant for handling the configuration file and any checks that can be performed on the config file without any other setup. It feels like some of the headaches this needs to address is that, because Config
has very little setup around it, the code needs to work both for when nothing has been set up and for when we actually have things like a store available.
It might be better to restructure this so that all this network setup code gets run from, say, a NetworkSetup
struct that gets a config and whatever else is needed (like the BlockStore
) and can then do its thing knowing that the store exists already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically what Networks is, I could move it to it's own module and add a from_config instead of having config.networks(), this was just for convenience, I don't feel strongly either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think putting it into its own module would be better. It also seems to me that requiring having a BlockStore
here would simplify the logic.
EthereumAdapterSelector, EthereumBlockRefetcher, EthereumRuntimeAdapterBuilder, | ||
EthereumStreamBuilder, | ||
}; | ||
use ethereum::{BlockIngestor, EthereumNetworks}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like that this tightens up main
overall - it had way too much control flow mixed with intricate but somewhat boring detail.
9671102
to
4beac8f
Compare
Thanks for the review! There are a couple of nuances on how we use the adapters, for example, we have different types for firehose and for rpc, one of these layers take care of stuff like falling over to a different adapter if one has a high error rate or handling a usage cap per adapter, these also handle some details like turning a single adapter into a pool and things like that. These 2 types (firehose and rpc) have very different interfaces so I think merging those is not something I would want to tackle on a PR like this, however I think longer term there could be some benefits of being able to go back and forth between rpc and firehose where that is possible. The notion of readiness is already encapsulated on the ProviderManager, when calling get_all, the caller already gets ready to use adapters so the rest of the code has no notion of readiness. The reason I decided to encapsulate this on a deeper layer instead of a decorator is because on firehose we turn 1 definition into a connection pool to avoid hitting some limits(can find the issue later but it's not relevant for this conversation why we do it). Hence, the idea of having a decorator approach would then cause the same check to happen 20 (default) times for each adapter instead of once per adapter. This is also similar to how we handle the connection failover on firehose and rpc for the same-ish reason. If I understand correctly, the suggestion would be the expose the ProviderManager, essentially invert the layers in relation to where they are now. Ie As a user of an adapter
instead of
I like that this unifies the types for adapters behind a single manager type, on the other hand this may have other implications like how it is implemented, right now it uses async and there are some cases where we can't use that (which is why I added some escape hatches). The fact it's a bigger interface is not great but it may open some possibilities to some behaviour we can't mock with the current implementation. My gut feeling is this is quite a large change but probably worth exploring further. Hopefully this is what you meant 😄 |
I was a little unclear in my initial message: I didn't mean that we necessarily only have one |
I agree we can have a similar approach for those and just remove them from the specific implementations. I don't think I will try to tackle that on this PR but happy to chase that separately 😛 |
4beac8f
to
9174aa7
Compare
cac3a06
to
b029424
Compare
b029424
to
4017d9c
Compare
Fixes #3937
ChainIdentifier::default()
on startProviderManager
ProviderManager
if the ident returned is not the expected value.ChainClient
and tries to get an adapter with each poll, this means it won't stop if the first poll failsNetworks
type.Future improvements: