Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lock] MongoDbStore handle duplicate querystring keys in mongodb uri when stripping #37868

Merged
merged 1 commit into from
Aug 19, 2020

Conversation

kralos
Copy link

@kralos kralos commented Aug 18, 2020

Q A
Branch? 5.1
Bug fix? yes
New feature? no
Deprecations? no
Tickets Fix #37864
License MIT
Doc PR

Allow duplicate querystring keys when stripping collection. readPreferenceTags is currently allowed to be specified twice so re-assembling the querystring with http_build_query will also strip duplicated readPreferenceTags. Use preg_match instead.

@kralos kralos marked this pull request as ready for review August 18, 2020 01:14
@kralos kralos requested a review from jderusse as a code owner August 18, 2020 01:14
@kralos kralos changed the title #37864 handle duplicate querystring keys in mongodb uri when stripping [Lock] MongoDbStore handle duplicate querystring keys in mongodb uri when stripping Aug 18, 2020
@kralos
Copy link
Author

kralos commented Aug 18, 2020

FYI: CI is failing on Validator, Lock is passing

Copy link
Member

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @jmikola @jderusse Can you confirm that this is what you want?

$pathDb = ltrim($parsedUrl['path'] ?? '', '/') ?: null;
if (null !== $pathDb) {
$this->options['database'] = $pathDb;
}

$matches = [];
if (preg_match('/^(.*\?.*)collection=([^&#]*)&?(([^#]*).*)$/', $uri, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this regexp does not handle dsn like:

  • ?foo_collection=x&collection=y
  • #collection=x

Should we handle such situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since parse_url() is called above, is there any reason not to restrict this matching to the already-parsed query component in $parsedUrl['query']? That would avoid the need to check for ? and # in this pattern.

Anecdotally, MongoDB's connection string format does not use fragments (i.e. #) and I can't speak for how all drivers might interpret a fragment. I expect some drivers might capture it as a value for a query key (e.g. ?foo=bar#fragment might parse bar#fragment as the value) -- but I think that's all outside the scope of this PR.

As for stripping collection=<value> from the returned string, I think we can again leverage the result of parse_url() to rebuild the original string. I understand simply replacing collection=<value> anywhere in the $uri argument could pose a problem if it hypothetically appeared in a username/password component. One suggestion in my original comment was to use the i modifier for case-insensitive match since connection string options are technically case-insensitive. I realize it may be intentionally to not do so here, since collection= is just being defined for this Symfony class. My second suggestion was to interpret the value as any sequence of characters other than &, since that would indicate the next connection string option. I think that's still reasonable if we do the matching on the query field returned by parse_url().

I also tested parse_url() with two types of non-standard (outside of MongoDB) connection strings: replica sets and sharded clusters which have a comma-delimited list of host/port combinations, and the custom SRV scheme:

>>> parse_url('mongodb://user:password@mongodb0.example.com:27017,mongodb1.example.com:27017/?replicaSet=myRepl')
=> [
     "scheme" => "mongodb",
     "host" => "mongodb0.example.com:27017,mongodb1.example.com",
     "port" => 27017,
     "user" => "user",
     "pass" => "password",
     "path" => "/",
     "query" => "replicaSet=myRepl",
   ]
>>> parse_url('mongodb+srv://server.example.com/')
=> [
     "scheme" => "mongodb+srv",
     "host" => "server.example.com",
     "path" => "/",
   ]

I don't believe any data is lost here (despite the odd way host is parsed in the first example). Reassembling can use the following concatenation:

  • scheme + ://
  • If user is present: user
  • If password is present: : + password
  • If user was present: @
  • host (potentially includes multiple hosts and ports for all but the last host)
  • If port is present: : + port (port for the last host if there were multiple)
  • If path is present: path
  • If query is present: ? + query
  • If fragment is present: # + fragment

Although parse_url() does not seem to require it, MongoDB drivers do tend to require a path separator / between the hosts and query components; however, I don't think you need to care about that. If path isn't picked up by parse_url, you can leave that component out of the returned string and let the driver raise an error that it normally would had skimUri never been invoked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jderusse

I'll add that test case and support it. Although it's not a thing, if i can get the strip to be more robust it will better handle any new options introduced to mongodb uris in the future.

@jmikola

Since parse_url() is called above, is there any reason not to restrict this matching to the already-parsed query component in $parsedUrl['query']?

Because it would introduce a lot more complexity than it eliminates, reconstructing a uri from the output of parse_url is quite involved as you pointed out in your concatenation list. I want to keep this method as simple and non-invasive as possible.

MongoDB's connection string format does not use fragments

I'm aware but I thought it was easy to support fragments in case one day mongodb uris used them... just to be more robust.

One suggestion in my original comment was to use the i modifier

As you pointed out, it could appear in the user/pass component. I'd rather be really strict about collection being lower and therefore less likely to strip the wrong part of the uri. Since we've made it up I think it's our prerogative.

My second suggestion was to interpret the value as any sequence of characters other than &

Yeah I think that's good but since i'm interpreting the whole uri, i'm also looking for not #. This looks safe in accordance with RFC 3986.

MongoDB drivers do tend to require a path separator / between the hosts and query components; however, I don't think you need to care about that.

This is one of the reasons why I chose to replace in the uri instead of disassemble / reassemble the uri. The less I change the uri the better. If a developer configures a uri in their application as mongodb://localhost/ for application storage, then appends %mongodb.uri%?collection=lock for Lock I want my stripped uri to go back to their original including the / (and vice versa if / was omitted). This will allow the driver to cache the connections based on the uri string as both the lock and application connection uris will hash to the same key. https://www.php.net/manual/en/mongodb.connection-handling.php

The less I touch uri the better. The simpler the implementation the better.

@kralos kralos marked this pull request as draft August 18, 2020 23:23
@kralos kralos marked this pull request as ready for review August 18, 2020 23:50
@fabpot fabpot force-pushed the 37864-mongo-uri-repeated-parameters branch from 9e84cd6 to c1ea9ae Compare August 19, 2020 10:50
@fabpot
Copy link
Member

fabpot commented Aug 19, 2020

Thank you @kralos.

@fabpot fabpot merged commit c426abe into symfony:5.1 Aug 19, 2020
@fabpot fabpot mentioned this pull request Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants