Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to real-utf8 when no charset is provided #1070

Merged
merged 1 commit into from Nov 26, 2019

Conversation

nicolas-grekas
Copy link
Member

Configuring charset in a portable way is a real pain for the community.
See e.g. symfony/recipes#674, which triggered symfony/recipes#689, then https://github.com/symfony/recipes/pull/691/files, but the situation is still not fixed.

This is the only way to fix the issue in a way that would make sense: use real-utf8 by default for all platforms (ie utf8mb4 on Mysql).

Here is the patch doing so, targeting 2.0.1.
I'd happily submit on 1.12 if you think that's possible.

@nicolas-grekas nicolas-grekas force-pushed the utf8default branch 2 times, most recently from 0eb0541 to 0ec46e1 Compare November 25, 2019 22:29
Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be contributed directly in the DBAL, inside DriverManager::getConnection()?

@nicolas-grekas
Copy link
Member Author

Shouldn't this be contributed directly in the DBAL, inside DriverManager::getConnection()?

That is not required, and that is a project with policies that are totally decoupled from Symfony.
I agree it would be great to make this the default of DBAL also, but that's not the goal here.

Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to create an issue in DBAL to make this piece of code redundant, that way it can be done at a slower pace later.

@weaverryan
Copy link
Contributor

I would love to have this out of use-land config. For 2.0 would have been better, but 2.0.1 I think works in practice to finally remove the config need.

Copy link
Member

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea!

@alcaeus alcaeus self-assigned this Nov 26, 2019
@alcaeus alcaeus added this to the 2.0.1 milestone Nov 26, 2019
@alcaeus
Copy link
Member

alcaeus commented Nov 26, 2019

I wasn't aware of this problem, so doing this in 2.0.1 makes sense to me. Thanks @nicolas-grekas!

@alcaeus alcaeus merged commit f6c5c76 into doctrine:master Nov 26, 2019
@alcaeus alcaeus added this to 2.0 in Roadmap Nov 26, 2019
}
}

$connection = new $connection($params, $driver, $connection->getConfiguration(), $connection->getEventManager());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use DriverManager::getConnection again rather than assuming the constructor argument here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the "contract" of all connection classes: the wrapper_class parameter already requires this signature.

@greg0ire
Copy link
Member

greg0ire commented Dec 9, 2019

@ggeorgiev-sfly both things were expected when writing the PR, and approved, because we made the (wrong) assumption that no heavy lifting would take place here… it doesn't look like it's the case: https://github.com/doctrine/dbal/blob/7be002cc478ebf70dc9c261b9e8dec225e9d8c2a/lib/Doctrine/DBAL/Connection.php#L163-L195 😅 @nicolas-grekas @alcaeus maybe we should revert this?

@nicolas-grekas
Copy link
Member Author

@greg0ire it would be better to understand what happens.

@greg0ire
Copy link
Member

greg0ire commented Dec 9, 2019

My understanding is that @ggeorgiev-sfly overrode the wrapper class thanks to this option: https://github.com/doctrine/dbal/blob/7be002cc478ebf70dc9c261b9e8dec225e9d8c2a/lib/Doctrine/DBAL/DriverManager.php#L171-L178

But because of https://github.com/doctrine/DoctrineBundle/pull/1070/files#diff-5b0dbf30dd0e8ce451fd72020914e62cR60 , the constructor of the wrapper class is now called twice. While this looks fine for https://github.com/doctrine/dbal/blob/7be002cc478ebf70dc9c261b9e8dec225e9d8c2a/lib/Doctrine/DBAL/Connection.php , it may not always be fine for any user-defined wrapper class.

@nicolas-grekas
Copy link
Member Author

Creating a connection to the backend in the constructor of the wrapped class is a critical issue with that constructor. I'm not sure we should consider this as legit.

@greg0ire
Copy link
Member

greg0ire commented Dec 9, 2019

@ggeorgiev-sfly can you please give more details about what you are doing here? We suspect there might be a better place for you to do what you need than in the constructor of the connection.

@ggeorgiev-sfly
Copy link

Thanks for the quick replies, here are some details regarding my case:
We have a large MySQL db with total of 2050 databases - 1 global master, 1 global slave, 1024 shards master, each having a slave replication. As I cannot inject any custom configuration to the doctrine, cause any extra keys throw an exception - I have to generate a connection data for each single shard, resulting over 8000-9000 lines of yaml configuration just for the shard config. And that is not accounting for the replication, just the shards.
Doctrine currently doesn't have master-slave replication on sharded DBs out of the box, so I have to implement that manually.
The heavy lifting is not like I'm creating driver connections in the constructor, but more like processing and validating all that global & shard data - to be stored in the connection class so they can be switched easily when needed.
I'm creating a custom connection wrapper class, based off the PoolingShardConnection. I saw there that a lot of configuration copying is taking place - meaning all global & shard configuration is copied over and over again on each of the shards, resulting 1024 shards each having the configuration for all other shards from this line: $this->connections[$shard['id']] = array_merge($params, $shard);
I didn't like that big array to be replicated everywhere so I decided to keep only the basic connection data - only what's needed for the connection in the local array - like host, port, user, pass, db, charset.
When I did that - I got an exception that global and shards are not found - and this is when I found out that the Connection class is instantiated twice. I don't really see a practical need to instantiate a class twice. Why don't you just configure the charset before the first instantiation?
In the constructor I try to validate the all the shard and global data is present, and this means validating more than 1000 entries for valid connection data, which I really like not to do twice.
Of course I can just save what I got into the constructor and validate later on, but still calling the constructor twice just to set the charset the second time seems really unnecessary for me here.
My case may be more extreme. Here are some things I like to be different:

  1. Allow custom configuration for the dbal, so we can extend the Doctrine functionality for custom needs more easily.
  2. The $params array initially passed to the Connection wrapper is stripped from charset config - no matter where I try to put it - it never reaches the class initially. Why we're removing that? If we allow to predefine the charset - even with that addition it won't create the connection second time.
  3. I think we can avoid instantiating new Connection class just to get the parameters we instantiated it with. Seems illogical to do so. We already have all the data we instantiate the class with.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 10, 2019

Fixed in #1087

@ggeorgiev-sfly btw, if you have ideas to improve either the bundle or dbal, please send some PRs :)

@alcaeus
Copy link
Member

alcaeus commented Dec 10, 2019

@ggeorgiev-sfly can you please confirm the PR fixes your issue? Thanks!

@ggeorgiev-sfly
Copy link

ggeorgiev-sfly commented Dec 10, 2019

@alcaeus @nicolas-grekas
Yes it does fix the double instantiation of the wrapper class.
It does instantiate the original Connection class and then the wrapper one, which in itself uses the original class again, making it execute it __construct twice (the original Connection one). Still I wonder why we cannot set charset before that and avoid the explicit charset adjustment completely?
I mean (as I mentioned above) - I couldn't find a way to pass that 'charset' in the $params that go into createConnection initially. Is there a way to do that so I can skip the if completely as I have specified the charset everywhere?

@nicolas-grekas
Copy link
Member Author

The database url can contain the charset too, so we need to parse it before anything.

@rosier
Copy link
Contributor

rosier commented Dec 17, 2019

I like to offer an alternative solution #1098 by setting the default charset in the DI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
2.0 (bug fixes only)
Development

Successfully merging this pull request may close these issues.

None yet

9 participants