Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 - Validating int->str #6045

Closed
1 task done
vitalik opened this issue Jun 8, 2023 · 51 comments · Fixed by #6737 or #7508
Closed
1 task done

V2 - Validating int->str #6045

vitalik opened this issue Jun 8, 2023 · 51 comments · Fixed by #6737 or #7508
Assignees

Comments

@vitalik
Copy link

vitalik commented Jun 8, 2023

Continue of the #5993

class Data(BaseModel):
    s: str
    f: str
    d: str


print(Data(s=42, f=4.2, d=Decimal('4.2')))

this gives validation error that s f d - Input should be a valid string

this was working in pydantic v1, and was actually very handy when dealing with parsing data of a public API where users send messy data from JS client sides

so would be nice to have some feature to be able to turn on str parsing (globally?)

Affected Components

Update: #6045 (comment)

@iBuitron

This comment was marked as off-topic.

@vitalik vitalik changed the title Validating int->str V2 - Validating int->str Jun 10, 2023
@lig
Copy link
Contributor

lig commented Jun 15, 2023

extra='allow' could be an alternative here for now. it won't convert data to str but at least it will allow storing anything without applying any schema.

@samuelcolvin
Copy link
Member

I really don't think it makes sense as the default in V2 to coerce numbers to strings. Coercing strings to field types is a very special and very common case, it doesn't mean we should support the reverse.

See this bit of the V2 plan for more details.

The only thing I'd be willing to accept would be a compatibility shim (perhaps set via config) that enables converting numbers to strings to be more compatible with V1.

@samuelcolvin
Copy link
Member

But not many people have asked about it, the one person I've spoken to who mentioned this basically said "the new behaviour is right, the only thing that confused me was the documentation being wrong".

We'd need more demand for this before adding the shim.

@vitalik
Copy link
Author

vitalik commented Jun 15, 2023

@samuelcolvin

yeah, well that makes some sense, but my feeling is this will be pop up once it get released to wider audience

like imagine you used pydantic to parse some yaml:

import yaml
from pydantic import BaseModel

YAML = """
details:
  name: 12345
  description: Hello world
 
"""


class Details(BaseModel):
    name: str
    description: str


raw_data = yaml.safe_load(YAML)
print(raw_data)

data = Details(**raw_data["details"])

I do not see any nice way to handle this except change ALL str field to some special string type

@dmontagu
Copy link
Contributor

@vitalik just to be clear, we'd be able to get it to behave the old way (i.e., converting ints to strs, etc.) through just an annotation (i.e., changing the type hint from str to Annotated[str, LenientStr()] or something like that). Yes, you'd need to add the annotation everywhere in your code, but it would at least not be treated as a different type by type checkers. Not sure if that helps anything, just pointing it out since you said "some special string type" — not sure if your concern was that it would be a different type, or just that it would require changes everywhere (in the latter case, this doesn't help).

That said, I do still understand the interest in a "global" flag. (But I also understand Samuel's point about wanting to see more concrete demand first.)

@vitalik
Copy link
Author

vitalik commented Jun 19, 2023

but it would at least not be treated as a different type by type checkers

well that's my issue here - as you can see in my example i'm not initialising model with arguments (so type checkers do not play here)

I'm passing raw yaml data just to parse and validate user input

Basically, I'm inspecting an impact of one of my clients where data parsing is heavily done with pydantic and migrating to v2 could simply lead to validation errors for data that was fine before

Changing all the code from :str to :Annotated[str, force] would be a really painful and annoying process, so it would be nice if you could provide some hook for easy monkey patching this behaviour (at least on a class level)

@dmontagu
Copy link
Contributor

@adriangb I’m wondering if we could override the get_pydantic_core_schema or one of the other magic methods and have it insert annotations for certain types? I guess maybe a prepare pydantic annotations for a BaseModel class? Not sure if that would work on the class level instead of the field level but I’m thinking there might be something here that makes this “configurable” even without the use of config

@adriangb
Copy link
Member

I think we can create or should be able to make changes so we can create a custom type like LaxInt = Annotated[int, ...]. The thing is that users would have to explicitly use that everywhere. I'm also not sure it would work right now, but we can (and should) re-factor internally to make it work.

We have also discussed about making a hook to do some sort of global replacement {int: LaxInt} but that has consequences for references and that sort of thing, so it probably needs a little bit more thought, even if it seems simple to implement.

@dmontagu
Copy link
Contributor

Right I was mostly curious if there might be a way to hook into schema generation for models specifically as a way to handle this. I’ll play with it a bit this afternoon and see if I can find anything that works without code changes today, or at least see how far it is from working.

@adriangb
Copy link
Member

So yeah you can do this:

from typing import Annotated

from pydantic import BeforeValidator, TypeAdapter

LaxStr = Annotated[str, BeforeValidator(lambda x: str(x))]

ta = TypeAdapter(LaxStr)

print(ta.validate_python(123))

(Sorry about any confusion above about LaxInt)

But there's no way to set this globally or recursively for a model.

You could write a function that recursively inspects a CoreSchema and wraps any {'type': 'str', ...} schema with the validator above, then you could make that part of an annotation (Annotated[T, RecursivelyMakeStrLax]), a base class (class LaxStrBaseModel(BaseModel): ... def __get_pydantic_core_schema__(...)) or something like that.

@dmontagu
Copy link
Contributor

dmontagu commented Jun 19, 2023

Yeah that’s what I was thinking. (Though I hadn’t fleshed it out as much, that’s helpful.)

I think that might be enough even if it didn’t do any nested model recursion (though it might need to recurse on core schemas for the purpose of interacting properly with other validators etc., at which point maybe it’s easier to do it fully recursive) since the hypothetical user could just use that as the base class for all their models.

@vitalik
Copy link
Author

vitalik commented Jun 19, 2023

So yeah you can do this:

Caution

Annotated[str, BeforeValidator(lambda x: str(x))]

ta.validate_python(None) # will result -> "None" (as type str)

@dmontagu
Copy link
Contributor

Probably want to use pydantic.v1.validators.str_validator or whatever the function was from v1

@mrenner-ML
Copy link

I am currently migrating v1 -> v2 and am facing issues with this. It would be great to at least have the option to set this in the model configs.

The use case I have for coercing int -> str is we often have IDs that appear as numericals (e.g. 12839384) and are infered as such when using pandas, whereas we want to treat them as string in order to perform string manipulation or join them with other non-numerical IDs. Pydantic v1 allowed me not to worry about this, whereas now I have to coerce all of my IDs separately beforehand.

May just add that I don't really see the benefit of removing this original behavior from v1, and that it will probably cause a lot of headaches when migrating, especially that this is not documented clearly.

@lig
Copy link
Contributor

lig commented Jul 6, 2023

I think this could be another case for a type map we've discussed privately. As far as I remember one of the ideas was to have a global registry that will map an annotation to another one. It could look from a user point of view as something like this:

type_map_registry: dict[type, type] = {str: Annotated[str, Field(force=True)]}

This would force all str fields in all modules to be treated as Annotated[str, Field(force=True)].

@lig
Copy link
Contributor

lig commented Jul 6, 2023

@mrenner-ML Indeed, this isn't stated explicitly in the migration guide. In general, coercing numbers and more complex types to str just hides errors and virtually disables the type checking. Overall, Pydantic V2 is more strict than Pydantic V1. However, there are more ways for customizing behavior in Pydantic V2.

In Pydantic V2 it's possible to achieve the desired behavior using Custom Data Types.

@vitalik
Copy link
Author

vitalik commented Jul 6, 2023

It could look from a user point of view as something like this:

type_map_registry: dict[type, type] = {str: Annotated[str, Field(force=True)]}

yeah, that should work for all of my cases... would also be nice to have this global and per base model class and a concrete model

@lig
Copy link
Contributor

lig commented Jul 6, 2023

@vitalik I'm not sure when such a type map could be implemented. However, PRs are always welcome:)

@vitalik
Copy link
Author

vitalik commented Jul 6, 2023

I can take a look, could you give some idea which module/class this best to add this functionality ?

@lig
Copy link
Contributor

lig commented Jul 6, 2023

I think, pydantic/main.py is a good place to start. This is where BaseModel lives. However, it might lead to pydantic/_internal/_model_construction.py where ModelMetaclass is or to pydantic/_internal/_generate_schema.py where schema is being generated and actual schema variants are being chosen.

@vitalik
Copy link
Author

vitalik commented Jul 8, 2023

with #6535 it would be possible to change annotation behaviour per model or even global

BaseModel.model_config['replace_types'] = {str: LaxStr}

@hampuskraft
Copy link

@wholmen Using LaxStr = Annotated[str, BeforeValidator(lambda x: str(x))] in place of str in models where you want to cast int to str is the current solution, although you might also have luck with LaxStrGenerator above—unless you use pattern, until #6951 is merged.

@David-OConnor
Copy link

What is the way to do this in 2.3.0 with pattern? Thank you!

@15498th
Copy link
Contributor

15498th commented Aug 23, 2023

There seems to be another instance of documentation describing old behavior: https://docs.pydantic.dev/latest/usage/types/standard_types/ (https://github.com/pydantic/pydantic/blob/main/docs/usage/types/standard_types.md?plain=1#L14):

Strings are accepted as-is. int, float, and Decimal are coerced using str(v).

15498th added a commit to 15498th/pydantic that referenced this issue Aug 23, 2023
Remove bit about coercion of numeric types to str since it is not the case for v2 (pydantic#6045)
@David-OConnor
Copy link

Solution: data_id: Union[int, str]

@fortzi
Copy link

fortzi commented Aug 24, 2023

It would be great to have this as a configuration that pydantic loads from the environment/file.

The reason is that patching BaseModel this way has the disadvantage that it must be done before importing any module that uses BaseModel, which adds to the mental load and is easy to mess up without noticing.

Another possible solution is implementing a model validator in a base class that would replace BaseModel. This also has the disadvantage of having yet another thing to remember.

@15498th
Copy link
Contributor

15498th commented Aug 26, 2023

@David-OConnor It might cause things break unexpectedly down the line if code using the model is not prepared to handle int. For example print('Data id is:' + data.data_id) raises TypeError if data_id value is integer.

@samuelcolvin samuelcolvin reopened this Sep 4, 2023
@pydantic-hooky pydantic-hooky bot added the unconfirmed Bug not yet confirmed as valid/applicable label Sep 4, 2023
@samuelcolvin
Copy link
Member

I've re-opened this. On balance I think we should add a config flag to re-instate the V1 behavior since grander changes to allow type replacement is going to take much longer.

@samuelcolvin samuelcolvin removed the unconfirmed Bug not yet confirmed as valid/applicable label Sep 4, 2023
@David-OConnor
Copy link

David-OConnor commented Sep 4, 2023

@David-OConnor It might cause things break unexpectedly down the line if code using the model is not prepared to handle int. For example print('Data id is:' + data.data_id) raises TypeError if data_id value is integer.

The case I'm referring to is parsing an int or string-of-int into int. So, downstream only sees and handles ints.

@CallumAtCarter
Copy link

CallumAtCarter commented Sep 8, 2023

Edit: Updating to 2.3.0 fixed this issue.


I'm unsure if this is the same issue, but I'll post the issue I'm having here also. The data should be str throughout. If someone believes this is a different issue (or if I'm just doing this completely wrong) please let me know.

StrippedStr = Annotated[
    str,
    BeforeValidator(lambda v: v.strip() if isinstance(v, str) else v),
]

AlphaStr = Annotated[
    StrippedStr,
    Field(pattern=r"^[a-zA-Z]+$"),
]


class Data(BaseModel):
    a: AlphaStr  # works
    b: AlphaStr | None  # does not work (see error below)
  File "site-packages\pydantic\_internal\_model_construction.py", line 173, in __new__
    complete_model_class(
  File "site-packages\pydantic\_internal\_model_construction.py", line 481, in complete_model_class
    cls.__pydantic_validator__ = SchemaValidator(simplified_core_schema, core_config)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.SchemaError: Invalid Schema:
model.schema.model-fields.fields.a.schema.default.schema.nullable.schema.function-before.pattern
  Extra inputs are not permitted [type=extra_forbidden, input_value='^[a-zA-Z]+$', input_type=str]
    For further information visit https://errors.pydantic.dev/2.1/v/extra_forbidden

@adriangb
Copy link
Member

adriangb commented Sep 8, 2023

@CallumAtCarter I can't reproduce, can you try installing pydantic 2.3.0 or from main? I suspect this got fixed in #6951.

from typing import Annotated

from pydantic import BaseModel, Field, BeforeValidator


StrippedStr = Annotated[
    str,
    BeforeValidator(lambda v: v.strip() if isinstance(v, str) else v),
]

AlphaStr = Annotated[
    StrippedStr,
    Field(pattern=r"^[a-zA-Z]+$"),
]


class Data(BaseModel):
    a: AlphaStr  # works
    b: AlphaStr | None  # does not work (see error below)


Data(a="abc", b=None)
Data(a="abc", b="  def  ")
Data(a="abc", b="!!!")
"""
b
  String should match pattern '^[a-zA-Z]+$' [type=string_pattern_mismatch, input_value='!!!', input_type=str]
    For further information visit https://errors.pydantic.dev/2.3/v/string_pattern_mismatch
"""

But yes this is unrelated to this issue so if that doesn't fix it for you please open a new discussion (and feel free to tag me).

@CallumAtCarter
Copy link

CallumAtCarter commented Sep 8, 2023

@adriangb I should have updated before posting 😔. Updating to 2.3.0 fixed it, thanks!

@berzi
Copy link

berzi commented Sep 21, 2023

@adriangb I really need this feature but I'm not familiar with Pydantic's release schedule. When can I expect this to land in a release?

@lig
Copy link
Contributor

lig commented Sep 21, 2023

As soon as the release is out 🤷🏻‍♂️ Usually, this is not long after a fix is ready.

At the moment, we are planning to make a release happen this week. There is a bunch of things coming to this release. Samuel outlined some priorities here #7324. This doesn't mean that everything from that list will make its way into the next release though.

@durgeksh
Copy link

Hi team,
I am new to Pydantic. My requirements are as below.

  1. Check data type of coulmns
  2. There are 5 columns out of which, I need to validate column 1 for first 500 rows only
  3. Validate the data of one model's column with other model's column (Primary and Foreign key relation)
    Thank you for the wonderful library.

@lig
Copy link
Contributor

lig commented Sep 28, 2023

@durgeksh Hi! It seems like your comment is unrelated to the original issue. I'd recommend asking your question in a new discussion here https://github.com/pydantic/pydantic/discussions

@durgeksh
Copy link

@lig Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment