V2 - Validating int->str #6045

vitalik · 2023-06-08T08:44:48Z

Continue of the #5993

class Data(BaseModel):
    s: str
    f: str
    d: str


print(Data(s=42, f=4.2, d=Decimal('4.2')))

this gives validation error that s f d - Input should be a valid string

this was working in pydantic v1, and was actually very handy when dealing with parsing data of a public API where users send messy data from JS client sides

so would be nice to have some feature to be able to turn on str parsing (globally?)

Affected Components

Data validation/parsing

Update: #6045 (comment)

The text was updated successfully, but these errors were encountered:

lig · 2023-06-15T12:33:05Z

extra='allow' could be an alternative here for now. it won't convert data to str but at least it will allow storing anything without applying any schema.

samuelcolvin · 2023-06-15T15:31:01Z

I really don't think it makes sense as the default in V2 to coerce numbers to strings. Coercing strings to field types is a very special and very common case, it doesn't mean we should support the reverse.

See this bit of the V2 plan for more details.

The only thing I'd be willing to accept would be a compatibility shim (perhaps set via config) that enables converting numbers to strings to be more compatible with V1.

samuelcolvin · 2023-06-15T15:32:20Z

But not many people have asked about it, the one person I've spoken to who mentioned this basically said "the new behaviour is right, the only thing that confused me was the documentation being wrong".

We'd need more demand for this before adding the shim.

vitalik · 2023-06-15T16:04:21Z

@samuelcolvin

yeah, well that makes some sense, but my feeling is this will be pop up once it get released to wider audience

like imagine you used pydantic to parse some yaml:

import yaml
from pydantic import BaseModel

YAML = """
details:
  name: 12345
  description: Hello world
 
"""


class Details(BaseModel):
    name: str
    description: str


raw_data = yaml.safe_load(YAML)
print(raw_data)

data = Details(**raw_data["details"])

I do not see any nice way to handle this except change ALL str field to some special string type

dmontagu · 2023-06-19T16:23:18Z

@vitalik just to be clear, we'd be able to get it to behave the old way (i.e., converting ints to strs, etc.) through just an annotation (i.e., changing the type hint from str to Annotated[str, LenientStr()] or something like that). Yes, you'd need to add the annotation everywhere in your code, but it would at least not be treated as a different type by type checkers. Not sure if that helps anything, just pointing it out since you said "some special string type" — not sure if your concern was that it would be a different type, or just that it would require changes everywhere (in the latter case, this doesn't help).

That said, I do still understand the interest in a "global" flag. (But I also understand Samuel's point about wanting to see more concrete demand first.)

vitalik · 2023-06-19T19:42:34Z

but it would at least not be treated as a different type by type checkers

well that's my issue here - as you can see in my example i'm not initialising model with arguments (so type checkers do not play here)

I'm passing raw yaml data just to parse and validate user input

Basically, I'm inspecting an impact of one of my clients where data parsing is heavily done with pydantic and migrating to v2 could simply lead to validation errors for data that was fine before

Changing all the code from :str to :Annotated[str, force] would be a really painful and annoying process, so it would be nice if you could provide some hook for easy monkey patching this behaviour (at least on a class level)

dmontagu · 2023-06-19T19:45:57Z

@adriangb I’m wondering if we could override the get_pydantic_core_schema or one of the other magic methods and have it insert annotations for certain types? I guess maybe a prepare pydantic annotations for a BaseModel class? Not sure if that would work on the class level instead of the field level but I’m thinking there might be something here that makes this “configurable” even without the use of config

adriangb · 2023-06-19T19:52:58Z

I think we can create or should be able to make changes so we can create a custom type like LaxInt = Annotated[int, ...]. The thing is that users would have to explicitly use that everywhere. I'm also not sure it would work right now, but we can (and should) re-factor internally to make it work.

We have also discussed about making a hook to do some sort of global replacement {int: LaxInt} but that has consequences for references and that sort of thing, so it probably needs a little bit more thought, even if it seems simple to implement.

dmontagu · 2023-06-19T19:55:04Z

Right I was mostly curious if there might be a way to hook into schema generation for models specifically as a way to handle this. I’ll play with it a bit this afternoon and see if I can find anything that works without code changes today, or at least see how far it is from working.

adriangb · 2023-06-19T20:51:51Z

So yeah you can do this:

from typing import Annotated

from pydantic import BeforeValidator, TypeAdapter

LaxStr = Annotated[str, BeforeValidator(lambda x: str(x))]

ta = TypeAdapter(LaxStr)

print(ta.validate_python(123))

(Sorry about any confusion above about LaxInt)

But there's no way to set this globally or recursively for a model.

You could write a function that recursively inspects a CoreSchema and wraps any {'type': 'str', ...} schema with the validator above, then you could make that part of an annotation (Annotated[T, RecursivelyMakeStrLax]), a base class (class LaxStrBaseModel(BaseModel): ... def __get_pydantic_core_schema__(...)) or something like that.

dmontagu · 2023-06-19T20:56:42Z

Yeah that’s what I was thinking. (Though I hadn’t fleshed it out as much, that’s helpful.)

I think that might be enough even if it didn’t do any nested model recursion (though it might need to recurse on core schemas for the purpose of interacting properly with other validators etc., at which point maybe it’s easier to do it fully recursive) since the hypothetical user could just use that as the base class for all their models.

vitalik · 2023-06-19T21:24:03Z

So yeah you can do this:

Caution

Annotated[str, BeforeValidator(lambda x: str(x))]

ta.validate_python(None) # will result -> "None" (as type str)

dmontagu · 2023-06-19T21:26:46Z

Probably want to use pydantic.v1.validators.str_validator or whatever the function was from v1

mrenner-ML · 2023-07-06T08:47:26Z

I am currently migrating v1 -> v2 and am facing issues with this. It would be great to at least have the option to set this in the model configs.

The use case I have for coercing int -> str is we often have IDs that appear as numericals (e.g. 12839384) and are infered as such when using pandas, whereas we want to treat them as string in order to perform string manipulation or join them with other non-numerical IDs. Pydantic v1 allowed me not to worry about this, whereas now I have to coerce all of my IDs separately beforehand.

May just add that I don't really see the benefit of removing this original behavior from v1, and that it will probably cause a lot of headaches when migrating, especially that this is not documented clearly.

lig · 2023-07-06T09:05:30Z

I think this could be another case for a type map we've discussed privately. As far as I remember one of the ideas was to have a global registry that will map an annotation to another one. It could look from a user point of view as something like this:

type_map_registry: dict[type, type] = {str: Annotated[str, Field(force=True)]}

This would force all str fields in all modules to be treated as Annotated[str, Field(force=True)].

lig · 2023-07-06T09:16:25Z

@mrenner-ML Indeed, this isn't stated explicitly in the migration guide. In general, coercing numbers and more complex types to str just hides errors and virtually disables the type checking. Overall, Pydantic V2 is more strict than Pydantic V1. However, there are more ways for customizing behavior in Pydantic V2.

In Pydantic V2 it's possible to achieve the desired behavior using Custom Data Types.

vitalik · 2023-07-06T09:29:47Z

It could look from a user point of view as something like this:
type_map_registry: dict[type, type] = {str: Annotated[str, Field(force=True)]}

yeah, that should work for all of my cases... would also be nice to have this global and per base model class and a concrete model

lig · 2023-07-06T09:33:36Z

@vitalik I'm not sure when such a type map could be implemented. However, PRs are always welcome:)

vitalik · 2023-07-06T09:55:51Z

I can take a look, could you give some idea which module/class this best to add this functionality ?

lig · 2023-07-06T11:07:36Z

I think, pydantic/main.py is a good place to start. This is where BaseModel lives. However, it might lead to pydantic/_internal/_model_construction.py where ModelMetaclass is or to pydantic/_internal/_generate_schema.py where schema is being generated and actual schema variants are being chosen.

vitalik · 2023-07-08T21:27:02Z

with #6535 it would be possible to change annotation behaviour per model or even global

BaseModel.model_config['replace_types'] = {str: LaxStr}

hampuskraft · 2023-08-09T12:45:22Z

@wholmen Using LaxStr = Annotated[str, BeforeValidator(lambda x: str(x))] in place of str in models where you want to cast int to str is the current solution, although you might also have luck with LaxStrGenerator above—unless you use pattern, until #6951 is merged.

David-OConnor · 2023-08-23T20:30:24Z

What is the way to do this in 2.3.0 with pattern? Thank you!

15498th · 2023-08-23T22:44:42Z

There seems to be another instance of documentation describing old behavior: https://docs.pydantic.dev/latest/usage/types/standard_types/ (https://github.com/pydantic/pydantic/blob/main/docs/usage/types/standard_types.md?plain=1#L14):

Strings are accepted as-is. int, float, and Decimal are coerced using str(v).

Remove bit about coercion of numeric types to str since it is not the case for v2 (pydantic#6045)

David-OConnor · 2023-08-24T00:07:15Z

Solution: data_id: Union[int, str]

fortzi · 2023-08-24T07:04:30Z

It would be great to have this as a configuration that pydantic loads from the environment/file.

The reason is that patching BaseModel this way has the disadvantage that it must be done before importing any module that uses BaseModel, which adds to the mental load and is easy to mess up without noticing.

Another possible solution is implementing a model validator in a base class that would replace BaseModel. This also has the disadvantage of having yet another thing to remember.

15498th · 2023-08-26T06:23:54Z

@David-OConnor It might cause things break unexpectedly down the line if code using the model is not prepared to handle int. For example print('Data id is:' + data.data_id) raises TypeError if data_id value is integer.

samuelcolvin · 2023-09-04T09:39:16Z

I've re-opened this. On balance I think we should add a config flag to re-instate the V1 behavior since grander changes to allow type replacement is going to take much longer.

David-OConnor · 2023-09-04T12:22:57Z

@David-OConnor It might cause things break unexpectedly down the line if code using the model is not prepared to handle int. For example print('Data id is:' + data.data_id) raises TypeError if data_id value is integer.

The case I'm referring to is parsing an int or string-of-int into int. So, downstream only sees and handles ints.

CallumAtCarter · 2023-09-08T16:39:52Z

Edit: Updating to 2.3.0 fixed this issue.

I'm unsure if this is the same issue, but I'll post the issue I'm having here also. The data should be str throughout. If someone believes this is a different issue (or if I'm just doing this completely wrong) please let me know.

StrippedStr = Annotated[
    str,
    BeforeValidator(lambda v: v.strip() if isinstance(v, str) else v),
]

AlphaStr = Annotated[
    StrippedStr,
    Field(pattern=r"^[a-zA-Z]+$"),
]


class Data(BaseModel):
    a: AlphaStr  # works
    b: AlphaStr | None  # does not work (see error below)

  File "site-packages\pydantic\_internal\_model_construction.py", line 173, in __new__
    complete_model_class(
  File "site-packages\pydantic\_internal\_model_construction.py", line 481, in complete_model_class
    cls.__pydantic_validator__ = SchemaValidator(simplified_core_schema, core_config)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.SchemaError: Invalid Schema:
model.schema.model-fields.fields.a.schema.default.schema.nullable.schema.function-before.pattern
  Extra inputs are not permitted [type=extra_forbidden, input_value='^[a-zA-Z]+$', input_type=str]
    For further information visit https://errors.pydantic.dev/2.1/v/extra_forbidden

adriangb · 2023-09-08T16:53:03Z

@CallumAtCarter I can't reproduce, can you try installing pydantic 2.3.0 or from main? I suspect this got fixed in #6951.

from typing import Annotated

from pydantic import BaseModel, Field, BeforeValidator


StrippedStr = Annotated[
    str,
    BeforeValidator(lambda v: v.strip() if isinstance(v, str) else v),
]

AlphaStr = Annotated[
    StrippedStr,
    Field(pattern=r"^[a-zA-Z]+$"),
]


class Data(BaseModel):
    a: AlphaStr  # works
    b: AlphaStr | None  # does not work (see error below)


Data(a="abc", b=None)
Data(a="abc", b="  def  ")
Data(a="abc", b="!!!")
"""
b
  String should match pattern '^[a-zA-Z]+$' [type=string_pattern_mismatch, input_value='!!!', input_type=str]
    For further information visit https://errors.pydantic.dev/2.3/v/string_pattern_mismatch
"""

But yes this is unrelated to this issue so if that doesn't fix it for you please open a new discussion (and feel free to tag me).

CallumAtCarter · 2023-09-08T16:59:22Z

@adriangb I should have updated before posting 😔. Updating to 2.3.0 fixed it, thanks!

berzi · 2023-09-21T10:07:05Z

@adriangb I really need this feature but I'm not familiar with Pydantic's release schedule. When can I expect this to land in a release?

lig · 2023-09-21T19:50:23Z

As soon as the release is out 🤷🏻‍♂️ Usually, this is not long after a fix is ready.

At the moment, we are planning to make a release happen this week. There is a bunch of things coming to this release. Samuel outlined some priorities here #7324. This doesn't mean that everything from that list will make its way into the next release though.

durgeksh · 2023-09-26T12:47:44Z

Hi team,
I am new to Pydantic. My requirements are as below.

Check data type of coulmns
There are 5 columns out of which, I need to validate column 1 for first 500 rows only
Validate the data of one model's column with other model's column (Primary and Foreign key relation)
Thank you for the wonderful library.

lig · 2023-09-28T08:04:02Z

@durgeksh Hi! It seems like your comment is unrelated to the original issue. I'd recommend asking your question in a new discussion here https://github.com/pydantic/pydantic/discussions

durgeksh · 2023-09-28T08:09:20Z

@lig Thank you.

vitalik added the feature request label Jun 8, 2023

This comment was marked as off-topic.

Sign in to view

vitalik changed the title ~~Validating int->str~~ V2 - Validating int->str Jun 10, 2023

vitalik mentioned this issue Jun 14, 2023

V2 beta 3 outstanding issues #6096

Closed

9 tasks

Kludex mentioned this issue Jul 5, 2023

Ints are not coerced into strings #6446

Closed

1 task

vitalik mentioned this issue Jul 8, 2023

Add a replace_types option to ConfigDict #6535

Closed

5 tasks

samuelcolvin added this to the 2.0 milestone Jul 10, 2023

This was referenced Jul 14, 2023

Type coercion does not work for str field with strict mode disabled. #6667

Closed

How to patch _internal._fields #6668

Closed

15498th added a commit to 15498th/pydantic that referenced this issue Aug 23, 2023

Update description for string type in Standard Library Types section

9276216

Remove bit about coercion of numeric types to str since it is not the case for v2 (pydantic#6045)

15498th mentioned this issue Aug 23, 2023

Update description for string type in Standard Library Types documentation section #7238

Merged

5 tasks

samuelcolvin reopened this Sep 4, 2023

pydantic-hooky bot assigned hramezani Sep 4, 2023

pydantic-hooky bot added the unconfirmed Bug not yet confirmed as valid/applicable label Sep 4, 2023

samuelcolvin removed the unconfirmed Bug not yet confirmed as valid/applicable label Sep 4, 2023

samuelcolvin unassigned adriangb and hramezani Sep 4, 2023

samuelcolvin mentioned this issue Sep 4, 2023

☝️ Pydantic Priorities #7324

Closed

13 tasks

lig self-assigned this Sep 19, 2023

This was referenced Sep 19, 2023

✨ Implement optional number to str coercion pydantic/pydantic-core#975

Merged

✨ Implement optional number to str coercion #7508

Merged

adriangb closed this as completed in #7508 Sep 21, 2023

fvankrieken mentioned this issue Jan 11, 2024

Compile python requirements NYCPlanning/data-engineering#502

Merged

V2 - Validating int->str #6045

V2 - Validating int->str #6045

Comments

vitalik commented Jun 8, 2023 • edited by lig

Affected Components

This comment was marked as off-topic.

lig commented Jun 15, 2023

samuelcolvin commented Jun 15, 2023

samuelcolvin commented Jun 15, 2023

vitalik commented Jun 15, 2023 • edited

dmontagu commented Jun 19, 2023

vitalik commented Jun 19, 2023

dmontagu commented Jun 19, 2023

adriangb commented Jun 19, 2023

dmontagu commented Jun 19, 2023

adriangb commented Jun 19, 2023

dmontagu commented Jun 19, 2023 • edited

vitalik commented Jun 19, 2023

dmontagu commented Jun 19, 2023

mrenner-ML commented Jul 6, 2023

lig commented Jul 6, 2023

lig commented Jul 6, 2023

vitalik commented Jul 6, 2023

lig commented Jul 6, 2023

vitalik commented Jul 6, 2023

lig commented Jul 6, 2023

vitalik commented Jul 8, 2023

hampuskraft commented Aug 9, 2023

David-OConnor commented Aug 23, 2023

15498th commented Aug 23, 2023

David-OConnor commented Aug 24, 2023

fortzi commented Aug 24, 2023

15498th commented Aug 26, 2023

samuelcolvin commented Sep 4, 2023

David-OConnor commented Sep 4, 2023 • edited

CallumAtCarter commented Sep 8, 2023 • edited

adriangb commented Sep 8, 2023 • edited

CallumAtCarter commented Sep 8, 2023 • edited

berzi commented Sep 21, 2023

lig commented Sep 21, 2023 • edited

durgeksh commented Sep 26, 2023

lig commented Sep 28, 2023

durgeksh commented Sep 28, 2023

vitalik commented Jun 8, 2023 •

edited by lig

vitalik commented Jun 15, 2023 •

edited

dmontagu commented Jun 19, 2023 •

edited

David-OConnor commented Sep 4, 2023 •

edited

CallumAtCarter commented Sep 8, 2023 •

edited

adriangb commented Sep 8, 2023 •

edited

CallumAtCarter commented Sep 8, 2023 •

edited

lig commented Sep 21, 2023 •

edited