Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support numpy types #31

Open
Kludex opened this issue Apr 25, 2023 · 17 comments
Open

Support numpy types #31

Kludex opened this issue Apr 25, 2023 · 17 comments
Assignees
Labels
good first issue Good for newcomers Types

Comments

@Kludex
Copy link
Member

Kludex commented Apr 25, 2023

The idea here is to support the numpy types mentioned on https://numpy.org/doc/stable/reference/arrays.scalars.html.

@yezz123
Copy link
Collaborator

yezz123 commented Apr 26, 2023

+1

@Kludex
Copy link
Member Author

Kludex commented Apr 26, 2023

Actually, numpy has a pretty big artifact. Should we create pydantic-numpy-types? 😅

@yezz123
Copy link
Collaborator

yezz123 commented Apr 26, 2023

Actually, numpy has a pretty big artifact. Should we create pydantic-numpy-types? 😅

Can't deny the same regarding the #24 but I guess it can be good if we have all of the types in one package!

@Kludex
Copy link
Member Author

Kludex commented Apr 26, 2023

Maybe the extra all can be created so people can do pip install pydantic-extra-types[all], or the extra numpy. Or... Just don't install numpy, and show a message like "You need to install numpy".

@yezz123
Copy link
Collaborator

yezz123 commented Apr 26, 2023

Maybe the extra all can be created so people can do pip install pydantic-extra-types[all], or the extra numpy. Or... Just don't install numpy, and show a message like "You need to install numpy".

I agree with both options - either displaying a message that prompts the user to install Numpy, or proceeding with the extra requirements since it functions properly in our current scenario, this will allow for the possibility of adding any big package as extra

@kroncatti
Copy link

kroncatti commented Jun 13, 2023

Hey folks,

Just so I understand a little bit better our objective with the numpy type here. I imagine you intend to have something like (correct me if I am talking nonsense):

Numpy(value=np.float64(12))

Do we want to have some conversions as well ? because my first idea was to validate using isinstance(value, np.generic). However, when running isinstance(12, np.generic) the result is False (the same happens with a python float). Would we want to convert those elements to numpy when someone invoke the type ?

The reason why I am asking this is the following: if we want to do those conversions we would probably need to make some decisions (e.g., we have int32, int64, and so on).

If my reasoning is not fair, just let me know 😄

@lig
Copy link
Contributor

lig commented Jun 14, 2023

@kroncatti Pydantic has two modes of parsing/validation: strict and lax. In lax mode it already coerces types in many cases such as this:

    class Model(BaseModel):
        foo: int
    
    m = Model(foo='12')
    print(m)
    # foo=12
    print(type(m.foo))
    # <class 'int'>

So, it seems natural for Pydantic to coerce values to numpy types on validation.

@kroncatti
Copy link

kroncatti commented Jun 14, 2023

Thanks @lig,

Just to check if I properly understood. If we set:

class Model(BaseModel):
    foo: Numpy

m = Model(foo='12')
print(m)
# foo=12
print(type(m.foo))
# <class 'numpy.int64'>

This should be the outcome ?

Shouldn't the user have to specify what numpy type we are going to coerce ? such as int64, float64, etc.

@Kludex
Copy link
Member Author

Kludex commented Jun 14, 2023

I think the idea here is to support the following types: https://numpy.org/doc/stable/user/basics.types.html#array-types-and-conversions-between-types

We should create the following, and all the analogous:

  • pydantic_extra_types.NumPyFloatHalf / pydantic_extra_types.NumPyFloat16
  • pydantic_extra_types.NumPySingle
  • pydantic_extra_types.NumPyDouble

I guess we also want np.array, np.datetime64, and others.

@kroncatti
Copy link

That makes sense. so we are basically creating one extra type for each one of those types instead of having a generic type for all of them. Cool!

Kludex added a commit to Kludex/Kludex that referenced this issue Jun 19, 2023
@frenki123
Copy link

frenki123 commented Jul 8, 2023

Hey guys,
I want to help with this topic, but first I want to check if I understand how should I create these new types.
My idea is something like in the code below. The only problem is that in strict mode validation passes with int and not with numpy.int8 (probably because I am using int_schema). Did you maybe have some other ideas?

class NumPyInt8(numpy.int8):
    """
    A numpy.int8 type. The range is between -128 and 127.
    """
    min_value: int = -128
    max_value: int = 127

    @classmethod
    def __get_pydantic_core_schema__(cls, source: type[Any], handler: GetCoreSchemaHandler) -> core_schema.CoreSchema:
        return core_schema.general_after_validator_function(
            cls._transform,
            core_schema.int_schema(le=cls.max_value, ge=cls.min_value)
        )
    
    @classmethod
    def _transform(cls, scalar: int, _: core_schema.ValidationInfo) -> numpy.int8:
        return numpy.int8(scalar)

@frenki123
Copy link

Hello again,
@yezz123 and @Kludex, can you take a look at my fork for numpy integers?
https://github.com/frenki123/pydantic-extra-types/tree/numpy-int-types

Maybe it can be added to the main branch as a start for support of numpy types.

@GuillaumeQuenneville
Copy link

One of the issues to deal with is that JSON cannot natively represent all the dtype's for numpy arrays without extra context. So we would have to choose a reasonable schema as a default for the serializing/deserializing to be isomorphic. Namely, we need to have an opinionated default for representing arrays of complex numbers. There are of course many valid options so having it be easily overwritable would also be useful.

That being said, the way we have been solving is as well as the ASE project has been with

{'__ndarray__': (
    obj.shape,
    str(obj.dtype),
    flatobj.tolist()
)}

The serializing / deserializing logic is in the file I linked.

@caniko
Copy link

caniko commented Oct 31, 2023

I suggest you to check out my project pydantic-numpy

@rbavery
Copy link

rbavery commented Nov 18, 2023

from pydantic/pydantic#7980

I wonder if it makes more sense to integrate the logic you've designed in pydantic-numpy into pydantic-extra-types.

I'd personally prefer that pydantic/pydantic-extra-types natively support numpy types.

pydantic-numpy requires saving arrays to files rather than serializing and deserializing the numpy types themselves. I think this discussion raises some good points: pydantic/pydantic#4964

My use case is that I want to define a metadata standard for ml models that take large and complex arrays. the metadata just needs to record the numpy type, order of labeled dimensions, and the shape. Saving out the whole array to load into the Numpy Model isn't preferrable, it would be uneccessary storage and susceptible to path errors

@caniko
Copy link

caniko commented Nov 19, 2023

The co-author of pydantic-numpy here.

My use case is that I want to define a metadata standard for ml models that take large and complex arrays.

If you only want validation, and dimension enforcement, just use pydantic_numpy.types, it is compatible with pydantic.BaseModel, even pydantic.dataclass.

pydantic-numpy requires saving arrays to files rather than serializing and deserializing the numpy types themselves.

@rbavery your statement is false, interaction with numpy files or saving/loading is an optional quality of life feature, and is only offered with NumpyModel; you can ignore this feature in your described case.

Please be careful when you make these claims, I'd rather have you ask the question than claiming if you are uncertain.

Update
I wanted to share some updates and thoughts regarding the pydantic-numpy project. Here's what we're looking into:

Refactoring of pydantic-numpy.typing: We're giving the submodule a minor overhaul. The key change is the transition to automatically generated code. This shift is essential since dynamically generated typing hints aren't compatible with static type checkers like MyPy and PyRight. The refactor is compatible with static type checkers, and the code generator's script is available in the repository for reference.

Introducing NumpyModel for File IO: We've added a NumpyModel that supports integrated file IO operations. This should streamline processes that involve NumPy data handling.

Comprehensive Testing for Coverage: We've also put a significant effort into extensive testing to ensure robust coverage and reliability.

Regarding the integration of pydantic-numpy with this repository, I propose keeping them separate for the following reasons:

Complexity Management: Merging pydantic-numpy into this repository would significantly increase the complexity of both codebases. Our goal is to maintain simplicity and clarity in our projects.

Community Feedback: I'm aware of the requests to incorporate NumPy types directly into this repository. While I understand the perspective, I believe maintaining separation is in our best interest for streamlined development and maintenance.

Documentation Update: To make pydantic-numpy more discoverable for those who need it, we're considering adding a small dedicated section about it in the Pydantic documentation. I hope this update aligns with your goals, and I look forward to your thoughts and feedback.

caniko added a commit to caniko/pydantic-numpy that referenced this issue Nov 19, 2023
Changes made because of an alarming interpretation of the package on pydantic/pydantic-extra-types#31
caniko added a commit to caniko/pydantic-numpy that referenced this issue Nov 19, 2023
Changes made because of an alarming interpretation of the package on pydantic/pydantic-extra-types#31
@rbavery
Copy link

rbavery commented Nov 19, 2023

woops sorry @caniko, my bad. I read the readme incorrectly. Thanks for correcting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Types
Projects
None yet
Development

No branches or pull requests

8 participants