Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associate spatial functions to each type #396

Open
adrien-berchet opened this issue Aug 7, 2022 · 17 comments
Open

Associate spatial functions to each type #396

adrien-berchet opened this issue Aug 7, 2022 · 17 comments

Comments

@adrien-berchet
Copy link
Member

A nice improvement would be to associate only the relevant functions to each spatial type, instead of associating all functions to all types. With this feature we wouldn't have to specify the target type when it is different from Geometry (as described here: https://geoalchemy-2.readthedocs.io/en/latest/core_tutorial.html#use-raster-functions).

@papacodebear
Copy link

papacodebear commented Jun 4, 2023

@adrien-berchet I've started to dig into this, but I'm noticing a build inconsistency which is preventing progress. The GitHub actions yml is set to test python versions 3.7 - 3.11 https://github.com/geoalchemy/geoalchemy2/blob/master/.github/workflows/test_and_publish.yml#L32, but the pre-commit hooks are configured to use only python 3.8 https://github.com/geoalchemy/geoalchemy2/blob/master/.pre-commit-config.yaml#L2. Can the 3.8 restriction be loosened? My local setup is using 3.11.

@adrien-berchet
Copy link
Member Author

Hi @papacodebear !
Sure, it's no big deal. Just note that in the CI the lint job, which runs the pre-commit hooks, uses Py38. So if you use features from Python > 3.8 the pre-commit hook might be valid on your local setup but not in the CI.

@papacodebear
Copy link

@adrien-berchet Gotcha, sounds good and noted. I'm trying to create a branch, but I don't have permissions to. Would you mind adding me as a contributor? Also, I assume that I should be branching off of and creating PR's into master, correct?

@adrien-berchet
Copy link
Member Author

Could you create a fork on your side to make the branch there? Actually I don't have permissions to add you as a contributor (I will try to ask for it, it would make sense that I get them now).
Yes you should target the master branch.
Thanks!

@adrien-berchet
Copy link
Member Author

Hi @papacodebear, did you have some time to work on this? (just to know, no pressure ☺️ )

@papacodebear
Copy link

papacodebear commented Jul 6, 2023

Hi @adrien-berchet , no worries! Yeah I've started to work on this, but the scope has definitely expanded.

The way I've started to tackle this is to define which functions actually belong to each type, which has been the majority of the effort. I started by taking the "master" list of functions from here, and tried to associate them with the right types, but then realized that PostGIS has also made updates over time, and so pulled the new list of functions directly from PostGIS.

Rasters
Geography
I'm looking at adding box functions as well, but haven't really started this yet. I've been trying to reconcile the existing set of functions with those that are just for the geometry type. I'm trying to prune those functions that were already in the master list here to include only those relevant to geometries, while also including any that may have been added to postgis in the meantime, but this is still in progress. I'm referencing this table .

I sent you an invite to the forked repository if you'd like to take a quick look at the progress. Any word on getting the collaborator access?

Edit: I realize the forked repository is public, my branch is here: https://github.com/papacodebear/geoalchemy2/tree/papacodebear/issue396
Edit: Separated functions are defined here: https://github.com/papacodebear/geoalchemy2/tree/papacodebear/issue396/geoalchemy2/functions

@adrien-berchet
Copy link
Member Author

Very nice!
I didn't know this table (http://postgis.net/docs/manual-3.3/PostGIS_Special_Functions_Index.html#PostGIS_TypeFunctionMatrix), it's very interesting. And indeed, I guess most of the work is to translate this table into actual GeoAlchemy2 functions (and with proper specific compilations for other dialect than Postgis).
I did not read your code in details but I'm wondering one thing: why did you moved spatial types into one separated files?

@papacodebear
Copy link

Apologies for the delay, I moved the spatial types into individual files mostly for readability reasons, using the one class per module pattern. It can be helpful to know what types exist by looking in the directory. It's also helpful to reduce merge conflicts when multiple people are working on different types. I sort of see it as the same benefit as splitting the dialects here: https://github.com/geoalchemy/geoalchemy2/tree/master/geoalchemy2/types/dialects .

It's also going to become more important as we split the spatial functions per type, since the _FUNCTIONS variable is so large, we would want them in separate files for separate types.

I don't want to step on any toes, so let me know if you want me to change it back, but I think there are benefits to splitting it.

@adrien-berchet
Copy link
Member Author

Ok, I see, that's interesting. In our case the type classes are very simple so it may be an overkill but we can still try this new organization.

@papacodebear
Copy link

Hi @adrien-berchet ,

I've split the functions per type, but now I'm running into an issue where I could use some guidance. I'm trying to accommodate the func use case, and having an issue with multiple possible types.

As an example, take ST_Intersection, which applies to both rasters and geometries.

Geometry: http://postgis.net/docs/manual-3.3/ST_Intersection.html
Raster: http://postgis.net/docs/manual-3.3/RT_ST_Intersection.html

If we're in a position where a user could write an expression like

with Session(engine) as session:
    geometry_test = table("geometrytest", column("id"), column("geom"), column("rast"))
    query = select(func.ST_Intersection(geometry_test.c.rast, geometry_test.c.geom))
    geom_row = session.scalars(query).one()

It's not obvious what the return type is unless either the user specifies it, or we keep a list of potential argument types and known return types. For instance the example here should return a set of geometry values, even though the first argument is a raster.

image

By my estimation, it looks like we're having to guess what the return type is based on arguments to the function.

This problem would occur in varying degrees of confusion for the following functions:

  • ST_Intersection
  • ST_Segmentize
  • ST_SetSRID
  • ST_SnapToGrid
  • ST_Transform
  • ST_Union

Should I look into building the list of possible arguments and corresponding return types?

@adrien-berchet
Copy link
Member Author

Hi @papacodebear !
Sorry for the long delay, I had no time to answer when you posted this and then I completely forgot about this...
Anyway, I'm afraid you're right and we'll have to list all possible couples of argument types and return types. Then we'll have to register them with specific internal names. For example, ST_Intersection(geom, rast, band_num=1) could be registered as geom_rast_st_intersection, so the user could call geom.st_intersection(rast, band_num=1) and internally it would call geom_rast_st_intersection. But if the user works with core queries as you showed, I think he will have to specifically call something like select(func.geom_rast_ST_Intersection(geometry_test.c.geom, geometry_test.c.rast, band_num=1)) or select(func.ST_Intersection(geometry_test.c.geom, geometry_test.c.rast, band_num=1, type_=GeometrySet)) (though the GeometrySet type should be defined first).

@mbway
Copy link
Contributor

mbway commented Oct 20, 2023

Responding here to a question asked in #478 as to whether
typing.overload can be used to solve this problem.

I don't think it can be used to solve this problem as @overload is a type-level construct and in order to have the SQL
type be different depending on the input data types would require self.type to be assigned a different value at
runtime.

It looks like this should be possible at least in some situations:

https://github.com/sqlalchemy/sqlalchemy/blob/26f4b3e72f8a250e406f5d9871212b83cbc7cfc5/lib/sqlalchemy/sql/functions.py#L1385

SQLAlchemy provides ReturnTypeFromArgs(GenericFunction) which indicates that runtime introspection is possible. The case here is more complex than simply copying the input type to the output type but I think the same techniques should apply.

@adrien-berchet
Copy link
Member Author

Interesting, thanks!
@papacodebear can I let you see what you can do with this?

@papacodebear
Copy link

@adrien-berchet Sure thing. Let me take a look and see what's possible with it! Thanks for finding this @mbway

@papacodebear
Copy link

Hey @adrien-berchet, sorry for the crazy delay here, but I think I've made quite a bit of progress over the holidays. I think there's going to be a lot of overlap with the work @mbway has done, and I'd love to incorporate what he's done here: #478.

I took a quick look at the suggested ReturnTypeFromArgs(GenericFunction) and found that that simply tries to match the current return type to the return type of the first non-null argument. https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/sql/elements.py#L5220 . But to the point, @mbway is right, runtime introspection is possible. So I went back to the idea of creating function "entries" for different argument overloads. The idea being that if we know the types of the arguments the user is supplying, we can infer the return type.

Rather than manually create the argument overloads and their respective return types, I figured out an automatic way of parsing available postgis functions. The documentation isn't readily queryable, but I think I figured out a general way to maintain the list of functions and argument types. This also required adding a few unimplemented types like GeomVal though.

  1. Query the available "ST_" functions and corresponding metadata in a postgis:3.4 instance in a local docker container. This will give the lowercase list of functions with corresponding descriptions, argument types, and return types.
  2. Parse the latest postgis 3.4 manual and search it for the correct capitalization of function names.
  3. Save each set of possible arguments with their corresponding return type hints and instantiation classes. This necessitates changing the structure of the list of functions though. As an example, here's ST_Intersection. https://github.com/papacodebear/geoalchemy2/blob/papacodebear/issue396/geoalchemy2/_functions.py#L2449
  4. When calling a function (e.g. func.ST_Intersection), create a tuple of the supplied arguments, and lookup that set of arguments in the FUNCTIONS dictionary. Set the GenericFunction self.type attribute based on the instantiation type (distinctly not the type hint class, learned the hard way that you can't instantiate a List[]() or a Union[]()) https://github.com/papacodebear/geoalchemy2/blob/papacodebear/issue396/geoalchemy2/functions/__init__.py#L254

It does solve the original issue, but in one case requires providing a little more information about column types. As an example for a GeometryTest table:

class GeometryTest(Base):
    __tablename__ = "geometrytest"
    id = Column(Integer, primary_key=True)
    geom = Column(Geometry)
    rast = Column(Raster)


with Session(engine) as session:
    # This doesn't work, we can't infer the type of the columns. They resolve as NullType(): 
    geometry_test = table("geometrytest", column("id"), column("geom"), column("rast"))
    query = select(func.ST_Intersection(geometry_test.c.rast, geometry_test.c.geom))
    # This works because we specify the type of the columns. We don't have to specify the return type though
    geometry_test = table(
        "geometrytest", column("id"), column("geom", type_=Geometry), column("rast", type_=Raster)
    )
    query = select(func.ST_Intersection(geometry_test.c.rast, geometry_test.c.geom))
    # This works because the type of the columns comes from the table definition
    query = select(func.ST_Intersection(GeometryTest.rast, GeometryTest.geom))
    geom_row = session.scalars(query).one()
    print(geom_row)

Essentially, if using the "table" construct, the column types have to be provided:

A column() / table() construct like that illustrated above can be created in an ad-hoc fashion and is not associated with any MetaData, DDL, or events, unlike its Table counterpart.

It's not cleaned up, but before I go further I wanted to check in and make sure I'm not way out there. I think this could simplify a lot of function maintenance!

Let me know what you think!

@adrien-berchet
Copy link
Member Author

Hi @papacodebear
Thank you very much for this update, it seems very promising! (and no problem for the delay 😉 )

As far as I am concerned, I have no objection about changing the data structure used to store the functions (in your 3.). It's only private internal data, so no package other than GeoAlchemy2 should use it, so we can change it according to our needs.

About the table constructor, I never used it so I'm not sure. How does it behave with GeoAlchemy2==0.14.3? Do we also have to specify the type? Anyway, I would like to not introduce a breaking change in the API for this feature.

Finally, your branch is quite far behind our master branch, it would be nice to try to update it so it will be easier to understand what you changed.

Thanks again!

@papacodebear
Copy link

@adrien-berchet Fair enough! I back-merged from master, so it should be up to date now. Let me know what you think! I haven't incorporated the automated type stub generation yet, but am happy to add it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants