-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: adding a row to a GeoDataFrame downcasts GeoSeries to Series #3119
Comments
It's the
Inside |
This unfortunately isn't new as such, it's just that the
we have the same crash, with or without the Now it looks like we could do something like this: def _constructor_from_mgr(self, mgr, axes):
# analogous logic to _geodataframe_constructor_with_fallback
if not any(isinstance(block.dtype, GeometryDtype) for block in mgr.blocks):
df= pd.DataFrame._from_mgr(mgr, axes)
# stuff like concat to append an object row will upcast GeometryDtype blocks
for c in self.columns[self.dtypes == "geometry"]:
try:
df[c] = _ensure_geometry(df[c], crs=self[c].crs)
except TypeError:
pass
try:
df = df.set_geometry(self._geometry_column_name, crs=self.crs)
except TypeError:
pass
return df
return GeoDataFrame._from_mgr(mgr, axes) which is a bit ugly but seems to work (this is super rough, so there might be a nicer way). Having a look at how the concat logic works, it seems like ideally we'd have a way to tell pandas that the types should be specialised to geometrydtype for both things being concatenated, rather than generalised to object dtype. Currently this seems to happen in core.dtypes.concat::concat_compat and _get_result_dtype, where (For example, trying to take (GeometryDtype, object) -> GeometryDtype breaks trying to access individiual rows, gdf.iloc[0] as the dtype promotion will try and put a row of all types into a GeometryArray and crash, rather than an object numpy array) |
Aside, I was trying to think about how to do this properly, and thought that a Categorical is the best analogue in pandas: import pandas as pd
df = pd.DataFrame({"a":[1,2,3], "b":pd.Categorical(["a", "b", "b"])})
b_ = [6, "b"]
df.loc[5] = b_
print(df.dtypes)
a int64
b object
dtype: object but in this case the categorical dtype also isn't preserved. |
Just adding as a note that @jorisvandenbossche picked up on the warnings PR, this case also regresses: crs = "EPSG:4326"
df = GeoDataFrame(
{
"geometry": [Point(0, 0), Point(1, 1), Point(0, 0)],
"value1": np.arange(3, dtype="int64"),
"value2": np.array([1, 2, 1], dtype="int64"),
},
crs=crs,
)
# dummy test asserting we can access the crs
def func(group):
assert isinstance(group, GeoDataFrame)
assert group.crs == crs
groupby = df.groupby("value2")
groupby_sliced = groupby[df.columns] # -> groupby_sliced._selected_obj._geometry_column_name is where _geometry_column_name is lost.
groupby_sliced.apply(func) still looking into how this happens. |
Yeah, that's a good other example. I think in the end this is something that will have to be solved in pandas to fix it properly (although your PR looks interesting as a workaround, have to take a closer look!)
Yes, I looked into that further and commented about it at #2966 (comment). Summary, it's also "uncovered" by the _constructor change, but something that already existed before for non-"geometry" geometry columns as well. Will do a PR to fix it on the pandas side (it's an easy fix), and have #3130 to test it on our side. |
There's actually a much older regression at play here too. import geopandas
from geodatasets import get_path
nybb = geopandas.read_file(get_path("nybb"))[["BoroCode", "geometry"]]
crs_orig = nybb.crs
# add a new row
nybb.loc[5] = [6, nybb.geometry.iloc[0]]
assert nybb.crs == crs_orig # now fails The CRS is not preserved in this case since 8bac66c |
In the end this is all because pandas doesn't properly preserve the dtype in concat. And only our constructor that detects by default a "geometry" column fixed that again. Will try to think about how to fix it in pandas as well. |
I guess that merging #3080 caused a regression illustrated below.
We lose the active geometry because the
geometry
column is now a Seriescc @m-richards
The text was updated successfully, but these errors were encountered: