Describe the issue:
When assigning a new column to a dask object, it seems like the concrete subtype (e.g. geopandas.GeoDataFrame) is lost.
Minimal Complete Verifiable Example:
import dask.array
import dask.dataframe
import dask_geopandas
import geopandas
import pandas as pd
df = geopandas.GeoDataFrame({"geometry": geopandas.points_from_xy([0, 0], [0, 1])})
ddf = dask_geopandas.from_geopandas(df, npartitions=2)
ddf = ddf.clear_divisions() # this is important
b = dask.dataframe.from_dask_array(dask.array.zeros((2,), chunks=(1, 1)), index=ddf.index)
ddf.assign(a=b).geometry.x.compute() ## error
that raises
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[/var/folders/x7/__bs9yvx21qbvzb17sj4qsh40000gn/T/ipykernel_95282/3433075730.py](http://127.0.0.1:8888/var/folders/x7/__bs9yvx21qbvzb17sj4qsh40000gn/T/ipykernel_95282/3433075730.py) in ?()
8 ddf = dask_geopandas.from_geopandas(df, npartitions=2)
9 ddf = ddf.clear_divisions() # this is important
10
11 b = dask.dataframe.from_dask_array(dask.array.zeros((2,), chunks=(1, 1)), index=ddf.index)
---> 12 ddf.assign(a=b).geometry.x.compute() ## error
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask_expr/_collection.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask_expr/_collection.py) in ?(self, fuse, concatenate, **kwargs)
476 out = self
477 if not isinstance(out, Scalar) and concatenate:
478 out = out.repartition(npartitions=1)
479 out = out.optimize(fuse=fuse)
--> 480 return DaskMethodsMixin.compute(out, **kwargs)
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py) in ?(self, **kwargs)
368 See Also
369 --------
370 dask.compute
371 """
--> 372 (result,) = compute(self, traverse=False, **kwargs)
373 return result
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py) in ?(traverse, optimize_graph, scheduler, get, *args, **kwargs)
656 keys.append(x.__dask_keys__())
657 postcomputes.append(x.__dask_postcompute__())
658
659 with shorten_traceback():
--> 660 results = schedule(dsk, keys, **kwargs)
661
662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/pandas/core/generic.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/pandas/core/generic.py) in ?(self, name)
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'x'
geopandas.GeoSeries objects automatically add .x and .y to the geometry columns. We're getting a regular pandas.Series, causing the error.
Anything else we need to know?:
Having unknown divisions does seem to be necessary. Commenting out the def = ddf.clear_divisions() line makes the error go away. So I think we can maybe narrow the search to AssignAlign (and not Assign)
Environment:
-
Dask version: 2024.12.0
-
dask-expr from main @ d7577a2
-
Python version:
-
Operating System:
-
Install method (conda, pip, source):
Describe the issue:
When assigning a new column to a dask object, it seems like the concrete subtype (e.g. geopandas.GeoDataFrame) is lost.
Minimal Complete Verifiable Example:
that raises
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) [/var/folders/x7/__bs9yvx21qbvzb17sj4qsh40000gn/T/ipykernel_95282/3433075730.py](http://127.0.0.1:8888/var/folders/x7/__bs9yvx21qbvzb17sj4qsh40000gn/T/ipykernel_95282/3433075730.py) in ?() 8 ddf = dask_geopandas.from_geopandas(df, npartitions=2) 9 ddf = ddf.clear_divisions() # this is important 10 11 b = dask.dataframe.from_dask_array(dask.array.zeros((2,), chunks=(1, 1)), index=ddf.index) ---> 12 ddf.assign(a=b).geometry.x.compute() ## error [~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask_expr/_collection.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask_expr/_collection.py) in ?(self, fuse, concatenate, **kwargs) 476 out = self 477 if not isinstance(out, Scalar) and concatenate: 478 out = out.repartition(npartitions=1) 479 out = out.optimize(fuse=fuse) --> 480 return DaskMethodsMixin.compute(out, **kwargs) [~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py) in ?(self, **kwargs) 368 See Also 369 -------- 370 dask.compute 371 """ --> 372 (result,) = compute(self, traverse=False, **kwargs) 373 return result [~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py) in ?(traverse, optimize_graph, scheduler, get, *args, **kwargs) 656 keys.append(x.__dask_keys__()) 657 postcomputes.append(x.__dask_postcompute__()) 658 659 with shorten_traceback(): --> 660 results = schedule(dsk, keys, **kwargs) 661 662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) [~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/pandas/core/generic.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/pandas/core/generic.py) in ?(self, name) 6295 and name not in self._accessors 6296 and self._info_axis._can_hold_identifiers_and_holds_name(name) 6297 ): 6298 return self[name] -> 6299 return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'x'geopandas.GeoSeries objects automatically add
.xand.yto the geometry columns. We're getting a regularpandas.Series, causing the error.Anything else we need to know?:
Having unknown divisions does seem to be necessary. Commenting out the
def = ddf.clear_divisions()line makes the error go away. So I think we can maybe narrow the search toAssignAlign(and notAssign)Environment:
Dask version: 2024.12.0
dask-expr from
main@ d7577a2Python version:
Operating System:
Install method (conda, pip, source):