Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when displaying LazyFrame in Jupyter #16252

Closed
2 tasks done
mdavis-xyz opened this issue May 15, 2024 · 3 comments
Closed
2 tasks done

Panic when displaying LazyFrame in Jupyter #16252

mdavis-xyz opened this issue May 15, 2024 · 3 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@mdavis-xyz
Copy link
Contributor

mdavis-xyz commented May 15, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Run this in Jupyter (which automatically tries to print whatever the last line returns). Note that the column I'm selecting does not exist.

import polars as pl
(
    pl.LazyFrame(
        {
            "a": [1, 2],
        }
    )
    .with_columns(pl.col("b"))
    .select("b")
)

Log output

image

(The verbose flag didn't seem to output anything additional.)

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[2], line 2
      1 import polars as pl
----> 2 (
      3     pl.LazyFrame(
      4         {
      5             "a": [1, 2],
      6         }
      7     )
      8     .with_columns(pl.col("b"))
      9     .select("b")
     10 )

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\displayhook.py:268, in DisplayHook.__call__(self, result)
    266 self.start_displayhook()
    267 self.write_output_prompt()
--> 268 format_dict, md_dict = self.compute_format_data(result)
    269 self.update_user_ns(result)
    270 self.fill_exec_result(result)

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\displayhook.py:157, in DisplayHook.compute_format_data(self, result)
    127 def compute_format_data(self, result):
    128     """Compute format data of the object to be displayed.
    129 
    130     The format data is a generalization of the :func:`repr` of an object.
   (...)
    155 
    156     """
--> 157     return self.shell.display_formatter.format(result)

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\formatters.py:179, in DisplayFormatter.format(self, obj, include, exclude)
    177 md = None
    178 try:
--> 179     data = formatter(obj)
    180 except:
    181     # FIXME: log the exception
    182     raise

File ~\AppData\Local\anaconda3\Lib\site-packages\decorator.py:232, in decorate.<locals>.fun(*args, **kw)
    230 if not kwsyntax:
    231     args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\formatters.py:223, in catch_format_error(method, self, *args, **kwargs)
    221 """show traceback on failed format call"""
    222 try:
--> 223     r = method(self, *args, **kwargs)
    224 except NotImplementedError:
    225     # don't warn on NotImplementedErrors
    226     return self._check_return(None, args[0])

File ~\AppData\Local\anaconda3\Lib\site-packages\IPython\core\formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File ~\AppData\Local\anaconda3\Lib\site-packages\polars\lazyframe\frame.py:533, in LazyFrame._repr_html_(self)
    531 def _repr_html_(self) -> str:
    532     try:
--> 533         dot = self._ldf.to_dot(optimized=False)
    534         svg = subprocess.check_output(
    535             ["dot", "-Nshape=box", "-Tsvg"], input=f"{dot}".encode()
    536         )
    537         return (
    538             "<h4>NAIVE QUERY PLAN</h4><p>run <b>LazyFrame.show_graph()</b> to see"
    539             f" the optimized version</p>{svg.decode()}"
    540         )

PanicException: io error: Error

Issue description

In Jupyter, if I have made a mistake with my polars operations (e.g. selecting a column which doesn't exist), when Jupyter tries to print the result, I get a Panic.

Note that I'm only able to reproduce the error with both .select() and .with_columns(). On their own each one does not cause the Panic.

Expected behavior

Since a .collect() would result in an error, I expect that a string representation of the lazyframe would either:

  • show the same error as a collect
  • show the same output as a valid LazyFrame:
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)

SELECT [col("a")] FROM

WITH_COLUMNS:

[col("b")]

DF ["b"]; PROJECT */1 COLUMNS; SELECTION: "None"

My understanding is that a Panic is never the intended behavior, and that all errors should be handled more gracefully.

Installed versions

--------Version info---------
Polars:               0.20.26
Index type:           UInt32
Platform:             Windows-10-10.0.19045-SP0
Python:               3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.4.0
gevent:               <not installed>
hvplot:               0.8.4
matplotlib:           3.7.2
nest_asyncio:         1.5.6
numpy:                1.24.3
openpyxl:             3.0.10
pandas:               2.0.3
pyarrow:              11.0.0
pydantic:             1.10.8
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.39
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           3.2.0
@mdavis-xyz mdavis-xyz added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 15, 2024
@cmdlineluser
Copy link
Contributor

Can be reproduced outside of Jupyter by forcing the .to_dot() call:

(
    pl.LazyFrame({"a": [1, 2]})
    .with_columns(pl.col("b"))
    .select("b")
    ._ldf
    .to_dot(optimized=False)
)
# could not determine schema
# thread '<unnamed>' panicked at crates/polars-lazy/src/dot.rs:49:14:
# io error: Error
# PanicException: io error: Error

@cmdlineluser
Copy link
Contributor

cmdlineluser commented May 15, 2024

I think this is actually fixed on main due to #16237

On main I get the ColumnNotFoundError as expected:

ColumnNotFoundError: b

This error occurred with the following context stack:
	[1] 'with_columns' failed
	[2] 'select' input failed to resolve

@ritchie46
Copy link
Member

Yes, this is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants