Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing pandas dataframe display style in Rmarkdown #783

Open
ofajardo opened this issue Jun 26, 2020 · 13 comments · May be fixed by #1474
Open

changing pandas dataframe display style in Rmarkdown #783

ofajardo opened this issue Jun 26, 2020 · 13 comments · May be fixed by #1474

Comments

@ofajardo
Copy link

ofajardo commented Jun 26, 2020

I would like to be able to change the display style of a pandas data frame, this code works in Jupyter, would be awesome to get it to work in R markdown. Currently it displays an incomplete version of the html string instead of the nicely formatted html table. Rmarkdown file attached.

dframe.Rmd.zip

---
title: "rawtest"
output: html_document
---
	
```{r setup, include=FALSE}
library(reticulate)
knitr::opts_chunk$set(echo = TRUE, error=TRUE)
use_python('/opt/conda/bin/python')

Displaying a pandas data frame nicely

OK we have a complicated pandas data frame and we want to show it nicely. Passing it to R and using kable
or something like that is not an option because when passing a pandas dataframe with multi-index to R
those indexes will dissapear. Let's start by displaying the dataframe:

import pandas as pd
ncols = 3
nrows = 3
row = list(range(1,ncols+1))
table = [row for x in range(nrows) ]
columns = [["","Overall"],["Transplant","false"],["Transplant","true"]]
rows = [["n", ""],["age","mean (SE)"],["age","median (IQR)"]]
custom_df = pd.DataFrame(table)
custom_df.columns = pd.MultiIndex.from_tuples(columns)
custom_df.index = pd.MultiIndex.from_tuples(rows)
custom_df.to_html()

OK not bad (what are those commas before and after the table btw?), but looks boring. Let's try to
beautify with some CSS. OOPS, but the resulting html is not rendered, why?

# Let's apply some nice formatting to the dataframe

table_props = [('font-family', '"Arial", Arial, sans-serif;'),
							 ('font-size', '12pt;'),
							 ('border-collapse', 'collapse;'),
							 ('padding', '0px;'),
							 ('margin', '0px;'),
							 ('margin-bottom', '10pt;')]

tbody_props = [('background', '#fff')]

th_props = [
	('border', '0;'),
	('text-align', 'center;'),
	('padding', '0.5ex 1.5ex;'),
	('margin', '0px;')
	]

# Set CSS properties for td elements in dataframe
td_props = [
	('white-space', 'nowrap;'),
	('border', '0;'),
	('text-align', 'center;'),
	('padding', '0.5ex 1.5ex;'),
	('margin', '0px;')
	]

tr_nthchild_props = [
	('background', '#fff')
	]

thead_first = [
	('border-top', '2pt solid black;')
	]

thead_last = [
	('border-bottom', '1pt solid black;')
	]

tr_last = [
	('border-bottom', '2pt solid black;')
	]

# Set table styles
styles = [
	dict(selector="table", props=table_props),
	dict(selector="tbody", props=tbody_props),
	dict(selector="th", props=th_props),
	dict(selector="td", props=td_props),
	dict(selector="tbody tr:nth-child(odd)", props=tr_nthchild_props),
	dict(selector="thead>tr:first-child>th", props=thead_first),
	dict(selector="thead>tr:last-child>th", props=thead_last),
	dict(selector="tbody>tr:last-child>td", props=tr_last),
	
	]

styled = custom_df.style.set_table_styles(styles)
styled.render()
#styled_html = styled.render()
#styled_html.replace("</style><table", "</style>\n\n<!-- -->\n\n<table")
#styled_html
@m-legrand
Copy link

m-legrand commented Jul 1, 2020

Even leaving aside the styling, there are two things I find interesting with this issue:

  1. (bug) results='asis' preserves the quotes of any Python output. This is making unnecessarily complicated to create HTML or Markdown directly from Python. These are the "commas" @ofajardo is seeing.

  2. (feature request (involving knitr?)) For a lot of pandas.DataFrame output, either of the following would often be better than the raw printing:

  • {python, results='asis'} df = ...; df.to_html() (assuming 1. is corrected)
  • {python, results='asis'} df = ...; df.to_markdown() (assuming 1. is corrected)
  • {python} df = ... + {r} py$df (cleanest result when no multi-index)

I could see this getting much cleaner and customizable through an option somewhere, e.g. pandas.df.output being something like "repr" (default), "html", "markdown" or "r".

@hathawayj
Copy link

Did something change to align with this request? I can't get my pandas dataframes to just print output anymore in my markdown files. It always converts int to an HTML table unless I wrap a print() around it.

@rleyvasal
Copy link

It would be great if pandas data frames were shown nicely in Rmarkdown (R notebooks) same as they appear on Jupyter notebooks (or better, with an indicator of a datatype for each column). The only reason I don't use Rstudio for python is because I am not able to see the full data frames - not scrollable to left and right. This simple feature is very important for data exploration.

@linogaliana
Copy link

linogaliana commented Jun 21, 2021

Would it be possible to change the class of pandas DataFrame returned from python and have some adapted methods for printing ?

When we do

```{python, echo = FALSE}
df = pd.DataFrame(
    {'size': [1.,1.5,1],
    'weight' : [3, 5, 2.5]
    },
    index = ['cat', 'dog', 'koala']
)
```

We end up with an object of class data.frame

```{r}
class(py$df)
# [1] "data.frame"
```

With an additional class, let's say dataframe.pandas, this would probably be easier to add some printing methods (e.g. print.dataframe.pandas.default, print.dataframe.pandas.html, print.dataframe.pandas.markdown) that would mimic, at R level (which would give R Markdown users more control on the output) the behavior of df.to_html or df.to_markdown.

@kevinushey
Copy link
Collaborator

If I understand correctly, this is an MRE:

---
title: "Pandas Printing"
author: "Kevin Ushey"
date: "`r Sys.Date()`"
output: html_document
---

```{r}
library(reticulate)
use_virtualenv("r-reticulate", required = TRUE)
py_install("pandas")
```

```{python, echo=FALSE}
import pandas as pd

data = {
  'size': [1., 1.5, 1],
  'weight': [3, 5, 2.5]
}

pd.DataFrame(data, index = ['cat', 'dog', 'koala'])
```

When this document is rendered via rmarkdown::render(), you see:

Screen Shot 2022-12-07 at 9 46 07 AM

and so you don't get the nice HTML rendering for the Pandas DataFrame you might've hoped for.

@kevinushey
Copy link
Collaborator

This is where Pandas DataFrames get handled by the reticulate Python engine:

return(captured)

Note that we don't do anything here; we just use the captured (default) print style for the DataFrame. We considered using the to_html() method in the past, but the rendered HTML is pretty bare-bones and ugly.

Screen Shot 2022-12-07 at 9 52 31 AM

I'm not exactly sure what Jupyter is doing here when rendering DataFrames; presumedly they're using their own tooling for rendering to HTML? Or maybe they're using to_markdown() and letting the Markdown rendered produce a nice table?

@linogaliana
Copy link

linogaliana commented Dec 7, 2022

Thanks @kevinushey for your detailed answer. In my case, moving to quarto solved the problem since, behind the stage, this means moving to juypter engine. I guess quarto now solves most of the cases expressed in this issue. The issue only remains for people mixing R and python in quarto or R Markdown

@linogaliana
Copy link

linogaliana commented Dec 7, 2022

If it can help, in the past, jupyter was using this css to style the table. However, I have not been able to locate this styling in current jupyter version.

@cscheid
Copy link
Member

cscheid commented Dec 7, 2022

I don't know how exactly Jupyter does it, but their output is equivalent to Display(Markdown(df.to_markdown())) (or whatever the IPython classes are). So I think that if reticulate could know that it's running inside knitr and output markdown in that case, then the style would match that of Jupyter.

That would mean, in turn, that quarto gets df printing behavior that is consistent across engines (which is the cause of our upstream issue)

@cderv
Copy link
Contributor

cderv commented Jul 19, 2023

As this came up again on Quarto side, I looked into this a bit. Here are some thoughts and insights

  • results='asis' preserves the quotes of any Python output. This is making unnecessarily complicated to create HTML or Markdown directly from Python.

    Pandas's to_html() or to_markdown() method will create a string representation of the DataFrame. But it requires an extra step to print it correctly for output asis in knitr. See cat() example to do it at R printing step.

  • Using IPython and its display solution like HTML() and Markdown() helps to format correctly the output. I believe this what will happen in Jupyter

  • We considered using the to_html() method in the past, but the rendered HTML is pretty bare-bones and ugly.

    Regarding this, I believe this is a matter of CSS. We do a specific processing to add Bootstrap style to Pandoc's table, but it seems this does not catch tables output from Pandas. So it would need a tweak.

    Adding classes to to_html() is also an option - especially when we know we are in Bootstrap document. See example below.

    Also Quarto is clever on this, because it will parse HTML table by default and do same processing than Markdown table. So style is applied (as any Pandoc table).

  • Or maybe they're using to_markdown() and letting the Markdown rendered produce a nice table?

    So I think that if reticulate could know that it's running inside knitr and output markdown in that case, then the style would match that of Jupyter.

    In that case, no styling problem because indeed in R Markdown or Quarto, it will be Pandoc tables and some style are applied to it based on Bootstrap.

Quarto and R Markdown will do different styling, but at the end this is a matter of printing method to do at knitr step. Currently it is default priting, but it could be improved. AFAIU Jupyter (or nbclient or anything in the stack) registers some representation like text/html, text/markdown or text/latex and choose the one to use depending on the output format. At least Quarto leverages that from Jupyter output.

reticulate could do something similar to send information to knitr or do the choice itself based on knitr::pandoc_to() outputs. Easier with Quarto as outputing Markdown tables is the easiest because Quarto will do its processing and styling.

Documenting how to explicitly style a Pandas table using HTML(df.to_html()) could also be documented as this would be the way (with results: asis to do it explicitly with knitr).

this would probably be easier to add some printing methods (e.g. print.dataframe.pandas.default, print.dataframe.pandas.html, print.dataframe.pandas.markdown)

Going through this idea is also a good option for R Markdown.

@kevinushey @t-kalinowski hopes this helps. Happy to help make this better. We would love to have Jupyter and Knitr output for Python to be equivalent in Quarto ! (part of quarto-dev/quarto-cli#3457)

Examples showing the different point mentioned above

Here are some tests I did with the rendering and different options with R Markdown
https://rpubs.com/cderv/reticulate-rmarkdown-pandas-table-outputs

Rmd Source
---
title: "Pandas Printing"
author: "Kevin Ushey"
date: "`r Sys.Date()`"
output: html_document
---

```{r}
library(reticulate)
use_virtualenv("r-reticulate", required = TRUE)
py_install(c("pandas", "IPython", "tabulate"))
```

```{python, echo=FALSE}
import pandas as pd

data = {
  'size': [1., 1.5, 1],
  'weight': [3, 5, 2.5]
}

df = pd.DataFrame(data, index = ['cat', 'dog', 'koala'])
```

# Default render

```{python}
df
```

# Try HTML

Some quote are still there preventing correct printing

```{python}
df.to_html()
```

```{python, results = "asis"}
df.to_html()
```

So it requires some special processing

```{python}
df_html = df.to_html()
```

```{r, results='asis'}
cat(py$df_html)
```

# Using IPython Display helps

```{python, results = "asis"}
from IPython.display import HTML
HTML(df.to_html())
```

# Improve stylings using Bootstrap class 

```{python}
df_html = df.to_html(classes = ["table", "table_condensed"])
```

```{r, results='asis'}
cat(py$df_html)
```

```{python, results = "asis"}
HTML(df.to_html(classes = ["table", "table_condensed"]))
```

# Try Markdown

Still quoting, so it requires some special printing

```{python}
df.to_markdown()
```

```{python, results = "asis"}
df.to_markdown()
```

```{python}
df_markdown = df.to_markdown()
```

```{r, results = "asis"}
cat(py$df_markdown)
```

# Using IPython Display helps

```{python, results = "asis"}
from IPython.display import Markdown
Markdown(df.to_markdown())
```

And same document in Quarto
https://rpubs.com/cderv/reticulate-quarto-pandas-table-outputs

Qmd Source
---
title: "Pandas Printing"
author: "Christophe Dervieux"
date: today
engine: knitr
format: 
  html:
    code-tools: 
      source: true
---

```{r}
library(reticulate)
use_virtualenv("r-reticulate", required = TRUE)
py_install(c("pandas", "IPython", "tabulate"))
```

```{python}
import pandas as pd

data = {
  'size': [1., 1.5, 1],
  'weight': [3, 5, 2.5]
}

df = pd.DataFrame(data, index = ['cat', 'dog', 'koala'])
```

# Default render

```{python}
df
```

# Try HTML

Some quote are still there preventing correct printing

```{python}
df.to_html()
```

```{python}
#| output: asis
df.to_html()
```

So it requires some special processing

```{python}
df_html = df.to_html()
```

```{r}
#| output: asis
cat(py$df_html)
```

# Using IPython Display helps

```{python}
#| output: asis
from IPython.display import HTML
HTML(df.to_html())
```

# Improve stylings using Bootstrap class 

```{python}
df_html = df.to_html(classes = ["table", "table_condensed"])
```

```{r}
#| output: asis
cat(py$df_html)
```

```{python}
#| output: asis
HTML(df.to_html(classes = ["table", "table_condensed"]))
```

# Try Markdown

Still quoting, so it requires some special printing

```{python}
df.to_markdown()
```

```{python}
#| output: asis
df.to_markdown()
```

```{python}
df_markdown = df.to_markdown()
```

```{r}
#| output: asis
cat(py$df_markdown)
```

# Using IPython Display helps

```{python}
#| output: asis
from IPython.display import Markdown
Markdown(df.to_markdown())
```

@cderv
Copy link
Contributor

cderv commented Jul 19, 2023

Note that I understand now reticulate is catching Pandas DataFrame before any _repr_html_ or to_html can be used.

reticulate/R/knitr-engine.R

Lines 576 to 580 in a1d7f7f

} else if (inherits(value, "pandas.core.frame.DataFrame")) {
return(captured)
} else if (isHtml && py_has_method(value, "_repr_html_")) {

Regarding quarto-dev/quarto-cli#3457, if the _repr_html method was called we would get the same output as in Jupyter with the raw HTML table produced , and Quarto would handle them the same.

I confirm that removing else if (inherits(value, "pandas.core.frame.DataFrame")) do get us the same output in Quarto than with Jupyter. Though as discussed before, in R Markdown it would require some additional CSS or processing to add the bootstrap class for table like it is done in R Markdown for Pandoc's table (and what Quarto is doing also)

@dfalbel
Copy link
Member

dfalbel commented Sep 1, 2023

I'm leaning towards changing reticulate to produce the Markdown representation when running trough knitr, witht this change table would be displayed like this in RMarkdown

Screenshot 2023-09-01 at 11 32 37

and

image

It would still not look exactly the same as in Quarto + Jupyter Engine, which is displayed like this:

image

The pro of this approach is that it only requires changing reticulate and no need for special handling from RMarkdown which I think can be tricky to coordinate. Do you think this a reasonable approach @cderv?

@cderv
Copy link
Contributor

cderv commented Sep 4, 2023

The pro of this approach is that it only requires changing reticulate and no need for special handling from RMarkdown which I think can be tricky to coordinate.

About this, I don't think anything is needed in rmarkdown or knitr in general for what *reticulate is doing. knitr is a toolbox for custom engine to use, and everything that reticulate does in a knitting context is defined inside reticulate.
knitr only calls eng_python() when python chunk is seen and reticulate available.

So regarding this printing issue, this is only happening based on how reticulate decided to print content, possily in eng_python_autoprint(). This function decides when to output HTML or Markdown representation for tables (and does also other choices for other type of output)

Usually any issue reported as knitr issue but relevant to reticulate python engine are to be fixed in reticulate itself.

However, I may be missing something...

I'm leaning towards changing reticulate to produce the Markdown representation when running trough knit

I guess this would be fine to output Markdown table only. Quarto does parse Markdown tables through Pandoc and does a lot. but Quarto does parse also HTML table so it would be fine too (https://quarto.org/docs/authoring/tables.html)

I believe for Jupyter engine, Quarto will select the HTML output as I explained above: #783 (comment)
so possibly the output would be the same.

But regarding styling, this is only a matter of CSS. We can definitely fix that in Quarto to get the same styling.

Hope it helps.

Happy to discuss, help and test as needed.

@dfalbel dfalbel linked a pull request Sep 6, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants