Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the fmt_units() method #240

Merged
merged 57 commits into from
Jun 4, 2024
Merged

Add the fmt_units() method #240

merged 57 commits into from
Jun 4, 2024

Conversation

rich-iannone
Copy link
Member

@rich-iannone rich-iannone commented Mar 12, 2024

This PR adds the fmt_units() method. This performs a conversion of units in the units notation syntax (e.g., "x10^9 / L" or "x10^9 L^-1", etc.) to HTML. The method, like others of the fmt_*(), only concerns itself with text transformations in the table body. Other methods will also gain the ability to convert text in units notation to nicely formatted HTML in later PRs.

Here's an example that uses fmt_units() with the illness dataset. It so happens that the units column has strings in units notation, so, we just need to point this method to that column:

from great_tables import GT, style, loc
from great_tables.data import illness

(
    GT(illness, rowname_col="test")
    .fmt_units(columns="units")
    .fmt_number(columns=lambda x: x.startswith("day"), decimals=2, drop_trailing_zeros=True)
    .tab_header(title="Laboratory Findings for the YF Patient")
    .tab_spanner(label="Day", columns=lambda x: x.startswith("day"))
    .tab_spanner(label="Normal Range", columns=lambda x: x.startswith("norm"))
    .cols_label(
      norm_l="Lower",
      norm_u="Upper",
      units="Units"
    )
    .opt_vertical_padding(scale=0.4)
    .opt_align_table_header(align="left")
    .tab_options(heading_padding="10px")
    .tab_style(
        locations=loc.body(columns="norm_l"),
        style=style.borders(sides="left")
    )
    .opt_vertical_padding(scale=0.5)
)

illness

Fixes: #211
Partially addresses the .epic issue: #169

@github-actions github-actions bot temporarily deployed to pr-240 March 12, 2024 15:56 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 March 12, 2024 15:57 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 March 13, 2024 14:39 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 March 13, 2024 18:46 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 March 13, 2024 21:03 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 March 13, 2024 21:37 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 May 24, 2024 16:32 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 May 24, 2024 16:51 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 May 24, 2024 19:15 Destroyed
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting all this work in the PR, especially into documenting define units. The tests are looking really great. I left some comments suggesting some ways we might be able to clean up the code a little.

These mostly center on...

  • grouping the logic for a variable definition together (into tighter if/elif/else blocks)
  • creating a UnitDefinition.from_token() method, since the class doesn't seem like it can be instantiated directly.

def __getitem__(self, index: int) -> UnitDefinition:
return self.units_list[index]

def to_html(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this..

  • loops over each element
  • converts each element to html
  • the elementwise conversions are independent

it seems like to_html() should be a method on the elements. (you could still have a to_html() method here that does [x.to_hmtl() for x in self.units_list] etc on the elements)

great_tables/_helpers.py Outdated Show resolved Hide resolved
great_tables/_helpers.py Outdated Show resolved Hide resolved
if len(tokens_list) == 0:
return UnitDefinitionList(units_list=[])

for i in range(len(tokens_list)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because...

  • this loop creates a UnitDefinition object for each token
  • UnitDefinition will likely never be instantiated directly (because you have to input token, along with everything else this loop calculates)

It might be good as a constructor on UnitDefinition?

e.g.

@dataclass
UnitDefinition:

    ...


    @classmethod
    def from_token(cls, token: str) -> UnitDefinition:
        # logic from the loop here ----

        unit_subscript = None
        sub_super_overstrike = False
        chemical_formula = False
        exponent = None

        ...
        
        return cls(token, ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now implemented.

@github-actions github-actions bot temporarily deployed to pr-240 May 30, 2024 12:42 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 May 30, 2024 14:37 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 May 30, 2024 14:48 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 May 31, 2024 19:23 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 June 3, 2024 17:54 Destroyed
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:

  • need to escape > and <
  • let's rewrite the specification section of define units to cover the rules of the DSL

Unit DSL Rules

From pairing w/ @rich-iannone, here are what seems like the rules of define_units:

# Within unit rules ----
# 1. ^ creates a superscript
# 2. _ creates a subscript
# 3. subscripts and superscripts may be combined
#   - however, _ inside a superscript does not create a superscript

# 4. use [_subscript^superscript] to create an overstrike

# 5. / at the beginning adds the superscript -1
# 6. hyphen is transformed to minus sign
# 7. x at the beginning transformed to ×
# 8. ascii terms from biology/chemistry turned into TERM FORM (TODO: enumerate via code)

# 9. can create italics with * or _, and can create bold with ** or __
#   - can italicize AND bold together
#   - issue: because we use commonmark, a broader set of behaviors occur
#   - e.g. **m^2**, "a<marquee>123</marquee>b"

# ISSUE: < and > are unescaped

# Special notations ----
# 10. special symbol set surrounded by colons (e.g. :angstrom:)
# 11. chemistry notation: %C6%

sub_super_overstrike = True

# Extract the unit w/o subscript from the string
unit = re.sub(r"(.+?)\[_.+?\^.+?\]", r"\1", token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can punt this for a future PR, but it seems like unit, unit_subscript, and exponent could be captured using something like this...

import re

m = re.match(r"(.+?)\[_(.+?)(\^.+?) \]", token)

You can name groups using the ?P<some_name> syntax:

m = re.match(r"(?P<unit>.+?)\[_(?P<unit_subscript>.+?)(?P<exponent>\^.+?) \]", token)

Often people breaks these up using parentheses:

m = re.match(
    (
        r"(?P<unit>.+?)"
        r"\["
        r"_(?P<unit_subscript>.+?)"
        r"(?P<exponent>\^.+?)"
        r"\]"
    ),
    token
)

#m.groups()

@github-actions github-actions bot temporarily deployed to pr-240 June 3, 2024 19:19 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 June 3, 2024 20:04 Destroyed
@github-actions github-actions bot temporarily deployed to pr-240 June 3, 2024 20:26 Destroyed
@rich-iannone rich-iannone requested a review from machow June 3, 2024 20:29
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for taking the time to make all the changes! One quick thing---it might be helpful to add a reference to define_units() in the docstring of fmt_units() (but we can always punt to another PR)

@rich-iannone
Copy link
Member Author

@machow I added a reference to define_units() in the See Also section of the fmt_units() docstring.

@rich-iannone rich-iannone merged commit 02a04ee into main Jun 4, 2024
13 checks passed
@rich-iannone rich-iannone deleted the fmt-units branch June 4, 2024 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the fmt_units() method
3 participants