Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: more options for how marks and scales represent invalid data (e.g., nulls, NaNs) #9342

Merged
merged 41 commits into from
May 20, 2024

Conversation

kanitw
Copy link
Member

@kanitw kanitw commented May 7, 2024

  • Add more spec options for handling invalid data

    • extend mark.invalid
    • add new config.scale.invalid
  • See docs's pdf file from invalid.md or the actual doc for explanation of the behavior. (Given the complexity of the PR, reviewers should check out the branch and test the spec locally.)

refactor:

  • Adjust how we generate rules for invalid data to handle these options, per channel type.

    • data's invalid filter
    • defined encoding (for breaking paths)
  • Make wrapCondition accept invalidValueRef for "include" mode.

  • Remove the hidden/screen "hide" mode for invalid data (Given this is a hidden feature, it's not a breaking change.)## PR Description

bug fixes:

Known issues / Follow up items:

Checklist

Tests:

  • Integration tests -- added a bunch of examples
  • Unit tests (in progress) -- I'll re-review the code and add more unit tests.

Docs:

  • Has documentation under site/docs/ + examples.

Copy link

cloudflare-pages bot commented May 7, 2024

Deploying vega-lite with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1511d15
Status: ✅  Deploy successful!
Preview URL: https://ac50a1aa.vega-lite.pages.dev
Branch Preview URL: https://kw-9325-better-mark-invalid.vega-lite.pages.dev

View logs

@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch from 6198c53 to e28256c Compare May 7, 2024 22:02
// While discrete domain scales can handle invalid values, continuous scales can't.
// Thus, for non-path marks, we have to filter null for scales with continuous domains.
// (For path marks, we will use "defined" property and skip these values instead.)
if (hasContinuousDomain(scaleType) && fieldDef.aggregate !== 'count' && !isPathMark(mark)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rules are now part of getScaleInvalidDataMode

@@ -384,13 +399,41 @@ export function parseData(model: Model): DataComponent {
head = StackNode.makeFromEncoding(head, model) ?? head;
}

let preFilterInvalid: OutputNode | undefined;
Copy link
Member Author

@kanitw kanitw May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: file a separate ticket

Separate from this PR, I have a hunch that this will remain slightly incorrect for stacked chart,
since null values will be "stacked", but it's probably not worth changing the behavior in this PR.

@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch from e28256c to 9f1bc85 Compare May 7, 2024 22:49
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in this file, we're creating data sources pre-/post-filter invalid values, depending on the invalid data mode.

return {
...encode,
...wrapCondition(model, valueDef, channel, def => signalOrValueRef(def.value))
...wrapCondition({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, no behavior change here. I just switch wrapCondition to use parameter argument and make it explicit that guideEncodeEntry won't need the new invalidValueRef added.

@@ -45,7 +45,13 @@ export function description(model: UnitModel) {
const channelDef = encoding.description;

if (channelDef) {
return wrapCondition(model, channelDef, 'description', cDef => textRef(cDef, model.config));
return wrapCondition({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, no behavior change here. I just switch wrapCondition to use parameter argument and make it explicit that aria won't need the new invalidValueRef added.

const field = model.vgField(channel, {expr: 'datum', binSuffix: model.stack?.impute ? 'mid' : undefined});

// While discrete domain scales can handle invalid values, continuous scales can't.
if (field && hasContinuousDomain(scaleType)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the continuous scale check is now in getScaleInvalidDataMode


return wrapCondition(model, channelDef, vgChannel ?? channel, cDef => {
const invalidValueRef = getConditionalValueRefForIncludingInvalidValue({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the main change in this file is adding invalidValueRef here.

@@ -249,6 +251,11 @@ function parseSingleChannelDomain(
const {type} = fieldOrDatumDef;
const timeUnit = fieldOrDatumDef['timeUnit'];

const dataSourceTypeForScaleDomain = getScaleDataSourceForHandlingInvalidValues({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is the main change in this file. Instead of always drawing scale domains from "main" data source, we may choose pre-/post- filtering invalid values, depending on the invalid data mode.

/** @hidden */
export type Hide = 'hide';

export interface MarkInvalidMixins {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, moved to src/invalid.ts + revamped

field: 'mean_Acceleration'
}
]);
expect(props.fill).toEqual({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended change -- we no longer relying "null" fill to hide invalid marks in the new "break-paths-keep-domains" that replaces in the old secret "hide" mode.

@@ -72,18 +77,6 @@ export interface TooltipContent {
content: 'encoding' | 'data';
}

/** @hidden */
export type Hide = 'hide';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the old hide mode was a secret experimental feature (not part of public API) that I experimented with many years ago. It's not documented and thus can be removed.

field: 'Origin'
}
],
fill: {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, intended change -- we no longer relying "null" fill to hide invalid marks in the new "break-paths-keep-domains" that replaces in the old secret "hide" mode.

Same for the rest of this file.

model,
channelDef,
vgChannel,
invalidValueRef,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the main change in this file is to add invalidValueRef that caller of this method may pass in.

However, I also switch to object argument so it's easy to make explicit when invalidValueRef is not used.

isPath: boolean;
}

export function getDataSourcesForHandlingInvalidValues({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method includes one of the main rules for defining how these modes affect data sources for scale / marks.

@@ -45,27 +39,7 @@ export function baseEncodeEntry(model: UnitModel, ignore: Ignore) {
};
}

// TODO: mark VgValueRef[] as readonly after https://github.com/vega/vega/pull/1987
function wrapAllFieldsInvalid(model: UnitModel, channel: Channel, valueRef: VgValueRef | VgValueRef[]): VgEncodeEntry {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, all old code in this file were for the legacy secret/hidden "hide" mode that I remove.

The old hide mode hides data by adjusting fill/stroke, which still wastes rendering time to draw transparent marks.
With the new break-paths-keep-domain, we're switching to rely on filters to produce this behavior.

@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch 10 times, most recently from 264399f to 414e175 Compare May 9, 2024 04:59
@@ -16,10 +16,6 @@
"ops": ["distinct"],
"fields": ["Name"],
"as": ["distinct_Name"]
},
{
Copy link
Member Author

@kanitw kanitw May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, correct change since distinct aggregate can't output nulls.

@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch from 3c41e6c to bd9378a Compare May 9, 2024 15:08
@kanitw kanitw changed the title feat: more options for handling invalid feat: more options for handling invalid data May 9, 2024
@@ -52,7 +52,8 @@
"y": [
{
"test": "!isValid(datum[\"sum_Worldwide Gross\"]) || !isFinite(+datum[\"sum_Worldwide Gross\"])",
"field": {"group": "height"}
"scale": "y",
Copy link
Member Author

@kanitw kanitw May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, Good change as intended in a later fix commit -- invalid values get placed as zero rather than min. Same for many other diffs.

@@ -32,14 +32,16 @@
"x": [
{
"test": "!isValid(datum[\"IMDB Rating\"]) || !isFinite(+datum[\"IMDB Rating\"])",
"scale": "x",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, good change as intended in a later fix commit -- invalid values get placed as zero rather than min.

  • value: 0 = min
  • scale: 'x', value: '0' -> encode same output as 0

@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch 4 times, most recently from c038fd2 to e0d2a64 Compare May 9, 2024 22:45
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I adjust the example layout a bit, so they fit better in the doc page.

Also adjusted data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for showing code only in the new doc page added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actual change here, but just change the assertion to be easier to debug as my older code fails the assertion in this file.

@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch 2 times, most recently from 2407cdd to e25ba16 Compare May 10, 2024 16:00
@kanitw kanitw force-pushed the kw/9325-better-mark-invalid branch from df34cdf to bb8ff99 Compare May 16, 2024 02:40
kanitw and others added 2 commits May 17, 2024 15:02
…nvalid

# Conflicts:
#	build/vega-lite-schema.json
#	src/scale.ts
public domainDefinitelyIncludesZero() {
if (this.get('zero') !== false) {
return true;
public domainHasZero(): 'definitely' | 'definitely-not' | 'maybe' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not super sure about the wording here in this union, but non blocking either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i like it better than boolean | null where null is maybe though. Given it's internal, we can always change later.

src/invalid.ts Outdated
| 'filter'
| 'break-paths-filter-domains'
| 'break-paths-show-domains'
| 'break-paths-and-show-path-domains'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the only one that includes the word and? Several of these options feel like conjunctive payloads, but this one is special for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call! fixed.

@kanitw kanitw changed the title feat: more options for handling invalid data feat: more options for how marks and corresponding scales represent invalid data May 18, 2024
@kanitw kanitw changed the title feat: more options for how marks and corresponding scales represent invalid data feat: more options for how marks and scales represent invalid data (e.g., nulls, NaNs) May 18, 2024
Copy link
Member

@lsh lsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ham and I talked about the design here for a while yesterday and I'm convinced it's extensible and composable. Since he addressed my main nits about syntax, I'm good with these changes.

@kanitw kanitw enabled auto-merge (squash) May 20, 2024 04:23
@kanitw kanitw merged commit 502db1d into main May 20, 2024
8 checks passed
@kanitw kanitw deleted the kw/9325-better-mark-invalid branch May 20, 2024 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mark.invalid set the invalid points on y channel at the top
3 participants