Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47429][SQL] Rename error class to condition #46543

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented May 12, 2024

What changes were proposed in this pull request?

On the Scala/Java side:

  • Rename ErrorClassesJsonReader to ErrorConditionsJsonReader.
  • Rename subClass to subCondition in error-conditions.json and also in some private classes like ErrorInfo.

On the Python side:

  • Rename error_classes.py to error_conditions.py.
  • Rename ErrorClassesReader to ErrorConditionsReader.
  • For PySparkException and PySparkAssertionError, add an error_condition parameter to the interface and deprecate the error_class parameter. (It would have been nice to be able to use @warnings.deprecated() for this, but that's not available until Python 3.13.)

TODO:

  • Review all mentions of SPARK-47429 in the codebase.
  • Confirm we want sub-conditions to be keyed in JSON as sub_condition in Python but subCondition in Scala.
  • Mark public interfaces that use "error class" as deprecated and redirect to new interfaces that use "error condition" instead.
    • ErrorConditionsJsonReader methods like getErrorMessage, isValidErrorClass, etc.
    • SparkThrowable
    • SparkThrowableHelper
    • SparkException
    • SparkThrowable sub-classes like SparkRuntimeException, SparkUpgradeException, etc.
    • KafkaIllegalStateException

Why are the changes needed?

This brings the code and interfaces in line with the updated terminology agreed on in SPARK-46810 and described in the errors README.

Deprecating some of the errorClass / error_class parameters instead of renaming them in place is mainly so that we don't have to execute thousands of renames across the codebase all at once, and can instead do this incrementally.

Does this PR introduce any user-facing change?

Yes, some evolving and developer APIs are changing.

The public interfaces for Spark's error reporting are also changing, though in a backwards-compatible manner.

How was this patch tested?

CI.

Was this patch authored or co-authored using generative AI tooling?

No.

@nchammas
Copy link
Contributor Author

@MaxGekk @HyukjinKwon @cloud-fan - Before I finish up this PR, I'd like to check in with you briefly to make sure you like where this is going.

The main idea in this PR is to deprecate public classes and parameters that use the old "error class" terminology and instead direct users to use "error condition". I'm trying wherever possible to do this in a backwards-compatible way.

Here's an example of the approach. In addition to being a bit gentler on users migrating to Spark 4.0, this will also help us avoid having to rename every instance of the errorClass or error_class parameters all across the codebase in one shot.

Does this look good to you so far?

In a similar vein, shall I keep incorrectly-named classes around but turn them into deprecated sub-classes of their correctly-named replacements? For example, ErrorClassesJsonReader would become a sub-class of ErrorConditionsJsonReader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant