Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update py-pyspark and py-py4j #44263

Merged
merged 52 commits into from
May 30, 2024

Conversation

teaguesterling
Copy link
Contributor

@teaguesterling teaguesterling commented May 18, 2024

  • Update versions for py-spark
  • Add a default variant of py4j that enforces a requirement of Java (disable to use system java). When this isn't present py4j will fail to initialize if Java is not present (or is incompatible with spark)
  • Add a variant and dependencies to pyspark to require java via py4j
  • Make the explicit py4j version and py-spark version tracking easier to read/update.
  • Adds dependencies as noted on pyspark dependencies page: pyarrow, pandas, numpy, grpcio, grpcio-status, and googleapis-common-proto
  • Adds new versions of pyarrow and arrow (for gcc 14 compatibility)
  • Adds version bumps to 3 packages to meet expectations from spark: grpcio, grpcio-status, and googleapis-common-proto
  • Added new protobuf versions (for grpcio-status dependency) and gcc 14 compatibility conflicts

Not included (but probably should be): allow py-spark to be built from source (or provided as a virtual package from spark built from source).

I'm not sure about the best way to set defaults for py4j & java.

Copy link

spackbot-app bot commented May 18, 2024

Hi @teaguesterling! I noticed that the following package(s) don't yet have maintainers:

  • py-py4j
  • py-pyspark

Are you interested in adopting any of these package(s)? If so, simply add the following to the package class:

    maintainers("teaguesterling")

If not, could you contact the developers of this package and see if they are interested? You can quickly see who has worked on a package with spack blame:

$ spack blame py-py4j

Thank you for your help! Please don't add maintainers without their consent.

You don't have to be a Spack expert or package developer in order to be a "maintainer," it just gives us a list of users willing to review PRs or debug issues relating to this package. A package can have multiple maintainers; just add a list of GitHub handles of anyone who wants to volunteer.

@teaguesterling teaguesterling changed the title Update py pyspark and py py4j Update py-pyspark and py-py4j May 18, 2024
@teaguesterling
Copy link
Contributor Author

Confirmed the 8 new version sha256s.

Just to make sure it's not missed, I added a few other versions for packages I was only adding older versions:

  • py-googleapis-common-protos
  • py-grpcio (Latest versions as well as latest version aligning with py-grpcio-status)
  • py-grpcio-status (Latest version on pypi)

Copy link
Contributor

@tldahlgren tldahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Have a few additional tweaks.

@teaguesterling
Copy link
Contributor Author

@spackbot fix style

Copy link

spackbot-app bot commented May 23, 2024

Let me see if I can fix that for you!

Copy link

spackbot-app bot commented May 23, 2024

I was able to run spack style --fix for you!

spack style --fix
==> Running style checks on spack
  selected: isort, black, flake8, mypy
==> Modified files
  var/spack/repos/builtin/packages/arrow/package.py
  var/spack/repos/builtin/packages/py-googleapis-common-protos/package.py
  var/spack/repos/builtin/packages/py-grpcio-status/package.py
  var/spack/repos/builtin/packages/py-grpcio/package.py
  var/spack/repos/builtin/packages/py-protobuf/package.py
  var/spack/repos/builtin/packages/py-py4j/package.py
  var/spack/repos/builtin/packages/py-pyarrow/package.py
  var/spack/repos/builtin/packages/py-pyspark/package.py
==> Running isort checks
  isort checks were clean
==> Running black checks
reformatted var/spack/repos/builtin/packages/py-pyspark/package.py
reformatted var/spack/repos/builtin/packages/py-pyarrow/package.py
All done! ✨ 🍰 ✨
2 files reformatted, 6 files left unchanged.
  black checks were clean
==> Running flake8 checks
  flake8 checks were clean
==> Running mypy checks
lib/spack/spack/version/version_types.py:145: error: Argument 2 to "StandardVersion" has incompatible type "*Tuple[Tuple[Any, ...], Tuple[Any, ...]]"; expected "Tuple[Tuple[Any, ...], Tuple[Any, ...]]"  [arg-type]
lib/spack/spack/version/version_types.py:452: error: Argument 2 to "StandardVersion" has incompatible type "*Tuple[Tuple[Any, ...], Tuple[Any, ...]]"; expected "Tuple[Tuple[Any, ...], Tuple[Any, ...]]"  [arg-type]
lib/spack/spack/version/version_types.py:481: error: Argument 2 to "StandardVersion" has incompatible type "*Tuple[Tuple[Any, ...], Tuple[Any, ...]]"; expected "Tuple[Tuple[Any, ...], Tuple[Any, ...]]"  [arg-type]
Found 3 errors in 1 file (checked 625 source files)
  mypy found errors
Keep in mind that I cannot fix your flake8 or mypy errors, so if you have any you'll need to fix them and update the pull request. If I was able to push to your branch, if you make further changes you will need to pull from your updated branch before pushing again.

I've updated the branch with style fixes.

@spackbot-app spackbot-app bot added the patch label May 27, 2024
@teaguesterling
Copy link
Contributor Author

In looking at some of the CI checks, it seems at least version 1.48 had unresolved issues. Digging around a bit I was able to add patches and dependencies to resolve.

@tldahlgren tldahlgren enabled auto-merge (squash) May 29, 2024 22:00
@tldahlgren tldahlgren merged commit 6753605 into spack:develop May 30, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants