Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add python 3.8, 3.9 support and migration to pytest #3048

Merged
merged 6 commits into from Apr 6, 2021

Conversation

hirosassa
Copy link
Contributor

@hirosassa hirosassa commented Feb 24, 2021

Description

fix #3046, related to #2886, #2939

This PR migrates from nose to pytest and adds python 3.8 and 3.9 support.

Details are as follows

  • migrate from nose to pytest
    • place test/conftest.py to change test root
    • replace @attr decorator into pytest.mark to group tests
    • fix a few imports e.g. here
    • attach namespace on some test tasks to isolate each test e.g. here
  • add py38 and py39 CI on tox and travis

Motivation and Context

pytest migration

nose is not in active development and several deprecation warnings are occurred.

add python 3.8, 3.9 tests

easy to follow latest python versions

Have you tested this? If so, how?

I ran tests locally and on travis, too

@hirosassa
Copy link
Contributor Author

hirosassa commented Mar 11, 2021

@honnix @dlstadther @Tarrasch
cc @ravwojdyla
Does anybody know how to run CI with py38 environment on travis?
Timeouts on tox's installation step will always be occurred.

https://travis-ci.org/github/spotify/luigi/jobs/762401923

(I'm sorry for very very dirty and many commits. I'll squash later.)

@ravwojdyla
Copy link

@hirosassa 🤔 is py38 also timing out locally?

@hirosassa
Copy link
Contributor Author

@ravwojdyla No, This is just occurred only on travis.

@ravwojdyla
Copy link

@hirosassa interesting. Travis has limited support for debugging builds: https://docs.travis-ci.com/user/running-build-in-debug-mode/. But I imagine we can't use that here, in the meantime we could also add -vvvv to tox commands in ./scripts/ci/conditional_tox.sh to hopefully see more output and maybe the cause?

@hirosassa
Copy link
Contributor Author

@ravwojdyla Thank you! I'll try -vvv option.

@hirosassa
Copy link
Contributor Author

@ravwojdyla The cause of timeout is resolving dependencies on python 3.8 and I fixed it. Thanks!
I'll struggle against some failure tests.
https://travis-ci.org/github/spotify/luigi/builds/762528137

@ravwojdyla
Copy link

@hirosassa thanks for the update. Oh wow, there is quite a lot of failures on py38!

FYI @lallea since you have mentioned you've been using luigi on py38 in #3046 (comment). There seem to be quite a number of test failures, hopefully that doesn't affect your workloads, just wanted to let you know.

@ravwojdyla
Copy link

@hirosassa what do you think is the right way forward?

@lallea
Copy link
Contributor

lallea commented Mar 12, 2021

@ravwojdyla Thanks for the heads up. Lots of failures, indeed, and on core things. Surprising, since we have not seen any trouble. Works on our machines. :-)

@hirosassa
Copy link
Contributor Author

hirosassa commented Mar 12, 2021

@ravwojdyla @lallea I fixed almost all of the failures now (most of failures comes from coverage bug on python 3.8 and fixed by update it).

ref
nedbat/coveragepy#745

https://travis-ci.org/github/spotify/luigi/builds/762532477
https://travis-ci.org/github/spotify/luigi/jobs/762532486

@ravwojdyla
Copy link

That's a huge relief! Really appreciate your work @hirosassa!

@hirosassa
Copy link
Contributor Author

hirosassa commented Mar 20, 2021

@ravwojdyla @honnix @dlstadther
almost all of the tests passed, left one same error for each python versions. Do you have any idea to solve following test's error?
It is not occurred in my local environment.

test/scheduler_test.py::StableDoneCooldownSecsTest::test_sending_same_task_twice_without_cooldown_leads_to_double_run

=================================== FAILURES ===================================

_ StableDoneCooldownSecsTest.test_sending_same_task_twice_without_cooldown_leads_to_double_run _

self = <scheduler_test.StableDoneCooldownSecsTest testMethod=test_sending_same_task_twice_without_cooldown_leads_to_double_run>

@with_config({'scheduler': {'stable_done_cooldown_secs': '0'}})

def test_sending_same_task_twice_without_cooldown_leads_to_double_run(self):

    second_run_result = self.get_second_run_result_on_double_run()
  self.assertEqual(second_run_result.scheduling_succeeded, False)

E AssertionError: True != False

test/scheduler_test.py:366: AssertionError

----------------------------- Captured stdout call -----------------------------

INFO: No prior state file exists at /var/lib/luigi-server/state.pickle. Starting with empty state

INFO: Scheduler starting up

INFO: logging already configured

DEBUG: Starting new HTTP connection (1): localhost:8082

INFO: 200 POST /api/ping (127.0.0.1) 2.42ms

DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34

INFO: 200 POST /api/ping (127.0.0.1) 1.43ms

DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34

INFO: 200 POST /api/add_task (127.0.0.1) 2.13ms

DEBUG: http://localhost:8082 "POST /api/add_task HTTP/1.1" 200 18

INFO: 200 POST /api/add_worker (127.0.0.1) 1.44ms

DEBUG: http://localhost:8082 "POST /api/add_worker HTTP/1.1" 200 18

INFO: 200 POST /api/get_work (127.0.0.1) 1.43ms

DEBUG: http://localhost:8082 "POST /api/get_work HTTP/1.1" 200 327

INFO: logging already configured

INFO: 200 POST /api/ping (127.0.0.1) 2.26ms

DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34

DEBUG: Starting new HTTP connection (1): localhost:8082

INFO: 200 POST /api/ping (127.0.0.1) 1.39ms

DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34

INFO: 200 POST /api/ping (127.0.0.1) 1.44ms

DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34

INFO: 200 POST /api/add_task (127.0.0.1) 1.64ms

DEBUG: http://localhost:8082 "POST /api/add_task HTTP/1.1" 200 18

INFO: 200 POST /api/get_work (127.0.0.1) 1.32ms

DEBUG: http://localhost:8082 "POST /api/get_work HTTP/1.1" 200 154

INFO: 200 POST /api/ping (127.0.0.1) 1.40ms

DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34

INFO: 200 POST /api/add_task (127.0.0.1) 1.59ms

DEBUG: http://localhost:8082 "POST /api/add_task HTTP/1.1" 200 18

INFO: 200 POST /api/add_worker (127.0.0.1) 1.47ms

DEBUG: http://localhost:8082 "POST /api/add_worker HTTP/1.1" 200 18

INFO: 200 POST /api/get_work (127.0.0.1) 1.44ms

DEBUG: http://localhost:8082 "POST /api/get_work HTTP/1.1" 200 154

INFO: Scheduler instance shutting down

WARNING: Failed saving scheduler state

Traceback (most recent call last):

File "/home/travis/build/spotify/luigi/luigi/scheduler.py", line 445, in dump

with open(self._state_path, 'wb') as fobj:

FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/luigi-server/state.pickle'

----------------------------- Captured stderr call -----------------------------

@ravwojdyla
Copy link

@hirosassa great job!!

left one same error for each python versions

So this smells like a regression from the package updates maybe? That test was introduced by b4e4433 as a fix for #2644:

class FailingOnDoubleRunTask(luigi.Task):
time_to_check_secs = 1
time_to_run_secs = 2
output_dir = luigi.Parameter(default="")
def __init__(self, *args, **kwargs):
super(FailingOnDoubleRunTask, self).__init__(*args, **kwargs)
self.file_name = os.path.join(self.output_dir, "AnyTask")
def complete(self):
time.sleep(self.time_to_check_secs) # e.g., establish connection
exists = os.path.exists(self.file_name)
time.sleep(self.time_to_check_secs) # e.g., close connection
return exists
def run(self):
time.sleep(self.time_to_run_secs)
if os.path.exists(self.file_name):
raise FileAlreadyExists(self.file_name)
open(self.file_name, 'w').close()
class StableDoneCooldownSecsTest(unittest.TestCase):
def setUp(self):
self.p = tempfile.mkdtemp()
def tearDown(self):
shutil.rmtree(self.p)
def run_task(self):
return luigi.build([FailingOnDoubleRunTask(output_dir=self.p)],
detailed_summary=True,
parallel_scheduling=True,
parallel_scheduling_processes=2)
@with_config({'worker': {'keep_alive': 'false'}})
def get_second_run_result_on_double_run(self):
server_process = Process(target=luigi.server.run)
process = Process(target=self.run_task)
try:
# scheduler is started
server_process.start()
# first run is started
process.start()
time.sleep(FailingOnDoubleRunTask.time_to_run_secs + FailingOnDoubleRunTask.time_to_check_secs)
# second run of the same task is started
second_run_result = self.run_task()
return second_run_result
finally:
process.join(1)
server_process.terminate()
server_process.join(1)
@with_config({'scheduler': {'stable_done_cooldown_secs': '5'}})
def test_sending_same_task_twice_with_cooldown_does_not_lead_to_double_run(self):
second_run_result = self.get_second_run_result_on_double_run()
self.assertEqual(second_run_result.scheduling_succeeded, True)
@with_config({'scheduler': {'stable_done_cooldown_secs': '0'}})
def test_sending_same_task_twice_without_cooldown_leads_to_double_run(self):
second_run_result = self.get_second_run_result_on_double_run()
self.assertEqual(second_run_result.scheduling_succeeded, False)

has a number of knobs to tweak like the time_to_check_secs, time_to_run_secs, parallel_scheduling_processes. Maybe adding a couple of log messages between the steps would make it a bit more obvious why the task if failing. Also tweaking the knobs to more extreme values just for the debugging purposes?

@ravwojdyla
Copy link

@hirosassa

WARNING: Failed saving scheduler state
Traceback (most recent call last):
File "/home/travis/build/spotify/luigi/luigi/scheduler.py", line 448, in dump
with open(self._state_path, 'wb') as fobj:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/luigi-server/state.pickle'

Does this directory: /var/lib/luigi-server exist at the time of the test run?

@hirosassa hirosassa requested a review from a team as a code owner March 23, 2021 13:11
@honnix honnix removed their request for review March 23, 2021 13:35
@hirosassa
Copy link
Contributor Author

@ravwojdyla Thank you for your suggestion. I checked existence and writability of the directory by this code, and found the following logs

INFO: state path: /var/lib/luigi-server/state.pickle, exists?: False, writable?: False
INFO: state path dir: /var/lib/luigi-server, exists?: False, writable?: False
INFO: state path root dir: /var/lib, exists?: True, writable?: False

However, there's no mkdir or preparation of the directories 🤔

@ravwojdyla
Copy link

@hirosassa that actually might be a red herring. So it seems like test_sending_same_task_twice_without_cooldown_leads_to_double_run is flaky on the travis CI only. Very weird. To limit this down, maybe we should try to run only a single environment instead of the test full matrix (essentially I wonder if there's some interaction there).

@hirosassa
Copy link
Contributor Author

@ravwojdyla I found the reason test_sending_same_task_twice_without_cooldown_leads_to_double_run fails!
I removed coverage command in 644b445 , then all the core test passed.

@ravwojdyla
Copy link

@hirosassa great news! I wonder if this is a known issue(?) and what's the proper way forward?

@hirosassa
Copy link
Contributor Author

@ravwojdyla finally! 🎉
https://travis-ci.org/github/spotify/luigi/builds/764698712

@hirosassa
Copy link
Contributor Author

@Tarrasch @honnix @dlstadther cc @ravwojdyla @lallea
This PR is ready for review. Please check!

@ravwojdyla
Copy link

Great work @hirosassa!!

@hirosassa hirosassa changed the title [WIP] add python 3.8, 3.9 support and migration to pytest add python 3.8, 3.9 support and migration to pytest Mar 28, 2021
Copy link

@ravwojdyla ravwojdyla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hirosassa looks great, and again great work figuring out all the weird errors!! I only have some questions inline, plus would suggest to open an issue to update test documentation to document pytest usage going forward.

from hdfs import InsecureClient
from hdfs.ext.kerberos import KerberosClient
except ImportError:
raise unittest.SkipTest('Unable to load hdfs module')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably also use pytest.importorskip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion! fixed in d4de5fe

@@ -27,6 +27,7 @@


class DummyTask(luigi.Task):
task_namespace = 'customized_run'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this now required? Is this related to path setup between nose and pytest?

Copy link
Contributor Author

@hirosassa hirosassa Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is for test isolation. There are several DummyTask in the test directory.
Since pytest loads all of the test at the collection phase of the test, the tasks will be conflicted (debugger shows that result).
So I add this namespace to isolate each Task.

Same reason as followings;
https://github.com/spotify/luigi/pull/3048/files#diff-5c8d3f52b8e7946dd9520aeb1b25f24ff2f5c0513c6205922ca8a7a28aec4cd8R32
https://github.com/spotify/luigi/pull/3048/files#diff-487da8575cd13d85c3d44e6b43d7988a2aea0c81479d298248820eb27be5da9bR221

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in a542dee

@@ -1,25 +1,41 @@
[tox]
envlist = py{35,36,37}-{cdh,hdp,core,contrib,apache,aws,gcloud,postgres,unixsocket,azureblob,dropbox}, visualiser, docs, flake8
envlist = py{35,36,37,38,39}-{cdh,hdp,core,contrib,apache,aws,gcloud,postgres,unixsocket,azureblob,dropbox}, visualiser, docs, flake8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

tox.ini Outdated
LUIGI_CONFIG_PATH={toxinidir}/test/testconfig/luigi.cfg
TEST_VISUALISER=1
commands =
python --version
nosetests -v --tests=test/visualiser
pytest -m "not minicluster and not gcloud and not postgres and not unixsocket" test/visualiser

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these specific filters here?

Copy link
Contributor Author

@hirosassa hirosassa Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is counter part of NOSE_EVAL_ATTER above, but it is not needed. removed it in
013b75f

@ravwojdyla
Copy link

👋 @lallea do you mind taking a look (review) as well (if you have time) please. I assume you would like to have this in as well :)

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

Could you also update the Readme.md to denote update to Python versions being tested and "supported"?

@hirosassa
Copy link
Contributor Author

@dlstadther Thank you for the review! added in f7053d9

@lallea
Copy link
Contributor

lallea commented Apr 3, 2021

All looks good to me, with the reservation that I know little about pytest.

Thanks @hirosassa for putting down all this work!

@ravwojdyla
Copy link

👋 @honnix @narape what's the best way to request review/merge from the Spotify Team?

@narape
Copy link
Contributor

narape commented Apr 6, 2021

👍 LGTM, @hirosassa Thanks for the contribution

@narape narape merged commit 7fe53ed into spotify:master Apr 6, 2021
@hirosassa hirosassa deleted the add-python-version branch April 6, 2021 20:43
@hirosassa
Copy link
Contributor Author

Thanks for your comments and reviews! @ravwojdyla @lallea @dlstadther @narape

@ravwojdyla
Copy link

@hirosassa thanks for all you great work!

Btw, just noticed that it might be also worth updating the setup.py to include python 3.{8,9}. Let me know if that's something you would like to handle as well.

@hirosassa
Copy link
Contributor Author

@ravwojdyla Oh, I missed it. I'll submit small PR to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python 3.8 support
5 participants