add python 3.8, 3.9 support and migration to pytest #3048

hirosassa · 2021-02-24T23:10:44Z

Description

fix #3046, related to #2886, #2939

This PR migrates from nose to pytest and adds python 3.8 and 3.9 support.

Details are as follows

migrate from nose to pytest
- place test/conftest.py to change test root
- replace @attr decorator into pytest.mark to group tests
- fix a few imports e.g. here
- attach namespace on some test tasks to isolate each test e.g. here
add py38 and py39 CI on tox and travis
- update test dependencies
- update virtualenv to resolve cryptography installation problem
  - diff
- replace hypothesis.extra.datetime into hypothesis.strategies because of it's deprecation
  - diff

Motivation and Context

pytest migration

nose is not in active development and several deprecation warnings are occurred.

add python 3.8, 3.9 tests

easy to follow latest python versions

Have you tested this? If so, how?

I ran tests locally and on travis, too

hirosassa · 2021-03-11T02:24:56Z

@honnix @dlstadther @Tarrasch
cc @ravwojdyla
Does anybody know how to run CI with py38 environment on travis?
Timeouts on tox's installation step will always be occurred.

https://travis-ci.org/github/spotify/luigi/jobs/762401923

(I'm sorry for very very dirty and many commits. I'll squash later.)

ravwojdyla · 2021-03-11T08:18:14Z

@hirosassa 🤔 is py38 also timing out locally?

hirosassa · 2021-03-11T09:28:22Z

@ravwojdyla No, This is just occurred only on travis.

ravwojdyla · 2021-03-11T09:55:46Z

@hirosassa interesting. Travis has limited support for debugging builds: https://docs.travis-ci.com/user/running-build-in-debug-mode/. But I imagine we can't use that here, in the meantime we could also add -vvvv to tox commands in ./scripts/ci/conditional_tox.sh to hopefully see more output and maybe the cause?

hirosassa · 2021-03-11T09:57:30Z

@ravwojdyla Thank you! I'll try -vvv option.

hirosassa · 2021-03-11T22:36:59Z

@ravwojdyla The cause of timeout is resolving dependencies on python 3.8 and I fixed it. Thanks!
I'll struggle against some failure tests.
https://travis-ci.org/github/spotify/luigi/builds/762528137

ravwojdyla · 2021-03-12T06:56:47Z

@hirosassa thanks for the update. Oh wow, there is quite a lot of failures on py38!

FYI @lallea since you have mentioned you've been using luigi on py38 in #3046 (comment). There seem to be quite a number of test failures, hopefully that doesn't affect your workloads, just wanted to let you know.

ravwojdyla · 2021-03-12T06:57:23Z

@hirosassa what do you think is the right way forward?

lallea · 2021-03-12T07:54:01Z

@ravwojdyla Thanks for the heads up. Lots of failures, indeed, and on core things. Surprising, since we have not seen any trouble. Works on our machines. :-)

hirosassa · 2021-03-12T08:29:36Z

@ravwojdyla @lallea I fixed almost all of the failures now (most of failures comes from coverage bug on python 3.8 and fixed by update it).

ref
nedbat/coveragepy#745

https://travis-ci.org/github/spotify/luigi/builds/762532477
https://travis-ci.org/github/spotify/luigi/jobs/762532486

ravwojdyla · 2021-03-12T12:04:31Z

That's a huge relief! Really appreciate your work @hirosassa!

hirosassa · 2021-03-20T08:40:48Z

@ravwojdyla @honnix @dlstadther
almost all of the tests passed, left one same error for each python versions. Do you have any idea to solve following test's error?
It is not occurred in my local environment.

test/scheduler_test.py::StableDoneCooldownSecsTest::test_sending_same_task_twice_without_cooldown_leads_to_double_run


=================================== FAILURES ===================================
_ StableDoneCooldownSecsTest.test_sending_same_task_twice_without_cooldown_leads_to_double_run _
self = <scheduler_test.StableDoneCooldownSecsTest testMethod=test_sending_same_task_twice_without_cooldown_leads_to_double_run>
@with_config({'scheduler': {'stable_done_cooldown_secs': '0'}})

def test_sending_same_task_twice_without_cooldown_leads_to_double_run(self):

    second_run_result = self.get_second_run_result_on_double_run()


  self.assertEqual(second_run_result.scheduling_succeeded, False)


E       AssertionError: True != False
test/scheduler_test.py:366: AssertionError
----------------------------- Captured stdout call -----------------------------
INFO: No prior state file exists at /var/lib/luigi-server/state.pickle. Starting with empty state
INFO: Scheduler starting up
INFO: logging already configured
DEBUG: Starting new HTTP connection (1): localhost:8082
INFO: 200 POST /api/ping (127.0.0.1) 2.42ms
DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34
INFO: 200 POST /api/ping (127.0.0.1) 1.43ms
DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34
INFO: 200 POST /api/add_task (127.0.0.1) 2.13ms
DEBUG: http://localhost:8082 "POST /api/add_task HTTP/1.1" 200 18
INFO: 200 POST /api/add_worker (127.0.0.1) 1.44ms
DEBUG: http://localhost:8082 "POST /api/add_worker HTTP/1.1" 200 18
INFO: 200 POST /api/get_work (127.0.0.1) 1.43ms
DEBUG: http://localhost:8082 "POST /api/get_work HTTP/1.1" 200 327
INFO: logging already configured
INFO: 200 POST /api/ping (127.0.0.1) 2.26ms
DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34
DEBUG: Starting new HTTP connection (1): localhost:8082
INFO: 200 POST /api/ping (127.0.0.1) 1.39ms
DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34
INFO: 200 POST /api/ping (127.0.0.1) 1.44ms
DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34
INFO: 200 POST /api/add_task (127.0.0.1) 1.64ms
DEBUG: http://localhost:8082 "POST /api/add_task HTTP/1.1" 200 18
INFO: 200 POST /api/get_work (127.0.0.1) 1.32ms
DEBUG: http://localhost:8082 "POST /api/get_work HTTP/1.1" 200 154
INFO: 200 POST /api/ping (127.0.0.1) 1.40ms
DEBUG: http://localhost:8082 "POST /api/ping HTTP/1.1" 200 34
INFO: 200 POST /api/add_task (127.0.0.1) 1.59ms
DEBUG: http://localhost:8082 "POST /api/add_task HTTP/1.1" 200 18
INFO: 200 POST /api/add_worker (127.0.0.1) 1.47ms
DEBUG: http://localhost:8082 "POST /api/add_worker HTTP/1.1" 200 18
INFO: 200 POST /api/get_work (127.0.0.1) 1.44ms
DEBUG: http://localhost:8082 "POST /api/get_work HTTP/1.1" 200 154
INFO: Scheduler instance shutting down
WARNING: Failed saving scheduler state
Traceback (most recent call last):
File "/home/travis/build/spotify/luigi/luigi/scheduler.py", line 445, in dump
with open(self._state_path, 'wb') as fobj:

FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/luigi-server/state.pickle'
----------------------------- Captured stderr call -----------------------------

ravwojdyla · 2021-03-22T07:33:20Z

@hirosassa great job!!

left one same error for each python versions

So this smells like a regression from the package updates maybe? That test was introduced by b4e4433 as a fix for #2644:

luigi/test/scheduler_test.py

Lines 304 to 366 in 7508cdf

    
           class FailingOnDoubleRunTask(luigi.Task): 
        
               time_to_check_secs = 1 
        
               time_to_run_secs = 2 
        
               output_dir = luigi.Parameter(default="") 
        
               def __init__(self, *args, **kwargs): 
        
                   super(FailingOnDoubleRunTask, self).__init__(*args, **kwargs) 
        
                   self.file_name = os.path.join(self.output_dir, "AnyTask") 
        
               def complete(self): 
        
                   time.sleep(self.time_to_check_secs)  # e.g., establish connection 
        
                   exists = os.path.exists(self.file_name) 
        
                   time.sleep(self.time_to_check_secs)  # e.g., close connection 
        
                   return exists 
        
               def run(self): 
        
                   time.sleep(self.time_to_run_secs) 
        
                   if os.path.exists(self.file_name): 
        
                       raise FileAlreadyExists(self.file_name) 
        
                   open(self.file_name, 'w').close() 
        
           class StableDoneCooldownSecsTest(unittest.TestCase): 
        
               def setUp(self): 
        
                   self.p = tempfile.mkdtemp() 
        
               def tearDown(self): 
        
                   shutil.rmtree(self.p) 
        
               def run_task(self): 
        
                   return luigi.build([FailingOnDoubleRunTask(output_dir=self.p)], 
        
                                      detailed_summary=True, 
        
                                      parallel_scheduling=True, 
        
                                      parallel_scheduling_processes=2) 
        
               @with_config({'worker': {'keep_alive': 'false'}}) 
        
               def get_second_run_result_on_double_run(self): 
        
                   server_process = Process(target=luigi.server.run) 
        
                   process = Process(target=self.run_task) 
        
                   try: 
        
                       # scheduler is started 
        
                       server_process.start() 
        
                       # first run is started 
        
                       process.start() 
        
                       time.sleep(FailingOnDoubleRunTask.time_to_run_secs + FailingOnDoubleRunTask.time_to_check_secs) 
        
                       # second run of the same task is started 
        
                       second_run_result = self.run_task() 
        
                       return second_run_result 
        
                   finally: 
        
                       process.join(1) 
        
                       server_process.terminate() 
        
                       server_process.join(1) 
        
               @with_config({'scheduler': {'stable_done_cooldown_secs': '5'}}) 
        
               def test_sending_same_task_twice_with_cooldown_does_not_lead_to_double_run(self): 
        
                   second_run_result = self.get_second_run_result_on_double_run() 
        
                   self.assertEqual(second_run_result.scheduling_succeeded, True) 
        
               @with_config({'scheduler': {'stable_done_cooldown_secs': '0'}}) 
        
               def test_sending_same_task_twice_without_cooldown_leads_to_double_run(self): 
        
                   second_run_result = self.get_second_run_result_on_double_run() 
        
                   self.assertEqual(second_run_result.scheduling_succeeded, False)

has a number of knobs to tweak like the time_to_check_secs, time_to_run_secs, parallel_scheduling_processes. Maybe adding a couple of log messages between the steps would make it a bit more obvious why the task if failing. Also tweaking the knobs to more extreme values just for the debugging purposes?

ravwojdyla · 2021-03-23T08:40:09Z

@hirosassa

WARNING: Failed saving scheduler state
Traceback (most recent call last):
File "/home/travis/build/spotify/luigi/luigi/scheduler.py", line 448, in dump
with open(self._state_path, 'wb') as fobj:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/luigi-server/state.pickle'

Does this directory: /var/lib/luigi-server exist at the time of the test run?

hirosassa · 2021-03-23T14:11:11Z

@ravwojdyla Thank you for your suggestion. I checked existence and writability of the directory by this code, and found the following logs

INFO: state path: /var/lib/luigi-server/state.pickle, exists?: False, writable?: False
INFO: state path dir: /var/lib/luigi-server, exists?: False, writable?: False
INFO: state path root dir: /var/lib, exists?: True, writable?: False

However, there's no mkdir or preparation of the directories 🤔

ravwojdyla · 2021-03-24T08:30:13Z

@hirosassa that actually might be a red herring. So it seems like test_sending_same_task_twice_without_cooldown_leads_to_double_run is flaky on the travis CI only. Very weird. To limit this down, maybe we should try to run only a single environment instead of the test full matrix (essentially I wonder if there's some interaction there).

hirosassa · 2021-03-24T13:44:58Z

@ravwojdyla I found the reason test_sending_same_task_twice_without_cooldown_leads_to_double_run fails!
I removed coverage command in 644b445 , then all the core test passed.

ravwojdyla · 2021-03-24T16:58:56Z

@hirosassa great news! I wonder if this is a known issue(?) and what's the proper way forward?

hirosassa · 2021-03-27T23:56:58Z

@ravwojdyla finally! 🎉
https://travis-ci.org/github/spotify/luigi/builds/764698712

hirosassa · 2021-03-28T02:39:09Z

@Tarrasch @honnix @dlstadther cc @ravwojdyla @lallea
This PR is ready for review. Please check!

ravwojdyla · 2021-03-28T08:47:54Z

Great work @hirosassa!!

ravwojdyla

@hirosassa looks great, and again great work figuring out all the weird errors!! I only have some questions inline, plus would suggest to open an issue to update test documentation to document pytest usage going forward.

ravwojdyla · 2021-03-29T08:14:04Z

test/contrib/hdfs/webhdfs_client_test.py

+    from hdfs import InsecureClient
+    from hdfs.ext.kerberos import KerberosClient
+except ImportError:
+    raise unittest.SkipTest('Unable to load hdfs module')


You could probably also use pytest.importorskip?

Thank you for your suggestion! fixed in d4de5fe

ravwojdyla · 2021-03-29T08:15:33Z

test/customized_run_test.py

@@ -27,6 +27,7 @@


 class DummyTask(luigi.Task):
+    task_namespace = 'customized_run'


why is this now required? Is this related to path setup between nose and pytest?

This line is for test isolation. There are several DummyTask in the test directory.
Since pytest loads all of the test at the collection phase of the test, the tasks will be conflicted (debugger shows that result).
So I add this namespace to isolate each Task.

Same reason as followings;
https://github.com/spotify/luigi/pull/3048/files#diff-5c8d3f52b8e7946dd9520aeb1b25f24ff2f5c0513c6205922ca8a7a28aec4cd8R32
https://github.com/spotify/luigi/pull/3048/files#diff-487da8575cd13d85c3d44e6b43d7988a2aea0c81479d298248820eb27be5da9bR221

I added a comment in a542dee

ravwojdyla · 2021-03-29T08:16:51Z

tox.ini

@@ -1,25 +1,41 @@
 [tox]
-envlist = py{35,36,37}-{cdh,hdp,core,contrib,apache,aws,gcloud,postgres,unixsocket,azureblob,dropbox}, visualiser, docs, flake8
+envlist = py{35,36,37,38,39}-{cdh,hdp,core,contrib,apache,aws,gcloud,postgres,unixsocket,azureblob,dropbox}, visualiser, docs, flake8


ravwojdyla · 2021-03-29T08:19:20Z

tox.ini

    LUIGI_CONFIG_PATH={toxinidir}/test/testconfig/luigi.cfg
    TEST_VISUALISER=1
 commands =
    python --version
-    nosetests -v --tests=test/visualiser
+    pytest -m "not minicluster and not gcloud and not postgres and not unixsocket" test/visualiser


why these specific filters here?

This filter is counter part of NOSE_EVAL_ATTER above, but it is not needed. removed it in
013b75f

ravwojdyla · 2021-03-31T15:36:08Z

👋 @lallea do you mind taking a look (review) as well (if you have time) please. I assume you would like to have this in as well :)

dlstadther

Changes look good to me.

Could you also update the Readme.md to denote update to Python versions being tested and "supported"?

hirosassa · 2021-04-02T21:44:26Z

@dlstadther Thank you for the review! added in f7053d9

lallea · 2021-04-03T22:08:06Z

All looks good to me, with the reservation that I know little about pytest.

Thanks @hirosassa for putting down all this work!

ravwojdyla · 2021-04-05T15:28:28Z

👋 @honnix @narape what's the best way to request review/merge from the Spotify Team?

narape · 2021-04-06T16:34:32Z

👍 LGTM, @hirosassa Thanks for the contribution

hirosassa · 2021-04-07T06:16:51Z

Thanks for your comments and reviews! @ravwojdyla @lallea @dlstadther @narape

ravwojdyla · 2021-04-07T08:42:10Z

@hirosassa thanks for all you great work!

Btw, just noticed that it might be also worth updating the setup.py to include python 3.{8,9}. Let me know if that's something you would like to handle as well.

hirosassa · 2021-04-07T20:56:30Z

@ravwojdyla Oh, I missed it. I'll submit small PR to fix this.

hirosassa requested review from dlstadther, honnix and Tarrasch as code owners February 24, 2021 23:10

hirosassa mentioned this pull request Mar 19, 2021

Added Meltwater to list of Luigi users in README.rst #3052

Merged

hirosassa requested a review from a team as a code owner March 23, 2021 13:11

honnix removed their request for review March 23, 2021 13:35

hirosassa force-pushed the add-python-version branch from 0047737 to b575b37 Compare March 28, 2021 00:41

migrate to pytest and add py38 and py39 environment

93b4741

hirosassa force-pushed the add-python-version branch from b575b37 to 93b4741 Compare March 28, 2021 01:05

trigger ci

26e3337

hirosassa changed the title ~~[WIP] add python 3.8, 3.9 support and migration to pytest~~ add python 3.8, 3.9 support and migration to pytest Mar 28, 2021

ravwojdyla approved these changes Mar 29, 2021

View reviewed changes

hirosassa added 3 commits March 30, 2021 20:36

use importorskip

d4de5fe

add comment

a542dee

remove filter

013b75f

dlstadther reviewed Apr 2, 2021

View reviewed changes

add python versions on README

f7053d9

dlstadther approved these changes Apr 2, 2021

View reviewed changes

narape merged commit 7fe53ed into spotify:master Apr 6, 2021

hirosassa deleted the add-python-version branch April 6, 2021 20:43

hirosassa mentioned this pull request Apr 7, 2021

add python 3.8, 3.9 on setup.py #3059

Merged

		@@ -27,6 +27,7 @@


		class DummyTask(luigi.Task):
		task_namespace = 'customized_run'

add python 3.8, 3.9 support and migration to pytest #3048

add python 3.8, 3.9 support and migration to pytest #3048

Conversation

hirosassa commented Feb 24, 2021 • edited

Description

Motivation and Context

pytest migration

add python 3.8, 3.9 tests

Have you tested this? If so, how?

hirosassa commented Mar 11, 2021 • edited

ravwojdyla commented Mar 11, 2021

hirosassa commented Mar 11, 2021

ravwojdyla commented Mar 11, 2021

hirosassa commented Mar 11, 2021

hirosassa commented Mar 11, 2021

ravwojdyla commented Mar 12, 2021

ravwojdyla commented Mar 12, 2021

lallea commented Mar 12, 2021

hirosassa commented Mar 12, 2021 • edited

ravwojdyla commented Mar 12, 2021

hirosassa commented Mar 20, 2021 • edited

ravwojdyla commented Mar 22, 2021

ravwojdyla commented Mar 23, 2021

hirosassa commented Mar 23, 2021

ravwojdyla commented Mar 24, 2021

hirosassa commented Mar 24, 2021

ravwojdyla commented Mar 24, 2021

hirosassa commented Mar 27, 2021

hirosassa commented Mar 28, 2021

ravwojdyla commented Mar 28, 2021

ravwojdyla left a comment

Choose a reason for hiding this comment

ravwojdyla Mar 29, 2021

Choose a reason for hiding this comment

hirosassa Mar 30, 2021

Choose a reason for hiding this comment

ravwojdyla Mar 29, 2021

Choose a reason for hiding this comment

hirosassa Mar 30, 2021 • edited

Choose a reason for hiding this comment

hirosassa Mar 30, 2021

Choose a reason for hiding this comment

ravwojdyla Mar 29, 2021

Choose a reason for hiding this comment

ravwojdyla Mar 29, 2021

Choose a reason for hiding this comment

hirosassa Mar 30, 2021 • edited

Choose a reason for hiding this comment

ravwojdyla commented Mar 31, 2021

dlstadther left a comment

Choose a reason for hiding this comment

hirosassa commented Apr 2, 2021

lallea commented Apr 3, 2021

ravwojdyla commented Apr 5, 2021

narape commented Apr 6, 2021

hirosassa commented Apr 7, 2021

ravwojdyla commented Apr 7, 2021

hirosassa commented Apr 7, 2021

hirosassa commented Feb 24, 2021 •

edited

hirosassa commented Mar 11, 2021 •

edited

hirosassa commented Mar 12, 2021 •

edited

hirosassa commented Mar 20, 2021 •

edited

hirosassa Mar 30, 2021 •

edited

hirosassa Mar 30, 2021 •

edited