Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: runtime HEATMAP/DXT cross-check log #22

Merged
merged 1 commit into from Feb 18, 2022

Conversation

tylerjereddy
Copy link
Collaborator

  • add a carefully crafted darshan log using latest darshan
    main branch that contains "runtime" HEATMAP module
    and DXT tracing, to facilitate the writing of carefully
    designed tests for internal consistency of data structures
    (and to facilitate rapid visual inspection as well)

  • see the README for a very detailed description of how
    the data was produced (this is perhaps a bit too detailed,
    though I'm mostly doing this so that more folks can generate
    carefully-crafted testing data using pure Python without
    waiting for me for example)

Here is a sample comparison of the DXT vs. runtime heatmaps in a hacked version of Jakob's branch from gh-615, with a few debug prints as well, to show data structure similarity on top of the obvious visual similarities. I hope it is clear that the write activity, which I intentionally made diagonal for visual inspection, matches.

image

image

Debug prints:

dxt heatmap nonzero_rows: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31]
dxt heatmap NUM nonzero_rows: 32
dxt heatmap nonzero_columns: [  4  10  17  23  29  35  42  48  55  60  66  73  80  86  92  99 105 111
 118 123 129 137 143 150 156 163 169 174 182 188 193 199]
dxt heatmap NUM nonzero_columns: 32
runtime heatmap module: POSIX
operation: read
runtime heatmap nonzero_rows: []
runtime heatmap NUM nonzero_rows: 0
runtime heatmap nonzero_columns: []
runtime heatmap NUM nonzero_columns: 0
operation: write
runtime heatmap nonzero_rows: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31]
runtime heatmap NUM nonzero_rows: 32
runtime heatmap nonzero_columns: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31]
runtime heatmap NUM nonzero_columns: 32
Report generated successfully. 
Saving report at location: /Users/treddy/LANL/rough_work/darshan/diagonal_heatmap_testing/runtime_and_dxt_heatmaps_diagonal_write_only_report.html

If developers would like to modify the application to, for example, produce read events as well, I think I've provided sufficient detail for folks to make that adjustment (feel free to do that and push in here/adjust README, etc.). If not, this may provide sufficient detail for initial flushing of runtime HEATMAP code paths on the Python side in gh-615, and for subsequent more detailed testing/cross-comparison with the DXT heatmap results later on.

@tylerjereddy
Copy link
Collaborator Author

If reviewers want to look in to the CI issues, that may be helpful as well (if it isn't obvious, you can start by diffing the current logs with previous successful logs for the repo, etc.)

* add a carefully crafted darshan log using latest `darshan`
`main` branch that contains "runtime" `HEATMAP` module
and DXT tracing, to facilitate the writing of carefully
designed tests for internal consistency of data structures
(and to facilitate rapid visual inspection as well)

* see the `README` for a very detailed description of how
the data was produced (this is perhaps a bit too detailed,
though I'm mostly doing this so that more folks can generate
carefully-crafted testing data using pure Python without
waiting for me for example)
@nawtrey
Copy link
Contributor

nawtrey commented Feb 18, 2022

At a glance the CI errors don't seem directly related to this PR, it's acting like darshan isn't installed:

________________ ERROR collecting darshan/tests/test_summary.py ________________
ImportError while importing test module '/home/runner/work/darshan-logs/darshan-logs/scratch/darshan/darshan-util/pydarshan/darshan/tests/test_summary.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
darshan/tests/test_summary.py:12: in <module>
    from darshan.cli import summary
E   ModuleNotFoundError: No module named 'darshan.cli'; 'darshan' is not a package

I'm guessing this is because the CI operates a little differently in this repo vs. the main one (i.e. pytest --import-mode=importlib vs. pytest --pyargs darshan), but I'll have to double check that.

I was expecting a failure somewhere in test_report.py or test_summary.py (since they both contain tests that iterate over all of the logs), but the CI hasn't made it that far yet. Running pytest --pyargs darshan -n 6 locally on pydarshan-devel, I get something closer to what I would expect:

    def log_get_generic_record(log, mod_name, dtype='numpy'):
        """
        Returns a dictionary holding a generic darshan log record.
    
        Args:
            log: Handle returned by darshan.open
            mod_name (str): Name of the Darshan module
    
        Return:
            dict: generic log record
    
        Example:
    
        The typical darshan log record provides two arrays, on for integer counters
        and one for floating point counters:
    
        >>> darshan.log_get_generic_record(log, "POSIX", "struct darshan_posix_file **")
        {'counters': array([...], dtype=int64), 'fcounters': array([...])}
    
        """
        modules = log_get_modules(log)
        if mod_name not in modules:
            return None
>       mod_type = _structdefs[mod_name]
E       KeyError: 'HEATMAP'

darshan/backend/cffi_backend.py:334: KeyError
===================================================================== short test summary info ======================================================================
FAILED darshan/tests/test_report.py::test_jobid_type_all_logs_repo_files - KeyError: 'HEATMAP'
FAILED darshan/tests/test_summary.py::test_main_all_logs_repo_files - KeyError: 'HEATMAP'
============================================================ 2 failed, 300 passed in 148.07s (0:02:28) =============================================================

And on Jakob's branch: 299 passed in 128.17s (0:02:08)

@shanedsnyder
Copy link
Contributor

I tried reproducing the CI steps locally on my machine and hit the same errors. As @nawtrey points out, I can work around the issue just by dropping the --import-mode=importlib argument to pytest.

I noticed this comparing these CI failures to the last success:

Last success:

platform linux -- Python 3.6.15, pytest-6.2.5, py-1.11.0, pluggy-1.0.0

This job:

platform linux -- Python 3.6.15, pytest-7.0.1, pluggy-1.0.0

So maybe it's related to a newer pytest version? Anything we might expect to change with --import-mode=importlib on newer versions?

@shanedsnyder
Copy link
Contributor

Was also able to quickly confirm that pytest --import-mode=importlib works fine here if I uninstall pytest 7.0.1 and pin to 6.2.5. Nothing sticks out as obvious scanning the changelog for pytest 7.0.1: https://github.com/pytest-dev/pytest/releases

@nawtrey
Copy link
Contributor

nawtrey commented Feb 18, 2022

I looked through the pytest changelog here and didn't find anything obvious either. I did find a similar issue that might give us an idea how to resolve this. Specifically this comment seems to describe a similar experience.

Like I noted above, locally pytest --pyargs darshan works fine, but pytest --import-mode=importlib raises the same errors as the CI. I think there is an issue here because I don't see a reason why import-mode=importlib shouldn't work, whether or not we invoke--pyargs. The pytest devs seemed to have addressed the linked issue very recently, so maybe this will sort itself out upstream.

In the mean time I tried making some changes to the pytest.ini, but I couldn't find a way to escape the import errors. We could just pin the older pytest version until they have a fix for this, I suppose.

@tylerjereddy
Copy link
Collaborator Author

We could just pin the older pytest version until they have a fix for this, I suppose.

Sounds good, feel free to do that and maybe open an issue reminding us to try unpinning the future.

We could also just use pytest --pyargs darshan in a temporary directory like the main repo, but the failure here has me thinking that it doesn't hurt to have the repos using the slightly different approach for coverage of our bases anyway.

@shanedsnyder
Copy link
Contributor

I'll submit a PR with pytest pinned and open an issue so that we revisit this at some point.

There are additional issues on #23 I need to look into, but we should be good on this PR. Should I merge despite pydarshan failures related to heatmap bindings? Or hold off until darshan-hpc/darshan#615 is in?

@tylerjereddy
Copy link
Collaborator Author

Should I merge despite pydarshan failures related to heatmap bindings? Or hold off until darshan-hpc/darshan#615 is in?

I'd probably vote for merging, then using the log in this PR as a test case for flushing the code paths in Jakob's PR (I already pasted a draft "sanity check" style test over there, so worst case scenario we do something like that initially instead of the deeper testing I had in mind for the future with the diagonals..).

I may be biased though. Once thing I do want to make sure of is that gh-23 confusion isn't an indication of a deeper problem with the runtime heatmap parsing infra.

@nawtrey
Copy link
Contributor

nawtrey commented Feb 18, 2022

Should I merge despite pydarshan failures related to heatmap bindings? Or hold off until darshan-hpc/darshan#615 is in?

I would also vote for merging this, we are probably better off making sure darshan-hpc/darshan#615 is well tested than worrying too much about a CI failure in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants