Fix hdf5 tile data after animation stops #1371

pford · 2024-04-17T21:52:24Z

Description

What is implemented or fixed? Mention the linked issue(s), if any.
Fixes odd small pixel tiles seen in hdf5 image after animation stops #1368
How does this PR solve the issue? Give a brief summary.
By default, image quality is decreased during animation for better speed. For each channel, tile data for HDF5 images is requested from the tile cache, encoded so that NaN values may be restored, and then compressed. However, the encoding step changes the cached tile data (passed with a shared_ptr) by replacing a NaN values in a block with the average of finite values in that block. When the cached tile data is requested again after animation stops, for image data with a higher compression quality, the image data is distorted since some values are no longer NaN. To fix this, the tile data is assigned to a new vector for encoding and compression, leaving the cached data unchanged.
Are there any companion PRs (frontend, protobuf)?
No
Is there anything else that testers should know (e.g. exactly how to reproduce the issue)?
Zoom in an HDF5 image to an area with NaN values, then start and stop animation. There should be no defect in the image after the animation is stopped.

Checklist

changelog updated / ~~no changelog update needed~~
e2e test passing / ~~corresponding fix added~~ / ~~new e2e test created~~
ICD test passing / corresponding fix added / new ICD test created
~~protobuf updated to the latest dev commit~~ / no protobuf update needed
~~protobuf version bumped~~ / protobuf version not bumped
added reviewers and assignee
added ZenHub estimate, milestone, and release

pford · 2024-04-17T22:01:58Z

Not sure if this should be code reviewed or tested first since it involves copying data, which we try to avoid.

confluence · 2024-05-02T11:48:29Z

In this case I think copying is unavoidable -- the compression function should not be modifying data in the tile cache!

I do remember adding a tile pool to the tile cache to avoid constantly creating and destroying vector objects, since this was causing quite a significant slowdown. I wonder if a tile pool would be appropriate here as well (I think that we should test the performance impact of this change first).

If yes, then we could either create a new tile pool to be used by this function, or add a GetCopy function to the tile cache which performs the copy internally and uses the tile cache's own tile pool (we may then need to adjust the tile cache pool size when we start and stop the animation, but maybe the tile requirements aren't going to exceed the default size, which IIRC is enough to fit one row and one column of 2x2 tile chunks).

confluence · 2024-05-02T12:02:42Z

Actually, now I see that the current change is in FillRasterTileData, which means that the copy is being performed regardless of how the data was obtained in GetRasterTileData (which means that it will also affect cases where the data does not need to be copied). It's GetRasterTileData which should be modified -- there's already a new vector which is created there (for the cases which don't use the tile cache), which could be used to copy the cached data.

Perhaps adding a new tile pool in GetRasterTileData, and using it in all cases instead of creating a new tile vector, would be a good idea.

pford · 2024-05-07T23:05:54Z

@confluence I added TileCache::GetCopy. If I added a TilePool to Frame, wouldn't it have the same problem as the TileCache pool?

confluence · 2024-05-08T08:48:58Z

@confluence I added TileCache::GetCopy.

Currently GetCopy is creating a new vector. I was originally suggesting using the tile cache's pool to recycle a vector object to use for the copy (i.e. using _pool->Pull() to get a vector pointer).

If I added a TilePool to Frame, wouldn't it have the same problem as the TileCache pool?

I'm not sure what problem you mean here.

I think I've confused the issue by conflating multiple suggestions -- for clarity, I'm talking about two separate things:

In GetRasterTileData we already create a new vector to read data into, in the cases that don't use the tile cache. I would suggest that we use this vector as well in the tile cache case, for consistency, instead of creating a vector in a different place. Perhaps the GetCopy function should be called CopyInto and take that vector as an input/output parameter -- but maybe this function isn't necessary at all, and the copy can just be inlined in GetRasterTileData. (I originally suggested the tile cache function to enable the use of the tile cache's pool, before I realised that the tile cache case could be integrated more generically with the other cases.)
Additionally, as a performance improvement, I think it would be a good idea to use a tile pool for that new vector in GetRasterTileData, instead of repeatedly creating and destroying it. My updated suggestion for this is to add a separate tile pool object to the frame, to be used here in all the cases, not just the tile cache case. I haven't checked if there are other places in the frame where a new tile vector is created (which could make use of the same pool).

confluence · 2024-05-10T23:24:08Z

@pford I was suggesting to use only a tile pool in the frame for all the tiles, not the tile cache -- we only added the full-resolution tile cache to HDF5 files because in the HDF5 schema the main dataset is chunked, which makes it efficient to read a chunk (which is 2x2 tiles) at a time. For FITS files and other formats, reading a chunk at a time is not efficient, which is why we still use the full channel cache for those formats. I don't think it makes sense to copy tiles read from the full channel cache into the tile cache (this is currently bypassing the tile cache's tile pool, and I believe that downsampled tiles are also being saved -- this cache is only written to handle full-resolution tiles).

I am suggesting only that the frame gets its own tile pool object (separate to the tile cache's tile pool), and uses that pool to recycle the tile pointers (instead of creating new pointers with newly allocated data). This wouldn't be caching any tile data; just keeping a buffer of reusable objects allocated in memory. They would be overwritten with new data every time they are used (so it would be fine to use them for all tiles in GetRasterTileData, whether they're downsampled or not).

So the frame would have a std::shared_ptr<TilePool> _pool, with some reasonable capacity (how many tiles are likely to be allocated at the same time during an animation?), and in GetRasterTileData instead of creating a new tile_data vector and later a new pointer with make_shared, we'd get a pointer from the pool with _pool->Pull(), and write to that pointer's data. The pool class handles the recycling internally (with a custom deleter).

So it would be just this small change, plus the copying in the tile cache case.

This reverts commit 8f0bf53.

This reverts commit 4c47ec8.

pford · 2024-05-13T18:40:14Z

I added a TilePool to Frame with the capacity set to the number of omp threads, since Session fills the tiles in parallel using omp. The TilePool is created the first time tiles are requested, since tiles are not used when Frame holds the downsampled image for PV preview.

confluence

Other than the place the pool is created, I think this is good now. Using the number of threads to calculate the pool capacity makes sense. Perhaps it should be the number of threads plus one or two, to avoid race conditions in which a new tile is needed before any tiles have been returned to the pool?

src/Frame/Frame.cc

github-actions · 2024-05-13T22:42:07Z

Package	Line Rate	Health
src.Cache	72%	➖
src.DataStream	44%	➖
src.FileList	67%	➖
src.Frame	36%	❌
src.HttpServer	42%	➖
src.ImageData	28%	❌
src.ImageFitter	83%	✔
src.ImageGenerators	43%	➖
src.ImageStats	75%	✔
src.Logger	37%	❌
src.Main	52%	➖
src.Region	69%	➖
src.Session	4%	❌
src.Table	52%	➖
src.ThreadingManager	67%	➖
src.Timer	85%	✔
src.Util	40%	➖
Summary	46% (8614 / 18797)	➖

confluence · 2024-05-14T07:30:29Z

@kswang1029 in addition to testing the bug fix, please check the performance impact for animation (of both HDF5 files and other file types). I expect this to slightly decrease performance for HDF5 files but also slightly increase performance across the board, and I'd like to find out what the overall impact is (and to make sure that something unexpected isn't happening).

kswang1029

@pford this looks good. @confluence As for performance test, please refer to the following table. I test the performance with a custom e2e test which has a 30s animation playback and then I count the stopped channel index and hence an average fps. I do not see obvious performance improvement/degradation.

confluence · 2024-05-20T09:25:11Z

@kswang1029 @pford in that case I'm happy to merge this.

kswang1029 · 2024-05-20T09:27:37Z

@kswang1029 @pford in that case I'm happy to merge this.

the effect (if any) may be more prominent using "large" cube but I do not have the resources to test with, unfortunately.

Pam Harris added 2 commits April 17, 2024 15:10

Make copy of tile data before encoding nans

78b96be

Update changelog

2021493

pford added this to the v5.0-beta milestone Apr 17, 2024

pford requested review from veggiesaurus, confluence, loveluthien and kswang1029 April 17, 2024 21:52

pford assigned confluence Apr 17, 2024

Pam Harris added 2 commits May 7, 2024 13:03

Merge branch 'dev' into pam/1368_hdf5_animation

8a2f3cf

Add TileCache GetCopy and use in Frame when tile data changed

4c47ec8

Use tile cache in Frame for all tiles

8f0bf53

Pam Harris added 3 commits May 10, 2024 17:52

Revert "Use tile cache in Frame for all tiles"

18b731f

This reverts commit 8f0bf53.

Revert "Add TileCache GetCopy and use in Frame when tile data changed"

640aee0

This reverts commit 4c47ec8.

Add tile pool to Frame

418b79a

confluence reviewed May 13, 2024

View reviewed changes

src/Frame/Frame.cc Outdated Show resolved Hide resolved

Init tile pool in Frame constructor

4ec5289

confluence approved these changes May 14, 2024

View reviewed changes

confluence added the awaiting testing For pull requests that require testing label May 14, 2024

kswang1029 approved these changes May 20, 2024

View reviewed changes

kswang1029 added the awaiting code review For pull requests that require code review label May 20, 2024

confluence merged commit ce8f6c0 into dev May 20, 2024
14 checks passed

confluence deleted the pam/1368_hdf5_animation branch May 20, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hdf5 tile data after animation stops #1371

Fix hdf5 tile data after animation stops #1371

pford commented Apr 17, 2024 •

edited by kswang1029

pford commented Apr 17, 2024

confluence commented May 2, 2024

confluence commented May 2, 2024

pford commented May 7, 2024

confluence commented May 8, 2024

confluence commented May 10, 2024 •

edited

pford commented May 13, 2024

confluence left a comment

github-actions bot commented May 13, 2024

confluence commented May 14, 2024

kswang1029 left a comment

confluence commented May 20, 2024

kswang1029 commented May 20, 2024 •

edited

Fix hdf5 tile data after animation stops #1371

Fix hdf5 tile data after animation stops #1371

Conversation

pford commented Apr 17, 2024 • edited by kswang1029

pford commented Apr 17, 2024

confluence commented May 2, 2024

confluence commented May 2, 2024

pford commented May 7, 2024

confluence commented May 8, 2024

confluence commented May 10, 2024 • edited

pford commented May 13, 2024

confluence left a comment

Choose a reason for hiding this comment

github-actions bot commented May 13, 2024

confluence commented May 14, 2024

kswang1029 left a comment

Choose a reason for hiding this comment

confluence commented May 20, 2024

kswang1029 commented May 20, 2024 • edited

pford commented Apr 17, 2024 •

edited by kswang1029

confluence commented May 10, 2024 •

edited

kswang1029 commented May 20, 2024 •

edited