Add an option to cache the whole cube image #1276

markccchiang · 2023-06-28T02:47:00Z

Description

What is implemented or fixed? Mention the linked issue(s), if any.
Cache all cube image data #1286
This is the trial to cache the whole cube image if its memory space is under a specific quota. Users can set the memory quota via the command line --full_image_cache_size_available when starting the backend. For example, ./carta_backend --full_image_cache_size_available 1000 means at most 1000 MB of memory space for the backend can be used. If the memory of a cube image we open is under 1000 MB, it will cache all the cube image data. The upper limit of full image cache is defined as 90% of users' computer ram capacity. If users set the full image cache that exceeds the upper limit, the backend will reset it to the upper limit. By default, the full image cache is 0 MB. This function will not apply to HDF5 images (only available for CASA or FITS files). The cube image cache will be used in the following calculations:

image rendering, including animation
cube histogram calculations
contours
vector field calculations
spatial profiles
spectral profiles for cursor/point region
spectral profiles for the other region types, including their statistical calculations
PV image generator
above functions for computed stokes data
🚧 image moments (This item will do in a separate PR after this PR is merged)

How does this PR solve the issue? Give a brief summary.
Create an ImageCache struct under the Frame class and use it to deal with the channel or cube image data cache, which is obtained from the casacore.
Are there any companion PRs (frontend, protobuf)?
No.
Is there anything else that testers should know (e.g. exactly how to reproduce the issue)?
The performances of cached cube images are expected to be better than without the cache (i.e., the current dev branch). Especially when accessing spectral profiles or data in the z-direction.

Checklist

changelog updated ~~/ no changelog update needed~~
e2e test passing / ~~added corresponding fix~~
~~protobuf updated to the latest dev commit /~~ no protobuf update needed
added reviewers and assignee
added ZenHub estimate, milestone, and release

…mark/opt_to_cache_image_cubes

…ache

ajm-ska · 2023-07-01T02:48:53Z

@markccchiang Just to mention, I am aware the macOS12 unit tests are failing here. They consistently failed 4 times in a row when I made Github Actions rerun them. When I let Github Actions re-run the previous commit, it passed the first time.

The error is: Error: Timeout of 120000ms hit.

I have tried building and running it manually with the Address Sanitizer flags as that is what Github Actions has always been doing. The runtime is quite variable: 152.68s, 65.16s, 144.22s.
To make things more confusing, I tried the previous commit (b7fb274) and the Unit Tests run times appear to be more consistent but all longer than 2 minutes: 152.19s, 151.23s, 150.13s. So it seems that it was very lucky that the Github Actions somehow passed twice on the previous commit, (and maybe lucky it passed on earlier commits too)!

I figured a better test would be to compare it with the 'dev' branch. I get: 90.50s, 91.35s, 91.51s.

Therefore, something in this branch has slowed down the Units Tests runtime and they can no longer consistently finish within 2 minutes. It seems to only be an issue on macOS12, which is an original M1 Mac.

I have not gone so far as to identify which commit in your branch has caused the Unit Tests to slow down because first I guess we need to know if this is considered an actual issue?
It may show up later in the performance testing.
The one 65.16s runtime looks great, but unfortunately, it is not consistent.

If it is not considered an issue, you can get around the Github Actions failure by increasing the timeout in .github/workflows/continuous_integration.yml line 130: timeout_minutes: 2 to perhaps 3.

markccchiang · 2023-07-01T03:17:23Z

@markccchiang Just to mention, I am aware the macOS12 unit tests are failing here. They consistently failed 4 times in a row when I made Github Actions rerun them. When I let Github Actions re-run the previous commit, it passed the first time.

The error is: Error: Timeout of 120000ms hit.

@ajm-asiaa Thank you for the reminding and investigating! You are right. This is due to the sample image files I generated are too large. Because I was doing performance tests for the functionalities of cached cube images in unit tests. I will reduce the size of sample image files. So this problem will be solved without changing the timeout setting.

…mark/opt_to_cache_image_cubes

…le cache and mip data

kswang1029 · 2023-07-18T04:32:10Z

@markccchiang here are a few findings I observed so far:

if we have two images loaded and matched, when we enable multiple-profile plot mode in the spectral profiler to see region spectra, the backend will crash.
If we load three images (20, 20, and 90 MB), from the backend log we see (I set --reserved_memory 4)

[2023-07-18 04:28:28.210] [CARTA] [info] 4 GB of reserved memory are available.
[2023-07-18 04:28:28.213] [CARTA] [info] Cache the whole cube image data.
[2023-07-18 04:28:28.287] [CARTA] [info] 3 GB of reserved memory are available.
[2023-07-18 04:28:28.673] [CARTA] [info] Cache the whole cube image data.
[2023-07-18 04:28:28.681] [CARTA] [info] 2 GB of reserved memory are available.
[2023-07-18 04:28:28.935] [CARTA] [info] Cache the whole cube image data.
[2023-07-18 04:28:28.943] [CARTA] [info] 1 GB of reserved memory are available.

This seems not right. The actually RAM usage from the macOS activity monitor shows the right amount.
3. In the backend log, maybe we should show more digits to display the amount of available memory (eg 2.34 GB)
4. When we launch carta_backend with --reserved_memory flag, we should also add [info] log too.

Will have more tests soon.

kswang1029 · 2023-07-18T04:51:09Z

@markccchiang Overall, this does improve UX significantly in many I/O intensive features. @veggiesaurus please have a test as well if possible. We also need a strategy on RAM management for the controller deployment. Currently it is the backend flag --reserved_memory doing the trick. Not sure about the controller side. 🤔

veggiesaurus · 2023-07-18T07:31:44Z

We might also need the ability to control the total amount of memory that cached cubes can take up. For example, if I load 10 images each with 10 GB, would that allow it, and use up 100 GB of memory?

kswang1029 · 2023-07-18T07:44:09Z

We might also need the ability to control the total amount of memory that cached cubes can take up. For example, if I load 10 images each with 10 GB, would that allow it, and use up 100 GB of memory?

@markccchiang could you check if the current setup is for the total amount of caching memory, not for individual image? I also found that we need to improve the [info] logs when loading images that can or cannot be cached. The use case could be I reserved 4 GB as image cache. Then I load the following images in order

a.fits 3 GB
b.fits 2 GB
c.fits 500 MB
In this case, a.fits will be cached. b.fits will not. c.fits will be cached. In any case, we will need to have [info] log to be more informative on whether or not the image being loaded will be cached or not.

…move it from constructors of FullImageCache and CubeImageCache

…cheSize from ImageCache

…geChannels as UpdateValidity from ImageCache

…ds from Frame

…esCache

…nd IsCurrentStokes from Frame

confluence

This is a partial review, which includes some general remarks about the current structure of the code, and ways in which it could be refactored to divide responsibility between these classes more consistently.

confluence · 2024-04-30T09:44:49Z

src/Cache/StokesCache.h

+    void UpdateValidity(int stokes) override;
+
+private:
+    bool FillCubeImageCache(std::unique_ptr<float[]>& stokes_data, int stokes);


This should be renamed to FillStokesCache.

confluence · 2024-04-30T09:45:46Z

test/TestCubeImageCache.cc

@@ -0,0 +1,772 @@
+/* This file is part of the CARTA Image Viewer: https://github.com/CARTAvis/carta-backend


This test file should be renamed (probably to TestImageCaches), and the class and test names should be updated to match the new names in the code.

confluence · 2024-04-30T13:41:52Z

src/Frame/Frame.h

+    casacore::IPosition OriginalImageShape() const; // Image shape from the original file
+    size_t Width() const;                           // length of x axis
+    size_t Height() const;                          // length of y axis
+    size_t Depth() const;                           // length of z axis
+    size_t NumStokes() const;                       // if no stokes axis, number of stokes = 1
+    int XAxis() const;
+    int YAxis() const;
+    int ZAxis() const;
+    int StokesAxis() const;
+    int CurrentZ() const;
+    int CurrentStokes() const;


It's fine to add all of these getters for consistency, even if though some of these values aren't currently accessed from outside the frame class, but it isn't necessary to use them inside the frame class. It's fine to use the bare properties (e.g. _width rather than Width()). This will also greatly simplify the diff of the frame class.

confluence · 2024-04-30T13:48:19Z

src/Frame/Frame.cc

-    casacore::IPosition start(_image_shape.size());
+    casacore::IPosition start(OriginalImageShape().size());
    start = 0;
-    casacore::IPosition end(_image_shape);
+    casacore::IPosition end(OriginalImageShape());
    end -= 1; // last position, not length

    // Slice x axis
-    if (_x_axis >= 0) {
+    if (XAxis() >= 0) {
        int start_x(x_range.from), end_x(x_range.to);

        // Normalize x constants
        if (start_x == ALL_X) {
            start_x = 0;
        }
        if (end_x == ALL_X) {
-            end_x = _width - 1;
+            end_x = Width() - 1;


This is an example of where the properties should still be used (rather than the getters). All of these little changes should be reverted.

There are a few places in the original code where getters are used instead of properties, and for consistency I think that we should update them, but I would prefer to do that in a separate PR (to keep unrelated changes out of this PR).

confluence · 2024-04-30T13:51:38Z

src/Frame/Frame.cc

+bool Frame::UpdateChannelCache() {
+    return _image_cache->UpdateChannelCache(CurrentZ(), CurrentStokes());
 }


Since this is now a one-liner, and only used in a few places in Frame, I would prefer to remove this wrapper function and call the function on the cache directly.

confluence · 2024-04-30T15:51:07Z

src/Cache/ChannelCache.cc

+float ChannelCache::GetValue(int x, int y, int z, int stokes) {
+    bool write_lock(false);
+    queuing_rw_mutex_scoped cache_lock(&_cache_mutex, write_lock);
+    return _channel_data[(_width * y) + x];
+}


This function is used in two different ways: from Frame, to access a single value (where this lock is required), and inside a loop in other cache functions (where this lock should not be used; the calling function should acquire the lock once).

But this is part of a broader issue with the way that the cache functions are currently used from frame, which I will describe in more detail below.

confluence · 2024-04-30T15:54:46Z

src/Cache/StokesCache.cc

+}
+
+bool StokesCache::ChannelDataAvailable(int z, int stokes) const {
+    return (z == ALL_Z) && (stokes == _frame->CurrentStokes()) && _stokes_image_cache_valid;


Surely the (z == ALL_Z) check here is redundant? If the stokes matches, data is available for any z value.

confluence · 2024-04-30T16:13:12Z

src/Frame/Frame.cc

+    if (ImageCacheAvailable()) {
+        cursor_value_with_current_stokes = _image_cache->GetValue(x, y, CurrentZ(), CurrentStokes());


There are several places in Frame where the cache is used like this. It doesn't make sense to check first if the cache is valid and then acquire the cache lock and get the data. The check should be performed after the lock is acquired, in the same function as the data retrieval.

The correct thing to do is probably to refactor all instances like this so that the public function on the cache returns a boolean and sets an input/output parameter instead, checks internally whether the data is available, and returns false if it isn't. Some cache functions already have this kind of interface, but it's not consistent.

In this specific case, there should be two separate cache functions for getting a value: a public one, which acquires the lock and checks if the data is available, and a protected internal function which just fetches the data.

It would also be helpful to unify the behaviour of functions of data which fetch data only if it exists in the cache, and functions which update the cache if necessary and then fetch data. Currently the frame explicitly updates the channel cache in several places. This should be the internal responsibility of the cache -- the frame should only invalidate the cache when the channel or stokes changes. Functions for fetching data from the cache should have an update flag which determines whether the cache should be updated on demand when that function is called. The frame would then set that flag appropriately.

confluence · 2024-04-30T16:16:00Z

src/Frame/Frame.cc

+                } else {
+                    // Required stokes is not the current stokes or the stokes needs to be computed
+                    if (ImageCacheAvailable(CurrentZ(), stokes)) {
+                        get_spatial_profile_from_cache(stokes);


ImageCacheAvailable is called a second time here inside get_spatial_profile_from_cache. But this is an example of logic which should be refactored.

…mark/opt_to_cache_image_cubes

…kesCache

github-actions · 2024-05-08T08:25:19Z

Package	Line Rate	Health
src.Cache	50%	➖
src.DataStream	44%	➖
src.FileList	67%	➖
src.Frame	44%	➖
src.HttpServer	42%	➖
src.ImageData	32%	❌
src.ImageFitter	83%	✔
src.ImageGenerators	43%	➖
src.ImageStats	81%	✔
src.Logger	37%	❌
src.Main	53%	➖
src.Region	71%	➖
src.Session	4%	❌
src.Table	52%	➖
src.ThreadingManager	67%	➖
src.Timer	85%	✔
src.Util	41%	➖
Summary	48% (9091 / 19126)	➖

markccchiang added 7 commits June 19, 2023 13:22

Add an option and allow the Frame to cache whole cube image data

c71cc85

Add an option to set reserved memory for Session

93279b7

Apply the cube image cache in cube histogram calculations

dc9bea9

Apply the cube image cache to get the cursor spectral profile

32db94c

Apply the cube image cache to get point region spectral profile

17805f5

Merge branch 'dev' of https://github.com/CARTAvis/carta-backend into …

b7fb274

…mark/opt_to_cache_image_cubes

Improve cube histogram calculations when using the whole cube image c…

3acbd1e

…ache

markccchiang added 17 commits July 1, 2023 23:48

Refactor testing codes and reduce the size of sample image files

2a28d9d

Apply cube image cache in region spectral profile calculations

4a1d916

Refactor testing codes

a21c40c

Minor code changes

1a68805

Solve merge conflicts

cf8b021

Parallelize statistical calculations for region spectral profiles

58d9c3b

Slightly refactor the code

3d9a896

Enable to get the cache of cube histogram data

606dd3e

Minor code changes and modify the testing code

d64dd65

Add a tester for the consistency of image pixel data

56bd96e

Apply cube image caches in rendering computed stokes pixel data

2db074c

Minor code changes

7d6e9d5

Apply cube image cache on region spectral profiles for computed stokes

ba83bdc

Merge branch 'dev' of https://github.com/CARTAvis/carta-backend into …

34063b9

…mark/opt_to_cache_image_cubes

Improve performances for computed stokes spectral profiles

7dbc599

Cache the whole image data only for non-HDF5 files or HDF5 without ti…

65f1407

…le cache and mip data

Change the unit of reserved memory from MB to GB

0bcbb4d

markccchiang added 16 commits April 18, 2024 13:21

Add the mutex of image cache size on ImageCache::GetImageCache and re…

f800d1c

…move it from constructors of FullImageCache and CubeImageCache

Remove the method ImageCache::GetBeamArea

42682b8

Rename the method AssignFullImageCacheSizeAvailable as SetFullImageCa…

3117e70

…cheSize from ImageCache

Rename the class ChannelImageCache as ChannelCache

5559770

Rename the class CubeImageCache as StokesCache

46e0a2b

Set current z and stokes on the Frame, and rename the function SetIma…

61330b9

…geChannels as UpdateValidity from ImageCache

Move back the method GetImageSlicer from ImageCache to Frame

f59eefc

Rename the method FillImageCache as UpdateChannelCache from Frame

91d48f1

Move the image cache mutex from Frame to ImageCache

06fa0ef

Remove the sub-string Cached from the name of ImageCache methods

468c22a

Remove redundant headers

1b47a4b

Define two versions of the GetImageData and ImageCacheAvailable metho…

69c070f

…ds from Frame

Remove a check in the GetValue method from ChannelCache

29f4a8e

Remove a check from FillCubeImageCache and GetValue methods from Stok…

2b75cf1

…esCache

Remove methods CheckCurrentZ, CheckCurrentStokes, IsCurrentChannel, a…

26574f1

…nd IsCurrentStokes from Frame

Fix the header and headerguards styles

2398469

confluence requested changes Apr 30, 2024

View reviewed changes

markccchiang added 11 commits May 6, 2024 14:01

Merge branch 'dev' of https://github.com/CARTAvis/carta-backend into …

701a170

…mark/opt_to_cache_image_cubes

Rename the method FillCubeImageCache as FillStokesCache from StokesCache

d115ea5

Rename the tester TestCubeImageCache as TestImageCaches

0d22e11

Use properties rather than getters inside the Frame class

93a3ad2

Remove the wrapper function UpdateChannelCache from the Frame

41d77ad

Remove a redundant check from StokesCache::ChannelDataAvailable

af10a9f

Remove checks for image cache available from Frame and refactor codes

9f0447f

Merge branch 'dev' of https://github.com/CARTAvis/carta-backend into …

d9c0c37

…mark/opt_to_cache_image_cubes

Remove a redundant lock and add a channel data available check on Sto…

7c177ea

…kesCache

Move the calling of updating channel cache from Frame to ImageCache

aa6ec75

Clarify the valid flags in ImageCache

d918036

markccchiang closed this May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to cache the whole cube image #1276

Add an option to cache the whole cube image #1276

markccchiang commented Jun 28, 2023 •

edited

ajm-ska commented Jul 1, 2023

markccchiang commented Jul 1, 2023

kswang1029 commented Jul 18, 2023 •

edited

kswang1029 commented Jul 18, 2023 •

edited

veggiesaurus commented Jul 18, 2023

kswang1029 commented Jul 18, 2023 •

edited

confluence left a comment

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

confluence Apr 30, 2024

github-actions bot commented May 8, 2024

		@@ -0,0 +1,772 @@
		/* This file is part of the CARTA Image Viewer: https://github.com/CARTAvis/carta-backend

		if (ImageCacheAvailable()) {
		cursor_value_with_current_stokes = _image_cache->GetValue(x, y, CurrentZ(), CurrentStokes());

Add an option to cache the whole cube image #1276

Add an option to cache the whole cube image #1276

Conversation

markccchiang commented Jun 28, 2023 • edited

ajm-ska commented Jul 1, 2023

markccchiang commented Jul 1, 2023

kswang1029 commented Jul 18, 2023 • edited

kswang1029 commented Jul 18, 2023 • edited

veggiesaurus commented Jul 18, 2023

kswang1029 commented Jul 18, 2023 • edited

confluence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 8, 2024

markccchiang commented Jun 28, 2023 •

edited

kswang1029 commented Jul 18, 2023 •

edited

kswang1029 commented Jul 18, 2023 •

edited

kswang1029 commented Jul 18, 2023 •

edited