Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PickShiftedUDF #1635

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

sivborg
Copy link

@sivborg sivborg commented May 3, 2024

Contributor Checklist:

Reviewer Checklist:

  • /azp run libertem.libertem-data passed
  • No import of GPL code from MIT code

This is an addition of a new PickUDF that is able to shift the data that is retrieved. This is useful because it makes it possible to correct raw data for unintended shifts, for example descan which can obstruct electromagnetic signal in STEM-DPC. This functions as the previous PickUDF, but you can give corrective regression parameters indicating how much to shift the data. This can be found for example by using the CoMUDF:

from libertem import api
from libertem.udf.com import CoMUDF
from libertem.udf.raw import PickShiftedUDF

ctx = api.Context()

dataset = ctx.load('dask', dask_array=s.data, sig_dims=2)

comudf = CoMUDF.with_params(cy=128, cx=128, r=1000, regression=1)
comresult = ctx.run_udf(dataset=dataset, udf=comudf)

pickudf = PickShiftedUDF(regression_coefficients=comresult["regression"].data)
result = ctx.run_udf(udf=pickudf, dataset=dataset, roi=(10,10))

The way this is implemented is with scipy.ndimage.shift, done as a postprocessing step in the UDF. The shifts are found with the self.meta.coordinates attribute, which is passed around as a result buffer. So perhaps there is a cleaner way of doing this. The UDF seems to work well for my practical purposes and I have set up some accompanying tests.

Copy link

codecov bot commented May 3, 2024

Codecov Report

Attention: Patch coverage is 84.78261% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 69.89%. Comparing base (59529c1) to head (791ce78).

❗ Current head 791ce78 differs from pull request most recent head d3dd4a6. Consider uploading reports for the commit d3dd4a6 to get more accurate results

Files Patch % Lines
src/libertem/udf/raw.py 84.78% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1635      +/-   ##
==========================================
+ Coverage   69.85%   69.89%   +0.03%     
==========================================
  Files         325      325              
  Lines       27255    27299      +44     
  Branches     3139     3147       +8     
==========================================
+ Hits        19039    19080      +41     
  Misses       7647     7647              
- Partials      569      572       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@uellue uellue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good idea and a good start!

For changelog you can add a file in docs/source/changelog/features/. There are already two, you can just follow the pattern. :-)

Comment on lines +88 to +99
Parameters
----------
regression_coefficients : (3, 2) matrix
A vector of regression coefficients for x- and y-directions.
The regression coefficients make up two planes that describe
the descan and are used to remove it.
The coefficients are arranged in the following manner:
((y, x), (dy/dy, dx/dy), (dy/dx, dx/dx)) where y and x are
the shifts at coordinates (0, 0). The displayed image will be
shifted in x- and y-directions by these parameters. By default
`None`, which corresponds to no shifts.
'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about supporting the same options as CoMUDF, or even use the CoMParams class:

class CoMParams(NamedTuple):
"""
Used to instantiate :class:`~libertem.udf.com.CoMUDF`.
See :meth:`~libertem.udf.com.CoMUDF.with_params`
for descriptions of each parameter. In most cases use this
method rather than instantiating this class directly.
"""
cy: Optional[float] = None
cx: Optional[float] = None
r: float = float('inf')
ri: Union[float, None] = 0.
scan_rotation: float = 0.
flip_y: bool = False
regression: RegressionOptionsT = RegressionOptions.NO_REGRESSION

scan_rotation and flip_y would apply perfectly for this use case and should be easy to implement with the scipy.ndimage functions.

In particular when dealing with DPC, it can help if in the picked frames everything points in the right directions relative to the scan! By using the same parameters, one can use them for both. As a side effect one could use this to create a corrected dataset! Could be merged with corresponding code in pyxem as a perspective?

Comment on lines +128 to +152
def get_results(self):

intensity = self.results.get_buffer('intensity').data

regression_coefficients = np.array(self.params.regression_coefficients)
# Only do shifts if nonzero regression coefficients
if regression_coefficients.any():

coordinates = self.results.get_buffer('coordinates').raw_data

prefix_ones = np.ones((*coordinates.shape[:-1], 1))
coordinates = np.concatenate((prefix_ones, coordinates), axis=-1)
shifts = np.dot(coordinates, self.params.regression_coefficients)

shifted = []
for intens, shift in zip(intensity, shifts):
shifted.append(scipy.ndimage.shift(intens, -shift, mode="constant"))

if shifted:
intensity = np.stack(shifted)

return {
"intensity": intensity,
"coordinates": self.results.get_buffer('coordinates').raw_data
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using process_frame() and doing all this there? It would be easier to read, and the processing would be performed on the workers, not on the main process.

Later one could look into using process_partition() or postprocess() to process larger chunks, which would be beneficial for using GPUs.

Comment on lines +47 to +90


def test_pick_shifted(lt_ctx, delayed_ctx):
data = _mk_random(size=(16, 16, 16, 16), dtype="float32")
# data = np.ones((16, 16, 16, 16), dtype="float32")
# data = np.arange(0, 16*16*16*16, dtype="float32").reshape((16, 16, 16, 16))
dataset = MemoryDataSet(data=data, tileshape=(3, 7, 16),
num_partitions=7, sig_dims=2)
roi = np.random.choice([True, False], size=dataset.shape.nav)
roi[0] = True

regression_coefficients = np.array([(3, 3), (0, 0), (0, 0)])

udf = PickShiftedUDF(regression_coefficients=regression_coefficients)
res = lt_ctx.run_udf(dataset=dataset, udf=udf, roi=roi)
res_delayed = delayed_ctx.run_udf(dataset=dataset, udf=udf, roi=roi)

assert np.allclose(data[roi][..., 3:, 3:], res['intensity'].data[..., :-3, :-3], 1e-15)
assert np.allclose(data[roi][..., 3:, 3:], res_delayed['intensity'].data[..., :-3, :-3], 1e-15)

assert data.dtype == res['intensity'].data.dtype
assert data.dtype == res_delayed['intensity'].data.dtype


def test_pick_shifted_empty_roi(lt_ctx, delayed_ctx):
data = _mk_random(size=(16, 16, 16, 16), dtype="float32")
dataset = MemoryDataSet(data=data, tileshape=(3, 7, 7),
num_partitions=7, sig_dims=2)
roi = np.zeros(dataset.shape.nav, dtype=bool)

regression_coefficients = np.array([(3, 3), (0, 0), (0, 0)])

udf = PickShiftedUDF(regression_coefficients=regression_coefficients)
res = lt_ctx.run_udf(dataset=dataset, udf=udf, roi=roi)
res_delayed = delayed_ctx.run_udf(dataset=dataset, udf=udf, roi=roi)

assert np.allclose(data[roi], res['intensity'].data)
assert np.allclose(data[roi], res_delayed['intensity'].data)

assert data[roi].shape == res['intensity'].data.shape
assert data[roi].shape == res_delayed['intensity'].data.shape

assert data.dtype == res['intensity'].data.dtype
assert data.dtype == res_delayed['intensity'].data.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression coefficients should all be different to make sure we are not mixing them anywhere. In particular, the dy/dx ... coefficients require tests.

We should also confirm that the coefficients act the same as in CoMUDF, in particular if we support more parameters, as suggested above. A simple and comprehensive test would be comparing CoMUDF on an uncorrected dataset vs. CoMUDF on a corrected dataset.



class PickShiftedUDF(UDF):
'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'''
'''
.. versionadded:: 0.15.0

Comment on lines 29 to 31
def __init__(self):
super().__init__()
def __init__(self, **kwargs):
super().__init__(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why this change? The PickShiftedUDF is subclassed from UDF, so PickUDF doesn't need to pass parameters through. If one wanted a common subclass to share code, it would be cleaner to have a BasePickUDF with the common parts, and derive both PickUDF and PickShiftedUDF from that, each with a final constructor. That way the parameters are clear for autocompletion etc.

@uellue
Copy link
Member

uellue commented May 6, 2024

/azp run libertem.libertem-data

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sk1p sk1p added this to the 0.15 milestone May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants