Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL accelerate fails with confusing reason if policy prevents benchmark execution due to xc: coder #6917

Open
GreenReaper opened this issue Dec 4, 2023 · 1 comment

Comments

@GreenReaper
Copy link

GreenReaper commented Dec 4, 2023

ImageMagick version

7.1.1-21 Q16 x86_64 21667

Operating system

Linux

Operating system, version and so on

Debian Bookworm, Linux 6.5.13, OpenCL 3.0 Intel 22.43.24595 ICD Loader 2.3.1 on UHD Graphics P630 (Xeon E-2274G)

Description

After finding out our new server GPU supported OpenCL and it appeared to effectively save one of four cores (and offer a ~20% speedup for image scaling) I was very excited.

However I found out with intel_gpu_top that the Render/3D cores were not being used in production once the system had been fully set up. Curious, I tried to run it manually and ran into this headscratcher:

convert: MagickCore/effect.c:781: BlurImage: Assertion `image != (const Image *) NULL' failed.

This was rather confusing, as I wasn't trying to blur the image, and further error messages would have been useful to help identify the root cause - a policy restriction impacting a component of the OpenCL benchmark.

Steps to Reproduce

Rather, the operation was along the lines of a scaling in linear colour space:

MAGICK_OCL_DEVICE=true convert PNG:test.png -gamma 0.454545454545455 -filter Lanczos2 -resize '920x920>' -gamma 2.2 -sampling-factor 1x1 -quality 95 -strip +profile '*' -depth 8 PNG:/tmp/screen.png

Eventually I found the very helpful -debug that helped me track down the blur in question that uses xc: assuming it works:

...[loads test file successfully and looks for output file]...
0:00.741 2.380u 7.1.1 Accelerate convert[629080]: opencl.c/GetOpenCLCacheDirectory/373/Accelerate
  Using cache directory: "/home/greenreaper/.cache/ImageMagick"
0:00.766 2.490u 7.1.1 Accelerate convert[629080]: opencl.c/LoadOpenCLDevices/2435/Accelerate
  Found device: Intel(R) UHD Graphics P630 [0x3e96] (Intel(R) OpenCL HD Graphics)
0:00.766 2.490u 7.1.1 Accelerate convert[629080]: opencl.c/BenchmarkOpenCLDevices/1203/Accelerate
  Starting benchmark
0:00.766 2.490u 7.1.1 Policy convert[629080]: policy.c/IsRightsAuthorized/656/Policy
  Domain: Module; rights=Unrecognized; pattern="XC" ...
0:00.766 2.490u 7.1.1 Policy convert[629080]: policy.c/IsRightsAuthorized/656/Policy
  Domain: Coder; rights=Read; pattern="XC" ...
...[locale lookup follows, presumably for error message]...
0:00.768 2.490u 7.1.1 Exception convert[629080]: constitute.c/IsCoderAuthorized/454/Exception
  attempt to perform an operation not allowed by the security policy `XC'

This referred to our policy, which consisted of the following:

<policymap>
  <policy domain="Undefined" rights="none"/>
  <policy domain="resource" name="temporary-path" value="/var/tmp"/>
  <policy domain="resource" name="memory" value="4GiB"/>
  <policy domain="resource" name="map" value="4GiB"/>
  <policy domain="resource" name="area" value="144MP"/>
  <policy domain="resource" name="disk" value="16GiB"/>
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" />
  <policy domain="coder" rights="read|write" pattern="{GIF,JPEG,PNG,PNG00}" />
  <policy domain="path" rights="none" pattern="@*" />
  <policy domain="path" rights="none" pattern="|*" />
</policymap>

(It is possible there are other deficiencies in this policy, which was established in a hurry after seeing what happened to other sites a few years ago, along with compiling our own build to reduce surface area - I just include it because it led to the issue.)

I am not sure if this would apply to the currently-suggested module-based web server security policy, or websafe policy i.e.

<policy domain="delegate" rights="none" pattern="*" />
<policy domain="module" rights="none" pattern="*" />
<policy domain="module" rights="read | write" pattern="{GIF,JPEG,PNG,WEBP}" />

XC is not mentioned in the OpenCL requirements. We probably didn't notice this earlier because the home directory for www-data was unset (it was in the render group to access the device node) so it would have failed even earlier on with something like:

0:00.871 2.440u 7.1.1 Accelerate convert[772933]: opencl.c/GetOpenCLCacheDirectory/369/Accelerate
  Cannot use cache directory: "/root/ImageMagick"

This can be duplicated with e.g. MAGICK_OPENCL_CACHE_DIR=/root MAGICK_OCL_DEVICE=true on a regular user. It's a bit insidious because the convert then works, it's just not accelerated. Perhaps this is better than not working, though.

After defining a dir and adding <policy domain="coder" rights="read" pattern="{XC}" /> to policy we got a better experience:

$ time MAGICK_OCL_DEVICE=GPU convert -verbose PNG:test.png -gamma 0.454545454545455 -filter Lanczos2 -resize '920x920>' -gamma 2.2 -sampling-factor 1x1 -quality 95 -strip +profile '*' -depth 8 PNG:/tmp/screen2.png
PNG:test.png=>test.png PNG 2600x3120 2600x3120+0+0 8-bit TrueColor sRGB 1.55623MiB 0.260u 0:00.272
PNG:test.png=>/tmp/screen2.png PNG 2600x3120=>767x920 767x920+0+0 8-bit sRGB 1.55623MiB 1.540u 0:00.423

real    0m1.335s
user    0m4.457s
sys     0m0.118s

$ time MAGICK_OCL_DEVICE=CPU convert -verbose PNG:test.png -gamma 0.454545454545455 -filter Lanczos2 -resize '920x920>' -gamma 2.2 -sampling-factor 1x1 -quality 95 -strip +profile '*' -depth 8 PNG:/tmp/screen2.png
PNG:test.png=>test.png PNG 2600x3120 2600x3120+0+0 8-bit TrueColor sRGB 1.55623MiB 0.240u 0:00.258
PNG:test.png=>/tmp/screen2.png PNG 2600x3120=>767x920 767x920+0+0 8-bit sRGB 1.55623MiB 1.100u 0:00.326

real    0m1.384s
user    0m5.500s
sys     0m0.069s

$ time convert -verbose PNG:test.png -gamma 0.454545454545455 -filter Lanczos2 -resize '920x920>' -gamma 2.2 -sampling-factor 1x1 -quality 95 -strip +profile '*' -depth 8 PNG:/tmp/screen2.png
PNG:test.png=>test.png PNG 2600x3120 2600x3120+0+0 8-bit TrueColor sRGB 1.55623MiB 0.260u 0:00.263
PNG:test.png=>/tmp/screen2.png PNG 2600x3120=>767x920 767x920+0+0 8-bit sRGB 1.55623MiB 1.020u 0:00.323

real    0m1.399s
user    0m5.515s
sys     0m0.069s

The effect is dominated by PNG compression; the CPU benefit is even more noticeable with a 15300x8925 JPG.

$ time MAGICK_OCL_DEVICE=GPU convert JPEG:test.jpg -gamma 0.454545454545455 -filter Lanczos2 -resize '920x920>' -gamma 2.2 -sampling-factor 1x1 -quality 95 -strip +profile '*' -depth 8 JPEG:screen.jpg

real    0m5.233s
user    0m10.870s
sys     0m0.245s

$ time convert JPEG:test.jpg -gamma 0.454545454545455 -filter Lanczos2 -resize '920x920>' -gamma 2.2 -sampling-factor 1x1 -quality 95 -strip +profile '*' -depth 8 JPEG:screen.jpg           
real    0m6.549s
user    0m28.351s
sys     0m0.225s

FWIW, a JPG of 16000x9000 was not accelerated and it was not clear why, although looking at the source there are many potential exit points. Debug just showed that it opened and closed a pixel cache, where normally it would have had this in the middle:

Accelerate convert[1754212]: opencl.c/AcquireOpenCLKernel/726/Accelerate  Using kernel: ResizeHorizontalFilter
Accelerate convert[1754212]: opencl.c/AcquireOpenCLKernel/726/Accelerate  Using kernel: ResizeVerticalFilter

before doing it the regular way. We have few files that size, and it worked on 15300x8925 and 10963x12350, so not really an issue. The iGPU's own limit on JPEG output is 16k so perhaps it's running into that. Anyway, just mentioning that in passing - the issue here is that Accelerate failed in a confusing way, and from the error message it wasn't clear that it was OpenCL-related, let alone policy. Possibly this is because OpenCL was added before security policies became popular, so xc: would very rarely be restricted.

@GreenReaper GreenReaper changed the title OpenCL accelerate fails with confusing reason if policy prevents benchmark execution due to xc: delegate OpenCL accelerate fails with confusing reason if policy prevents benchmark execution due to xc: coder Dec 4, 2023
@urban-warrior
Copy link
Member

Thank you for reporting the issue. We have successfully reproduced it and are actively working on a patch to resolve it. You can expect this patch to be merged into the main GIT branch, later today. As part of our commitment to quality, this fix will also be included in the upcoming beta releases of ImageMagick by tomorrow. Your patience and feedback are greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants