gdImageXbmCtx is still locale-dependent #798

TysonAndre · 2021-12-22T14:04:54Z

Describe the bug

			/* only in C-locale isalnum() would work */
			if (!isupper(name[i]) && !islower(name[i]) && !isdigit(name[i])) {
				name[i] = '_';
			}

The functions isupper and islower are locale-dependent. In the locale "de_DE" (ISO-8859-1, not UTF-8) and other locales, various bytes return true for islower. This would be incorrect for utf-8 where characters span multiple bytes.

aa: ª
b5: µ
ba: º
df: ß
e0: à
e1: á
e2: â
....

Expected behavior
Expected: \xaa replaced with _ after LC_CTYPE=de_DE on a system with that locale installed

Actual results
Observed(expected): That byte is not replaced.

Fix: Check for the specific characters instead of using islower/isupper/isdigit?

Environment (please complete the following information):

OS: Linux Mint
sudo locale-gen de_DE; sudo locale-gen de_DE.UTF-8

Additional information
https://www.cplusplus.com/reference/cctype/islower/

Notice that what is considered a letter may depend on the locale being used; In the default "C" locale, a lowercase letter is any of: a b c d e f g h i j k l m n o p q r s t u v w x y z.

Other locales may consider a different selection of characters as lowercase characters, but never characters that returns true for iscntrl, isdigit, ispunct or isspace.

Noticed while working on php/php-src#7802

The text was updated successfully, but these errors were encountered:

cmb69 · 2021-12-22T14:38:11Z

Thanks for reporting!

I don't think we support EBCDIC, in which case it should be trivial to fix this.

vapier · 2021-12-22T16:01:26Z

irony is that the buggy code came from PHP

i guess the point of the logic is to try and create an identifier that is valid in X BitMap. that is based on C, but it's unclear which standard, so guessing the fact that newer C accepts Unicode names won't help.

do we really need to output the name here in the first place ? what if we just hardcoded it to gd_xbm_generated ?

cmb69 · 2021-12-22T16:10:57Z

what if we just hardcoded it to gd_xbm_generated ?

If somebody is really using this for including multiple XBMs in some C compilation unit, that won't work. On the other hand, I wonder whether anybody uses XBM at all nowadays.

vapier · 2021-12-22T16:17:25Z

If somebody is really using this for including multiple XBMs in some C compilation unit, that won't work.

that's already not working:

if people pass /foo/bar.xbm and /foo/foo/bar.xbm. the func does a basename & chops the extension.
if they pass bl ah.xbm and bl?ah.xbm, the normalization logic turns those both into bl_ah.
it has fallback logic to hardcode image in some cases. if you use /foo/.xbm, that is turned into image.

our choices:

hardcode the name and wait for someone to complain.
push this back on the user. change file_name to XBM identifier and remove the text Illegal characters are automatically stripped..

cmb69 · 2021-12-22T16:33:17Z

Oh, right, we never use that as filename, so I'd go with the latter (rename parameter, and don't mangle it anymore).

vapier · 2021-12-22T17:12:49Z

@TysonAndre feel like sending a PR for that idea ? :)

pierrejoye · 2021-12-23T04:26:39Z

@TysonAndre good catch :)

If API remains compatible (API and ABI), it can be done in 2.x, if not it has to wait 3.0

pierrejoye · 2021-12-23T04:29:32Z

Also we could manually check for non ascii and replaced with _ if not ascii. The name of the generated code (the #define) has to be ascii anyway for those using it in X apps. Nobody I know uses XBM for anything else. And going to the troubles of allowing the filename to be different from the define is not worth it, and I am not sure all compilers support random filenames for source codes either :)

vapier · 2021-12-23T05:44:38Z

we wouldn't really be breaking any API calls that aren't already broken

pierrejoye · 2021-12-23T05:59:52Z

@vapier I don't see how the api is broken but the unclear behavior for the name argument.

What would you suggest? I am not sure to grasp the comments here about changing the name and the likes :)

cmb69 · 2021-12-23T11:01:11Z

libgd/src/gd_xbm.c

Line 225 in 0f9dd96

    
           BGD_DECLARE(void) gdImageXbmCtx(gdImagePtr image, char* file_name, int fg, gdIOCtx * out)

The parameter is called file_name although it is never used as filename, but is rather used as identifier (prefix) for the generated C code. Renaming the parameter, and simultaniously dropping any name mangling, would solve the issue, but would be a BC break.

pierrejoye · 2021-12-23T11:42:16Z

ah that correct. It was because the convention meant to have the same filename as the define.

I am fine dropping the mangling. We could also simply rejecting non ASCII letters, underscore and digit first (what the C allows) , that can be considered as a bug fix as technically the file would be invalid.

More work but fully compatible, mangle all non allowed characters using our own functions and update the documentation.

I do favor the rejecting solution, less magic for the users.

pierrejoye · 2021-12-24T04:45:41Z

exactly

I tend to:

remove the mangling
reject invalid non a-z A-Z 0-9 and _ chars
update docs to make the specs and behavior clear

thoughts?

vapier added bug documentation labels Dec 22, 2021

vapier added this to the GD 2.4 milestone Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gdImageXbmCtx is still locale-dependent #798

gdImageXbmCtx is still locale-dependent #798

TysonAndre commented Dec 22, 2021

cmb69 commented Dec 22, 2021

vapier commented Dec 22, 2021

cmb69 commented Dec 22, 2021

vapier commented Dec 22, 2021

cmb69 commented Dec 22, 2021

vapier commented Dec 22, 2021

pierrejoye commented Dec 23, 2021

pierrejoye commented Dec 23, 2021 •

edited

vapier commented Dec 23, 2021

pierrejoye commented Dec 23, 2021

cmb69 commented Dec 23, 2021

pierrejoye commented Dec 23, 2021

pierrejoye commented Dec 24, 2021 •

edited

gdImageXbmCtx is still locale-dependent #798

gdImageXbmCtx is still locale-dependent #798

Comments

TysonAndre commented Dec 22, 2021

cmb69 commented Dec 22, 2021

vapier commented Dec 22, 2021

cmb69 commented Dec 22, 2021

vapier commented Dec 22, 2021

cmb69 commented Dec 22, 2021

vapier commented Dec 22, 2021

pierrejoye commented Dec 23, 2021

pierrejoye commented Dec 23, 2021 • edited

vapier commented Dec 23, 2021

pierrejoye commented Dec 23, 2021

cmb69 commented Dec 23, 2021

pierrejoye commented Dec 23, 2021

pierrejoye commented Dec 24, 2021 • edited

pierrejoye commented Dec 23, 2021 •

edited

pierrejoye commented Dec 24, 2021 •

edited