Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdImageXbmCtx is still locale-dependent #798

Open
TysonAndre opened this issue Dec 22, 2021 · 13 comments
Open

gdImageXbmCtx is still locale-dependent #798

TysonAndre opened this issue Dec 22, 2021 · 13 comments

Comments

@TysonAndre
Copy link

Describe the bug

			/* only in C-locale isalnum() would work */
			if (!isupper(name[i]) && !islower(name[i]) && !isdigit(name[i])) {
				name[i] = '_';
			}

The functions isupper and islower are locale-dependent. In the locale "de_DE" (ISO-8859-1, not UTF-8) and other locales, various bytes return true for islower. This would be incorrect for utf-8 where characters span multiple bytes.

aa: ª
b5: µ
ba: º
df: ß
e0: à
e1: á
e2: â
....

Expected behavior
Expected: \xaa replaced with _ after LC_CTYPE=de_DE on a system with that locale installed

Actual results
Observed(expected): That byte is not replaced.

Fix: Check for the specific characters instead of using islower/isupper/isdigit?

Environment (please complete the following information):

  • OS: Linux Mint
  • sudo locale-gen de_DE; sudo locale-gen de_DE.UTF-8

Additional information
https://www.cplusplus.com/reference/cctype/islower/

Notice that what is considered a letter may depend on the locale being used; In the default "C" locale, a lowercase letter is any of: a b c d e f g h i j k l m n o p q r s t u v w x y z.

Other locales may consider a different selection of characters as lowercase characters, but never characters that returns true for iscntrl, isdigit, ispunct or isspace.

Noticed while working on php/php-src#7802

@cmb69
Copy link
Contributor

cmb69 commented Dec 22, 2021

Thanks for reporting!

I don't think we support EBCDIC, in which case it should be trivial to fix this.

@vapier
Copy link
Member

vapier commented Dec 22, 2021

irony is that the buggy code came from PHP

i guess the point of the logic is to try and create an identifier that is valid in X BitMap. that is based on C, but it's unclear which standard, so guessing the fact that newer C accepts Unicode names won't help.

do we really need to output the name here in the first place ? what if we just hardcoded it to gd_xbm_generated ?

@cmb69
Copy link
Contributor

cmb69 commented Dec 22, 2021

what if we just hardcoded it to gd_xbm_generated ?

If somebody is really using this for including multiple XBMs in some C compilation unit, that won't work. On the other hand, I wonder whether anybody uses XBM at all nowadays.

@vapier
Copy link
Member

vapier commented Dec 22, 2021

If somebody is really using this for including multiple XBMs in some C compilation unit, that won't work.

that's already not working:

  • if people pass /foo/bar.xbm and /foo/foo/bar.xbm. the func does a basename & chops the extension.
  • if they pass bl ah.xbm and bl?ah.xbm, the normalization logic turns those both into bl_ah.
  • it has fallback logic to hardcode image in some cases. if you use /foo/.xbm, that is turned into image.

our choices:

  • hardcode the name and wait for someone to complain.
  • push this back on the user. change file_name to XBM identifier and remove the text Illegal characters are automatically stripped..

@cmb69
Copy link
Contributor

cmb69 commented Dec 22, 2021

Oh, right, we never use that as filename, so I'd go with the latter (rename parameter, and don't mangle it anymore).

@vapier
Copy link
Member

vapier commented Dec 22, 2021

@TysonAndre feel like sending a PR for that idea ? :)

@vapier vapier added this to the GD 2.4 milestone Dec 22, 2021
@pierrejoye
Copy link
Contributor

@TysonAndre good catch :)

If API remains compatible (API and ABI), it can be done in 2.x, if not it has to wait 3.0

@pierrejoye
Copy link
Contributor

pierrejoye commented Dec 23, 2021

Also we could manually check for non ascii and replaced with _ if not ascii. The name of the generated code (the #define) has to be ascii anyway for those using it in X apps. Nobody I know uses XBM for anything else. And going to the troubles of allowing the filename to be different from the define is not worth it, and I am not sure all compilers support random filenames for source codes either :)

@vapier
Copy link
Member

vapier commented Dec 23, 2021

we wouldn't really be breaking any API calls that aren't already broken

@pierrejoye
Copy link
Contributor

@vapier I don't see how the api is broken but the unclear behavior for the name argument.

What would you suggest? I am not sure to grasp the comments here about changing the name and the likes :)

@cmb69
Copy link
Contributor

cmb69 commented Dec 23, 2021

BGD_DECLARE(void) gdImageXbmCtx(gdImagePtr image, char* file_name, int fg, gdIOCtx * out)

The parameter is called file_name although it is never used as filename, but is rather used as identifier (prefix) for the generated C code. Renaming the parameter, and simultaniously dropping any name mangling, would solve the issue, but would be a BC break.

@pierrejoye
Copy link
Contributor

ah that correct. It was because the convention meant to have the same filename as the define.

I am fine dropping the mangling. We could also simply rejecting non ASCII letters, underscore and digit first (what the C allows) , that can be considered as a bug fix as technically the file would be invalid.

More work but fully compatible, mangle all non allowed characters using our own functions and update the documentation.

I do favor the rejecting solution, less magic for the users.

@pierrejoye
Copy link
Contributor

pierrejoye commented Dec 24, 2021

exactly

I tend to:

  • remove the mangling
  • reject invalid non a-z A-Z 0-9 and _ chars
  • update docs to make the specs and behavior clear

thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants