Provide parity for both escC and unescC to support more awkward filenames #220

Earnestly · 2023-08-28T23:30:00Z

I have been experimenting with awkward filenames, using -@ and #[CSTR] to support them. When my filename contains \b and \f for example, I and attempt to store this under the xmp-dc:source tag while creating a MIE file, it converts the backspace and formfeed into . dots making it impossible to recover the original filename.

Looking at the codebase I notice that there is support for a much wider range of typical C-style escapes when unescaping them, but few for escaping:

# lookup for C-style escape sequences
my %escC = ( "\n" => '\n', "\r" => '\r', "\t" => '\t', '\\' => '\\\\');
my %unescC = ( a => "\a", b => "\b", f => "\f", n => "\n", r => "\r",
               t => "\t", 0 => "\0", '\\' => '\\' );

Is there a reason why escC couldn't be at parity with unescC (or potentially one derived from the other)?

I.e.

# lookup for C-style escape sequences
my %unescC = ( a => "\a", b => "\b", f => "\f", n => "\n", r => "\r",
               t => "\t", 0 => "\0", '\\' => '\\' );
my %escC = reverse %unescC

However just adding more mappings to escC or editting unescapeChar doesn't seem to be enough as my local tests show the same result with unrecognised escapes being turned into ..

The text was updated successfully, but these errors were encountered:

StarGeekSpaceNerd · 2023-08-28T23:37:28Z

Try the -b (-binary) option. From the docs

This option is mainly used for extracting embedded images or other binary data, but it may also be useful for some text strings since control characters (such as newlines) are not replaced by '.' as they are in the default output.

Anything else will require Phil's attention, but he is currently away until mid-September.

Earnestly · 2023-08-28T23:46:22Z

Seems like -b doesn't help with this. A simple demonstration (using bash for $'dollar quote' feature):

$ printf content\\n > $'\a\f\n\b\r\t\e\v\\\"'

$ exiftool -tagsfromfile $'\a\f\n\b\r\t\e\v\\\"' -o test.mie -xmp-dc:source'<${filename}'
    1 image files created

$ exiftool -p '${source}' test.mie | od -An -tc
   .   .  \n   .  \r  \t   .   .   \   "  \n

The expected result would have been more like:

   \a  \f  \n  \b  \r  \t 033  \v   \   "  \n

The \033 (or \e) escape was thrown in to consider how it might approach arbitrary bytes, as a filename can contain any except NUL and /.

PS: I'll look into applying https://exiftool.org/faq.html#Q21 to see if that helps.

boardhead · 2023-09-19T13:16:50Z

This topic must be handled carefully because code injection from malicious file names is a real possibility. At the moment, ExifTool doesn't do more than necessary because this is the safest way to proceed. I would have to dedicate a good block of time to expanding this to cover all possible characters/escapes, and without a real-life use case I don't know if this would be a worthwhile way to spend my time. Your tests seem to be theoretical -- have you seen file names like this in the wild?

Earnestly · 2023-09-19T20:04:47Z

Rarely, but I do try to write software that handles the datatypes as they are (at least on unix filenames are defined as any sequence of bytes except / and NUL). Currently I include a check which excludes filenames containing most of these problem characters as a compromise.

For some prior art, imagemagick also interprets filenames but provides the -define filename:literal=true option to disable the feature. Exiftool differs here in that it doesn't seem to store the bytes literally?

(I don't really mind if it can't print them nicely using C-escape encode but I would want the bytes that go in to be the same as the bytes out even if that has to go through an encode/decode layer, e.g. to and from xml entities. As much as possible anyway.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide parity for both escC and unescC to support more awkward filenames #220

Provide parity for both escC and unescC to support more awkward filenames #220

Earnestly commented Aug 28, 2023 •

edited

StarGeekSpaceNerd commented Aug 28, 2023

Earnestly commented Aug 28, 2023 •

edited

boardhead commented Sep 19, 2023

Earnestly commented Sep 19, 2023 •

edited

Provide parity for both escC and unescC to support more awkward filenames #220

Provide parity for both escC and unescC to support more awkward filenames #220

Comments

Earnestly commented Aug 28, 2023 • edited

StarGeekSpaceNerd commented Aug 28, 2023

Earnestly commented Aug 28, 2023 • edited

boardhead commented Sep 19, 2023

Earnestly commented Sep 19, 2023 • edited

Earnestly commented Aug 28, 2023 •

edited

Earnestly commented Aug 28, 2023 •

edited

Earnestly commented Sep 19, 2023 •

edited