Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug or not, when using xmltoolkit/xmlservice with invalid characters in IBM i database field #178

Open
ChriMech opened this issue Jul 22, 2022 · 11 comments

Comments

@ChriMech
Copy link

I am using php xmltoolkit with xmlservice to fetch text from an ibm i database file through a call to an RPG programm. Everything works fine, unless the character database field contains "invalid" characters below EBCDIC x'40' (i.e x'3F'). This surely should not be, but somehow the database has thousands of records with one or more of this non-text characters within a text field, which should include an article description.

When the RPG Program call fetches such a text field through an RPG program parameter, i get this :
Warning: simplexml_load_string(): Entity: line 36: parser error : CData section not finished Articl in /QOpenSys/pkgs/share/zendphp/libraries/xmltoolkit/1.8.5/ToolkitServiceXML.php on line 807
Warning: simplexml_load_string(): <data var='TEXT' type='50a' ><![CDATA[Article �« description]]></data> in /QOpenSys/pkgs/share/zendphp/libraries/xmltoolkit/1.8.5/ToolkitServiceXML.php on line 807
Warning: simplexml_load_string(): ^ in /QOpenSys/pkgs/share/zendphp/libraries/xmltoolkit/1.8.5/ToolkitServiceXML.php on line 807
Warning: simplexml_load_string(): Entity: line 36: parser error : PCDATA invalid Char value 26 in /QOpenSys/pkgs/share/zendphp/libraries/xmltoolkit/1.8.5/ToolkitServiceXML.php on line 807

The invalid characters show as �«

My question: Is this a bug in xmlservice or ToolkitServiceXML.php or do you have any suggestions, on how i should deal with the situation?
Thank you very much,
Christian

@alanseiden
Copy link
Collaborator

@ChriMech You could specify the binary data type, which returns data as a hex string. In your PHP code, you could then convert the hex characters into something usable.

static function AddParameterBin($io, $size , $comment, $varName = '', $value='',$dimension =0)

$params[] = $conn->AddParameterBin('out', 50, 'TEXT', 'mytext', '');

Try this and let us know if you receive hex characters that you can work with.

@alanseiden
Copy link
Collaborator

alanseiden commented Jul 22, 2022

If you are curious what XMLSERVICE does for binary type, here is where some of the relevant XMLSERVICE code for the binary type (return as hex) is done:
https://github.com/IBM/xmlservice/blob/20800d6be4aa4748b6eb226896c899c9c7e752ab/src/plugconv.rpgle#L1208
It shows the example node:

<data type='5b'>F0F1F2CDEF</data>

@ChriMech
Copy link
Author

ChriMech commented Aug 4, 2022

Thank you, but to be honest, I had something different in my mind, i.e. a suggestion for a future improvement of XMLSERVICE.

Sometimes by mistake, text within character based Database fields (A-Z, etc.) contain some Bytes with EBCDIC Codes below x'40'. This is often caused by improper data import from non IBM i data sources to IBM i tables, when character conversion from UTF-8 to EBCDIC CCSID 1141 is not done correctly with special german or european characters like ä, ö, ü.

In many cases this is not a big problem, because most IBM i related functions substitute “bad” characters with a replacement symbol (mainly a solid block or a square). This is true within all 5250 greenscreen applications, but also when using SQL to show table contents, or when using SQL from within PHP scripts. Even the remote program call APIs included in IBM i Client Access perform this substitution automatically, not to mention the old Net.Data scripting language on IBM i.

Only XMLSERVICE does nothing like this, but produces invalid xml when such bad characters are encountered. This invalid xml eventually results in fatal errors within the toolkit.

So my proposal for improvement of XMLSERVICE is:
When preparing the XML to output data for a character typed program parameter, check to make sure the XML will be valid. If needed, replace invalid characters with a substitution (maybe even a blank). I don’t know whether this check should be done before or after translation from IBM i to webserver character set. Probably the easiest will be to modify the translation table to achieve this, but I am not the expert in this.

@alanseiden
Copy link
Collaborator

@ChriMech No guarantees, but would these options to ignore errors help on the PHP side? Perhaps you could test your XML example with this command:
$xml = simplexml_load_string('your XML string here', "SimpleXMLElement", LIBXML_NOERROR | LIBXML_ERR_NONE);

@alanseiden
Copy link
Collaborator

@ChriMech Have you had a chance to test a "bad" XML string with the command above? If it helps, we could incorporate these options into the toolkit. Thank you.

@ChriMech
Copy link
Author

ChriMech commented Mar 6, 2023

@alanseiden I use the php xmltoolkit together with CW to call RPG programs. To do the test you suggested, i modified the php code in .../xmltoolkit/1.8.5/ToolkitServiceXML.php on line 807 (within function getParamsFromXml() ) to read like this:
$xmlobj = simplexml_load_string($xml, "SimpleXMLElement", LIBXML_NOERROR | LIBXML_ERR_NONE);

With this, the "ugly" PHP warnings don't come up anymore, but the function call to getParamsFromXml() returns false and produces the bad.xml file. (see attached zip file bad.zip). Consequently the toolkit program call fails.

The invalid character within the xml string is a x'1A' shown in lines 36 and 50 of bad.xml.
So just ignoring the incorrect XML characters while doing the simplexml_load_string(), does not look like an option.
I still think XMLSERVICE on IBM i should check for invalid characters when preparing the xml for charcter typed fields. But I don't dare to review the XMLSERVICE source code myself.

@alanseiden
Copy link
Collaborator

Thanks! We'll review your results.

@alanseiden
Copy link
Collaborator

Although I understand the intent behind this issue, the difficulty of deciding what is "bad" data in a particular character set will make this request unlikely to be implemented unless someone submits a PR that can be tested under various character encodings.

@ChriMech
Copy link
Author

ChriMech commented Feb 6, 2024

I don't object closing the issue. It really is a rare case.

  • I thought it wouldn't be so difficult, since every EBCDIC encoded byte below x'40' is "bad" when doing a character translation from any EBCDIC character set into anything else.
  • It was shocking for me to find out, that xmlservice would produce invalid XML

@alanseiden
Copy link
Collaborator

@ChriMech I also encountered this issue when creating the CW and calling various "Q*" system APIs. Certain data structure fields produced binary data. I handled them using the "hole" or "binary" data types. The difficulty would be if we could not predict the binary data in advance, as in your case.

@ChriMech I don't mind keeping the issue open if we want to resolve it on the PHP side, but if it's going to be an xmlservice enhancement then we should move it to the xmlservice project. Am I correct in thinking you feel it should be part of xmlservice? I do recall, for comparison, that the Easycom toolkit replaced binary characters with spaces. In the xmlservice project, we need to get momentum to improve the test suite structure.

@alanseiden
Copy link
Collaborator

I'm going to reopen this issue. We have some ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants