Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify and improve parsing of the .properties file format #685

Open
pnoltes opened this issue Nov 9, 2023 · 2 comments
Open

Clarify and improve parsing of the .properties file format #685

pnoltes opened this issue Nov 9, 2023 · 2 comments
Assignees
Labels
kind/improvement Categorizes issue or PR as related to improvements.
Milestone

Comments

@pnoltes
Copy link
Contributor

pnoltes commented Nov 9, 2023

Clarify and improve parsing of the .properties file format

Apache Celix uses a .properties file to store and read configurable - framework and bundle - parameters.

The Apache Celix .properties file format is based on the Java .properties file format.
There is no RFC or clear specification document for the Java .properties file format, but there is a wikipedia page https://en.wikipedia.org/wiki/.properties

.properties file example in the wikipedia:

# You are reading a comment in ".properties" file.
! The exclamation mark can also be used for comments.
# Lines with "properties" contain a key and a value separated by a delimiting character.
# There are 3 delimiting characters: '=' (equal), ':' (colon) and whitespace (space, \t and \f).
website = https://en.wikipedia.org/
language : English
topic .properties files
# A word on a line will just create a key with no value.
empty
# White space that appears between the key, the value and the delimiter is ignored.
# This means that the following are equivalent (other than for readability).
hello=hello
hello = hello
# Keys with the same name will be overwritten by the key that is the furthest in a file.
# For example the final value for "duplicateKey" will be "second".
duplicateKey = first
duplicateKey = second
# To use the delimiter characters inside a key, you need to escape them with a \.
# However, there is no need to do this in the value.
delimiterCharacters\:\=\ = This is the value for the key "delimiterCharacters\:\=\ "
# Adding a \ at the end of a line means that the value continues to the next line.
multiline = This line \
continues
# If you want your value to include a \, it should be escaped by another \.
path = c:\\wiki\\templates
# This means that if the number of \ at the end of the line is even, the next line is not included in the value. 
# In the following example, the value for "evenKey" is "This is on one line\".
evenKey = This is on one line\\
# This line is a normal comment and is not included in the value for "evenKey"
# If the number of \ is odd, then the next line is included in the value.
# In the following example, the value for "oddKey" is "This is line one and\#This is line two".
oddKey = This is line one and\\\
# This is line two
# White space characters are removed before each line.
# Make sure to add your spaces before your \ if you need them on the next line.
# In the following example, the value for "welcome" is "Welcome to Wikipedia!".
welcome = Welcome to \
          Wikipedia!
# If you need to add newlines and carriage returns, they need to be escaped using \n and \r respectively.
# You can also optionally escape tabs with \t for readability purposes.
valueWithEscapes = This is a newline\n and a carriage return\r and a tab\t.
# You can also use Unicode escape characters (maximum of four hexadecimal digits).
# In the following example, the value for "encodedHelloInJapanese" is "こんにちは".
encodedHelloInJapanese = \u3053\u3093\u306b\u3061\u306f
# But with more modern file encodings like UTF-8, you can directly use supported characters.
helloInJapanese = こんにちは

Clarify Apache Celix .properties file format

Write down a specification for the Apache Celix .properties file format. This does not have to be formal (EBNF), but it should make it clear for users what is and is not possible with .properties files.

Refactor .properties file parsing

The current parsing of .properties file is not ideal (parsing can lead to many allocation) and this should be improved.
The manifest_readFromStream can be used as input for the refactoring.

A possible choice could be to to support multiple key/value delimiters (= and :) and then extract and extend the manifest_readFromStream to be used in properties and manifest handling.

Add support for typed property value format

Update the format so that storage of typed properties (bool, long, version, double) is also possible.

@pnoltes pnoltes added the kind/improvement Categorizes issue or PR as related to improvements. label Nov 9, 2023
@pnoltes pnoltes self-assigned this Nov 21, 2023
@pnoltes pnoltes added this to the 3.0.0 milestone Nov 21, 2023
@PengZheng
Copy link
Contributor

PengZheng commented Dec 9, 2023

I found the following nice explanation of [The Java Properties File Format] in addition to java.util.Properties.load(java.io.Reader) :

The Java Properties File Format

A Java style properties file contains key value pairs (properties) in a file with ISO-8859-1 encoding (code page 28591). The file usually has a “.properties” file extension and consists of a series of lines (terminated by CRLF or CR or LF) each a key value pair, a comment or a blank line.

Leading whitespace (spaces, tabs ‘\t’, form feeds ‘\f’) are ignored at the start of any line – and a line that is empty or contains only whitespace is treated as blank and ignored.

A line where the first non-whitespace character is a ‘#’ or ‘!’ is a comment line and the rest of the line is ignored.

If the first non-whitespace character is not ‘#’ or ‘!’ then it is the start of a key. A key is all the characters up to the first whitespace or a key/value separator (‘=’ or ‘:’). The separator is optional. Any whitespace after the key or after the separator (if present) is ignored.

The first non-whitespace character after the separator (or after the key if no separator) begins the value. The value may include whitespace, separators, or comment characters.

The following special cases are defined:

'\t' - horizontal tab.
'\f' - form feed.
'\r' - return
'\n' - new line
'\\' - add escape character.
'\ ' - add space in a key or at the start of a value.
'\!', '\#' - add comment markers at the start of a key.
'\=', '\:' - add a separator in a key.

Any Unicode character may be inserted in either key or value using the following escape:

'\uXXXX' - where XXXX represents the unicode character code as 4 hexadecimal digits.

Finally, longer lines can be broken by putting an escape at the very end of the line. Any leading space (unless escaped) is skipped at the beginning of the following line.

Examples

a-key = a-value
a-key : a-value
a-key=a-value
a-key a-value

All the above will result in the same key/value pair – key “a-key” and value “a-value”.

! comment...
# another comment...

The above are two examples of comments. Yes, you can add comments to Java .properties files – so please do!

Hong\ Kong = Near China

The above shows how to embed a space in a key – the key is “Hong Kong” and the value is “Near China”. Without the ‘\’ escape, the key is “Hong” and the value is “Kong = Near China” (it wouldn’t be the first time I’ve seen it done…).

a-longer-key-example = a really long value that is \
split over two lines.

An example of a long line split into two.

@PengZheng
Copy link
Contributor

PengZheng commented Dec 9, 2023

According to the above article, I found white space at the start of a value is not escaped correctly by our implementation.
The following modified test case will fail on keyA (note the value is " valueA" rather than "valueA"):

TEST_F(PropertiesTestSuite, StoreTest) {
    const char* propertiesFile = "resources-test/properties_out.txt";
    celix_autoptr(celix_properties_t) properties = celix_properties_create();
    celix_properties_set(properties, "keyA", " valueA");
    celix_properties_set(properties, "keyB", "valueB");
    celix_properties_store(properties, propertiesFile, nullptr);

    celix_autoptr(celix_properties_t) properties2 = celix_properties_load(propertiesFile);
    EXPECT_EQ(celix_properties_size(properties), celix_properties_size(properties2));
    EXPECT_STREQ(celix_properties_get(properties, "keyA", ""), celix_properties_get(properties2, "keyA", ""));
    EXPECT_STREQ(celix_properties_get(properties, "keyB", ""), celix_properties_get(properties2, "keyB", ""));
}

pnoltes added a commit that referenced this issue Apr 14, 2024
Also: Rename the nested / flat encoding style flag.
pnoltes added a commit that referenced this issue Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/improvement Categorizes issue or PR as related to improvements.
Projects
None yet
Development

No branches or pull requests

2 participants