Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full regular expressions in PatternMatchingCompositeLineMapper #4412

Open
fmbenhassine opened this issue Jul 4, 2023 Discussed in #4344 · 3 comments · May be fixed by #4492
Open

Add support for full regular expressions in PatternMatchingCompositeLineMapper #4412

fmbenhassine opened this issue Jul 4, 2023 Discussed in #4344 · 3 comments · May be fixed by #4492

Comments

@fmbenhassine
Copy link
Contributor

Discussed in #4344

Originally posted by jmresler April 8, 2023

Hi All,
I'm working on rewriting a batch process that is quite old.
It processes multiple files which have essentially the same data, but one has an additional text field at the end.
The files are comma separated value (CSV) files.

Without going too far into it, I was under the impression the PatternMatchingCompositeLineMapper would support a full regular expression suite. After all, Java does have a good pattern matching library in the Pattern & Matches and the String classes "matches" functionality which is built over the top of those regular expression tools.

What I have discovered though is that the support for regular expressions is rather limited with '*' stars and '?' question marks.
From reviewing the code it looks like it's a very limited, ant pattern matching capability.

The result is that a solution is very inelegant, requiring a long list of ??? and intermittent * to support possibly unknown length white space values.

Granted, I could write a custom line tokenizer but according to "The Definitive Guide to Spring Batch", that's expanding the separation of concerns for that object and not recommended. My understanding is that the author of that book is also head of the Spring Batch project.

Any chance someone would be willing to implement java.util.Pattern matching functionality?

       	/**
	 * Lifted from AntPathMatcher in Spring Core. Tests whether or not a string
	 * matches against a pattern. The pattern may contain two special
	 * characters:<br>
	 * '*' means zero or more characters<br>
	 * '?' means one and only one character
	 * 
	 * @param pattern pattern to match against. Must not be <code>null</code>.
	 * @param str string which must be matched against the pattern. Must not be
	 * <code>null</code>.
	 * @return <code>true</code> if the string matches against the pattern, or
	 * <code>false</code> otherwise.
	 */
	public static boolean match(String pattern, String str) {
		int patIdxStart = 0;
		int patIdxEnd = pattern.length() - 1;
		int strIdxStart = 0;
		int strIdxEnd = str.length() - 1;
		char ch;

		boolean containsStar = pattern.contains("*");

		if (!containsStar) {
			// No '*'s, so we make a shortcut
			if (patIdxEnd != strIdxEnd) {
				return false; // Pattern and string do not have the same size
			}
			for (int i = 0; i <= patIdxEnd; i++) {
				ch = pattern.charAt(i);
				if (ch != '?') {
					if (ch != str.charAt(i)) {
						return false;// Character mismatch
					}
				}
			}
			return true; // String matches against pattern
		}

		if (patIdxEnd == 0) {
			return true; // Pattern contains only '*', which matches anything
		}

		// Process characters before first star
		while ((ch = pattern.charAt(patIdxStart)) != '*' && strIdxStart <= strIdxEnd) {
			if (ch != '?') {
				if (ch != str.charAt(strIdxStart)) {
					return false;// Character mismatch
				}
			}
			patIdxStart++;
			strIdxStart++;
		}
		if (strIdxStart > strIdxEnd) {
			// All characters in the string are used. Check if only '*'s are
			// left in the pattern. If so, we succeeded. Otherwise failure.
			for (int i = patIdxStart; i <= patIdxEnd; i++) {
				if (pattern.charAt(i) != '*') {
					return false;
				}
			}
			return true;
		}

		// Process characters after last star
		while ((ch = pattern.charAt(patIdxEnd)) != '*' && strIdxStart <= strIdxEnd) {
			if (ch != '?') {
				if (ch != str.charAt(strIdxEnd)) {
					return false;// Character mismatch
				}
			}
			patIdxEnd--;
			strIdxEnd--;
		}
		if (strIdxStart > strIdxEnd) {
			// All characters in the string are used. Check if only '*'s are
			// left in the pattern. If so, we succeeded. Otherwise failure.
			for (int i = patIdxStart; i <= patIdxEnd; i++) {
				if (pattern.charAt(i) != '*') {
					return false;
				}
			}
			return true;
		}

		// process pattern between stars. padIdxStart and patIdxEnd point
		// always to a '*'.
		while (patIdxStart != patIdxEnd && strIdxStart <= strIdxEnd) {
			int patIdxTmp = -1;
			for (int i = patIdxStart + 1; i <= patIdxEnd; i++) {
				if (pattern.charAt(i) == '*') {
					patIdxTmp = i;
					break;
				}
			}
			if (patIdxTmp == patIdxStart + 1) {
				// Two stars next to each other, skip the first one.
				patIdxStart++;
				continue;
			}
			// Find the pattern between padIdxStart & padIdxTmp in str between
			// strIdxStart & strIdxEnd
			int patLength = (patIdxTmp - patIdxStart - 1);
			int strLength = (strIdxEnd - strIdxStart + 1);
			int foundIdx = -1;
			strLoop: for (int i = 0; i <= strLength - patLength; i++) {
				for (int j = 0; j < patLength; j++) {
					ch = pattern.charAt(patIdxStart + j + 1);
					if (ch != '?') {
						if (ch != str.charAt(strIdxStart + i + j)) {
							continue strLoop;
						}
					}
				}

				foundIdx = strIdxStart + i;
				break;
			}

			if (foundIdx == -1) {
				return false;
			}

			patIdxStart = patIdxTmp;
			strIdxStart = foundIdx + patLength;
		}

		// All characters in the string are used. Check if only '*'s are left
		// in the pattern. If so, we succeeded. Otherwise failure.
		for (int i = patIdxStart; i <= patIdxEnd; i++) {
			if (pattern.charAt(i) != '*') {
				return false;
			}
		}

		return true;
	}
	
	
	
	/**
	 * Proposed but possibly oversimplified match functionality
	 * @param regex
	 * @param pattern
	 * @return
	 */
	public static boolean matchUsingFullRegex(final String regex, final String pattern) {
		
		if (regex == null)
			throw new NullPointerException("Regulat expression {" + regex + "} cannot be null");
		
		if (pattern == null)
			throw new NullPointerException("Pattern {" + pattern + "} cannot be null");
			
		return Pattern.matches(regex, pattern);
	}
@fmbenhassine
Copy link
Contributor Author

fmbenhassine commented Jul 4, 2023

@jmresler you are welcome to open a PR with the suggested change.

@injae-kim
Copy link
Contributor

May I handle this issue? please assign to me~! I'll create PR within this weekend!

@injae-kim
Copy link
Contributor

I made #4492 PR~! PTAL when you have time 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants