Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlatFileItemWriter creates empy line at eof. Could we allow option to disable? #3863

Closed
Jenson3210 opened this issue Mar 11, 2021 · 7 comments · May be fixed by #3895
Closed

FlatFileItemWriter creates empy line at eof. Could we allow option to disable? #3863

Jenson3210 opened this issue Mar 11, 2021 · 7 comments · May be fixed by #3895
Labels
in: infrastructure status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter type: feature

Comments

@Jenson3210
Copy link

Expected Behavior

To have an option in the FlatFileItemWriterBuilder that allows you to disable the empty line at eof. .noEmptyLineAtEndOfFile() in the example.

new FlatFileItemWriterBuilder<Metadata>()
                .name("writer")
                .resource(new FileUrlResource("fileuri"))
                .noEmptyLineAtEndOfFile()
                .formatted()
                .format("%-50.50s"))
                .names("field")
                .build();

could result in a file that has no last empty line by altering the writer doWrite method.

Current Behavior

The for loop appends this.lineseparator at end of String that will be written to file.

@Override
public String doWrite(List<? extends T> items) {
	StringBuilder lines = new StringBuilder();
	
	for (T item : items) {
		//BECAUSE OF THIS LOOP, THE LINE SEPARATOR IS ALWAYS ADDED, ALSO ON LAST LINE
		lines.append(this.lineAggregator.aggregate(item)).append(this.lineSeparator);
	}

	return lines.toString();
}

Context

  • I tried decorating the writer to have my own doWrite method, but no getters are available for the lineAggregator or lineSeparator.
  • Recreating the builder / writer just for this one function is a bit overhead.
  • We can also adapt our existing file consumers (non-spring batch) but todays file creator cycles used in-house (non-spring batch) create no empty line.
@parikshitdutta
Copy link
Contributor

parikshitdutta commented Mar 19, 2021

Hi @benas,
Since this is awaiting triage, wanted to check with you.

The following test from FlatFileItemWriterTests shows, the trailing separator being added. And there is no easier way to disable that for now.

@Test
public void testWriteRecordWithrecordSeparator() throws Exception {
writer.setLineSeparator("|");
writer.open(executionContext);
writer.write(Arrays.asList(new String[] { "1", "2" }));
String lineFromFile = readLine();
assertEquals("1|2|", lineFromFile);
}

I can work on it, if you and team find value.

@fmbenhassine
Copy link
Contributor

I don't see an obvious way to detect that we are writing the last chunk to not add a line separator at the end. I'm probably missing a clever way to do that, but even if we try to invert the code to something like this:

@Override
public String doWrite(List<? extends T> items) {
	StringBuilder lines = new StringBuilder();
	
	for (T item : items) {
		//BECAUSE OF THIS LOOP, THE LINE SEPARATOR IS ALWAYS ADDED, ALSO ON LAST LINE
--		lines.append(this.lineAggregator.aggregate(item)).append(this.lineSeparator);
++              lines.append(this.lineSeparator).append(this.lineAggregator.aggregate(item));
	}

	return lines.toString();
}

the problem remains, but at the other end of the file (ie the head/beginning of the file).

@Jenson3210 @parikshitdutta Do you agree?

@fmbenhassine fmbenhassine added status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter and removed status: waiting-for-triage Issues that we did not analyse yet labels Apr 15, 2021
@Jenson3210
Copy link
Author

As you might have guessed based on my limited description, I do not have that much experience with Spring Batch.
We currently have another system for batch applications (custom framework), moving slowly towards Spring Batch.
For now, I have solved it by not handling empty lines in the consumer.

But I thought it was still valid to have this option though. If to hard to implement, there are options to handle this behaviour in consuming steps too...

@benas Am I correct to understand that it would be too hard because this one piece can be called in chuncks, and thus passed multiple times? I assume this because if not passing multiple times, we could fix it without an enhanced for-loop, as you most certainly are aware of 😃

@fmbenhassine
Copy link
Contributor

My question is more about the feasibility of the feature rather than its added value. In order to add (or not) a line separator at the end of the file, Spring Batch needs to know if the current chunk to write is the last one (ie know the total number of items upfront). This information is not available (unless we ask the user to provide it in some way, but this is another story) which makes me wonder if this is even possible in the first place.

Probably there is a way to invert the current behaviour to not add a line separator by default and use a footer callback that adds a final separator at the end, but I'm not sure if this would work or if it would be backward compatible. I'm open to review a PR if there is a way (that I'm missing) to implement this feature.

@parikshitdutta
Copy link
Contributor

parikshitdutta commented Apr 18, 2021

@benas, I agree this is more about feasibility.

Probably by updating AbstractFileItemWriter this might be achieved, but I don't think, we can achieve that without touching low level concepts such as FileChannel, lastMarkedByteOffsetPosition, truncate() etc, if at all that is possible, and that too in an "elegant" way!

As far as using FooterCallback is concern, the trailing lineSeparator or empty line is only a problem when there is no footer, and with FooterCallback.writeFooter(Writer writer) we can skip adding the trailing line break if we don't want it.

The other could be by introducing callback interfaces, but I believe that would be over engineering for something that can be handled in the client code.

@fmbenhassine
Copy link
Contributor

@Jenson3210

I tried decorating the writer to have my own doWrite method, but no getters are available for the lineAggregator or lineSeparator.

The lineAggregator and lineSeparator have protected access in FlatFileItemWriter, so you should be able to access them in your extension. Please note that creating a custom implementation is recommended in favor of overriding the built-in implementation (see this FAQ).

@benas Am I correct to understand that it would be too hard because this one piece can be called in chuncks, and thus passed multiple times? I assume this because if not passing multiple times, we could fix it without an enhanced for-loop, as you most certainly are aware of

The difficulty to address this is due to the way items are written according to the chunk-oriented processing model. Even without an enhanced loop, I believe the problem will remain in one of the ends of the file (and in between chunks). Here is a repo where I tried this approach.

Based on the previous comments, I'm closing this issue for now. But again, I'm open to review a PR that could implement this feature in a backward compatible way.

@parikshitdutta
Copy link
Contributor

parikshitdutta commented Apr 23, 2021

Hi @benas, I got some time to look into the issue, please check my submitted PR #3895. I will look forward to your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: infrastructure status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter type: feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants