Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on paste of very large data #14

Open
VidyadheeshMN opened this issue Feb 21, 2022 · 4 comments
Open

Performance issue on paste of very large data #14

VidyadheeshMN opened this issue Feb 21, 2022 · 4 comments
Labels
help wanted Extra attention is needed known issue Known issue

Comments

@VidyadheeshMN
Copy link

VidyadheeshMN commented Feb 21, 2022

When a large data, for example a 30 page word document is copy-pasted, it is taking almost 3 minutes to finish to create pages, if more than 100 pages, the UI completely hangs, this can be reproduced by copy pasting more than 20-30 pages word document to the editor, also, even 10-15 pages takes ~25 seconds to paste, the issue is identified in the fit_content_over_pages function, the recursive call to the function move_children_forward_recursively is causing the problem
Even the demo link reproduces the said error. Please suggest a fix for this performance issue if possible.
Suggestion -
Does the approach of creating a new div and using it to move word by word to the current page and then moving all the remaining content into the next page at once instead of word by word, and repeat the same for the overflown page work?
Unfortunately I tried to modify the move_children_forward_recursively and fit_content_over_pages function but I have relatively less experience so I am not able to successfully implement the said logic, so it will be helpful to me if you can at least validate the new logic or help me implement it further

@motla
Copy link
Owner

motla commented Feb 21, 2022

Yes this is a known issue, as stated in the README file. But as you described it precisely and you already deep-dived the code, I keep this issue open for discussion. Any idea to take to improve performance is welcomed.

I spent a lot of time trying to improve this when making this lib as a summer project 2 years ago, but I couldn't find a JavaScript way to know in advance if a certain HTML content would overflow a <div> page.
The only solution I found is to actually render the HTML in the <div> page and to measure if the content page height is higher that its min-height. If it overflows, I believe the only way to know at what word it fits and should separate the rest of the content to the next page, is to remove words one by one. To speed up the process, we start by acting on document paragraphs which overflow (e.g. <p>, <div>, ...), then child elements, down to to word by word iteration which is the most intensive.

So the algorithm is this:

At every user input of the web browser contenteditable (adding/removing characters, pasting elements), the input event is called:

// Input event
async input (e) {
if(!e) return; // check that event is set
await this.fit_content_over_pages(); // fit content according to modifications
this.emit_new_content(); // emit content modification
if(e.inputType != "insertText") this.process_current_text_style(); // update current style if it has changed
},

This event calls fit_content_over_pages to update the document in real-time (at every user input). A single character change can modify page layouts after this character, and even the page before, in the case you remove a character from the first word of the page and it can fit back to the previous one.

In this function, amongst other things, we find the page(s) that has been modified by the user, we try a "back-propagation" of the content from the next page (if some content was removed, maybe part of next page content can fit). It's the move_children_backwards_with_merging function:

// BACKWARD-PROPAGATION
// check if content doesn't overflow, and that next page exists and has the same content_idx
if(page_elt.clientHeight <= this.pages_height && next_page && next_page.content_idx == page.content_idx) {
// try to append every node from the next page until it doesn't fit
move_children_backwards_with_merging(page_elt, next_page_elt, () => !next_page_elt.childNodes.length || (page_elt.clientHeight > this.pages_height));

Then, if the page overflows we start a "forward-propagation", it simply moves the last elements of the page recursively to the next one until it fits.

// FORWARD-PROPAGATION
// check if content overflows
if(page_elt.clientHeight > this.pages_height) {

So yeah, I know this is far from ideal, cause every call to the DOM can be a few milliseconds. The more words and pages you have the more intensive it is. But this library was more like an experiment to see if such a page splitting algorithm could be implemented just using JavaScript and the native contenteditable browser implementations.

Professional document editors like Google Docs use their own proprietary algorithms which process every word size (according to font family & size), tables images and all, and layout them using pure code without a need for DOM interaction. Unfortunately I guess such algorithms (Google one is called "kix") are not open. And I guess there are no JavaScript commands to get words metrics directly (maybe using the Canvas API but its another full month of development at least).

I did put this library online anyway because it can serve in many cases from invoice edition to manufacturing sheets. Definitely not adapted for writing books though.

Personally I'm done finding a way to make it work for large documents, because I don't have the use case. However anybody can fork it and contribute, it is open for improvement.

@motla motla added help wanted Extra attention is needed known issue Known issue labels Feb 21, 2022
@ghost
Copy link

ghost commented Jun 1, 2023

...And I guess there are no JavaScript commands to get words metrics directly (maybe using the Canvas API but its another full month of development at least).

First of all thank you for sharing your work, really cool project,

Since the issue is mainly present when we copy/paste large chunks of words, I was just wondering ..
Can't we just access the metrics via the Clipboard API ? on the @paste event . Did you or someone else try that already ?
Or is that a stupid idea ? (Yes I know I should just go and try to make it )

@motla
Copy link
Owner

motla commented Jun 1, 2023

@SylvainDelmote I don't get your idea. What I meant by "metrics" is the size in pixels that a word would take, according to its CSS font properties. I had in mind to make an algorithm to layout the document out of the DOM, separate it in pages and then apply this layout to the DOM, without having to check for page overlap afterwards. It seems easy to achieve but from what I tried it is not.

@elvishuges
Copy link

@motla this project allows use Webworkers? because when a make a loader before the hard process in function fit_content_over_pages, the loader do not appear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed known issue Known issue
Projects
None yet
Development

No branches or pull requests

3 participants