Performance issue on paste of very large data #14

VidyadheeshMN · 2022-02-21T09:45:57Z

When a large data, for example a 30 page word document is copy-pasted, it is taking almost 3 minutes to finish to create pages, if more than 100 pages, the UI completely hangs, this can be reproduced by copy pasting more than 20-30 pages word document to the editor, also, even 10-15 pages takes ~25 seconds to paste, the issue is identified in the fit_content_over_pages function, the recursive call to the function move_children_forward_recursively is causing the problem
Even the demo link reproduces the said error. Please suggest a fix for this performance issue if possible.
Suggestion -
Does the approach of creating a new div and using it to move word by word to the current page and then moving all the remaining content into the next page at once instead of word by word, and repeat the same for the overflown page work?
Unfortunately I tried to modify the move_children_forward_recursively and fit_content_over_pages function but I have relatively less experience so I am not able to successfully implement the said logic, so it will be helpful to me if you can at least validate the new logic or help me implement it further

The text was updated successfully, but these errors were encountered:

motla · 2022-02-21T13:57:11Z

Yes this is a known issue, as stated in the README file. But as you described it precisely and you already deep-dived the code, I keep this issue open for discussion. Any idea to take to improve performance is welcomed.

I spent a lot of time trying to improve this when making this lib as a summer project 2 years ago, but I couldn't find a JavaScript way to know in advance if a certain HTML content would overflow a <div> page.
The only solution I found is to actually render the HTML in the <div> page and to measure if the content page height is higher that its min-height. If it overflows, I believe the only way to know at what word it fits and should separate the rest of the content to the next page, is to remove words one by one. To speed up the process, we start by acting on document paragraphs which overflow (e.g. <p>, <div>, ...), then child elements, down to to word by word iteration which is the most intensive.

So the algorithm is this:

At every user input of the web browser contenteditable (adding/removing characters, pasting elements), the input event is called:

vue-document-editor/src/DocumentEditor/DocumentEditor.vue

Lines 254 to 260 in caf7763

    
               // Input event 
        
               async input (e) { 
        
                 if(!e) return; // check that event is set 
        
                 await this.fit_content_over_pages(); // fit content according to modifications 
        
                 this.emit_new_content(); // emit content modification 
        
                 if(e.inputType != "insertText") this.process_current_text_style(); // update current style if it has changed 
        
               },

This event calls fit_content_over_pages to update the document in real-time (at every user input). A single character change can modify page layouts after this character, and even the page before, in the case you remove a character from the first word of the page and it can fit back to the previous one.

In this function, amongst other things, we find the page(s) that has been modified by the user, we try a "back-propagation" of the content from the next page (if some content was removed, maybe part of next page content can fit). It's the move_children_backwards_with_merging function:

vue-document-editor/src/DocumentEditor/DocumentEditor.vue

Lines 205 to 210 in caf7763

    
                     // BACKWARD-PROPAGATION 
        
                     // check if content doesn't overflow, and that next page exists and has the same content_idx 
        
                     if(page_elt.clientHeight <= this.pages_height && next_page && next_page.content_idx == page.content_idx) { 
        
                       // try to append every node from the next page until it doesn't fit 
        
                       move_children_backwards_with_merging(page_elt, next_page_elt, () => !next_page_elt.childNodes.length || (page_elt.clientHeight > this.pages_height));

Then, if the page overflows we start a "forward-propagation", it simply moves the last elements of the page recursively to the next one until it fits.

vue-document-editor/src/DocumentEditor/DocumentEditor.vue

Lines 216 to 218 in caf7763

    
                     // FORWARD-PROPAGATION 
        
                     // check if content overflows 
        
                     if(page_elt.clientHeight > this.pages_height) {

So yeah, I know this is far from ideal, cause every call to the DOM can be a few milliseconds. The more words and pages you have the more intensive it is. But this library was more like an experiment to see if such a page splitting algorithm could be implemented just using JavaScript and the native contenteditable browser implementations.

Professional document editors like Google Docs use their own proprietary algorithms which process every word size (according to font family & size), tables images and all, and layout them using pure code without a need for DOM interaction. Unfortunately I guess such algorithms (Google one is called "kix") are not open. And I guess there are no JavaScript commands to get words metrics directly (maybe using the Canvas API but its another full month of development at least).

I did put this library online anyway because it can serve in many cases from invoice edition to manufacturing sheets. Definitely not adapted for writing books though.

Personally I'm done finding a way to make it work for large documents, because I don't have the use case. However anybody can fork it and contribute, it is open for improvement.

ghost · 2023-06-01T18:41:08Z

...And I guess there are no JavaScript commands to get words metrics directly (maybe using the Canvas API but its another full month of development at least).

First of all thank you for sharing your work, really cool project,

Since the issue is mainly present when we copy/paste large chunks of words, I was just wondering ..
Can't we just access the metrics via the Clipboard API ? on the @paste event . Did you or someone else try that already ?
Or is that a stupid idea ? (Yes I know I should just go and try to make it )

motla · 2023-06-01T21:18:30Z

@SylvainDelmote I don't get your idea. What I meant by "metrics" is the size in pixels that a word would take, according to its CSS font properties. I had in mind to make an algorithm to layout the document out of the DOM, separate it in pages and then apply this layout to the DOM, without having to check for page overlap afterwards. It seems easy to achieve but from what I tried it is not.

elvishuges · 2023-12-19T17:08:47Z

@motla this project allows use Webworkers? because when a make a loader before the hard process in function fit_content_over_pages, the loader do not appear.

motla added help wanted Extra attention is needed known issue Known issue labels Feb 21, 2022

motla mentioned this issue Feb 21, 2022

lost page when paste data from clipboard #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue on paste of very large data #14

Performance issue on paste of very large data #14

VidyadheeshMN commented Feb 21, 2022 •

edited

motla commented Feb 21, 2022

ghost commented Jun 1, 2023

motla commented Jun 1, 2023

elvishuges commented Dec 19, 2023

Performance issue on paste of very large data #14

Performance issue on paste of very large data #14

Comments

VidyadheeshMN commented Feb 21, 2022 • edited

motla commented Feb 21, 2022

ghost commented Jun 1, 2023

motla commented Jun 1, 2023

elvishuges commented Dec 19, 2023

VidyadheeshMN commented Feb 21, 2022 •

edited