Missing words in final layout #159

nextensible · 2019-08-19T07:33:23Z

This is a known issue:

Note: if a word cannot be placed in any of the positions attempted along the spiral, it is not included in the final word layout. This may be addressed in a future release.

How much work would it be (for you / for a dev new to the project) to implement this? Which approach would you suggest? Can you give some hints where to start? A simple option could be, as soon as a word cannot be placed, "zoom out" (e.g. by decreasing the font size) and restart the whole process – until all words can be placed. But I assume that you as the author would be able to come up with a better approach?

naholyr · 2019-10-08T12:44:12Z

It's not even as simple as decreasing the font because positioning is seasoned with randomness, which means you could have all words visible in a layout, and just a refresh later be missing a few ones… So decreasing font:

doesn't guarantee to work
may not be needed and trigger unnecessarily

However, you can try to implement your own heuristic, the simpliest one being "try again until everything is visible":

// Before update, store expected number of words
const expected = words.length;

// Trigger redraw
layout.words(words);

// The drawing function
const draw = words => {
  // words = computed layout, it contains the *actually displayed* words
  if (words.length < expected) {
    // try again
    this.layout.stop();
    this.layout.start();
    return;
  }

  …
}

Nothing stops you from changing configuration (like decreasing font size) before calling layout.start() again, but I must admit I'm pretty lost in the good methods. I was thinking more about increasing layout size, and resize svg afterwards.

localpcguy · 2020-02-12T19:36:03Z

I ran into this bug, and effectively did what @naholyr suggests, I run through a loop reducing the font size (wrinkle, we had a font-size range, so needed to reduce sizes across the range). Then try to redraw and check expect number of works against drawn number of words. Set a maximum iteration length of 10 reductions, after which it just displays what it can at that point, so it doesn't reduce words to illegible sizes.

adrianhelvik · 2020-06-22T20:40:54Z

I solved this with binary search. There are still edge cases though. And another thing: Always use a seeded RNG. Predictable randomness is a must have for debuggability.

hiniestic · 2021-08-17T15:14:58Z

I run in the same errors, word missing from the chart, and I found that the issue happends when setting a padding value

adrianhelvik · 2021-08-25T09:31:50Z

I ended up developing a custom wordcloud algorithm instead.

consoleLogIt · 2022-05-04T08:59:10Z

hi @adrianhelvik I am having a similar issue and feel custom word-cloud algo is the solution, can you please share resources or anything of that sort to get started.

adrianhelvik · 2022-05-10T16:20:27Z

Hmm. I'm considering open sourcing it. But it could be a competitive advantage as it's quite frankly a lot better than what our competitors are offering, so I must talk to my supervisors before sharing it.

The algorithm is like this:

Render sprites. Scale the words according to their weights first.
1. Optimize width: Use ctx.measureText to get the width of the word.
2. Optimize height: Use a larger height than you expect the word to have and remove empty rows from the pixel grid.
3. Render word with some stroke (this will make some space between the words)
4. Store non-white pixels in a Set
Generate word cloud
1. Pick a moderately small size to start building the cloud.
  1. Create a new Uint8ClampedArray(width * height) to store pixel values. You could use a Set for this too as you only need to store 0 for empty and 1 for occupied. I believe a Set would be better tbh.
2. Position the sprites
  1. For each word:
    1. Start out with a small width/height (100/100 f.ex) for your coordinate possibility space
    2. Try to place the word at a random coordinate among the allowed coordinates
    3. If within the coordinate space, you fail more than ‹configurable› times, increase the size by ‹configurable› number
    4. If you fall outside the full cloud bounds, clone all pixels to the center of a new UintClampedArray(width * 1.2, height * 1.2) and resume positioning words. I believe negative indeces in a Set would be more performant though.
3. Once the cloud is done (or even when a word is done), you have the x,y coordinates for each word and can render them into a canvas or an SVG.

adrianhelvik · 2022-05-10T16:22:09Z

And do not do it synchronously. Split things up using requestAnimationFrame, or even better use a web worker. I use events to emit placed words to the renderer, so I can display the cloud as it's being buit.

Also, I'd suggest trying to exclude the previously tested rectangle when placing a word. I don't it's the biggest performance bottleneck of the algorithm.

agatapst mentioned this issue Nov 26, 2020

fix(plugin-chart-word-cloud): ensure top results are always displayed apache-superset/superset-ui#841

Merged

zhangzhonghe mentioned this issue Aug 4, 2021

🐛 [BUG] 词云图渲染出错 antvis/G2Plot#2754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing words in final layout #159

Missing words in final layout #159

nextensible commented Aug 19, 2019

naholyr commented Oct 8, 2019

localpcguy commented Feb 12, 2020

adrianhelvik commented Jun 22, 2020

hiniestic commented Aug 17, 2021

adrianhelvik commented Aug 25, 2021

consoleLogIt commented May 4, 2022

adrianhelvik commented May 10, 2022 •

edited

adrianhelvik commented May 10, 2022 •

edited

Missing words in final layout #159

Missing words in final layout #159

Comments

nextensible commented Aug 19, 2019

naholyr commented Oct 8, 2019

localpcguy commented Feb 12, 2020

adrianhelvik commented Jun 22, 2020

hiniestic commented Aug 17, 2021

adrianhelvik commented Aug 25, 2021

consoleLogIt commented May 4, 2022

adrianhelvik commented May 10, 2022 • edited

adrianhelvik commented May 10, 2022 • edited

adrianhelvik commented May 10, 2022 •

edited

adrianhelvik commented May 10, 2022 •

edited