Skip to content
This repository has been archived by the owner on Jul 28, 2019. It is now read-only.

Out of memory occurs when compiling a very large dictionary file #2

Open
uyu423 opened this issue Jan 28, 2019 · 3 comments
Open

Out of memory occurs when compiling a very large dictionary file #2

uyu423 opened this issue Jan 28, 2019 · 3 comments

Comments

@uyu423
Copy link

uyu423 commented Jan 28, 2019

I tried to compile to make the Korean trie.gz file, but out of memory occurred in all situations.

The number of lines in the dic file is 102252, and the number of lines in the aff file is 122134.

I tried to increase memory usage with --max_old_space_size option but it only increased runtime and still out of memory.

node --max_old_space_size=32000 ./node_modules/.bin/cspell-tools compile-trie ./ko-aff-dic-0.7.1/ko.dic
Compile:
 output: default
 compress: true
 files:
  ./ko-aff-dic-0.7.1/ko.dic


Process "./ko-aff-dic-0.7.1/ko.dic" to "ko-aff-dic-0.7.1/ko.trie.gz"

<--- Last few GCs --->

[60298:0x104000000]   822651 ms: Mark-sweep 31996.2 (32543.8) -> 31995.9 (32544.3) MB, 59351.9 / 0.0 ms  (+ 0.1 ms in 27 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 59359 ms) (average mu = 0.120, current mu = 0.002) [60298:0x104000000]   890885 ms: Mark-sweep 31997.6 (32544.3) -> 31997.5 (32545.8) MB, 68226.9 / 0.1 ms  (+ 0.0 ms in 26 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 68233 ms) (average mu = 0.059, current mu = 0.000)

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x9332f2dbe3d]
Security context: 0x300afab9e6e1 <JSObject>
    1: DoJoin(aka DoJoin) [0x300afab85e89] [native array.js:~87] [pc=0x9332f2e633a](this=0x300a638026f1 <undefined>,l=0x300aadd4b2b9 <JSArray[2]>,m=2,A=0x300a638028c9 <true>,w=0x300a63809d61 <String[1]:  >,v=0x300a638029a1 <false>)
    2: Join(aka Join) [0x300afab85ed9] [native array.js:~112] [pc=0x9332f6a0478](this=0x300a638026f1 <undefined>,l=0x300aadd4b2b9...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003ae75 node::Abort() [/Users/yowu/.nvs/default/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/Users/yowu/.nvs/default/bin/node]
 3: 0x1001a7ae5 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/yowu/.nvs/default/bin/node]
 4: 0x100572ef2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/yowu/.nvs/default/bin/node]
 5: 0x1005759c5 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/Users/yowu/.nvs/default/bin/node]
 6: 0x10057186f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 7: 0x10056fa44 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 8: 0x10057c2dc v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
 9: 0x10057c35f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
10: 0x10054e1e4 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/Users/yowu/.nvs/default/bin/node]
11: 0x10082784d v8::internal::Runtime_StringBuilderJoin(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/yowu/.nvs/default/bin/node]
12: 0x9332f2dbe3d
[1]    60298 abort      node --max_old_space_size=32000 ./node_modules/.bin/cspell-tools compile-trie

If the time complexity of calculating the dic file and the aff file is a problem, let me know which code is causing the problem and I would like to help you improve it.

I tried to compile the en_US hunspell as a test. It was compiled in a very short time.

The used hunspell project is https://github.com/spellcheck-ko/hunspell-dict-ko/releases

I would like to help many Korean developers who use code spells.
Help me. Thank you.

@uyu423
Copy link
Author

uyu423 commented Jan 29, 2019

I set the value of --max_old_space_size to 70000 and compiled it for 80 minutes, but I still got out of memory. sad.

$ node --max_old_space_size=70000 ./node_modules/.bin/cspell-tools compile-trie ./ko-aff-dic-0.7.1/ko.dic
Compile:
 output: default
 compress: true
 files:
  ./ko-aff-dic-0.7.1/ko.dic


Process "./ko-aff-dic-0.7.1/ko.dic" to "ko-aff-dic-0.7.1/ko.trie.gz"

<--- Last few GCs --->

[61972:0x103800c00]  2429772 ms: Mark-sweep 69892.7 (71183.8) -> 69892.7 (71185.3) MB, 132854.4 / 0.1 ms  (average mu = 0.099, current mu = 0.000) allocation failure scavenge might not succeed
[61972:0x103800c00]  2699584 ms: Mark-sweep 69894.1 (71185.3) -> 69894.0 (71186.8) MB, 269797.7 / 0.0 ms  (average mu = 0.035, current mu = 0.000) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x36f4c45be3d]
Security context: 0x2a7707d1e6e1 <JSObject>
    1: DoJoin(aka DoJoin) [0x2a7707d05e89] [native array.js:~87] [pc=0x36f4c4670fa](this=0x2a77155026f1 <undefined>,l=0x2a84e9088039 <JSArray[2]>,m=2,A=0x2a77155028c9 <true>,w=0x2a7715509d61 <String[1]:  >,v=0x2a77155029a1 <false>)
    2: Join(aka Join) [0x2a7707d05ed9] [native array.js:~112] [pc=0x36f4c81d758](this=0x2a77155026f1 <undefined>,l=0x2a84e9088039...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003ae75 node::Abort() [/Users/yowu/.nvs/default/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/Users/yowu/.nvs/default/bin/node]
 3: 0x1001a7ae5 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/yowu/.nvs/default/bin/node]
 4: 0x100572ef2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/yowu/.nvs/default/bin/node]
 5: 0x1005759c5 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/Users/yowu/.nvs/default/bin/node]
 6: 0x10057186f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 7: 0x10056fa44 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/yowu/.nvs/default/bin/node]
 8: 0x10057c2dc v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
 9: 0x10057c35f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/Users/yowu/.nvs/default/bin/node]
10: 0x10054e1e4 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/Users/yowu/.nvs/default/bin/node]
11: 0x10082784d v8::internal::Runtime_StringBuilderJoin(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/yowu/.nvs/default/bin/node]
12: 0x36f4c45be3d
[1]    61972 abort      node --max_old_space_size=70000 ./node_modules/.bin/cspell-tools compile-trie

@uyu423
Copy link
Author

uyu423 commented Jan 29, 2019

Is there a way to use the hunspell dic and aff files in cspell without compiling?

@Jason3S
Copy link
Owner

Jason3S commented May 13, 2019

Sorry about not responding. There is a known issue for very large word lists. If you have another way to convert a hunspell file into a list of words, it might work better.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants