Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Using Juman++ in dart using dart:ffi #145

Open
CaptainDario opened this issue Feb 15, 2022 · 10 comments
Open

[Question] Using Juman++ in dart using dart:ffi #145

CaptainDario opened this issue Feb 15, 2022 · 10 comments

Comments

@CaptainDario
Copy link

I am developing a flutter app and would like to include morphological analysis. Now I am considering writing a package to use juman++ in dart using dart:ffi. This allows using native c code in dart.
Now I am wondering if there is platform-dependent code or something similar which would prevent this project to run on iOS/Android. Are the ML models used also available for TensorFlow lite?
Any help or suggestions would be very much appreciated.

@eiennohito
Copy link
Contributor

There should not be platform dependent code in Juman++ and it runs on M1-based macs natively without problem. There is, however, no C API, only C++ one. I am not familiar with Dart and its FFI, but most of FFI interactions happen via C API. Finally, the Juman++ model is not just a neural net and TensorFlow Lite compatibility really makes no sense in this context.

@CaptainDario
Copy link
Author

CaptainDario commented Feb 15, 2022

Thanks for your reply!

I never tried to use c++ in dart:ffi, however it looks like c++ should be usable if c++ symbols are marked as extern C. Could that cause any problems? If not I would give it a shot because this seems to be working much better for analyzing random texts from the internet compared to MeCab.

Sorry about the stupid tensorflow assumption. Somehow I thought the RNN uses tensorflow, but briefly looking at the code it seems like juman is using a custom build RNN.

@eiennohito
Copy link
Contributor

eiennohito commented Feb 15, 2022

It is not just a matter of extern C to expose C++ API as C. In the current state, the API is probably unusable from C.
First, all Juman++ strings (e.g. dictionary fields and morphemes surface) are not null-terminated strings, but slices. The abstraction for it (StringPiece) would not easily unusable from C. Making a C api is in backlog(#61), but I never had the time or had a need for it myself.

@CaptainDario
Copy link
Author

CaptainDario commented Feb 15, 2022

Thanks again for the reply.
I think the extern c is just to make dart aware of the c++ code. The docs say that c++ code should work. There is also an OpenCV dart:ffi version and a blog of the author of how to use c++. Therefore it seems possible to use juman++ in dart:ffi than. I think I will give it a shot and come back here to ask if there are any problems directly related to juman.

One more question, because for mobile the download size matters, should I also expect a size of 300mb?

@eiennohito
Copy link
Contributor

eiennohito commented Feb 15, 2022

Yes, model is pretty large, it is an unfortunate tradeoff with analysis accuracy here. I am not really sure that Juman++ is a good fit for mobile if the analysis accuracy is not of utmost importance.

extern C changes the symbol mangling for C++ symbols and Juman++ API surface is not only simple functions as shown in the Dart FFI example.
Also, OpenCV should have a stable C API regardless of the implementation language.

@CaptainDario
Copy link
Author

I think the size would not be too much of an issue as long as it does not cross the gigabyte mark.

That sounds discouraging, are there plans for a C-API?

@CaptainDario
Copy link
Author

CaptainDario commented Feb 15, 2022

After further investigation, you are right dart:ffi can only bind to C-APIs.
As there are already a few people asking for a C-API are there any plans for something in the near future?

@eiennohito
Copy link
Contributor

Unfortunately, it is very low in the list of my priorities, I probably won't work on it in any foreseeable future.

@CaptainDario
Copy link
Author

That is sad to hear.
I am quite clueless with C++ programming but would it be possible to only have a binding/C-API for the main entry point? Because I basically want to use this library as an off-the-shelf component and use it like shown in the docs

echo "魅力がたっぷりと詰まっている" | jumanpp

But if that also has some big hindrances I will stick to mecab.

@eiennohito
Copy link
Contributor

The simplest entry point is something like https://github.com/eiennohito/jumanpp-t9/blob/master/src/jumanpp_t9.cc and sure the C API can be done. I don't think that I will work on in in the nearest future, though. MeCab is probably your best bet as it has C API. I will probably implement MeCab-compatible C API if I will implement C API in the future, because MeCab is de-facto standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants