Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and document a stable C API #61

Open
DoumanAsh opened this issue Jan 4, 2018 · 15 comments
Open

Design and document a stable C API #61

DoumanAsh opened this issue Jan 4, 2018 · 15 comments

Comments

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Jan 4, 2018

For Juman++ to be widely usable, we want to have a documented and stable C API and an option to have a dynamically linked library.
That library probably should use -fvisibility=hidden and explicit visibility on exported symbols on Unixes and __declspec(dllimport/dllexport) on Windows.

The minimal API should be:

  1. Loading a model using a config file
  2. Analyzing a sentence
  3. Accessing the analysis top-1, top-N and dictionary data

Additional things (low priority):

  1. Access to the lattice?

There should not be any exceptions passing that API boundary because of C.

One problem remains: what to do with strings.
Dictionary strings in Juman++ are never zero terminated (they are length prefixed). Probably the API should provide non-zero terminated strings by default (exposing some StringPiece-like C struct) and an easy way to convert such strings to zero-terminated ones.

@eiennohito
Copy link
Contributor

YES! I need to do this! Definitely!

But right now I invite you to look into the main launcher https://github.com/ku-nlp/jumanpp/blob/master/src/jumandic/main/jumanpp.cc

@DoumanAsh
Copy link
Contributor Author

Yes that's good starting point, though currently as I'm on windows I cannot use jumanpp.
Cuz #31

Though it gave me idea to write some cross-platform mmap for C++

@eiennohito
Copy link
Contributor

eiennohito commented Jan 4, 2018

Yes, if you can work on that it should be wonderful.
The C++ itself code should be compilable with MSVC 2017 (I haven't tried), or definitely by Windows build of clang/mingw64.

There are 3 pieces of unix api I use, mmap (a wrapper is designed to be easily implemented for Windows though), posix_memalign (in util/memory, on Linux I use it with 2MB alignment and madvice to force transparent huge pages) and unlink in tests.

@eiennohito
Copy link
Contributor

The other thing is, on Windows, filesystem access should use W versions of API (and convert paths from/to utf-8).

@eiennohito eiennohito changed the title Documentation on library interface? Design and document a stable C API Mar 23, 2018
@eiennohito
Copy link
Contributor

eiennohito commented Mar 23, 2018

@DoumanAsh @kou what do you think.
I've reused the issue to be more related to what actually should be done.

@DoumanAsh
Copy link
Contributor Author

@eiennohito Should it be then just API in general? After all there will be also C++ API? 🤔

@eiennohito
Copy link
Contributor

eiennohito commented Mar 23, 2018

C++ API won't have stability guarantees (I will try not to break it much though) and won't support dynamic linking/systemwide installation as a library (basically, the current status-quo).

@DoumanAsh
Copy link
Contributor Author

Ah that's understandable of course.

I guess the first thing we'd need to try to take a look at existing public API and see how it is to be used.
I understand goal of mimicking mecab API, but i think it wouldn't harm to do own thing if it can be better API.

I've somewhat busy lately but I'll try to take a look at existing API later on and give some inputs/suggestion on it.

@kou
Copy link

kou commented Mar 23, 2018

C++ API won't support dynamic linking/systemwide installation as a library (basically, the current status-quo).

I'm a bit confused.
Does it mean that C API will support dynamic link but C++ API doesn't support dynamic link? If it's true, does C API use C++ API by static link?

@eiennohito
Copy link
Contributor

eiennohito commented Mar 23, 2018

Yes, C API will support consuming Juman++ as a dynamic object. I plan to build it with a bit different set of compile flags.

In theory it is possible to build even the current version Juman++ with -fPIC and link everything as dynamic libraries, and things will probably even work, but there could be a lot of not-very-fun problems with C++, dynamic linking, initialization order of globals and other runtime stuff.
E.g. exposing C++ classes over dll boundaries on Windows can lead to serious problems.

The second problem is symbol visibility. We don't want to export our version of Eigen and fmtlib to outside world, because that would confuse other software using their own version of Eigen. On Unixes we should use -fvisibility=hidden for that and explicitly define exported symbols.

C API library will cleanly solve these problems by providing a single shared object with only C API marked as visible and all C++ state contained inside that object.

@kou
Copy link

kou commented Mar 23, 2018

Thanks. I understand what you explain.

@kou
Copy link

kou commented Mar 24, 2018

I'll try creating a C API and send feedback to Juman++ in a few months.

@eiennohito
Copy link
Contributor

I maybe start doing that earlier, but I'll post here when I will begin doing it.

@eiennohito
Copy link
Contributor

eiennohito commented Jun 12, 2018

Nothing about C API yet, but I am writing a small tutorial on "How to use Juman++ to create your own morphological analyzers".

https://github.com/eiennohito/jumanpp-t9

@kou
Copy link

kou commented Dec 28, 2018

Sorry. I couldn't work on this this year...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants