You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If one takes a look at the definition of is_possible_narrative_text it seems that a quick temporary solution would be to at least use language_checks in line 90 such that it instead becomes:
if"eng"inlanguagesandlanguage_checksand (sentence_count(text, 3) <2) and (notcontains_verb(text)):
To Reproduce
fromunstructured.partition.text_typeimportis_possible_narrative_texttext="Dette er et eksempel på en kort sætning."is_possible_narrative_text(text)
which returns False right now. With the above quick-fix, it would return True as expected.
The text was updated successfully, but these errors were encountered:
Hi @cm-halfspace - thanks for reporting this! We'll look at this as soon as we can, or happy to review if you want to open a PR with your suggested change.
Describe the bug
When I partition a Danish .docx file I notice some weird classifications of the element types.
I think this is related to the fact that the
languages
-list is not being set in _parse_paragraph_text_for_element_type, eg inis_possible_narrative_text(text)
.If one takes a look at the definition of
is_possible_narrative_text
it seems that a quick temporary solution would be to at least uselanguage_checks
in line 90 such that it instead becomes:To Reproduce
which returns
False
right now. With the above quick-fix, it would returnTrue
as expected.The text was updated successfully, but these errors were encountered: