ZeroDivisionError: division by zero in _calc_discounted_normalised_rank #213

sumitkumarjethani · 2022-04-11T11:47:28Z

Hi,

I use this library together with spacy for the extraction of the most important words. However, when using the catalan model of spacy, the algorithm gives the following error:

`File "/code/app.py", line 20, in getNlpEntities

entities = runTextRankEntities(hl, contents['contents'], algorithm, num)

File "/code/nlp/textRankEntities.py", line 51, in runTextRankEntities

doc = nlp(joined_content)

File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1022, in call

error_handler(name, proc, [doc], e)

File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 1617, in raise_error

raise e

File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1017, in call

doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 253, in call

doc._.phrases = doc._.textrank.calc_textrank()

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 363, in calc_textrank

nc_phrases = self._collect_phrases(self.doc.noun_chunks, self.ranks)

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 548, in _collect_phrases

return {

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 549, in

span: self._calc_discounted_normalised_rank(span, sum_rank)

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 592, in _calc_discounted_normalised_rank

phrase_rank = math.sqrt(sum_rank / (len(span) + non_lemma))

ZeroDivisionError: division by zero`

The text was updated successfully, but these errors were encountered:

ceteri · 2022-04-11T16:51:31Z

Hi @sumitkumarjethani, thank you for this report. Let's get it fixed!

Could you please provide:

the code for app.py, or at least the body of the runTextRankEntities() function
example data in which the exception occurs
how spaCy and the Catalan model was installed
versions used for spaCy, the Catalan language model
your operating system and version

Many thanks!
Paco

sumitkumarjethani · 2022-04-12T12:34:02Z

Yeah sure!

Code used for execution: The original code has a quite modular structure, that's why I provide a quite similar version of the original to make it possible to run it locally (don't panic if it doesn't work as I wrote it on github itself).

"""
Returns text rank entites
"""

def getTextRankEntities(doc):

entities = []

for phrase in doc._.phrases:
    phrase_dict = {}

    phrase_dict['entitie'] = phrase.text
    phrase_dict['score'] = phrase.rank
    phrase_dict['n_gram'] = len(phrase.text.split())
    phrase_dict['count'] = phrase.count

    entities.append(phrase_dict)
return entities

"""
Main function to run text rank entites
"""

def runTextRankEntities(content):

entities = []

nlp = spacy.load("models/ca_core_news_lg-3.2.0/ca_core_news_lg/ca_core_news_lg-3.2.0") --> here you have to put the catalan pipeline name
nlp.add_pipe("textrank")

logger.info("Extracting entities with textrank algorithm")
doc = nlp(content)
entities = getTextRankEntities(doc)
logger.info("Entities extracted")
return entities

With regard to the example data where the exception occurs, I am afraid I cannot provide it. However, you can create a string with text in catalan and pass it to the function runTextRankEntities(content).
For the installation of spacy, the following command was executed: pip install spacy
For the installation of spacy catalan model I use the wget command from the repo: https://github.com/explosion/spacy-models/releases/download/ca_core_news_lg-3.2.0/ca_core_news_lg-3.2.0.tar.gz
Spacy version: 3.2.3 | Spacy catalan language model version: 3.2.0
OS: Windows 10 Home

Any other requirements please let me know and I will try to respond as soon as possible.

Thank you very much

ceteri self-assigned this Apr 11, 2022

ceteri added the bug label Apr 11, 2022

ceteri added this to In progress in pytextrank Apr 11, 2022

ceteri added help wanted good first issue items that are good as starting points for new contributors labels Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank #213

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank #213

sumitkumarjethani commented Apr 11, 2022

ceteri commented Apr 11, 2022

sumitkumarjethani commented Apr 12, 2022

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank #213

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank #213

Comments

sumitkumarjethani commented Apr 11, 2022

ceteri commented Apr 11, 2022

sumitkumarjethani commented Apr 12, 2022