Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank #213

Open
sumitkumarjethani opened this issue Apr 11, 2022 · 2 comments
Assignees
Labels
bug good first issue items that are good as starting points for new contributors help wanted
Projects

Comments

@sumitkumarjethani
Copy link

Hi,

I use this library together with spacy for the extraction of the most important words. However, when using the catalan model of spacy, the algorithm gives the following error:

`File "/code/app.py", line 20, in getNlpEntities

entities = runTextRankEntities(hl, contents['contents'], algorithm, num)

File "/code/nlp/textRankEntities.py", line 51, in runTextRankEntities

doc = nlp(joined_content)

File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1022, in call

error_handler(name, proc, [doc], e)

File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 1617, in raise_error

raise e

File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1017, in call

doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 253, in call

doc._.phrases = doc._.textrank.calc_textrank()

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 363, in calc_textrank

nc_phrases = self._collect_phrases(self.doc.noun_chunks, self.ranks)

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 548, in _collect_phrases

return {

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 549, in

span: self._calc_discounted_normalised_rank(span, sum_rank)

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 592, in _calc_discounted_normalised_rank

phrase_rank = math.sqrt(sum_rank / (len(span) + non_lemma))

ZeroDivisionError: division by zero`

@ceteri ceteri self-assigned this Apr 11, 2022
@ceteri ceteri added the bug label Apr 11, 2022
@ceteri ceteri added this to In progress in pytextrank Apr 11, 2022
@ceteri
Copy link
Collaborator

ceteri commented Apr 11, 2022

Hi @sumitkumarjethani, thank you for this report. Let's get it fixed!

Could you please provide:

  • the code for app.py, or at least the body of the runTextRankEntities() function
  • example data in which the exception occurs
  • how spaCy and the Catalan model was installed
  • versions used for spaCy, the Catalan language model
  • your operating system and version

Many thanks!
Paco

@sumitkumarjethani
Copy link
Author

Yeah sure!

  1. Code used for execution: The original code has a quite modular structure, that's why I provide a quite similar version of the original to make it possible to run it locally (don't panic if it doesn't work as I wrote it on github itself).

"""
Returns text rank entites
"""

def getTextRankEntities(doc):

entities = []

for phrase in doc._.phrases:
    phrase_dict = {}

    phrase_dict['entitie'] = phrase.text
    phrase_dict['score'] = phrase.rank
    phrase_dict['n_gram'] = len(phrase.text.split())
    phrase_dict['count'] = phrase.count

    entities.append(phrase_dict)
return entities

"""
Main function to run text rank entites
"""

def runTextRankEntities(content):

entities = []

nlp = spacy.load("models/ca_core_news_lg-3.2.0/ca_core_news_lg/ca_core_news_lg-3.2.0") --> here you have to put the catalan pipeline name
nlp.add_pipe("textrank")

logger.info("Extracting entities with textrank algorithm")
doc = nlp(content)
entities = getTextRankEntities(doc)
logger.info("Entities extracted")
return entities
  1. With regard to the example data where the exception occurs, I am afraid I cannot provide it. However, you can create a string with text in catalan and pass it to the function runTextRankEntities(content).
  2. For the installation of spacy, the following command was executed: pip install spacy
  3. For the installation of spacy catalan model I use the wget command from the repo: https://github.com/explosion/spacy-models/releases/download/ca_core_news_lg-3.2.0/ca_core_news_lg-3.2.0.tar.gz
  4. Spacy version: 3.2.3 | Spacy catalan language model version: 3.2.0
  5. OS: Windows 10 Home

Any other requirements please let me know and I will try to respond as soon as possible.

Thank you very much

@ceteri ceteri added help wanted good first issue items that are good as starting points for new contributors labels Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue items that are good as starting points for new contributors help wanted
Projects
pytextrank
  
In progress
Development

No branches or pull requests

2 participants