BLEURT is failing to produce results #119

Santhanreddy71 · 2022-10-12T18:18:27Z

I was trying to check with the same example mentioned in the readme file for Bleurt. It is failing by throwing an error. Please let me know the issue.

Error :

ImportError                               Traceback (most recent call last)
<ipython-input-16-ed14e2ab4c7e> in <module>
----> 1 bleurt = Bleurt.construct()
      2 score = bleurt.compute(predictions=predictions, references=references)

~\anaconda3\lib\site-packages\jury\metrics\_core\auxiliary.py in construct(cls, task, resulting_name, compute_kwargs, **kwargs)
     99         subclass = cls._get_subclass()
    100         resulting_name = resulting_name or cls._get_path()
--> 101         return subclass._construct(resulting_name=resulting_name, compute_kwargs=compute_kwargs, **kwargs)
    102 
    103     @classmethod

~\anaconda3\lib\site-packages\jury\metrics\_core\base.py in _construct(cls, resulting_name, compute_kwargs, **kwargs)
    235         cls, resulting_name: Optional[str] = None, compute_kwargs: Optional[Dict[str, Any]] = None, **kwargs
    236     ):
--> 237         return cls(resulting_name=resulting_name, compute_kwargs=compute_kwargs, **kwargs)
    238 
    239     @staticmethod

~\anaconda3\lib\site-packages\jury\metrics\_core\base.py in __init__(self, resulting_name, compute_kwargs, **kwargs)
    220     def __init__(self, resulting_name: Optional[str] = None, compute_kwargs: Optional[Dict[str, Any]] = None, **kwargs):
    221         compute_kwargs = self._validate_compute_kwargs(compute_kwargs)
--> 222         super().__init__(task=self._task, resulting_name=resulting_name, compute_kwargs=compute_kwargs, **kwargs)
    223 
    224     def _validate_compute_kwargs(self, compute_kwargs: Dict[str, Any]) -> Dict[str, Any]:

~\anaconda3\lib\site-packages\jury\metrics\_core\base.py in __init__(self, task, resulting_name, compute_kwargs, config_name, keep_in_memory, cache_dir, num_process, process_id, seed, experiment_id, max_concurrent_cache_files, timeout, **kwargs)
    100         self.resulting_name = resulting_name if resulting_name is not None else self.name
    101         self.compute_kwargs = compute_kwargs or {}
--> 102         self.download_and_prepare()
    103 
    104     @abstractmethod

~\anaconda3\lib\site-packages\evaluate\module.py in download_and_prepare(self, download_config, dl_manager)
    649             )
    650 
--> 651         self._download_and_prepare(dl_manager)
    652 
    653     def _download_and_prepare(self, dl_manager):

~\anaconda3\lib\site-packages\jury\metrics\bleurt\bleurt_for_language_generation.py in _download_and_prepare(self, dl_manager)
    120         global bleurt
    121         try:
--> 122             from bleurt import score
    123         except ModuleNotFoundError:
    124             raise ModuleNotFoundError(

ImportError: cannot import name 'score' from 'bleurt' (unknown location)

devrimcavusoglu · 2022-10-12T20:15:31Z

Hi @Santhanreddy71. Can you provide your jury version and your OS info ? Can you also verify that you have the bleurt package installed ? Jury uses lazy package importing to reduce the dependency load, to be able to use some metrics, related packages must be installed first, e.g for bleurt. You can install bleurt as follows

pip install git+https://github.com/devrimcavusoglu/bleurt.git

Normally, this is stated in an error message in metrics that use additional package. If this does not solve your problem, try with the latest version, and comment again if unsolved.

Santhanreddy71 · 2022-10-13T09:08:58Z

Thanks for Writing back,
I am working on windows OS and the jury version is Jury==2.2.2. I installed the mentioned bluert link and checked.
I try to solve that error but, I am facing some other issue with the bluert.

Error message:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-e0b245a06c35> in <module>
      2 references = [["the cat is playing on the mat.", "The cat plays on the mat."]]
      3 bleurt = jury.load_metric("bleurt", config_name="bleurt-tiny-512")
----> 4 results = bleurt.compute(predictions=predictions, references=references)
      5 print(results)

~\anaconda3\lib\site-packages\evaluate\module.py in compute(self, predictions, references, **kwargs)
    442             inputs = {input_name: self.data[input_name] for input_name in self._feature_names()}
    443             with temp_seed(self.seed):
--> 444                 output = self._compute(**inputs, **compute_kwargs)
    445 
    446             if self.buf_writer is not None:

~\anaconda3\lib\site-packages\jury\metrics\_core\base.py in _compute(self, predictions, references, **kwargs)
    320         eval_params.pop("reduce_fn")
    321         predictions, references = Collator(predictions), Collator(references)
--> 322         result = self.evaluate(predictions=predictions, references=references, reduce_fn=reduce_fn, **eval_params)
    323         return {self.resulting_name: result}
    324 

~\anaconda3\lib\site-packages\jury\metrics\_core\base.py in evaluate(self, predictions, references, **kwargs)
    274         else:
    275             eval_fn = self._compute_multi_pred_multi_ref
--> 276         return eval_fn(predictions=predictions, references=references, **kwargs)
    277 
    278 

~\anaconda3\lib\site-packages\jury\metrics\bleurt\bleurt_for_language_generation.py in _compute_multi_pred_multi_ref(self, predictions, references, reduce_fn, **kwargs)
    191             for pred in preds:
    192                 pred = [pred] * len(refs)
--> 193                 pred_score = self.scorer.score(references=refs, candidates=pred)
    194                 pred_scores.append(reduce_fn(pred_score))
    195             reduced_score = float(reduce_fn(pred_scores))

~\AppData\Roaming\Python\Python38\site-packages\bleurt\score.py in score(self, references, candidates, batch_size, *args)
    213           "segment_ids": segment_ids
    214       }
--> 215       predict_out = self._predictor.predict(tf_input)
    216       batch_results = predict_out.tolist()
    217       all_results.extend(batch_results)

~\AppData\Roaming\Python\Python38\site-packages\bleurt\score.py in predict(self, input_dict)
     65 
     66   def predict(self, input_dict):
---> 67     predictions = self._bleurt_model_ops(
     68         input_ids=tf.constant(input_dict["input_ids"]),
     69         input_mask=tf.constant(input_dict["input_mask"]),

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
   1705       TypeError: If the arguments do not match the function's signature.
   1706     """
-> 1707     return self._call_impl(args, kwargs)
   1708 
   1709   def _call_impl(self, args, kwargs, cancellation_manager=None):

~\anaconda3\lib\site-packages\tensorflow\python\eager\wrap_function.py in _call_impl(self, args, kwargs, cancellation_manager)
    244       return self._call_flat(args, self.captured_inputs)
    245     else:
--> 246       return super(WrappedFunction, self)._call_impl(
    247           args, kwargs, cancellation_manager)
    248 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _call_impl(self, args, kwargs, cancellation_manager)
   1723             raise structured_err
   1724 
-> 1725       return self._call_with_flat_signature(args, kwargs, cancellation_manager)
   1726 
   1727   def _call_with_flat_signature(self, args, kwargs, cancellation_manager):

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _call_with_flat_signature(self, args, kwargs, cancellation_manager)
   1772                         "got {} ({})".format(self._flat_signature_summary(), i,
   1773                                              type(arg).__name__, str(arg)))
-> 1774     return self._call_flat(args, self.captured_inputs, cancellation_manager)
   1775 
   1776   def _call_with_structured_signature(self, args, kwargs, cancellation_manager):

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1961         and executing_eagerly):
   1962       # No tape is watching; skip to running the function.
-> 1963       return self._build_call_outputs(self._inference_function.call(
   1964           ctx, args, cancellation_manager=cancellation_manager))
   1965     forward_backward = self._select_forward_and_backward_functions(

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
    589       with _InterpolateFunctionError(self):
    590         if cancellation_manager is None:
--> 591           outputs = execute.execute(
    592               str(self.signature.name),
    593               num_outputs=self._num_outputs,

~\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57   try:
     58     ctx.ensure_initialized()
---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

InvalidArgumentError: cannot compute __inference_pruned_4212 as input #0(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:__inference_pruned_4212]

devrimcavusoglu · 2022-10-16T11:13:29Z

Hi @Santhanreddy71, I just tried BLEURT on a Windows machine, indeed I got the same InvalidArgumentError you mentioned. The reason behind this seems to be tf version/installation on windows or new commit to bleurt repo. I'll investigate the issue and try to fix it asap. Thanks for the heads up ! 👍

devrimcavusoglu · 2022-10-16T17:10:45Z

@Santhanreddy71 Btw, meanwhile you can use a unix like OS to compute bleurt, or you can use Colab to compute metrics; try the following on Google's colab if you want.

# Cell 1
!pip install jury
!pip install git+https://github.com/devrimcavusoglu/bleurt.git

# Cell 2
import jury

mt_predictions = [
    ["the cat is on the mat", "There is cat playing on the mat"], 
    ["Look! a wonderful day."]
]
mt_references = [
    ["the cat is playing on the mat.", "The cat plays on the mat."],
    ["Today is a wonderful day", "The weather outside is wonderful."],
]

bleurt = jury.load_metric("bleurt")
bleurt.compute(predictions=mt_predictions, references=mt_references)
>>> {'bleurt': {'score': -0.37700408697128296,
  'scores': [0.2734588384628296, -1.0274670124053955],
  'checkpoint': 'bleurt-base-128'}}

NOTE: After executing Cell 1 (the installations), you may need to restart the runtime, you can do that by "Runtime > Restart Runtime" and you can directly execute Cell 2 without executing Cell 1 again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLEURT is failing to produce results #119

BLEURT is failing to produce results #119

Santhanreddy71 commented Oct 12, 2022 •

edited by devrimcavusoglu

devrimcavusoglu commented Oct 12, 2022

Santhanreddy71 commented Oct 13, 2022 •

edited by devrimcavusoglu

devrimcavusoglu commented Oct 16, 2022

devrimcavusoglu commented Oct 16, 2022 •

edited

BLEURT is failing to produce results #119

BLEURT is failing to produce results #119

Comments

Santhanreddy71 commented Oct 12, 2022 • edited by devrimcavusoglu

devrimcavusoglu commented Oct 12, 2022

Santhanreddy71 commented Oct 13, 2022 • edited by devrimcavusoglu

devrimcavusoglu commented Oct 16, 2022

devrimcavusoglu commented Oct 16, 2022 • edited

Santhanreddy71 commented Oct 12, 2022 •

edited by devrimcavusoglu

Santhanreddy71 commented Oct 13, 2022 •

edited by devrimcavusoglu

devrimcavusoglu commented Oct 16, 2022 •

edited