Skip to content
Pedro Scocco edited this page Aug 26, 2015 · 22 revisions

Context

There are some metrics, like code similarity, that are not defined for every file in a given source code repository. They exist only to point out issues for specific regions in the code. This behaviour breaks Mezuro's aggregation and value interpretation features and consist of a specific case that must be handled accordingly.

Example

One example of this is the Flay metric, which points out code duplications in Ruby. Its YAML output from MetricFu has the following structure:

:flay:
  :total_score: '445'
  :matches:
  - :reason: 1) Similar code found in :defn (mass = 72)
    :matches:
    - :name: app/controllers/projects_controller.rb
      :line: '18'
    - :name: app/controllers/repositories_controller.rb
      :line: '29'
  - :reason: 2) Similar code found in :defn (mass = 70)
    :matches:
    - :name: app/controllers/projects_controller.rb
      :line: '30'
    - :name: app/controllers/repositories_controller.rb
      :line: '41'

In summary we have:

  • A list of reasons
    • The reason attribute contains a message about the issue
      • The message may say if the duplication has similarities or if it is identical. The first one is not so severe as the second one.
      • A mass value that is a numerical quantification of the duplication severity.
    • Each reason has the matches attribute which is a list
      • file name
      • line

Requisites

Looking to this output, we can get the following requisites that are not yet supported:

  • Additional attributes
    • line
    • message
  • One result may relate to another

Some warnings:

  • One file may have many relations to other files and to itself.

First solution

We segregate this kind of metric from the remaining into two separate sets which are illustrated below:

Hotspot UML and ER

MetricResult becomes a superclass which are extended by:

  • TreeResult which keeps the default behaviour from MetricResult that already exists
  • HotspotResult which will hold the new attributes line (a integer) and message (a text)

In order to represent the relations between HotspotResults there will be a associative class RelatedHotspotResults which is a n by n relation under the relational model.

With this, we are not going to aggregate or interpret HotspotResults. They are just holding the necessary information for listing them.

Pros and Cons

This solution is nice in the way it keeps using the already existing implementations from ModuleResult and MetricResult while still gives the flexibility for future adaptations that may be necessary in order to support different metrics.

On the other hand, the RelatedHotspotResults has n^2 complexity in space and the search complexity may be high as well.

Side effects

This solution, focused on the KalibroProcessor side, leads to additional adaptations that may be necessary for KalibroClient: HotspotMetric.

And for KalibroConfiguration: HotspotMetricConfiguration.

The main reason of those is to have a explicit way to tell which metrics can be used by compound ones and which ones should not receive Ranges, ReadingGroups and Readings.

First Solution Review

With further discussion we have considered grouping HotspotResults with a belongs_to relation with RelatedHotspotResults. This way we would no longer have a quadratic number of relations.

We would still be able to use the HotspotResults's has_many relation using this Rails feature: Has Many Through

Future features

There are some features that are desired but we will leave them out for now:

  • Store the hotspot code regions, not just the lines
  • Give grades for hotspot metrics
  • In some way, combine the hostpot grade with the tree result grade to calculate the repository's global grade