Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term inconsistency: log / raw data / archive #22190

Open
matomoto opened this issue May 6, 2024 · 4 comments
Open

Term inconsistency: log / raw data / archive #22190

matomoto opened this issue May 6, 2024 · 4 comments
Labels
To Triage An issue awaiting triage by a Matomo core team member

Comments

@matomoto
Copy link

matomoto commented May 6, 2024

There is a widespreaded term inconsistency in matomo on many points with the terms log and raw data and archive.

Example:

UI → Administration → System → General Settings →

Delete old visitor logs and reports
You can configure Matomo to regularly delete old raw data and/or aggregated reports to keep your database small or to meet privacy regulations such as GDPR.
Click here to access the 'Delete old visitor logs and reports' settings.

linked to:

UI → Administration → Privacy → Anonymize data →
Regularly delete old raw data
You can configure Matomo to regularly delete old raw data and/or aggregated reports to keep your database small or to meet privacy regulations such as GDPR.
Regularly delete old raw data from the database
The raw data contains all details about each individual visit and each action your visitors took. When you delete raw data, the deleted information won't be available anymore in the visitor log. Also if you later decide to create a segment, the segmented reports won't become available for the time frame that has been deleted since all aggregated reports are generated from this raw data.

By the way: In the last, the term visitor log is used. This is an another problem. It must be visits log. And ...
UI → Dashboard → Sidebar (left) → Visitors → Visits Log
The list header Visitors is wrong. It must be Visits. Visitors is only correct by Visitor Profile.

Back to the issue:
In the global.ini.php there is using the category name [Deletelogs] with variable names like delete_logs_*.

The database tables used archive and named with archive_blob_[year]_[month], buth the table rows used log and named with log_*.

In the forum, the term raw data is mostly used.

Question: Is there a diffenrence between log and raw data and archive?
If is not: Please use always all or global only one. (by last: big problem with the database).

@matomoto matomoto added the To Triage An issue awaiting triage by a Matomo core team member label May 6, 2024
@sgiehl
Copy link
Member

sgiehl commented May 6, 2024

log and raw data is the same. But an archive relates to processed reports.

@matomoto
Copy link
Author

matomoto commented May 6, 2024

The term report / reports is not a problem, but

archive relates to processed reports.

It is possible to delete / invalidate reports and untouch the Tracking Raw Data. So, archive not the Raw Data?

I don't understand it really. Mean you, that the database tables named archive_blob_* only relate to the blob data in the rows named log_*?

The question behind the question is about the touching of the really Tracking Raw Data by using commands. Example: purge-old-archive-data used the term archive, and with using database table names like archive_* this related to the blobs and also purge the blobs? It's a really unsure matter. See here: #19851 (comment)

@sgiehl
Copy link
Member

sgiehl commented May 6, 2024

All reports (archives stored in archive_blob_* / archive_numeric tables) are built using raw data (log data stored in log_* tables). So as long as there is raw data for a certain period available, the reports can be rebuilt at any time.

@matomoto
Copy link
Author

matomoto commented May 6, 2024

Call back: Only the database tables

  • log_action
  • log_conversion
  • log_conversion_item
  • log_link_visit_action
  • log_profiling
  • log_visit

are only contain the really tracking raw data?

So, when all archive_blob_* and all archive_numeric_* tables are TRUNCATEed, it is possible to re-archive all reports? Sorry for my question, but maybe i misunderstood it. I had thinking that the blobs are contained the tracking raw data (User Agent). This is wrong.

Is this correct:

  • raw data = log
  • archive = reports

When it is so, than a global hint is missing to this info, especially when only one is used and relegate to the other (UI: Delete old visitor logs and reports → Regularly delete old raw data / Delete old aggregated report data).

Mistake in the topic: The database tables archive_blob_* don't contain table rows log_*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To Triage An issue awaiting triage by a Matomo core team member
Projects
None yet
Development

No branches or pull requests

2 participants