Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib: add pure python statefile parser #1789

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

arogge
Copy link
Member

@arogge arogge commented Apr 23, 2024

This PR adds a python module to read Bareos filedaemon statefiles.

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
    Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
    Required documentation changes are present and part of the PR
Tests
  • Decision taken that a test is required (if not, then remove this paragraph)
  • The choice of the type of test (unit test or systemtest) is reasonable
  • Testname matches exactly what is being tested
  • On a fail, output of the test leads quickly to the origin of the fault

@bruno-at-bareos
Copy link
Contributor

Should we anticipate the usage of that decoding for also bareos-sd ?

@arogge
Copy link
Member Author

arogge commented Apr 24, 2024

I haven't tested it, but maybe it just works?
The problem with the SD statefile parsing is that looking the last 10 jobs of an SD is usually a lot less useful (just because so much more jobs are running there).

Copy link
Contributor

@sebsura sebsura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I found a bug and added some suggestions as well.

contrib/misc/statefile-parser/statefile.py Show resolved Hide resolved
contrib/misc/statefile-parser/statefile.py Show resolved Hide resolved
Comment on lines 63 to 64
"Q", # job start time
"Q", # job end time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Q", # job start time
"Q", # job end time
"q", # job start time
"q", # job end time

time_t (and utime_t) are actually signed! the u stands for unix i believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some useless trivia on the usability of time_t: GNU libc guarantees that time_t is signed and negative values are handled correctly, even though they're undefined as of POSIX.
POSIX only requires that time_t is "an integer type" that represents seconds since epoch, so a system using int8_t would still be compliant, even though it couldn't represent a time after 1970-01-01T00:02:07Z.
Last but not least ISO C defines time_t as integer or float that you can pass to library functions that accept a time_t.

But yes, in our case "q" is more correct than "Q" 😃

contrib/misc/statefile-parser/statefile.py Show resolved Hide resolved

results = []
offset = last_job_offset + NUM_FMT_LEN
for _ in range(1, num_recs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for _ in range(1, num_recs):
for _ in range(0, num_recs):

why are we skipping the last one ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um... off-by-one error? 🤦‍♂️

contrib/misc/statefile-parser/statefile.py Outdated Show resolved Hide resolved
contrib/misc/statefile-parser/statefile.py Show resolved Hide resolved
from dataclasses import dataclass
from datetime import datetime

HEADER_FMT = "".join(
Copy link
Contributor

@sebsura sebsura Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think giving the specifiers useful names makes this much clearer. What do you think ?

def padding(count):
    return str(count) + "x"

def cstring(max_size):
    return str(max_size) + "s"

def repeat(type, count):
    return str(count) + type

# using standard sizes

i32 = "i"
i64 = "q"

u32 = "I"
u64 = "Q"

HEADER_FMT = "".join(
    (
        "=",             # native byte-order with standard sizes
	cstring(14),     # magic
	padding(2),
	i32,             # version
        padding(4),  
	u64,             # last job offset
	u64,             # end of list
	repeat(u64, 19), # reserved
    )
)
HEADER_LEN = calcsize(HEADER_FMT)

NUM_FMT = "".join(
    (
        "=", # native byte-order with standard sizes
        i32, # number of result records
    )
)
NUM_FMT_LEN = calcsize(NUM_FMT)

RESULT_FMT = "".join(
    (
        "=",           # native byte-order with standard sizes
	padding(16),
        i32,           # errors
        i32,           # job type
        i32,           # job status
        i32,           # job level
        u32,           # job id
        u32,           # volume session id
        u32,           # volume session time
        u32,           # job files
        u64,           # job bytes
        u64,           # job start time
        u64,           # job end time
        cstring(128),  # job name
    )
)
RESULT_LEN = calcsize(RESULT_FMT)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofc job [start|end] time should be i64 here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't say that this is wrong. However, you will have to understand struct.pack()/struct.unpack() anyway. So I'm not convinced an extra level of indirection will really improve readability.

@bruno-at-bareos
Copy link
Contributor

I haven't tested it, but maybe it just works? The problem with the SD statefile parsing is that looking the last 10 jobs of an SD is usually a lot less useful (just because so much more jobs are running there).

It just work when the state file in not empty. So good enough for contrib code.

@sebsura
Copy link
Contributor

sebsura commented May 14, 2024

We should probably write at least one systemtest for this. Nothing major. A simple backup + read state file and check for consistency should be enough.

@arogge
Copy link
Member Author

arogge commented Jun 6, 2024

We should probably write at least one systemtest for this. Nothing major. A simple backup + read state file and check for consistency should be enough.

But if we do that, then this isn't example-code for contrib anymore. Then we should clean it up, make sure it is portable and put it into the scripts directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants