Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core hours reporting is incorrect in endpoint reports #859

Open
benclifford opened this issue Jul 26, 2022 · 0 comments
Open

core hours reporting is incorrect in endpoint reports #859

benclifford opened this issue Jul 26, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@benclifford
Copy link
Contributor

benclifford commented Jul 26, 2022

Describe the bug
When a manager shuts down, the two core hours fields have the core hours from those managers removed.

This results in negative new_core_hrs as those hours are removed, and the total_core_hrs no longer reflects the total core hours as the shutdown manager has been removed.

That makes these values somewhat awkward to use.

In this example, observe negative new_core_hrs and that total_core_hrs has gone back to 0:

1658831219.077176 2022-07-26 12:26:59 DEBUG MainProcess-119578 MainThread-139748935808832 funcx_endpoint.endpoint.interchange:362 _main_loop Publishing message b'\x01{"message_type":"ep_status_report","data":{"endpoint_id":"ee4737d1-cb0e-4048-93e7-b69fede8e4e3","ep_status_report":{"task_id":-2,"info":{"total_cores":0,"total_mem":0,"new_core_hrs":-0.33205335670047337,"total_core_hrs":0,"managers":0,"active_managers":0,"total_workers":0,"idle_workers":0,"pending_tasks":0,"outstanding_tasks":{},"worker_mode":"no_container","scheduler_mode":"hard","scaling_enabled":true,"mem_per_worker":null,"cores_per_worker":1.0,"prefetch_capacity":10,"max_blocks":1,"min_blocks":0,"max_workers_per_node":Infinity,"nodes_per_block":1}},"task_statuses":{}}}'

This is because each iteration, the total_core_hours is calculated by summing over the currently active managers:

        for manager in self._ready_manager_queue:
        ....
            core_hrs += (active_dur * total_cores) / 3600

To Reproduce
Watch an endpoint report ep_status_report before and after a manager is shut down.

Expected behavior
The total_core_hrs field should report the total core hours.

Environment
funcX endpoint main at fd9c2cf

@benclifford benclifford added the bug Something isn't working label Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant