Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in 0.20.0 and trunk #5662

Open
Tracked by #5572
SludgePhD opened this issue May 4, 2024 · 4 comments
Open
Tracked by #5572

Deadlock in 0.20.0 and trunk #5662

SludgePhD opened this issue May 4, 2024 · 4 comments
Assignees
Labels
type: bug Something isn't working

Comments

@SludgePhD
Copy link
Contributor

Description

create_bind_group (8x):

#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732adfb7a in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 create_bind_group<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/resource.rs:2208
#13 0x00005c2732aa08d0 in device_create_bind_group<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:1130
#14 0x00005c2732b5e8c4 in device_create_bind_group () at src/backend/wgpu_core.rs:1055
#15 0x00005c2732b6b7f8 in device_create_bind_group<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2240
#16 0x00005c2732b978d1 in create_bind_group () at src/lib.rs:2650

command_encoder_end_render_pass:

(2x)
#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732ab6796 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 command_encoder_run_render_pass_impl<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1386
#13 0x00005c2732b68001 in command_encoder_run_render_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1311
#14 command_encoder_end_render_pass () at src/backend/wgpu_core.rs:1933
#15 0x00005c2732b6dac0 in command_encoder_end_render_pass<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2771

#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732ab6892 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 command_encoder_run_render_pass_impl<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1389
#13 0x00005c2732b68001 in command_encoder_run_render_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1311
#14 command_encoder_end_render_pass () at src/backend/wgpu_core.rs:1933
#15 0x00005c2732b6dac0 in command_encoder_end_render_pass<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2771

device_create_buffer (6x):
#7  lock_exclusive_slow () at src/raw_rwlock.rs:633
#8  0x00005c2732c1a6d0 in lock_exclusive () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:73
#9  write<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:491
#10 write<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:85
#11 assign<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:94
#12 0x00005c2732a9b7ce in device_create_buffer<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:260
#13 0x00005c2732b6138b in device_create_buffer () at src/backend/wgpu_core.rs:1251

bind_group_drop:
#7  lock_exclusive_slow () at src/raw_rwlock.rs:633
#8  0x00005c2732c1d4d8 in lock_exclusive () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:73
#9  write<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:491
#10 write<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:85
#11 unregister<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:166
#12 0x00005c2732a95346 in bind_group_drop<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:1165
#13 0x00005c2732b6ce58 in bind_group_drop<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2541

create_bind_group exclusive (2x)
#6  wait_for_readers () at src/raw_rwlock.rs:1013
#7  0x00005c2732eb0647 in lock_exclusive_slow () at src/raw_rwlock.rs:644
#8  0x00005c2732c1b5d0 in lock_exclusive () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:73
#9  write<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:491
#10 write<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:85
#11 assign<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:94
#12 0x00005c2732aa0a3a in device_create_bind_group<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:1135
#13 0x00005c2732b5e8c4 in device_create_bind_group () at src/backend/wgpu_core.rs:1055

#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732c1f658 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 get<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:139
#13 0x00005c2732a9612e in buffer_map_async_inner<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:2420
#14 buffer_map_async<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:2389
#15 0x00005c2732b6494b in buffer_map_async () at src/backend/wgpu_core.rs:1512

command_encoder_end_compute_pass:
#8  0x00005c2732bb68a3 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 resolve_compute_command_ids<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute_command.rs:82
#13 0x00005c2732ac0486 in command_encoder_run_compute_pass_with_unresolved_commands<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:313
#14 command_encoder_run_compute_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:298
#15 0x00005c2732b67417 in command_encoder_end_compute_pass () at src/backend/wgpu_core.rs:1849

#8  0x00005c2732bb6634 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 resolve_compute_command_ids<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute_command.rs:83
#13 0x00005c2732ac0486 in command_encoder_run_compute_pass_with_unresolved_commands<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:313
#14 command_encoder_run_compute_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:298
#15 0x00005c2732b67417 in command_encoder_end_compute_pass () at src/backend/wgpu_core.rs:1849

Might be a duplicate of one of the known deadlock issues in #5572, I'm not sure yet.

Repro steps
Closed source project, so not available.

Expected vs observed behavior
No deadlock vs Yes deadlock

Platform
Linux, Vulkan. wgpu 0.20.0 is affected (and is where the backtraces are from), but trunk also deadlocks in a similar way.

@Wumpf Wumpf added the type: bug Something isn't working label May 4, 2024
@ErichDonGubler
Copy link
Member

CC @jimblandy, who's been working on issues like this recently.

@jimblandy
Copy link
Member

Could you pull out from those stacks the locks each thread is holding, if any?

@jimblandy
Copy link
Member

Actually, it might suffice simply to know which lock each thread is trying to acquire, and I could figure out which other ones it must be holding.

@SludgePhD
Copy link
Contributor Author

The deadlock appears to be caused by:

  • command_encoder_end_compute_pass acquires the buffer read lock before the bind group read lock here:
    let buffers_guard = hub.buffers.read();
    let bind_group_guard = hub.bind_groups.read();
  • command_encoder_end_render_pass acquires the bind group read lock before the buffer read lock here:
    let bundle_guard = hub.render_bundles.read();
    let bind_group_guard = hub.bind_groups.read();
    let render_pipeline_guard = hub.render_pipelines.read();
    let query_set_guard = hub.query_sets.read();
    let buffer_guard = hub.buffers.read();

In the backtraces above, there is one thread in the first location holding the buffers lock and trying to acquire the bind_groups lock, and one thread in the second location holding most locks (including the bind_group one) and trying to acquire the buffers lock.

While these are all RWLocks, and these are all read lock acquisitions, there are also several threads trying to acquire write locks for both the bind_group and buffer storages. Due to the fair RWLock implementation in parking_lot, this makes the attempts to acquire read locks block until the write lock can be acquired, which then completes the deadlock.

It sounds like rank::REGISTRY_STORAGE should be split into one rank per resource to catch mistakes like this, maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants