You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description of the bug
Running simple SVE code with gather operations, I get the following message:
gem5.opt: build/ARM/cpu/o3/rename_map.cc:85: gem5::o3::SimpleRenameMap::RenameInfo gem5::o3::SimpleRenameMap::rename(const gem5::RegId&): Assertion `arch_reg.getNumPinnedWrites() == 0' failed.
Program aborted at tick 257229480756
Digging in the bug, I've generated a simple Exec trace. What I've observed is that after executing the instruction ld1 {z21}, p0/z, [x4, z32], the kernel launches a page fault. After the page fault, the program recovers and tries to execute the affected instruction again. This second time, gem5 arises the above abort.
Explanation of the Bug
When Gather SVE instructions arrive to Rename, they use pinned registers to avoid renaming the same register for each muop. This code can be found here.
SimpleRenameMap::rename(const RegId& arch_reg)
{
PhysRegIdPtr renamed_reg;
// Record the current physical register that is renamed to the
// requested architected register.
PhysRegIdPtr prev_reg = map[arch_reg.index()];
if (arch_reg.is(InvalidRegClass)) {
assert(prev_reg->is(InvalidRegClass));
renamed_reg = prev_reg;
} else if (prev_reg->getNumPinnedWrites() > 0) {
// Do not rename if the register is pinned
assert(arch_reg.getNumPinnedWrites() == 0); // Prevent pinning the
// same register twice
DPRINTF(Rename, "Renaming pinned reg, numPinnedWrites %d\n",
prev_reg->getNumPinnedWrites());
renamed_reg = prev_reg;
renamed_reg->decrNumPinnedWrites();
} else {
renamed_reg = freeList->getReg();
map[arch_reg.index()] = renamed_reg;
renamed_reg->setNumPinnedWrites(arch_reg.getNumPinnedWrites());
renamed_reg->setNumPinnedWritesToComplete(
arch_reg.getNumPinnedWrites() + 1);
}
I've checked in the Rename trace to see that the register was correctly renamed for the first time. It executed for the 16 muops that compose the vector instruction, decrementing the NumPinnedWrites from 16 to 0. However, it raises a page fault when it executes the 11th muop. After the exception, the remaining muops are squashed from the pipeline. Then gem5 executes the following code:
void
DynInst::setSquashed()
{
status.set(Squashed);
if (!isPinnedRegsRenamed() || isPinnedRegsSquashDone())
return;
// This inst has been renamed already so it may go through rename
// again (e.g. if the squash is due to memory access order violation).
// Reset the write counters for all pinned destination register to ensure
// that they are in a consistent state for a possible re-rename. This also
// ensures that dest regs will be pinned to the same phys register if
// re-rename happens.
for (int idx = 0; idx < numDestRegs(); idx++) {
PhysRegIdPtr phys_dest_reg = renamedDestIdx(idx);
if (phys_dest_reg->isPinned()) {
phys_dest_reg->incrNumPinnedWrites();
if (isPinnedRegsWritten())
phys_dest_reg->incrNumPinnedWritesToComplete();
}
}
setPinnedRegsSquashDone();
}
This code increases the number of numPinnedWrites to 5. Expecting that the instruction would be executed from the failing muop.
The second time the SVE gather is executed, the gem5 execution gets in the second case (else if getNumPinnedWrites() > 0) rather than getting inside the final else. Since the first muop has the arch_reg.getNumPinnedWrites() set to 16, an abort arises.
Affects version
Affected in gem5 v22.0.0, but same code exists in gem5 v23.0.0
gem5 Modifications
No modification
To Reproduce
Steps to reproduce the behavior. Please assume starting from a clean repository:
Compile gem5 with ARM ISA
Execute the simulation with fs mode and run an SPMV code vectorized with SVE and a vector length of 2048 bits.
Host Operating System
Ubuntu 22.04
Host ISA
X86
Compiler used
GCC-10
The text was updated successfully, but these errors were encountered:
I need to look at this more carefully.
An initial solution (a bit of a hack), would be to handle squashing due to faulting differently from simple re-execution (in the first case we flush the pipeline completely). For example in:
Description of the bug
Running simple SVE code with gather operations, I get the following message:
Digging in the bug, I've generated a simple Exec trace. What I've observed is that after executing the instruction
ld1 {z21}, p0/z, [x4, z32]
, the kernel launches a page fault. After the page fault, the program recovers and tries to execute the affected instruction again. This second time, gem5 arises the above abort.Explanation of the Bug
When Gather SVE instructions arrive to Rename, they use pinned registers to avoid renaming the same register for each muop. This code can be found here.
I've checked in the Rename trace to see that the register was correctly renamed for the first time. It executed for the 16 muops that compose the vector instruction, decrementing the NumPinnedWrites from 16 to 0. However, it raises a page fault when it executes the 11th muop. After the exception, the remaining muops are squashed from the pipeline. Then gem5 executes the following code:
This code increases the number of numPinnedWrites to 5. Expecting that the instruction would be executed from the failing muop.
The second time the SVE gather is executed, the gem5 execution gets in the second case (else if getNumPinnedWrites() > 0) rather than getting inside the final else. Since the first muop has the
arch_reg.getNumPinnedWrites()
set to 16, an abort arises.Affects version
Affected in gem5 v22.0.0, but same code exists in gem5 v23.0.0
gem5 Modifications
No modification
To Reproduce
Steps to reproduce the behavior. Please assume starting from a clean repository:
Host Operating System
Ubuntu 22.04
Host ISA
X86
Compiler used
GCC-10
The text was updated successfully, but these errors were encountered: