gem5 Version 23.1 is our first release where the development has been on GitHub.
During this release, there have been 362 pull requests merged which comprise 416 commits with 51 unique contributors.
Significant API and user-facing changes
The gem5 build can is now configured with kconfig
- Most gem5 builds without customized options (excluding double dash options) (e.g. , build/X86/gem5.opt) are backwards compatible and require no changes to your current workflows.
- All of the default builds in
build_opts
are unchanged and still available. - However, if you want to specialize your build. For example, use customized ruby protocol. The command
scons PROTOCOL=<PROTOCAL_NAME> build/ALL/gem5.opt
will not work anymore. you now have to usescons <kconfig command>
to update the ruby protocol as example. The double dash options (--without-tcmalloc
,--with-asan
and so on) are still continue to work as normal. - For more details refer to the documentation here: kconfig documentation
Standard library improvements
WorkloadResource
added to resource specialization
- The
Workload
andCustomWorkload
classes are now deprecated. They have been transformed into wrappers for theobtain_resource
andWorkloadResource
classes inresource.py
, respectively. - Code utilizing the older API will continue to function as expected but will trigger a warning message. To update code using the
Workload
class, change the call fromWorkload(id='resource_id', resource_version='1.0.0')
toobtain_resource(id='resource_id', resource_version='1.0.0')
. Similarly, to update code using theCustomWorkload
class, change the call fromCustomWorkload(function=func, parameters=params)
toWorkloadResource(function=func, parameters=params)
. - Workload resources in gem5 can now be directly acquired using the
obtain_resource
function, just like other resources.
Introducing Suites
Suites is a new category of resource being introduced in gem5. Documentation of suites can be found here: suite documentation.
Other API changes
- All resource object now have their own
id
andcategory
. Each resource class has its own__str__()
function which return its information in the form of category(id, version) like BinaryResource(id='riscv-hello', resource_version='1.0.0'). - Users can use GEM5_RESOURCE_JSON and GEM5_RESOURCE_JSON_APPEND env variables to overwrite all the data sources with the provided JSON and append a JSON file to all the data source respectively. More information can be found here.
Other user-facing changes
- Added support for clang 15 and clang 16
- gem5 no longer supports building on Ubuntu 18.04
- GCC 7, GCC 9, and clang 6 are no longer supported
- Two
DRAMInterface
stats have changed names (bytesRead
andbytesWritten
). For instance,board.memory.mem_ctrl.dram.bytesRead
andboard.memory.mem_ctrl.dram.bytesWritten
. These are changed todramBytesRead
anddramBytesWritten
so they don't collide with the stat with the same name inAbstractMemory
. - The stats for
NVMInterface
(bytesRead
andbytesWritten
) have been change tonvmBytesRead
andnvmBytesWritten
as well.
Full-system GPU model improvements
- Support for up to latest ROCm 5.7.1.
- Various changes to enable PyTorch/TensorFlow simulations.
- New packer disk image script containing ROCm 5.4.2, PyTorch 2.0.1, and Tensorflow 2.11.
- GPU instructions can now perform atomics on host addresses.
- The provided configs scripts can now run KVM on more restrictive setups.
- Add support to checkpoint and restore between kernels in GPUFS, including adding various AQL, HSA Queue, VMID map, MQD attributes, GART translations, and PM4Queues to GPU checkpoints
- move GPU cache recorder code to RubyPort instead of Sequencer/GPUCoalescer to allow checkpointing to occur
- add support for flushing GPU caches, as well as cache cooldown/warmup support, for checkpoints
- Update vega10_kvm.py to add checkpointing instructions
SE mode GPU model improvements
- started adding support for mmap'ing inputs for GPUSE tests, which reduces their runtime by 8-15% per run
GPU model improvements
- update GPU VIPER and Coalescer support to ensure correct replacement policy behavior when multiple requests from the same CU are concurrently accessing the same line
- fix bug with GPU VIPER to resolve a race conflict for loads that bypass the TCP (L1D$)
- fix bug with MRU replacement policy updates in GPU SQC (I$)
- update GPU and Ruby debug prints to resolve various small errors
- Add configurable GPU L1,L2 num banks and L2 latencies
- Add decodings for new MI100 VOP2 insts
- Add GPU GLC Atomic Resource Constraints to better model how atomic resources are shared at GPU TCC (L2$)
- Update GPU tester to work with both requests that bypass all caches (SLC) and requests that bypass only the TCP (L1D$)
- Fixes for how write mask works for GPU WB L2 caches
- Added support for WB and WT GPU atomics
- Added configurable support to better model the latency of GPU atomic requests
- fix GPU's default number of HW barrier/CU to better model amount of concurrency GPU CUs should have
RISC-V RVV 1.0 implemented
This was a huge undertaking by a large number of people!
Some of these people include Adrià Armejach who pushed it over the finish line, Xuan Hu who pushed the most recent version to gerrit that Adrià picked up,
Jerin Joy who did much of the initial work, and many others who contributed to the implementation including Roger Chang, Hoa Nguyen who put significant effort into testing and reviewing the code.
- Most of the instructions in the 1.0 spec implemented
- Works with both FS and SE mode
- Compatible with Simple CPUs, the O3, and the minor CPU models
- User can specify the width of the vector units
- Future improvements
- Widening/narrowing instructions are not implemented
- The model for executing memory instructions is not very high performance
- The statistics are not correct for counting vector instruction execution
ArmISA changes/improvements
- Architectural support for the following extensions:
- FEAT_TLBIRANGE
- FEAT_FGT
- FEAT_TCR2
- FEAT_SCTLR2
- Arm support for SVE instructions improved
- Fixed some FEAT_SEL2 related issues:
- Removed support for Arm Jazelle and ThumbEE
- Implementation of an Arm Capstone Disassembler
Other notable changes/improvements
- Improvements to the CHI coherence protocol implementation
- Far atomics implemented in CHI
- Ruby now supports using the prefetchers from the classic caches, if the protocol supports it. CHI has been extended to support the classic prefetchers.
- Bug in RISC-V TLB to fixed to correctly count misses and hits
- Added new RISC-V Zcb instructions #399
- RISC-V can now use a separate binary for the bootloader and kernel in FS mode
- DRAMSys integration updated to latest DRAMSys version (5.0)
- Improved support for RISC-V privilege modes
- Fixed bug in switching CPUs with RISC-V
- CPU branch preditor refactoring to prepare for decoupled front end support
- Perf is now optional when using the KVM CPU model
- Improvements to the gem5-SST bridge including updating to SST 13.0
- Improved formatting of documentation in stdlib
- By default use isort for python imports in style
- Many, many testing improvements during the migration to GitHub actions
- Fixed the elastic trace replaying logic (TraceCPU)
Known Bugs/Issues
- RISC-V RVV Bad execution of riscv rvv vss instruction
- RISC-V Vector Extension float32_t bugs/unsupported widening instructions
- Implement AVX xsave/xstor to avoid workaround when checkpointing
- Adding Vector Segmented Loads/Stores to RISC-V V 1.0 implementation
- Integer overflow in AddrRange subset check
- RISCV64 TLB refuses to access upper half of physical address space
- Bug when trying to restore checkpoints in SPARC: “panic: panic condition !pte occurred: Tried to execute unmapped address 0.”
- BaseCache::recvTimingResp can trigger an assertion error from getTarget() due to MSHR in senderState having no targets