[scheduler, dma, maxwell] Reduce CPU stalls in the GPU command processing pipeline through multiple targeted optimizations (#3296)

- Scheduler: Reduced lock scope to allow parallel command preparation across channels
- DmaPusher: Added command prefetching (16-command lookahead) to improve cache hit rate
- Maxwell3D: Pre-allocated macro parameter vectors to eliminate dynamic allocations and unrolls dirty register tracking loop for better cache locality
- MacroEngine: Added last-executed macro cache to skip hash table lookups on hot path

Co-authored-by: lizzie <lizzie@eden-emu.dev>
Reviewed-on: https://git.eden-emu.dev/eden-emu/eden/pulls/3296
Reviewed-by: Maufeat <sahyno1996@gmail.com>
Reviewed-by: DraVee <dravee@eden-emu.dev>
Co-authored-by: CamilleLaVey <camillelavey99@gmail.com>
Co-committed-by: CamilleLaVey <camillelavey99@gmail.com>
This commit is contained in:
CamilleLaVey 2026-01-18 03:45:18 +01:00 committed by crueter
parent 6ec6ca7c37
commit 51cc1bc6be
No known key found for this signature in database
GPG key ID: 425ACD2D4830EBC6
4 changed files with 100 additions and 30 deletions

View file

@ -14,6 +14,10 @@
#include "video_core/rasterizer_interface.h"
#include "video_core/texture_cache/util.h"
#ifdef _MSC_VER
#include <intrin.h>
#endif
namespace Tegra {
constexpr u32 MacroRegistersStart = 0xE00;