User Tools

Site Tools


ubuntu:gpu:troubleshooting:error_ring_gfx_0.0.0_timeout

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
ubuntu:gpu:troubleshooting:error_ring_gfx_0.0.0_timeout [2023/06/05 13:03] – created peterubuntu:gpu:troubleshooting:error_ring_gfx_0.0.0_timeout [2023/06/05 13:22] (current) peter
Line 4: Line 4:
  
 <code> <code>
-[   85.861734] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13365, emitted seq=13367 [   85.862162] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 819 thread kwin_x11:cs0 pid 838+[   85.861734] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13365, emitted seq=13367  
 +[   85.862162] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 819 thread kwin_x11:cs0 pid 838
 </code> </code>
  
Line 19: Line 20:
 ---- ----
  
 +===== An alternative workaround =====
 +
 +Use the **amdgpu.ppfeaturemask** parameter to narrow down which power feature is causing problems.
 +
 +  * The bits in that parameter are defined by the **PP_FEATURE_MASK** enum here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/amd_shared.h#n199
 +
 +<code bash>
 +cat /sys/class/drm/card0/device/pp_features
 +</code>
 +
 +returns:
 +
 +<code bash>
 +features high: 0x0003ebb8 low: 0x71ffffff
 +No. Feature               Bit : State
 +00. FW_DATA_READ         ( 0) : enabled
 +01. DPM_GFXCLK           ( 1) : enabled
 +02. DPM_GFX_POWER_OPTIMIZER ( 2) : enabled
 +03. DPM_UCLK             ( 3) : enabled
 +04. DPM_FCLK             ( 4) : enabled
 +05. DPM_SOCCLK           ( 5) : enabled
 +06. DPM_MP0CLK           ( 6) : enabled
 +07. DPM_LINK             ( 7) : enabled
 +08. DPM_DCN              ( 8) : enabled
 +09. VMEMP_SCALING        ( 9) : enabled
 +10. VDDIO_MEM_SCALING    (10) : enabled
 +11. DS_GFXCLK            (11) : enabled
 +12. DS_SOCCLK            (12) : enabled
 +13. DS_FCLK              (13) : enabled
 +14. DS_LCLK              (14) : enabled
 +15. DS_DCFCLK            (15) : enabled
 +16. DS_UCLK              (16) : enabled
 +17. GFX_ULV              (17) : enabled
 +18. FW_DSTATE            (18) : enabled
 +19. GFXOFF               (19) : enabled
 +20. BACO                 (20) : enabled
 +21. MM_DPM               (21) : enabled
 +22. SOC_MPCLK_DS         (22) : enabled
 +23. BACO_MPCLK_DS        (23) : enabled
 +24. THROTTLERS           (24) : enabled
 +25. SMARTSHIFT           (25) : disabled
 +26. GTHR                 (26) : disabled
 +27. ACDC                 (27) : disabled
 +28. VR0HOT               (28) : enabled
 +29. FW_CTF               (29) : enabled
 +30. FAN_CONTROL          (30) : enabled
 +31. GFX_DCS              (31) : disabled
 +32. GFX_READ_MARGIN      (32) : disabled
 +33. LED_DISPLAY          (33) : disabled
 +34. GFXCLK_SPREAD_SPECTRUM (34) : disabled
 +35. OUT_OF_BAND_MONITOR  (35) : enabled
 +36. OPTIMIZED_VMIN       (36) : enabled
 +37. GFX_IMU              (37) : enabled
 +38. BOOT_TIME_CAL        (38) : disabled
 +39. GFX_PCC_DFLL         (39) : enabled
 +40. SOC_CG               (40) : enabled
 +41. DF_CSTATE            (41) : enabled
 +42. GFX_EDC              (42) : disabled
 +43. BOOT_POWER_OPT       (43) : enabled
 +44. CLOCK_POWER_DOWN_BYPASS (44) : disabled
 +45. DS_VCN               (45) : enabled
 +46. BACO_CG              (46) : enabled
 +47. MEM_TEMP_READ        (47) : enabled
 +48. ATHUB_MMHUB_PG       (48) : enabled
 +49. SOC_PCC              (49) : enabled
 +</code>
 +
 +----
 +
 +==== Change Kernel Boot Parameters ====
 +
 +Try adding the **amdgpu.ppfeaturemask** to the kernel boot parameters, **/etc/default/grub**:
 +
 +<file bash /etc/default/grub>
 +GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.ppfeaturemask=0xfffd3fff"
 +</file>
 +
 +and update Grub
 +
 +<code bash>
 +sudo update-grub
 +</code>
 +
 +Then reboot, and check if the fault happens again.
 +
 +<WRAP info>
 +**NOTE:**  It may be that the fault is caused by one or more of those features:
 +
 +<code>
 +PP_OVERDRIVE_MASK = 0x4000,
 +PP_GFXOFF_MASK = 0x8000,
 +PP_STUTTER_MODE = 0x20000,
 +</code>
 +
 +</WRAP>
 +
 +----
 +
 +===== Use Mesa Environment Parameters to identify the cause =====
 +
 +Add **RADV_DEBUG=hang** to the **/etc/environment**, then try triggering the fault again.
 +
 +This dumps a report to $HOME/radv_dumps_<pid>_<time> if a GPU hang is detected.
 +
 +Report the error to the Mesa Team.
 +
 +<WRAP info>
 +**NOTE:**  See https://docs.mesa3d.org/envvars.html
 +</WRAP>
 +
 +----
 +
 +===== References =====
 +
 +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/amd_shared.h#n199
 +
 +https://docs.mesa3d.org/envvars.html
  
ubuntu/gpu/troubleshooting/error_ring_gfx_0.0.0_timeout.1685970187.txt.gz · Last modified: 2023/06/05 13:03 by peter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki