All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Linux Mint 20.04 5.11 issue
       [not found] <3e50d54ee11131494a8dcd75cdff5f366dd90930.camel@razorwired.com>
@ 2021-07-29 15:14 ` Felix Kuehling
  2021-07-29 16:04   ` Tim Cahill
  0 siblings, 1 reply; 4+ messages in thread
From: Felix Kuehling @ 2021-07-29 15:14 UTC (permalink / raw)
  To: Tim Cahill, amd-gfx

Am 2021-07-28 um 12:10 p.m. schrieb Tim Cahill:
> Hi Felix,

I'm not sure why you're calling me out by name. I'm not working on
anything obviously related to your crashes.

Anyway, I took a quick look at the backtraces. They all point at libgdk.
Two of them are segfaults, one is an abort. It's not clear how these
would be related to the GPU driver. That said, when you boot with
nomodeset, the GPU driver and all HW acceleration is completely
disabled. If that makes the problem disappear, the GPU driver is clearly
involved in the problem in some way.

The abort points at a problem while freeing memory. This could be caused
by a double-free problem in some unrelated code, possibly related to the
GPU driver. This would be a problem in a user mode component (maybe
Mesa), not the kernel mode driver.

I believe the messages you're seeing when you move the mouse are the
result of runtime power management that puts the GPU to sleep when it's
idle and reinitializes it when it's needed. You have 2 GPUs in your
laptop, an integrated Renoir GPU in the Ryzen CPU, and an external
Navi10 GPU for higher gaming performance. The GPU that goes to sleep and
wakes up is the external Navi10 GPU.

The OpenGL renderer string specifies "RENOIR". Therefore I'm surprised
that the Navi10 GPU wakes up when you move the mouse. Ideally it
shouldn't be used at all when you're just using the desktop.

If you suspect that runtime power management is responsible for your
problems, you could disable it with amdgpu.runpm=0 on the kernel command
line. That means the Navi10 GPU won't go into the low power mode and
drain your battery more quickly. So this is not a permanent solution.
Just an experiment to narrow down the problem.

Regards,
  Felix


>
> I'm not sure how to do this as I haven't had to report a bug before.
> I've looked to a variety of bug reporting sites to see if anyone else
> is running into the same issues that I'm having (such as the Mate
> project) and haven't seen anything at all similar to the issue I'm
> having. Since I had issues with AMD drivers with my distro (info
> below) and some consistent and high volume dmesg content shows up,
> I've decided that I should start here with the AMD kernel team.
>
> I have a fairly new MSI laptop with the following configuration:
>
> [code]
> System:    Kernel: 5.11.0-25-generic x86_64 bits: 64 compiler: N/A
> Desktop: MATE 1.24.0 wm: marco 
>            dm: LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu 20.04
> focal 
> Machine:   Type: Laptop System: Micro-Star product: Alpha 17 A4DEK v:
> REV:1.0 serial: <filter> 
>            Chassis: type: 10 serial: <filter> 
>            Mobo: Micro-Star model: MS-17EK v: REV:1.0 serial: <filter>
> UEFI: American Megatrends 
>            v: E17EKAMS.101 date: 10/26/2020 
> Battery:   ID-1: BAT1 charge: 66.2 Wh condition: 67.0/65.7 Wh (102%)
> volts: 12.4/10.8 
>            model: MSI Corp. MS-17EK serial: N/A status: Unknown 
> CPU:       Topology: 8-Core model: AMD Ryzen 7 4800H with Radeon
> Graphics bits: 64 type: MT MCP 
>            arch: Zen rev: 1 L2 cache: 4096 KiB 
>            flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a
> ssse3 svm bogomips: 92630 
>            Speed: 4278 MHz min/max: 1400/2900 MHz Core speeds (MHz):
> 1: 4280 2: 1865 3: 1397 
>            4: 2188 5: 1489 6: 2265 7: 1907 8: 1906 9: 1729 10: 1397
> 11: 1397 12: 1397 13: 1397 
>            14: 1397 15: 1907 16: 1740 
> Graphics:  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT /
> 5700/5700 XT] 
>            vendor: Micro-Star MSI driver: amdgpu v: kernel bus ID:
> 03:00.0 chip ID: 1002:731f 
>            Device-2: AMD Renoir vendor: Micro-Star MSI driver: amdgpu
> v: kernel bus ID: 08:00.0 
>            chip ID: 1002:1636 
>            Display: x11 server: X.Org 1.20.9 driver: amdgpu,ati 
>            unloaded: fbdev,modesetting,radeon,vesa compositor: marco
> resolution: 1920x1080~144Hz 
>            OpenGL: renderer: AMD RENOIR (DRM 3.40.0 5.11.0-25-generic
> LLVM 11.0.0) 
>            v: 4.6 Mesa 20.2.6 direct render: Yes 
> Audio:     Device-1: AMD Navi 10 HDMI Audio vendor: Micro-Star MSI
> driver: snd_hda_intel v: kernel 
>            bus ID: 03:00.1 chip ID: 1002:ab38 
>            Device-2: AMD Raven/Raven2/FireFlight/Renoir Audio
> Processor vendor: Micro-Star MSI 
>            driver: N/A bus ID: 08:00.5 chip ID: 1022:15e2 
>            Device-3: AMD Family 17h HD Audio vendor: Micro-Star MSI
> driver: snd_hda_intel 
>            v: kernel bus ID: 08:00.6 chip ID: 1022:15e3 
>            Sound Server: ALSA v: k5.11.0-25-generic 
> Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus
> ID: 04:00.0 
>            chip ID: 8086:2723 
>            IF: wlp4s0 state: up mac: <filter> 
>            Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit
> Ethernet vendor: Micro-Star MSI 
>            driver: r8169 v: kernel port: f000 bus ID: 05:00.0 chip ID:
> 10ec:8168 
>            IF: eno1 state: down mac: <filter> 
> Drives:    Local Storage: total: 476.94 GiB used: 89.79 GiB (18.8%) 
>            ID-1: /dev/nvme0n1 vendor: Kingston model: OM8PCP3512F-AI1
> size: 476.94 GiB 
>            speed: 31.6 Gb/s lanes: 4 serial: <filter> 
> Partition: ID-1: / size: 466.30 GiB used: 89.28 GiB (19.1%) fs: ext4
> dev: /dev/dm-1 
>            ID-2: /boot size: 704.5 MiB used: 519.7 MiB (73.8%) fs:
> ext4 dev: /dev/nvme0n1p2 
>            ID-3: swap-1 size: 980.0 MiB used: 0 KiB (0.0%) fs: swap
> dev: /dev/dm-2 
> USB:       Hub: 1-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0
> chip ID: 1d6b:0002 
>            Device-1: 1-3:2 info: SteelSeries ApS SteelSeries KLC type:
> HID 
>            driver: hid-generic,usbhid rev: 2.0 chip ID: 1038:1122 
>            Device-2: 1-4:3 info: Acer HD Webcam type: Video driver:
> uvcvideo rev: 2.0 
>            chip ID: 5986:211c 
>            Hub: 2-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1
> chip ID: 1d6b:0003 
>            Hub: 3-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0
> chip ID: 1d6b:0002 
>            Device-3: 3-3:2 info: Intel type: Bluetooth driver: btusb
> rev: 2.0 chip ID: 8087:0029 
>            Hub: 4-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1
> chip ID: 1d6b:0003 
> Sensors:   System Temperatures: cpu: 46.5 C mobo: N/A 
>            Fan Speeds (RPM): N/A 
>            GPU: device: amdgpu temp: 0 C fan: 65535 device: amdgpu
> temp: 31 C 
> Repos:     No active apt repos in: /etc/apt/sources.list 
>            Active apt repos in:
> /etc/apt/sources.list.d/official-package-repositories.list 
>            1: deb http: //mirrors.seas.harvard.edu/linuxmint-packages
> uma main upstream import backport
>            2: deb http: //mirror.us-ny2.kamatera.com/ubuntu focal main
> restricted universe multiverse
>            3: deb http: //mirror.us-ny2.kamatera.com/ubuntu
> focal-updates main restricted universe multiverse
>            4: deb http: //mirror.us-ny2.kamatera.com/ubuntu
> focal-backports main restricted universe multiverse
>            5: deb http: //security.ubuntu.com/ubuntu/ focal-security
> main restricted universe multiverse
>            6: deb http: //archive.canonical.com/ubuntu/ focal partner
> Info:      Processes: 372 Uptime: 2h 44m Memory: 15.10 GiB used: 1.15
> GiB (7.6%) Init: systemd 
>            v: 245 runlevel: 5 Compilers: gcc: 9.3.0 alt: 9 Client:
> Unknown python3.8 client 
>            inxi: 3.0.38 
> [/code]
>
>
> If I am using it interactively, I get random crashes that seems to hit
> elements of mate (mate-panel, etc.) 
> consistently - just not predictably. LibreOffice applications, xed,
> Firefox, and Evolution seem to be more prone
> to crashing the X session. I can easily move to tty1, login, and kill
> services running in tty7 as the crashes
> don't appear to completely kill tty7. Sometimes, I can kill mate and
> launch a new instance to salvage
> the tty7 session. However, i usually end up having to kill the root
> pid of the xwindows session in order
> to re-login. But I think this is related to the AMD GPU driver because
> every  time I simply move the mouse in 
> tty7 session, I get the following in dmesg:
>
> [13164.399550] [drm] PCIE GART of 512M enabled (table at
> 0x0000008000000000).
> [13164.399579] [drm] PSP is resuming...
> [13164.486593] [drm] reserve 0xa00000 from 0x800f400000 for PSP TMR
> [13164.678788] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode
> is not available
> [13164.702624] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode
> is not available
> [13164.702639] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
> [13164.702648] amdgpu 0000:03:00.0: amdgpu: smu driver if version =
> 0x00000036, smu fw if version = 0x00000037, smu fw version =
> 0x002a3f00 (42.63.0)
> [13164.702664] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not
> matched
> [13164.746143] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
> [13164.768978] [drm] kiq ring mec 2 pipe 1 q 0
> [13164.779651] [drm] VCN decode and encode initialized
> successfully(under DPG Mode).
> [13164.779758] [drm] JPEG decode initialized successfully.
> [13164.779779] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv
> eng 0 on hub 0
> [13164.779783] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM
> inv eng 1 on hub 0
> [13164.779784] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM
> inv eng 4 on hub 0
> [13164.779785] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM
> inv eng 5 on hub 0
> [13164.779786] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM
> inv eng 6 on hub 0
> [13164.779787] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM
> inv eng 7 on hub 0
> [13164.779788] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM
> inv eng 8 on hub 0
> [13164.779789] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM
> inv eng 9 on hub 0
> [13164.779790] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM
> inv eng 10 on hub 0
> [13164.779792] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv
> eng 11 on hub 0
> [13164.779793] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng
> 12 on hub 0
> [13164.779803] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng
> 13 on hub 0
> [13164.779804] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec uses VM inv
> eng 0 on hub 1
> [13164.779805] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inv
> eng 1 on hub 1
> [13164.779806] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses VM inv
> eng 4 on hub 1
> [13164.779807] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv
> eng 5 on hub 1
> [13164.783807] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
> [13170.722306] [drm] free PSP TMR buffer
>
> If I boot with nomodeset, I can operate fine - just without screen
> brightness control, etc. It just
> seems strange that an event is generated like this all the time.
>
> I only get sporadic crashes, though. Humorously, I've been running
> only Firefox, crash reporter and
> Mate Terminal this morning and it's run fine for over 4 hours. There
> were times when I wouldn't run
> anything at all and it's lock up on me. So I just can't find any
> common denominator for this (using vi 
> in terminal to type this - going to copy-paste into email client
> [Evolution] once I'm done this).
>
> I've attached 3 crash reports that were captured on the system over
> the last couple days. I apologize
> in advance - profusely! - if the problem turns out to be somewhere else. 
>
> Thanks,
> Tim 
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Linux Mint 20.04 5.11 issue
  2021-07-29 15:14 ` Linux Mint 20.04 5.11 issue Felix Kuehling
@ 2021-07-29 16:04   ` Tim Cahill
  2021-07-30 12:08     ` Tim Cahill
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Cahill @ 2021-07-29 16:04 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 13163 bytes --]

I apologize if the name callout is disconcerting. I was trying to
follow instructions for sending bugs and saw your name listed as the
owner of this code area. 
FYI, I'd done some more troubleshooting and tinkering regarding the
crashing and Mate seems to be at the center of all the issues. As a
result, I also opened an Issue with the Mate Desktop team (
https://github.com/mate-desktop/mate-panel/issues/1242). Mate also has
a power management component, which is probably responsible for the
excess logging and the confusion over Navil10. However, I have no way
to vouch for now accurately the Mate PM applet gathered data for its
instantiation. I have no external devices connected that I'm aware
would use it since I thought that was via HDMI. I *do* have a Jabra
Evolve2 headset that uses the TypeC USB connector, but I assume that's
not using the GPU.
The issue documentation I left with Mate notes that if I launch apps
from a terminal that is NOT launched from the Mate panel (right-click
on desktop instead to open terminal), the parent for all the apps
(Firefox, Evolution, etc.) is separate from Mate (at least separate
from mate-panel). Everything has worked fine (except for the constant
logging of the wake-up action) since I've done that (and turned off the
screensaver and screensaver lock). So, I'm not sure what else to do at
this point. Please advise if I should do anything on the driver side.
Thanks,Tim 
On Thu, 2021-07-29 at 11:14 -0400, Felix Kuehling wrote:
> Am 2021-07-28 um 12:10 p.m. schrieb Tim Cahill:
> > Hi Felix,
> 
> I'm not sure why you're calling me out by name. I'm not working
> onanything obviously related to your crashes.
> Anyway, I took a quick look at the backtraces. They all point at
> libgdk.Two of them are segfaults, one is an abort. It's not clear how
> thesewould be related to the GPU driver. That said, when you boot
> withnomodeset, the GPU driver and all HW acceleration is
> completelydisabled. If that makes the problem disappear, the GPU
> driver is clearlyinvolved in the problem in some way.
> The abort points at a problem while freeing memory. This could be
> causedby a double-free problem in some unrelated code, possibly
> related to theGPU driver. This would be a problem in a user mode
> component (maybeMesa), not the kernel mode driver.
> I believe the messages you're seeing when you move the mouse are
> theresult of runtime power management that puts the GPU to sleep when
> it'sidle and reinitializes it when it's needed. You have 2 GPUs in
> yourlaptop, an integrated Renoir GPU in the Ryzen CPU, and an
> externalNavi10 GPU for higher gaming performance. The GPU that goes
> to sleep andwakes up is the external Navi10 GPU.
> The OpenGL renderer string specifies "RENOIR". Therefore I'm
> surprisedthat the Navi10 GPU wakes up when you move the mouse.
> Ideally itshouldn't be used at all when you're just using the
> desktop.
> If you suspect that runtime power management is responsible for
> yourproblems, you could disable it with amdgpu.runpm=0 on the kernel
> commandline. That means the Navi10 GPU won't go into the low power
> mode anddrain your battery more quickly. So this is not a permanent
> solution.Just an experiment to narrow down the problem.
> Regards,  Felix
> 
> > I'm not sure how to do this as I haven't had to report a bug
> > before.I've looked to a variety of bug reporting sites to see if
> > anyone elseis running into the same issues that I'm having (such as
> > the Mateproject) and haven't seen anything at all similar to the
> > issue I'mhaving. Since I had issues with AMD drivers with my distro
> > (infobelow) and some consistent and high volume dmesg content shows
> > up,I've decided that I should start here with the AMD kernel team.
> > I have a fairly new MSI laptop with the following configuration:
> > [code]System:    Kernel: 5.11.0-25-generic x86_64 bits: 64
> > compiler: N/ADesktop: MATE 1.24.0 wm: marco            dm: LightDM
> > Distro: Linux Mint 20.2 Uma base: Ubuntu
> > 20.04focal Machine:   Type: Laptop System: Micro-Star product:
> > Alpha 17 A4DEK v:REV:1.0 serial: <filter>            Chassis: type:
> > 10 serial: <filter>            Mobo: Micro-Star model: MS-17EK v:
> > REV:1.0 serial: <filter>UEFI: American Megatrends            v:
> > E17EKAMS.101 date: 10/26/2020 Battery:   ID-1: BAT1 charge: 66.2 Wh
> > condition: 67.0/65.7 Wh (102%)volts: 12.4/10.8            model:
> > MSI Corp. MS-17EK serial: N/A status: Unknown CPU:       Topology:
> > 8-Core model: AMD Ryzen 7 4800H with RadeonGraphics bits: 64 type:
> > MT MCP            arch: Zen rev: 1 L2 cache: 4096
> > KiB            flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1
> > sse4_2 sse4assse3 svm bogomips: 92630            Speed: 4278 MHz
> > min/max: 1400/2900 MHz Core speeds (MHz):1: 4280 2: 1865 3:
> > 1397            4: 2188 5: 1489 6: 2265 7: 1907 8: 1906 9: 1729 10:
> > 139711: 1397 12: 1397 13: 1397            14: 1397 15: 1907 16:
> > 1740 Graphics:  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT
> > /5700/5700 XT]            vendor: Micro-Star MSI driver: amdgpu v:
> > kernel bus ID:03:00.0 chip ID: 1002:731f            Device-2: AMD
> > Renoir vendor: Micro-Star MSI driver: amdgpuv: kernel bus ID:
> > 08:00.0            chip ID: 1002:1636            Display: x11
> > server: X.Org 1.20.9 driver: amdgpu,ati            unloaded:
> > fbdev,modesetting,radeon,vesa compositor: marcoresolution:
> > 1920x1080~144Hz            OpenGL: renderer: AMD RENOIR (DRM 3.40.0
> > 5.11.0-25-genericLLVM 11.0.0)            v: 4.6 Mesa 20.2.6 direct
> > render: Yes Audio:     Device-1: AMD Navi 10 HDMI Audio vendor:
> > Micro-Star MSIdriver: snd_hda_intel v: kernel            bus ID:
> > 03:00.1 chip ID: 1002:ab38            Device-2: AMD
> > Raven/Raven2/FireFlight/Renoir AudioProcessor vendor: Micro-Star
> > MSI            driver: N/A bus ID: 08:00.5 chip ID:
> > 1022:15e2            Device-3: AMD Family 17h HD Audio vendor:
> > Micro-Star MSIdriver: snd_hda_intel            v: kernel bus ID:
> > 08:00.6 chip ID: 1022:15e3            Sound Server: ALSA v:
> > k5.11.0-25-generic Network:   Device-1: Intel Wi-Fi 6 AX200 driver:
> > iwlwifi v: kernel busID: 04:00.0            chip ID:
> > 8086:2723            IF: wlp4s0 state: up mac:
> > <filter>            Device-2: Realtek RTL8111/8168/8411 PCI Express
> > GigabitEthernet vendor: Micro-Star MSI            driver: r8169 v:
> > kernel port: f000 bus ID: 05:00.0 chip ID:10ec:8168            IF:
> > eno1 state: down mac: <filter> Drives:    Local Storage: total:
> > 476.94 GiB used: 89.79 GiB (18.8%)            ID-1: /dev/nvme0n1
> > vendor: Kingston model: OM8PCP3512F-AI1size: 476.94
> > GiB            speed: 31.6 Gb/s lanes: 4 serial:
> > <filter> Partition: ID-1: / size: 466.30 GiB used: 89.28 GiB
> > (19.1%) fs: ext4dev: /dev/dm-1            ID-2: /boot size: 704.5
> > MiB used: 519.7 MiB (73.8%) fs:ext4 dev:
> > /dev/nvme0n1p2            ID-3: swap-1 size: 980.0 MiB used: 0 KiB
> > (0.0%) fs: swapdev: /dev/dm-2 USB:       Hub: 1-0:1 info: Full
> > speed (or root) Hub ports: 4 rev: 2.0chip ID:
> > 1d6b:0002            Device-1: 1-3:2 info: SteelSeries ApS
> > SteelSeries KLC type:HID            driver: hid-generic,usbhid rev:
> > 2.0 chip ID: 1038:1122            Device-2: 1-4:3 info: Acer HD
> > Webcam type: Video driver:uvcvideo rev: 2.0            chip ID:
> > 5986:211c            Hub: 2-0:1 info: Full speed (or root) Hub
> > ports: 2 rev: 3.1chip ID: 1d6b:0003            Hub: 3-0:1 info:
> > Full speed (or root) Hub ports: 4 rev: 2.0chip ID:
> > 1d6b:0002            Device-3: 3-3:2 info: Intel type: Bluetooth
> > driver: btusbrev: 2.0 chip ID: 8087:0029            Hub: 4-0:1
> > info: Full speed (or root) Hub ports: 2 rev: 3.1chip ID:
> > 1d6b:0003 Sensors:   System Temperatures: cpu: 46.5 C mobo:
> > N/A            Fan Speeds (RPM): N/A            GPU: device: amdgpu
> > temp: 0 C fan: 65535 device: amdgputemp: 31 C Repos:     No active
> > apt repos in: /etc/apt/sources.list            Active apt repos
> > in:/etc/apt/sources.list.d/official-package-
> > repositories.list            1: deb http:
> > //mirrors.seas.harvard.edu/linuxmint-packagesuma main upstream
> > import backport           2: deb http: //mirror.us-
> > ny2.kamatera.com/ubuntu focal mainrestricted universe
> > multiverse           3: deb http: //mirror.us-
> > ny2.kamatera.com/ubuntufocal-updates main restricted universe
> > multiverse           4: deb http: //mirror.us-
> > ny2.kamatera.com/ubuntufocal-backports main restricted universe
> > multiverse           5: deb http: //security.ubuntu.com/ubuntu/
> > focal-securitymain restricted universe multiverse           6: deb
> > http: //archive.canonical.com/ubuntu/ focal
> > partnerInfo:      Processes: 372 Uptime: 2h 44m Memory: 15.10 GiB
> > used: 1.15GiB (7.6%) Init: systemd            v: 245 runlevel: 5
> > Compilers: gcc: 9.3.0 alt: 9 Client:Unknown python3.8
> > client            inxi: 3.0.38 [/code]
> > 
> > If I am using it interactively, I get random crashes that seems to
> > hitelements of mate (mate-panel, etc.) consistently - just not
> > predictably. LibreOffice applications, xed,Firefox, and Evolution
> > seem to be more proneto crashing the X session. I can easily move
> > to tty1, login, and killservices running in tty7 as the
> > crashesdon't appear to completely kill tty7. Sometimes, I can kill
> > mate andlaunch a new instance to salvagethe tty7 session. However,
> > i usually end up having to kill the rootpid of the xwindows session
> > in orderto re-login. But I think this is related to the AMD GPU
> > driver becauseevery  time I simply move the mouse in tty7 session,
> > I get the following in dmesg:
> > [13164.399550] [drm] PCIE GART of 512M enabled (table
> > at0x0000008000000000).[13164.399579] [drm] PSP is
> > resuming...[13164.486593] [drm] reserve 0xa00000 from 0x800f400000
> > for PSP TMR[13164.678788] amdgpu 0000:03:00.0: amdgpu: RAS:
> > optional ras ta ucodeis not available[13164.702624] amdgpu
> > 0000:03:00.0: amdgpu: RAP: optional rap ta ucodeis not
> > available[13164.702639] amdgpu 0000:03:00.0: amdgpu: SMU is
> > resuming...[13164.702648] amdgpu 0000:03:00.0: amdgpu: smu driver
> > if version =0x00000036, smu fw if version = 0x00000037, smu fw
> > version =0x002a3f00 (42.63.0)[13164.702664] amdgpu 0000:03:00.0:
> > amdgpu: SMU driver if version notmatched[13164.746143] amdgpu
> > 0000:03:00.0: amdgpu: SMU is resumed successfully![13164.768978]
> > [drm] kiq ring mec 2 pipe 1 q 0[13164.779651] [drm] VCN decode and
> > encode initializedsuccessfully(under DPG Mode).[13164.779758] [drm]
> > JPEG decode initialized successfully.[13164.779779] amdgpu
> > 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inveng 0 on hub
> > 0[13164.779783] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses
> > VMinv eng 1 on hub 0[13164.779784] amdgpu 0000:03:00.0: amdgpu:
> > ring comp_1.1.0 uses VMinv eng 4 on hub 0[13164.779785] amdgpu
> > 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VMinv eng 5 on hub
> > 0[13164.779786] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses
> > VMinv eng 6 on hub 0[13164.779787] amdgpu 0000:03:00.0: amdgpu:
> > ring comp_1.0.1 uses VMinv eng 7 on hub 0[13164.779788] amdgpu
> > 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VMinv eng 8 on hub
> > 0[13164.779789] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses
> > VMinv eng 9 on hub 0[13164.779790] amdgpu 0000:03:00.0: amdgpu:
> > ring comp_1.3.1 uses VMinv eng 10 on hub 0[13164.779792] amdgpu
> > 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inveng 11 on hub
> > 0[13164.779793] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv
> > eng12 on hub 0[13164.779803] amdgpu 0000:03:00.0: amdgpu: ring
> > sdma1 uses VM inv eng13 on hub 0[13164.779804] amdgpu 0000:03:00.0:
> > amdgpu: ring vcn_dec uses VM inveng 0 on hub 1[13164.779805] amdgpu
> > 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inveng 1 on hub
> > 1[13164.779806] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses VM
> > inveng 4 on hub 1[13164.779807] amdgpu 0000:03:00.0: amdgpu: ring
> > jpeg_dec uses VM inveng 5 on hub 1[13164.783807] amdgpu
> > 0000:03:00.0: [drm] Cannot find any crtc or sizes[13170.722306]
> > [drm] free PSP TMR buffer
> > If I boot with nomodeset, I can operate fine - just without
> > screenbrightness control, etc. It justseems strange that an event
> > is generated like this all the time.
> > I only get sporadic crashes, though. Humorously, I've been
> > runningonly Firefox, crash reporter andMate Terminal this morning
> > and it's run fine for over 4 hours. Therewere times when I wouldn't
> > runanything at all and it's lock up on me. So I just can't find
> > anycommon denominator for this (using vi in terminal to type this -
> > going to copy-paste into email client[Evolution] once I'm done
> > this).
> > I've attached 3 crash reports that were captured on the system
> > overthe last couple days. I apologizein advance - profusely! - if
> > the problem turns out to be somewhere else. 
> > Thanks,Tim 
> > _______________________________________________amd-gfx mailing 
> > listamd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #1.2: Type: text/html, Size: 19493 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Linux Mint 20.04 5.11 issue
  2021-07-29 16:04   ` Tim Cahill
@ 2021-07-30 12:08     ` Tim Cahill
  2021-08-04 12:53       ` Tim Cahill
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Cahill @ 2021-07-30 12:08 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 14740 bytes --]

Posted the following comment to the Mate-desktop issue:


  Had
 another hang with the same configuration as a youtube video
played via a
 USB headphone (Jabra40). I was able to recover by killing
Firefox, in 
which the video was playing. The video became choppy and
garbled and 
then stopped. The stderr is below:

ALSA lib
conf.c:5187:(snd_config_expand) Unknown parameters 1
ALSA lib control.c:1379:(snd_ctl_open_noupdate) Invalid CTL
sysdefault:1
ALSA lib conf.c:5187:(snd_config_expand) Unknown parameters 2
ALSA lib control.c:1379:(snd_ctl_open_noupdate) Invalid CTL
sysdefault:2
ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
On re-launch of Firefox from terminal window, the following appeared:
[GFX1-]: More than 1 GPU from same vendor detected via PCI, cannot deduce device
On Thu, 2021-07-29 at 12:04 -0400, Tim Cahill wrote:
> I apologize if the name callout is disconcerting. I was trying to
> follow instructions for sending bugs and saw your name listed as the
> owner of this code area. 
> FYI, I'd done some more troubleshooting and tinkering regarding the
> crashing and Mate seems to be at the center of all the issues. As a
> result, I also opened an Issue with the Mate Desktop team (
> https://github.com/mate-desktop/mate-panel/issues/1242). Mate also
> has a power management component, which is probably responsible for
> the excess logging and the confusion over Navil10. However, I have no
> way to vouch for now accurately the Mate PM applet gathered data for
> its instantiation. I have no external devices connected that I'm
> aware would use it since I thought that was via HDMI. I *do* have a
> Jabra Evolve2 headset that uses the TypeC USB connector, but I assume
> that's not using the GPU.
> The issue documentation I left with Mate notes that if I launch apps
> from a terminal that is NOT launched from the Mate panel (right-click 
> on desktop instead to open terminal), the parent for all the apps
> (Firefox, Evolution, etc.) is separate from Mate (at least separate
> from mate-panel). Everything has worked fine (except for the constant
> logging of the wake-up action) since I've done that (and turned off
> the screensaver and screensaver lock). So, I'm not sure what else to
> do at this point. Please advise if I should do anything on the driver
> side.
> Thanks,Tim 
> On Thu, 2021-07-29 at 11:14 -0400, Felix Kuehling wrote:
> > Am 2021-07-28 um 12:10 p.m. schrieb Tim Cahill:
> > > Hi Felix,
> > 
> > I'm not sure why you're calling me out by name. I'm not working
> > onanything obviously related to your crashes.
> > Anyway, I took a quick look at the backtraces. They all point at
> > libgdk.Two of them are segfaults, one is an abort. It's not clear
> > how thesewould be related to the GPU driver. That said, when you
> > boot withnomodeset, the GPU driver and all HW acceleration is
> > completelydisabled. If that makes the problem disappear, the GPU
> > driver is clearlyinvolved in the problem in some way.
> > The abort points at a problem while freeing memory. This could be
> > causedby a double-free problem in some unrelated code, possibly
> > related to theGPU driver. This would be a problem in a user mode
> > component (maybeMesa), not the kernel mode driver.
> > I believe the messages you're seeing when you move the mouse are
> > theresult of runtime power management that puts the GPU to sleep
> > when it'sidle and reinitializes it when it's needed. You have 2
> > GPUs in yourlaptop, an integrated Renoir GPU in the Ryzen CPU, and
> > an externalNavi10 GPU for higher gaming performance. The GPU that
> > goes to sleep andwakes up is the external Navi10 GPU.
> > The OpenGL renderer string specifies "RENOIR". Therefore I'm
> > surprisedthat the Navi10 GPU wakes up when you move the mouse.
> > Ideally itshouldn't be used at all when you're just using the
> > desktop.
> > If you suspect that runtime power management is responsible for
> > yourproblems, you could disable it with amdgpu.runpm=0 on the
> > kernel commandline. That means the Navi10 GPU won't go into the low
> > power mode anddrain your battery more quickly. So this is not a
> > permanent solution.Just an experiment to narrow down the problem.
> > Regards,  Felix
> > 
> > > I'm not sure how to do this as I haven't had to report a bug
> > > before.I've looked to a variety of bug reporting sites to see if
> > > anyone elseis running into the same issues that I'm having (such
> > > as the Mateproject) and haven't seen anything at all similar to
> > > the issue I'mhaving. Since I had issues with AMD drivers with my
> > > distro (infobelow) and some consistent and high volume dmesg
> > > content shows up,I've decided that I should start here with the
> > > AMD kernel team.
> > > I have a fairly new MSI laptop with the following configuration:
> > > [code]System:    Kernel: 5.11.0-25-generic x86_64 bits: 64
> > > compiler: N/ADesktop: MATE 1.24.0 wm: marco            dm:
> > > LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu
> > > 20.04focal Machine:   Type: Laptop System: Micro-Star product:
> > > Alpha 17 A4DEK v:REV:1.0 serial: <filter>            Chassis:
> > > type: 10 serial: <filter>            Mobo: Micro-Star model: MS-
> > > 17EK v: REV:1.0 serial: <filter>UEFI: American
> > > Megatrends            v: E17EKAMS.101 date:
> > > 10/26/2020 Battery:   ID-1: BAT1 charge: 66.2 Wh condition:
> > > 67.0/65.7 Wh (102%)volts: 12.4/10.8            model: MSI Corp.
> > > MS-17EK serial: N/A status: Unknown CPU:       Topology: 8-Core
> > > model: AMD Ryzen 7 4800H with RadeonGraphics bits: 64 type: MT
> > > MCP            arch: Zen rev: 1 L2 cache: 4096
> > > KiB            flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1
> > > sse4_2 sse4assse3 svm bogomips: 92630            Speed: 4278 MHz
> > > min/max: 1400/2900 MHz Core speeds (MHz):1: 4280 2: 1865 3:
> > > 1397            4: 2188 5: 1489 6: 2265 7: 1907 8: 1906 9: 1729
> > > 10: 139711: 1397 12: 1397 13: 1397            14: 1397 15: 1907
> > > 16: 1740 Graphics:  Device-1: AMD Navi 10 [Radeon RX 5600
> > > OEM/5600 XT /5700/5700 XT]            vendor: Micro-Star MSI
> > > driver: amdgpu v: kernel bus ID:03:00.0 chip ID:
> > > 1002:731f            Device-2: AMD Renoir vendor: Micro-Star MSI
> > > driver: amdgpuv: kernel bus ID: 08:00.0            chip ID:
> > > 1002:1636            Display: x11 server: X.Org 1.20.9 driver:
> > > amdgpu,ati            unloaded: fbdev,modesetting,radeon,vesa
> > > compositor: marcoresolution: 1920x1080~144Hz            OpenGL:
> > > renderer: AMD RENOIR (DRM 3.40.0 5.11.0-25-genericLLVM
> > > 11.0.0)            v: 4.6 Mesa 20.2.6 direct render:
> > > Yes Audio:     Device-1: AMD Navi 10 HDMI Audio vendor: Micro-
> > > Star MSIdriver: snd_hda_intel v: kernel            bus ID:
> > > 03:00.1 chip ID: 1002:ab38            Device-2: AMD
> > > Raven/Raven2/FireFlight/Renoir AudioProcessor vendor: Micro-Star
> > > MSI            driver: N/A bus ID: 08:00.5 chip ID:
> > > 1022:15e2            Device-3: AMD Family 17h HD Audio vendor:
> > > Micro-Star MSIdriver: snd_hda_intel            v: kernel bus ID:
> > > 08:00.6 chip ID: 1022:15e3            Sound Server: ALSA v:
> > > k5.11.0-25-generic Network:   Device-1: Intel Wi-Fi 6 AX200
> > > driver: iwlwifi v: kernel busID: 04:00.0            chip ID:
> > > 8086:2723            IF: wlp4s0 state: up mac:
> > > <filter>            Device-2: Realtek RTL8111/8168/8411 PCI
> > > Express GigabitEthernet vendor: Micro-Star MSI            driver:
> > > r8169 v: kernel port: f000 bus ID: 05:00.0 chip
> > > ID:10ec:8168            IF: eno1 state: down mac:
> > > <filter> Drives:    Local Storage: total: 476.94 GiB used: 89.79
> > > GiB (18.8%)            ID-1: /dev/nvme0n1 vendor: Kingston model:
> > > OM8PCP3512F-AI1size: 476.94 GiB            speed: 31.6 Gb/s
> > > lanes: 4 serial: <filter> Partition: ID-1: / size: 466.30 GiB
> > > used: 89.28 GiB (19.1%) fs: ext4dev: /dev/dm-1            ID-2:
> > > /boot size: 704.5 MiB used: 519.7 MiB (73.8%) fs:ext4 dev:
> > > /dev/nvme0n1p2            ID-3: swap-1 size: 980.0 MiB used: 0
> > > KiB (0.0%) fs: swapdev: /dev/dm-2 USB:       Hub: 1-0:1 info:
> > > Full speed (or root) Hub ports: 4 rev: 2.0chip ID:
> > > 1d6b:0002            Device-1: 1-3:2 info: SteelSeries ApS
> > > SteelSeries KLC type:HID            driver: hid-generic,usbhid
> > > rev: 2.0 chip ID: 1038:1122            Device-2: 1-4:3 info: Acer
> > > HD Webcam type: Video driver:uvcvideo rev: 2.0            chip
> > > ID: 5986:211c            Hub: 2-0:1 info: Full speed (or root)
> > > Hub ports: 2 rev: 3.1chip ID: 1d6b:0003            Hub: 3-0:1
> > > info: Full speed (or root) Hub ports: 4 rev: 2.0chip ID:
> > > 1d6b:0002            Device-3: 3-3:2 info: Intel type: Bluetooth
> > > driver: btusbrev: 2.0 chip ID: 8087:0029            Hub: 4-0:1
> > > info: Full speed (or root) Hub ports: 2 rev: 3.1chip ID:
> > > 1d6b:0003 Sensors:   System Temperatures: cpu: 46.5 C mobo:
> > > N/A            Fan Speeds (RPM): N/A            GPU: device:
> > > amdgpu temp: 0 C fan: 65535 device: amdgputemp: 31
> > > C Repos:     No active apt repos in:
> > > /etc/apt/sources.list            Active apt repos
> > > in:/etc/apt/sources.list.d/official-package-
> > > repositories.list            1: deb http:
> > > //mirrors.seas.harvard.edu/linuxmint-packagesuma main upstream
> > > import backport           2: deb http: //mirror.us-
> > > ny2.kamatera.com/ubuntu focal mainrestricted universe
> > > multiverse           3: deb http: //mirror.us-
> > > ny2.kamatera.com/ubuntufocal-updates main restricted universe
> > > multiverse           4: deb http: //mirror.us-
> > > ny2.kamatera.com/ubuntufocal-backports main restricted universe
> > > multiverse           5: deb http: //security.ubuntu.com/ubuntu/
> > > focal-securitymain restricted universe multiverse           6:
> > > deb http: //archive.canonical.com/ubuntu/ focal
> > > partnerInfo:      Processes: 372 Uptime: 2h 44m Memory: 15.10 GiB
> > > used: 1.15GiB (7.6%) Init: systemd            v: 245 runlevel: 5
> > > Compilers: gcc: 9.3.0 alt: 9 Client:Unknown python3.8
> > > client            inxi: 3.0.38 [/code]
> > > 
> > > If I am using it interactively, I get random crashes that seems
> > > to hitelements of mate (mate-panel, etc.) consistently - just not
> > > predictably. LibreOffice applications, xed,Firefox, and Evolution
> > > seem to be more proneto crashing the X session. I can easily move
> > > to tty1, login, and killservices running in tty7 as the
> > > crashesdon't appear to completely kill tty7. Sometimes, I can
> > > kill mate andlaunch a new instance to salvagethe tty7 session.
> > > However, i usually end up having to kill the rootpid of the
> > > xwindows session in orderto re-login. But I think this is related
> > > to the AMD GPU driver becauseevery  time I simply move the mouse
> > > in tty7 session, I get the following in dmesg:
> > > [13164.399550] [drm] PCIE GART of 512M enabled (table
> > > at0x0000008000000000).[13164.399579] [drm] PSP is
> > > resuming...[13164.486593] [drm] reserve 0xa00000 from
> > > 0x800f400000 for PSP TMR[13164.678788] amdgpu 0000:03:00.0:
> > > amdgpu: RAS: optional ras ta ucodeis not available[13164.702624]
> > > amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucodeis not
> > > available[13164.702639] amdgpu 0000:03:00.0: amdgpu: SMU is
> > > resuming...[13164.702648] amdgpu 0000:03:00.0: amdgpu: smu driver
> > > if version =0x00000036, smu fw if version = 0x00000037, smu fw
> > > version =0x002a3f00 (42.63.0)[13164.702664] amdgpu 0000:03:00.0:
> > > amdgpu: SMU driver if version notmatched[13164.746143] amdgpu
> > > 0000:03:00.0: amdgpu: SMU is resumed successfully![13164.768978]
> > > [drm] kiq ring mec 2 pipe 1 q 0[13164.779651] [drm] VCN decode
> > > and encode initializedsuccessfully(under DPG Mode).[13164.779758]
> > > [drm] JPEG decode initialized successfully.[13164.779779] amdgpu
> > > 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inveng 0 on hub
> > > 0[13164.779783] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses
> > > VMinv eng 1 on hub 0[13164.779784] amdgpu 0000:03:00.0: amdgpu:
> > > ring comp_1.1.0 uses VMinv eng 4 on hub 0[13164.779785] amdgpu
> > > 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VMinv eng 5 on hub
> > > 0[13164.779786] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses
> > > VMinv eng 6 on hub 0[13164.779787] amdgpu 0000:03:00.0: amdgpu:
> > > ring comp_1.0.1 uses VMinv eng 7 on hub 0[13164.779788] amdgpu
> > > 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VMinv eng 8 on hub
> > > 0[13164.779789] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses
> > > VMinv eng 9 on hub 0[13164.779790] amdgpu 0000:03:00.0: amdgpu:
> > > ring comp_1.3.1 uses VMinv eng 10 on hub 0[13164.779792] amdgpu
> > > 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inveng 11 on hub
> > > 0[13164.779793] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM
> > > inv eng12 on hub 0[13164.779803] amdgpu 0000:03:00.0: amdgpu:
> > > ring sdma1 uses VM inv eng13 on hub 0[13164.779804] amdgpu
> > > 0000:03:00.0: amdgpu: ring vcn_dec uses VM inveng 0 on hub
> > > 1[13164.779805] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses
> > > VM inveng 1 on hub 1[13164.779806] amdgpu 0000:03:00.0: amdgpu:
> > > ring vcn_enc1 uses VM inveng 4 on hub 1[13164.779807] amdgpu
> > > 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inveng 5 on hub
> > > 1[13164.783807] amdgpu 0000:03:00.0: [drm] Cannot find any crtc
> > > or sizes[13170.722306] [drm] free PSP TMR buffer
> > > If I boot with nomodeset, I can operate fine - just without
> > > screenbrightness control, etc. It justseems strange that an event
> > > is generated like this all the time.
> > > I only get sporadic crashes, though. Humorously, I've been
> > > runningonly Firefox, crash reporter andMate Terminal this morning
> > > and it's run fine for over 4 hours. Therewere times when I
> > > wouldn't runanything at all and it's lock up on me. So I just
> > > can't find anycommon denominator for this (using vi in terminal
> > > to type this - going to copy-paste into email client[Evolution]
> > > once I'm done this).
> > > I've attached 3 crash reports that were captured on the system
> > > overthe last couple days. I apologizein advance - profusely! - if
> > > the problem turns out to be somewhere else. 
> > > Thanks,Tim 
> > > _______________________________________________amd-gfx mailing 
> > > listamd-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
> 

[-- Attachment #2: Type: text/html, Size: 20819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Linux Mint 20.04 5.11 issue
  2021-07-30 12:08     ` Tim Cahill
@ 2021-08-04 12:53       ` Tim Cahill
  0 siblings, 0 replies; 4+ messages in thread
From: Tim Cahill @ 2021-08-04 12:53 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 16160 bytes --]

When launching Chromium from a terminal window, I get the following
output:
2606:2626:0804/084411.009101:ERROR:nss_util.cc(286)] After loading Root
Certs, loaded==false: NSS error code: -8018mesa: for the --simplifycfg-
sink-common option: may only occur zero or one times!mesa: for the --
global-isel-abort option: may only occur zero or one times!mesa: for
the --amdgpu-atomic-optimizations option: may only occur zero or one
times!mesa: for the --structurizecfg-skip-uniform-regions option: may
only occur zero or one
times![2636:2636:0804/084411.912737:ERROR:sandbox_linux.cc(374)]
InitializeSandbox() called with multiple threads in process gpu-
process.
I got the above after rebooting this morning after another Marco crash
(https://termbin.com/xy80). Any insight into whether or not this is
software, driver, or hardware issue is appreciated.
Thanks,Tim 
On Fri, 2021-07-30 at 08:08 -0400, Tim Cahill wrote:
> Posted the following comment to the Mate-desktop issue:
> 
>   Had
>  another hang with the same configuration as a youtube video played
> via a
>  USB headphone (Jabra40). I was able to recover by killing Firefox,
> in 
> which the video was playing. The video became choppy and garbled and 
> then stopped. The stderr is below:
> 
> ALSA lib conf.c:5187:(snd_config_expand) Unknown parameters 1ALSA lib
> control.c:1379:(snd_ctl_open_noupdate) Invalid CTL sysdefault:1ALSA
> lib conf.c:5187:(snd_config_expand) Unknown parameters 2ALSA lib
> control.c:1379:(snd_ctl_open_noupdate) Invalid CTL sysdefault:2ALSA
> lib pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slaveALSA lib
> pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slaveALSA lib
> pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slaveALSA lib
> pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slaveALSA lib
> pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
> 
> On re-launch of Firefox from terminal window, the following appeared:
> 
> [GFX1-]: More than 1 GPU from same vendor detected via PCI, cannot
> deduce deviceOn Thu, 2021-07-29 at 12:04 -0400, Tim Cahill wrote:
> > I apologize if the name callout is disconcerting. I was trying to
> > follow instructions for sending bugs and saw your name listed as
> > the owner of this code area. 
> > FYI, I'd done some more troubleshooting and tinkering regarding the
> > crashing and Mate seems to be at the center of all the issues. As a
> > result, I also opened an Issue with the Mate Desktop team (
> > https://github.com/mate-desktop/mate-panel/issues/1242). Mate also
> > has a power management component, which is probably responsible for
> > the excess logging and the confusion over Navil10. However, I have
> > no way to vouch for now accurately the Mate PM applet gathered data
> > for its instantiation. I have no external devices connected that
> > I'm aware would use it since I thought that was via HDMI. I *do*
> > have a Jabra Evolve2 headset that uses the TypeC USB connector, but
> > I assume that's not using the GPU.
> > The issue documentation I left with Mate notes that if I launch
> > apps from a terminal that is NOT launched from the Mate panel
> > (right-click on desktop instead to open terminal), the parent for
> > all the apps (Firefox, Evolution, etc.) is separate from Mate (at
> > least separate from mate-panel). Everything has worked fine (except
> > for the constant logging of the wake-up action) since I've done
> > that (and turned off the screensaver and screensaver lock). So, I'm
> > not sure what else to do at this point. Please advise if I should
> > do anything on the driver side.
> > Thanks,Tim 
> > On Thu, 2021-07-29 at 11:14 -0400, Felix Kuehling wrote:
> > > Am 2021-07-28 um 12:10 p.m. schrieb Tim Cahill:
> > > > Hi Felix,
> > > 
> > > I'm not sure why you're calling me out by name. I'm not working
> > > onanything obviously related to your crashes.
> > > Anyway, I took a quick look at the backtraces. They all point at
> > > libgdk.Two of them are segfaults, one is an abort. It's not clear
> > > how thesewould be related to the GPU driver. That said, when you
> > > boot withnomodeset, the GPU driver and all HW acceleration is
> > > completelydisabled. If that makes the problem disappear, the GPU
> > > driver is clearlyinvolved in the problem in some way.
> > > The abort points at a problem while freeing memory. This could be
> > > causedby a double-free problem in some unrelated code, possibly
> > > related to theGPU driver. This would be a problem in a user mode
> > > component (maybeMesa), not the kernel mode driver.
> > > I believe the messages you're seeing when you move the mouse are
> > > theresult of runtime power management that puts the GPU to sleep
> > > when it'sidle and reinitializes it when it's needed. You have 2
> > > GPUs in yourlaptop, an integrated Renoir GPU in the Ryzen CPU,
> > > and an externalNavi10 GPU for higher gaming performance. The GPU
> > > that goes to sleep andwakes up is the external Navi10 GPU.
> > > The OpenGL renderer string specifies "RENOIR". Therefore I'm
> > > surprisedthat the Navi10 GPU wakes up when you move the mouse.
> > > Ideally itshouldn't be used at all when you're just using the
> > > desktop.
> > > If you suspect that runtime power management is responsible for
> > > yourproblems, you could disable it with amdgpu.runpm=0 on the
> > > kernel commandline. That means the Navi10 GPU won't go into the
> > > low power mode anddrain your battery more quickly. So this is not
> > > a permanent solution.Just an experiment to narrow down the
> > > problem.
> > > Regards,  Felix
> > > 
> > > > I'm not sure how to do this as I haven't had to report a bug
> > > > before.I've looked to a variety of bug reporting sites to see
> > > > if anyone elseis running into the same issues that I'm having
> > > > (such as the Mateproject) and haven't seen anything at all
> > > > similar to the issue I'mhaving. Since I had issues with AMD
> > > > drivers with my distro (infobelow) and some consistent and high
> > > > volume dmesg content shows up,I've decided that I should start
> > > > here with the AMD kernel team.
> > > > I have a fairly new MSI laptop with the following
> > > > configuration:
> > > > [code]System:    Kernel: 5.11.0-25-generic x86_64 bits: 64
> > > > compiler: N/ADesktop: MATE 1.24.0 wm: marco            dm:
> > > > LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu
> > > > 20.04focal Machine:   Type: Laptop System: Micro-Star product:
> > > > Alpha 17 A4DEK v:REV:1.0 serial: <filter>            Chassis:
> > > > type: 10 serial: <filter>            Mobo: Micro-Star model:
> > > > MS-17EK v: REV:1.0 serial: <filter>UEFI: American
> > > > Megatrends            v: E17EKAMS.101 date:
> > > > 10/26/2020 Battery:   ID-1: BAT1 charge: 66.2 Wh condition:
> > > > 67.0/65.7 Wh (102%)volts: 12.4/10.8            model: MSI Corp.
> > > > MS-17EK serial: N/A status: Unknown CPU:       Topology: 8-Core 
> > > > model: AMD Ryzen 7 4800H with RadeonGraphics bits: 64 type: MT
> > > > MCP            arch: Zen rev: 1 L2 cache: 4096
> > > > KiB            flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1
> > > > sse4_2 sse4assse3 svm bogomips: 92630            Speed: 4278
> > > > MHz min/max: 1400/2900 MHz Core speeds (MHz):1: 4280 2: 1865 3:
> > > > 1397            4: 2188 5: 1489 6: 2265 7: 1907 8: 1906 9: 1729
> > > > 10: 139711: 1397 12: 1397 13: 1397            14: 1397 15: 1907
> > > > 16: 1740 Graphics:  Device-1: AMD Navi 10 [Radeon RX 5600
> > > > OEM/5600 XT /5700/5700 XT]            vendor: Micro-Star MSI
> > > > driver: amdgpu v: kernel bus ID:03:00.0 chip ID:
> > > > 1002:731f            Device-2: AMD Renoir vendor: Micro-Star
> > > > MSI driver: amdgpuv: kernel bus ID: 08:00.0            chip ID:
> > > > 1002:1636            Display: x11 server: X.Org 1.20.9 driver:
> > > > amdgpu,ati            unloaded: fbdev,modesetting,radeon,vesa
> > > > compositor: marcoresolution: 1920x1080~144Hz            OpenGL:
> > > > renderer: AMD RENOIR (DRM 3.40.0 5.11.0-25-genericLLVM
> > > > 11.0.0)            v: 4.6 Mesa 20.2.6 direct render:
> > > > Yes Audio:     Device-1: AMD Navi 10 HDMI Audio vendor: Micro-
> > > > Star MSIdriver: snd_hda_intel v: kernel            bus ID:
> > > > 03:00.1 chip ID: 1002:ab38            Device-2: AMD
> > > > Raven/Raven2/FireFlight/Renoir AudioProcessor vendor: Micro-
> > > > Star MSI            driver: N/A bus ID: 08:00.5 chip ID:
> > > > 1022:15e2            Device-3: AMD Family 17h HD Audio vendor:
> > > > Micro-Star MSIdriver: snd_hda_intel            v: kernel bus
> > > > ID: 08:00.6 chip ID: 1022:15e3            Sound Server: ALSA v:
> > > > k5.11.0-25-generic Network:   Device-1: Intel Wi-Fi 6 AX200
> > > > driver: iwlwifi v: kernel busID: 04:00.0            chip ID:
> > > > 8086:2723            IF: wlp4s0 state: up mac:
> > > > <filter>            Device-2: Realtek RTL8111/8168/8411 PCI
> > > > Express GigabitEthernet vendor: Micro-Star
> > > > MSI            driver: r8169 v: kernel port: f000 bus ID:
> > > > 05:00.0 chip ID:10ec:8168            IF: eno1 state: down mac:
> > > > <filter> Drives:    Local Storage: total: 476.94 GiB used:
> > > > 89.79 GiB (18.8%)            ID-1: /dev/nvme0n1 vendor:
> > > > Kingston model: OM8PCP3512F-AI1size: 476.94
> > > > GiB            speed: 31.6 Gb/s lanes: 4 serial:
> > > > <filter> Partition: ID-1: / size: 466.30 GiB used: 89.28 GiB
> > > > (19.1%) fs: ext4dev: /dev/dm-1            ID-2: /boot size:
> > > > 704.5 MiB used: 519.7 MiB (73.8%) fs:ext4 dev:
> > > > /dev/nvme0n1p2            ID-3: swap-1 size: 980.0 MiB used: 0
> > > > KiB (0.0%) fs: swapdev: /dev/dm-2 USB:       Hub: 1-0:1 info:
> > > > Full speed (or root) Hub ports: 4 rev: 2.0chip ID:
> > > > 1d6b:0002            Device-1: 1-3:2 info: SteelSeries ApS
> > > > SteelSeries KLC type:HID            driver: hid-generic,usbhid
> > > > rev: 2.0 chip ID: 1038:1122            Device-2: 1-4:3 info:
> > > > Acer HD Webcam type: Video driver:uvcvideo rev:
> > > > 2.0            chip ID: 5986:211c            Hub: 2-0:1 info:
> > > > Full speed (or root) Hub ports: 2 rev: 3.1chip ID:
> > > > 1d6b:0003            Hub: 3-0:1 info: Full speed (or root) Hub
> > > > ports: 4 rev: 2.0chip ID: 1d6b:0002            Device-3: 3-3:2
> > > > info: Intel type: Bluetooth driver: btusbrev: 2.0 chip ID:
> > > > 8087:0029            Hub: 4-0:1 info: Full speed (or root) Hub
> > > > ports: 2 rev: 3.1chip ID: 1d6b:0003 Sensors:   System
> > > > Temperatures: cpu: 46.5 C mobo: N/A            Fan Speeds
> > > > (RPM): N/A            GPU: device: amdgpu temp: 0 C fan: 65535
> > > > device: amdgputemp: 31 C Repos:     No active apt repos in:
> > > > /etc/apt/sources.list            Active apt repos
> > > > in:/etc/apt/sources.list.d/official-package-
> > > > repositories.list            1: deb http:
> > > > //mirrors.seas.harvard.edu/linuxmint-packagesuma main upstream
> > > > import backport           2: deb http: //mirror.us-
> > > > ny2.kamatera.com/ubuntu focal mainrestricted universe
> > > > multiverse           3: deb http: //mirror.us-
> > > > ny2.kamatera.com/ubuntufocal-updates main restricted universe
> > > > multiverse           4: deb http: //mirror.us-
> > > > ny2.kamatera.com/ubuntufocal-backports main restricted universe
> > > > multiverse           5: deb http: //security.ubuntu.com/ubuntu/
> > > > focal-securitymain restricted universe multiverse           6:
> > > > deb http: //archive.canonical.com/ubuntu/ focal
> > > > partnerInfo:      Processes: 372 Uptime: 2h 44m Memory: 15.10
> > > > GiB used: 1.15GiB (7.6%) Init: systemd            v: 245
> > > > runlevel: 5 Compilers: gcc: 9.3.0 alt: 9 Client:Unknown
> > > > python3.8 client            inxi: 3.0.38 [/code]
> > > > 
> > > > If I am using it interactively, I get random crashes that seems
> > > > to hitelements of mate (mate-panel, etc.) consistently - just
> > > > not predictably. LibreOffice applications, xed,Firefox, and
> > > > Evolution seem to be more proneto crashing the X session. I can
> > > > easily move to tty1, login, and killservices running in tty7 as
> > > > the crashesdon't appear to completely kill tty7. Sometimes, I
> > > > can kill mate andlaunch a new instance to salvagethe tty7
> > > > session. However, i usually end up having to kill the rootpid
> > > > of the xwindows session in orderto re-login. But I think this
> > > > is related to the AMD GPU driver becauseevery  time I simply
> > > > move the mouse in tty7 session, I get the following in dmesg:
> > > > [13164.399550] [drm] PCIE GART of 512M enabled (table
> > > > at0x0000008000000000).[13164.399579] [drm] PSP is
> > > > resuming...[13164.486593] [drm] reserve 0xa00000 from
> > > > 0x800f400000 for PSP TMR[13164.678788] amdgpu 0000:03:00.0:
> > > > amdgpu: RAS: optional ras ta ucodeis not
> > > > available[13164.702624] amdgpu 0000:03:00.0: amdgpu: RAP:
> > > > optional rap ta ucodeis not available[13164.702639] amdgpu
> > > > 0000:03:00.0: amdgpu: SMU is resuming...[13164.702648] amdgpu
> > > > 0000:03:00.0: amdgpu: smu driver if version =0x00000036, smu fw
> > > > if version = 0x00000037, smu fw version =0x002a3f00
> > > > (42.63.0)[13164.702664] amdgpu 0000:03:00.0: amdgpu: SMU driver
> > > > if version notmatched[13164.746143] amdgpu 0000:03:00.0:
> > > > amdgpu: SMU is resumed successfully![13164.768978] [drm] kiq
> > > > ring mec 2 pipe 1 q 0[13164.779651] [drm] VCN decode and encode
> > > > initializedsuccessfully(under DPG Mode).[13164.779758] [drm]
> > > > JPEG decode initialized successfully.[13164.779779] amdgpu
> > > > 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inveng 0 on hub
> > > > 0[13164.779783] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0
> > > > uses VMinv eng 1 on hub 0[13164.779784] amdgpu 0000:03:00.0:
> > > > amdgpu: ring comp_1.1.0 uses VMinv eng 4 on hub 0[13164.779785]
> > > > amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VMinv eng 5
> > > > on hub 0[13164.779786] amdgpu 0000:03:00.0: amdgpu: ring
> > > > comp_1.3.0 uses VMinv eng 6 on hub 0[13164.779787] amdgpu
> > > > 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VMinv eng 7 on hub
> > > > 0[13164.779788] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1
> > > > uses VMinv eng 8 on hub 0[13164.779789] amdgpu 0000:03:00.0:
> > > > amdgpu: ring comp_1.2.1 uses VMinv eng 9 on hub 0[13164.779790]
> > > > amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VMinv eng 10
> > > > on hub 0[13164.779792] amdgpu 0000:03:00.0: amdgpu: ring
> > > > kiq_2.1.0 uses VM inveng 11 on hub 0[13164.779793] amdgpu
> > > > 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng12 on hub
> > > > 0[13164.779803] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM
> > > > inv eng13 on hub 0[13164.779804] amdgpu 0000:03:00.0: amdgpu:
> > > > ring vcn_dec uses VM inveng 0 on hub 1[13164.779805] amdgpu
> > > > 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inveng 1 on hub
> > > > 1[13164.779806] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses
> > > > VM inveng 4 on hub 1[13164.779807] amdgpu 0000:03:00.0: amdgpu:
> > > > ring jpeg_dec uses VM inveng 5 on hub 1[13164.783807] amdgpu
> > > > 0000:03:00.0: [drm] Cannot find any crtc or sizes[13170.722306]
> > > > [drm] free PSP TMR buffer
> > > > If I boot with nomodeset, I can operate fine - just without
> > > > screenbrightness control, etc. It justseems strange that an
> > > > event is generated like this all the time.
> > > > I only get sporadic crashes, though. Humorously, I've been
> > > > runningonly Firefox, crash reporter andMate Terminal this
> > > > morning and it's run fine for over 4 hours. Therewere times
> > > > when I wouldn't runanything at all and it's lock up on me. So I
> > > > just can't find anycommon denominator for this (using vi in
> > > > terminal to type this - going to copy-paste into email
> > > > client[Evolution] once I'm done this).
> > > > I've attached 3 crash reports that were captured on the system
> > > > overthe last couple days. I apologizein advance - profusely! -
> > > > if the problem turns out to be somewhere else. 
> > > > Thanks,Tim 
> > > > _______________________________________________amd-gfx mailing 
> > > > listamd-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #2: Type: text/html, Size: 22135 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-05  7:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3e50d54ee11131494a8dcd75cdff5f366dd90930.camel@razorwired.com>
2021-07-29 15:14 ` Linux Mint 20.04 5.11 issue Felix Kuehling
2021-07-29 16:04   ` Tim Cahill
2021-07-30 12:08     ` Tim Cahill
2021-08-04 12:53       ` Tim Cahill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.