dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Dave Airlie <airlied@gmail.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>,
	LKML <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [git pull] drm for 6.1-rc1
Date: Fri, 7 Oct 2022 07:52:35 +1000	[thread overview]
Message-ID: <CAPM=9tx8tjzz5q4gkLbh=R+xO5x-8QQOB9E=GAXrV6=-r844-A@mail.gmail.com> (raw)
In-Reply-To: <CAPM=9tyL=J26aHdhSSK0jwYQLHBf8jjTMvJmj1cQheUF=wpd-Q@mail.gmail.com>

On Fri, 7 Oct 2022 at 07:41, Dave Airlie <airlied@gmail.com> wrote:
>
> On Fri, 7 Oct 2022 at 06:24, Dave Airlie <airlied@gmail.com> wrote:
> >
> > On Fri, 7 Oct 2022 at 06:14, Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Thu, Oct 6, 2022 at 3:48 PM Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > On Thu, Oct 6, 2022 at 12:28 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > >
> > > > > Maybe you are seeing this which is an issue with GPU TLB flushes which
> > > > > is kind of sporadic:
> > > > > https://gitlab.freedesktop.org/drm/amd/-/issues/2113
> > > >
> > > > Well, that seems to be 5.19, and while timing changes (or whatever
> > > > other software updates) could have made it start trigger, this machine
> > > > has been pretty solid otgerwise.
> > > >
> > > > > Are you seeing any GPU page faults in your kernel log?
> > > >
> > > > Nothing even remotely like that "no-retry page fault" in that issue
> > > > report. Of course, if it happens just before the whole thing locks
> > > > up...
> > >
> > > Your chip is too old to support retry faults so it's likely you could
> > > be just seeing a GPU page fault followed by a hang.  Your chip also
> > > lacks a paging queue, so you would be affected by the TLB issue.
> >
> >
> > Okay I got my FIJI running Linus tree and netconsole to blow up like
> > this, running fedora 36 desktop, steam, firefox, and then I ran
> > poweroff over ssh.
> >
> > [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088
> > [ 1234.778782] #PF: supervisor read access in kernel mode
> > [ 1234.778787] #PF: error_code(0x0000) - not-present page
> > [ 1234.778791] PGD 0 P4D 0
> > [ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2
> > [ 1234.778809] Hardware name: System manufacturer System Product
> > Name/PRIME X370-PRO, BIOS 5603 07/28/2020
> > [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
> > [ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f
> > ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53
> > 48 89 fb <48> 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00
> > 00 f0
> > [ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087
> > [ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018
> > [ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000
> > [ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808
> > [ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00
> > [ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458
> > [ 1234.778912] FS:  00007f35e7008580(0000) GS:ffff95428ebc0000(0000)
> > knlGS:0000000000000000
> > [ 1234.778916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0
> > [ 1234.778924] Call Trace:
> > [ 1234.778981]  <IRQ>
> > [ 1234.778989]  dma_fence_signal_timestamp_locked+0x6a/0xe0
> > [ 1234.778999]  dma_fence_signal+0x2c/0x50
> > [ 1234.779005]  amdgpu_fence_process+0xc8/0x140 [amdgpu]
> > [ 1234.779234]  sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu]
> > [ 1234.779395]  amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu]
> > [ 1234.779609]  amdgpu_ih_process+0x80/0x100 [amdgpu]
> > [ 1234.779783]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
> > [ 1234.779940]  __handle_irq_event_percpu+0x46/0x190
> > [ 1234.779946]  handle_irq_event+0x34/0x70
> > [ 1234.779949]  handle_edge_irq+0x9f/0x240
> > [ 1234.779954]  __common_interrupt+0x66/0x100
> > [ 1234.779960]  common_interrupt+0xa0/0xc0
> > [ 1234.779965]  </IRQ>
> > [ 1234.779968]  <TASK>
> > [ 1234.779971]  asm_common_interrupt+0x22/0x40
> > [ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110
> > [ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41
> > 54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30
> > 48 8b 18 <48> 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48
> > 83 ea
> > [ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202
> >
> > I'll see if I can dig any.
>
> I'm kicking the tires on the drm-next tree just prior to submission,
> and in an attempt to make myself look foolish and to tempt fate, it
> seems stable.

Yay it worked, crashed drm-next. will start reverting down the rabbit hole.

Dave.

  reply	other threads:[~2022-10-06 21:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-05  3:41 [git pull] drm for 6.1-rc1 Dave Airlie
2022-10-05 18:38 ` Linus Torvalds
2022-10-05 20:56   ` Dave Airlie
2022-10-05 18:40 ` pr-tracker-bot
2022-10-06 18:47 ` Linus Torvalds
2022-10-06 19:28   ` Alex Deucher
2022-10-06 19:47     ` Linus Torvalds
2022-10-06 20:14       ` Alex Deucher
2022-10-06 20:24         ` Dave Airlie
2022-10-06 21:41           ` Dave Airlie
2022-10-06 21:52             ` Dave Airlie [this message]
2022-10-06 23:45           ` Linus Torvalds
2022-10-07  2:45             ` Dave Airlie
2022-10-07  2:54               ` Dave Airlie
2022-10-07  3:03                 ` Dave Airlie
2022-10-07  6:11               ` Christian König
2022-10-07  8:16             ` Daniel Vetter
2022-10-07  9:28               ` Daniel Vetter
2022-10-06 19:29   ` Dave Airlie
2022-10-06 19:41     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPM=9tx8tjzz5q4gkLbh=R+xO5x-8QQOB9E=GAXrV6=-r844-A@mail.gmail.com' \
    --to=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).