All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Andres Freund <andres@anarazel.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Guenter Roeck <linux@roeck-us.net>,
	linux-kernel@vger.kernel.org,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: upstream kernel crashes
Date: Mon, 15 Aug 2022 03:51:34 -0400	[thread overview]
Message-ID: <20220815034532-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20220815071143.n2t5xsmifnigttq2@awork3.anarazel.de>

On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Some of the symptoms could be related to the issue in this thread, hence
> > > listing them here
> > 
> > Smells like slab corruption to me, and the problems may end up being
> > then largely random just depending on who ends up using the allocation
> > that gets trampled on.
> > 
> > I wouldn't be surprised if it's all the same thing - including your
> > network issue.
> 
> Yea. As I just wrote in
> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> bisected it down to one commit (762faee5a267). With that commit I only see the
> networking issue across a few reboots, but with ebcce4926365 some boots oops
> badly and other times it' "just" network not working.
> 
> 
> [    2.447668] general protection fault, probably for non-canonical address 0xffff000000000800: 0000 [#1] PREEMPT SMP PTI
> [    2.449168] CPU: 1 PID: 109 Comm: systemd-udevd Not tainted 5.19.0-bisect8-00051-gebcce4926365 #8
> [    2.450397] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [    2.451670] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [    2.452399] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [    2.455454] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [    2.456181] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [    2.457195] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [    2.458211] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [    2.459183] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [    2.460268] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [    2.461368] FS:  00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [    2.462605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.463436] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [    2.464527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.465520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.466509] Call Trace:
> [    2.466882]  <TASK>
> [    2.467218]  copy_process+0x1eb/0x1a00
> [    2.467827]  ? _raw_spin_unlock_irqrestore+0x16/0x30
> [    2.468578]  kernel_clone+0xba/0x400
> [    2.470455]  __do_sys_clone+0x78/0xa0
> [    2.471006]  do_syscall_64+0x37/0x90
> [    2.471526]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [    2.472267] RIP: 0033:0x7fd02bf98cb3
> [    2.472889] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
> [    2.475504] RSP: 002b:00007ffc6a3abf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
> [    2.476565] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd02bf98cb3
> [    2.477554] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> [    2.478574] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [    2.479608] R10: 00007fd02b8b8b90 R11: 0000000000000246 R12: 0000000000000001
> [    2.480675] R13: 00007ffc6a3ac0c0 R14: 0000000000000000 R15: 0000000000000001
> [    2.481686]  </TASK>
> [    2.482119] Modules linked in:
> [    2.482704] ---[ end trace 0000000000000000 ]---
> [    2.483456] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [    2.484282] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [    2.487024] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [    2.487817] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [    2.488805] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [    2.489869] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [    2.490842] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [    2.491905] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [    2.492975] FS:  00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [    2.494140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.495082] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [    2.496080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.497084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.498524] systemd-udevd (109) used greatest stack depth: 13688 bytes left
> [    2.503905] general protection fault, probably for non-canonical address 0xffff000000000000: 0000 [#2] PREEMPT SMP PTI
> [    2.505504] CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G      D           5.19.0-bisect8-00051-gebcce4926365 #8
> [    2.507037] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [    2.508313] RIP: 0010:rcu_core+0x280/0x920
> [    2.508968] Code: 3f 00 00 48 89 c2 48 85 c0 0f 84 2b 03 00 00 49 89 dd 48 83 c3 01 0f 1f 44 00 00 48 8b 42 08 48 89 d7 48 c7 42 08 00 00 00 00 <ff> d0 0f 1f 00 65 8b 05 64 f5 ad 6c f6 c4 01 75 97 be 00 02 00 00
> [    2.511684] RSP: 0000:ffffa2b40007fe20 EFLAGS: 00010202
> [    2.512410] RAX: ffff000000000000 RBX: 0000000000000002 RCX: 0000000080170011
> [    2.513497] RDX: ffff8da783372a20 RSI: 0000000080170011 RDI: ffff8da783372a20
> [    2.514604] RBP: ffff8da8b7c2b940 R08: 0000000000000001 R09: ffffffff9353b752
> [    2.515667] R10: ffffffff94a060c0 R11: 000000000009b776 R12: ffff8da78020c000
> [    2.516650] R13: 0000000000000001 R14: ffff8da8b7c2b9b8 R15: 0000000000000000
> [    2.517628] FS:  0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [    2.518840] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.519645] CR2: 0000557194db70f8 CR3: 0000000100364006 CR4: 00000000003706f0
> [    2.520641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.521629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.522592] Call Trace:
> [    2.522963]  <TASK>
> [    2.523299]  __do_softirq+0xe1/0x2ec
> [    2.523883]  ? sort_range+0x20/0x20
> [    2.524404]  run_ksoftirqd+0x25/0x30
> [    2.524944]  smpboot_thread_fn+0x180/0x220
> [    2.525519]  kthread+0xe1/0x110
> [    2.526001]  ? kthread_complete_and_exit+0x20/0x20
> [    2.526673]  ret_from_fork+0x1f/0x30
> [    2.527182]  </TASK>
> [    2.527518] Modules linked in:
> [    2.528005] ---[ end trace 0000000000000000 ]---
> [    2.528662] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [    2.529524] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [    2.532396] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [    2.533201] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [    2.534376] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [    2.535398] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> Begin: Loading e[    2.536401] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [    2.537641] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> ssential drivers[    2.538737] FS:  0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [    2.540028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  ... done.[    2.540843] CR2: 0000557194db70f8 CR3: 000000015080c002 CR4: 00000000003706f0
> [    2.541953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 
> [    2.542924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.543902] Kernel panic - not syncing: Fatal exception in interrupt
> [    2.544967] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    2.546637] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> 
> 
> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> can create a kernel with some more debugging stuff enabled, if somebody tells
> me what'd work best here.
> 
> 
> Greetings,
> 
> Andres Freund


It is possible that GCP gets confused if ring size is smaller than the
device maximum simply because no one did it in the past.

So I pushed just the revert of 762faee5a267 to the test branch.
Could you give it a spin?

-- 
MST


  parent reply	other threads:[~2022-08-15  7:51 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-14 21:26 upstream kernel crashes Guenter Roeck
2022-08-14 21:40 ` Linus Torvalds
2022-08-14 22:37   ` Andres Freund
2022-08-14 22:47     ` Linus Torvalds
2022-08-15  1:04       ` Jens Axboe
2022-08-15  1:36         ` Andres Freund
2022-08-15  3:18           ` Linus Torvalds
2022-08-15  7:11             ` Andres Freund
2022-08-15  7:29               ` Michael S. Tsirkin
2022-08-15  7:46                 ` Andres Freund
2022-08-15  7:53                   ` Michael S. Tsirkin
2022-08-15  8:02                   ` Michael S. Tsirkin
2022-08-15  8:02                     ` Michael S. Tsirkin
2022-08-15  7:51               ` Michael S. Tsirkin [this message]
2022-08-15  8:15                 ` Andres Freund
2022-08-15  8:28                   ` Michael S. Tsirkin
2022-08-15  8:34                     ` Andres Freund
2022-08-15 15:40                       ` Michael S. Tsirkin
2022-08-15 15:40                         ` Michael S. Tsirkin
2022-08-15 16:45                         ` Andres Freund
2022-08-15 16:45                           ` Andres Freund
2022-08-15 16:50                           ` Michael S. Tsirkin
2022-08-15 16:50                             ` Michael S. Tsirkin
2022-08-15 17:46                             ` Andres Freund
2022-08-15 17:46                               ` Andres Freund
2022-08-15 20:21                               ` Michael S. Tsirkin
2022-08-15 20:21                                 ` Michael S. Tsirkin
2022-08-15 20:53                                 ` Andres Freund
2022-08-15 20:53                                   ` Andres Freund
2022-08-15 21:04                                   ` Andres Freund
2022-08-15 21:04                                     ` Andres Freund
2022-08-15 21:10                                     ` Andres Freund
2022-08-15 21:10                                       ` Andres Freund
2022-08-15 21:32                                   ` Michael S. Tsirkin
2022-08-15 21:32                                     ` Michael S. Tsirkin
2022-08-16  2:45                                     ` Xuan Zhuo
2022-08-16  2:45                                       ` Xuan Zhuo
2022-08-17  6:13                                     ` Dmitry Vyukov
2022-08-17  6:13                                       ` Dmitry Vyukov via Virtualization
2022-08-17  6:36                                       ` Xuan Zhuo
2022-08-17  6:36                                         ` Xuan Zhuo
2022-08-17 10:53                                         ` Michael S. Tsirkin
2022-08-17 10:53                                           ` Michael S. Tsirkin
2022-08-17 15:58                                         ` Linus Torvalds
2022-08-17 15:58                                           ` Linus Torvalds
2022-08-18  1:55                                           ` Xuan Zhuo
2022-08-18  1:55                                             ` Xuan Zhuo
2022-08-15 20:45                             ` Guenter Roeck
2022-08-15 20:45                               ` Guenter Roeck
2022-08-15  6:36           ` Michael S. Tsirkin
2022-08-15  7:17             ` Andres Freund
2022-08-15  7:43               ` Michael S. Tsirkin
2022-08-15  1:17       ` Guenter Roeck
2022-08-15  1:29         ` Jens Axboe
2022-08-15  9:43 ` Michael S. Tsirkin
2022-08-15 15:49   ` Guenter Roeck
2022-08-15 16:01     ` Michael S. Tsirkin
2022-08-15 18:22       ` Guenter Roeck
2022-08-15 18:37         ` Linus Torvalds
2022-08-15 20:38           ` Guenter Roeck
2022-08-17 17:12 ` Linus Torvalds
2022-08-18  1:08   ` Andres Freund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220815034532-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=andres@anarazel.de \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.