From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 49D01C433EF
	for <linux-mm@archiver.kernel.org>; Thu, 14 Apr 2022 10:32:39 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id A517A6B0071; Thu, 14 Apr 2022 06:32:38 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A02776B0073; Thu, 14 Apr 2022 06:32:38 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 8C8BA6B0074; Thu, 14 Apr 2022 06:32:38 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24])
	by kanga.kvack.org (Postfix) with ESMTP id 7B2136B0071
	for <linux-mm@kvack.org>; Thu, 14 Apr 2022 06:32:38 -0400 (EDT)
Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 13B542B18
	for <linux-mm@kvack.org>; Thu, 14 Apr 2022 10:32:38 +0000 (UTC)
X-FDA: 79355120796.01.28978C6
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf14.hostedemail.com (Postfix) with ESMTP id 83846100003
	for <linux-mm@kvack.org>; Thu, 14 Apr 2022 10:32:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1649932356;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=QR9KlRpIgLy+NHuJHbBMyshO/OHacsD7KQS8/lCfVus=;
	b=HrjoWAyYRd0P+Xpq5sLAQtxUVdMtFpLGMMnAozzdor0nhwq9DG+tllF0mmQUP2q65v7TH5
	1npnjrwKYdpyg9cqOS+BmIQy4Ze+mJGpIqtCSNX2WlVrcg93BBexB5zTyA02LSzGpNyD4a
	yGct81lonRwQeCQjfW+hrfmgcCfRgkk=
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-651-pcadmk-LNzywEdzeYVG2ug-1; Thu, 14 Apr 2022 06:32:33 -0400
X-MC-Unique: pcadmk-LNzywEdzeYVG2ug-1
Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 14BA21C06912;
	Thu, 14 Apr 2022 10:32:33 +0000 (UTC)
Received: from localhost (ovpn-13-186.pek2.redhat.com [10.72.13.186])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id D125042D3D3;
	Thu, 14 Apr 2022 10:32:31 +0000 (UTC)
Date: Thu, 14 Apr 2022 18:32:28 +0800
From: Baoquan He <bhe@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: Chris Down <chris@chrisdown.name>, linux-mm@kvack.org,
	kexec@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>, Christoph Hellwig <hch@lst.de>,
	x86@kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2] mm/vmalloc: fix spinning drain_vmap_work after
 reading from /proc/vmcore
Message-ID: <Ylf4POlvtmYwkGrI@MiWiFi-R3L-srv>
References: <52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com>
 <Yk72+9UFwsaFXoZe@chrisdown.name>
 <Yk+l1/xLzhB02aIU@MiWiFi-R3L-srv>
 <Ylb5PLNDFIif6vZ2@relinquished.localdomain>
MIME-Version: 1.0
In-Reply-To: <Ylb5PLNDFIif6vZ2@relinquished.localdomain>
X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HrjoWAyY;
	spf=none (imf14.hostedemail.com: domain of bhe@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=bhe@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
X-Stat-Signature: uxcxatcfxgienrijb3gt6ckkb3iwisc9
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 83846100003
X-HE-Tag: 1649932357-168162
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On 04/13/22 at 09:24am, Omar Sandoval wrote:
> On Fri, Apr 08, 2022 at 11:02:47AM +0800, Baoquan He wrote:
......
> > Since redhat mail server has issue, the body content of patch is empty
> > from my mail client. So reply here to add comment.
> > 
> > As replied in v1 to Omar, I think this is a great fix. That would be
> > also great to state if this is a real issue which is breaking thing,
> > then add 'Fixes' tag and Cc stable like "Cc: <stable@vger.kernel.org> # 5.17",
> > or a fantastic improvement from code inspecting.
> > 
> > Concern this because in distros, e.g in our rhel8, we maintain old kernel
> > and back port necessary patches into the kernel, those patches with
> > 'Fixes' tag definitely are good candidate. This is important too to LTS
> > kernel.
> > 
> > Thanks
> > Baoquan
> 
> Hi, Baoquan,
> 
> Sorry I missed your replies. I'll answer your questions from your first
> email.
> 
> > I am wondering if this is a real issue you met, or you just found it
> > by code inspecting
> 
> I hit this issue with the test suite for drgn
> (https://github.com/osandov/drgn). We run the test cases in a virtual
> machine on various kernel versions
> (https://github.com/osandov/drgn/tree/main/vmtest). Part of the test
> suite crashes the kernel to run some tests against /proc/vmcore
> (https://github.com/osandov/drgn/blob/13144eda119790cdbc11f360c15a04efdf81ae9a/setup.py#L213,
> https://github.com/osandov/drgn/blob/main/vmtest/enter_kdump.py,
> https://github.com/osandov/drgn/tree/main/tests/linux_kernel/vmcore).
> When I tried v5.18-rc1 configured with !SMP and !PREEMPT, that part of
> the test suite got stuck, which is how I found this issue.
> 
> > I am wondering how your vmcore dumping is handled. Asking this because
> > we usually use makedumpfile utility
> 
> In production at Facebook, we don't run drgn directly against
> /proc/vmcore. We use makedumpfile and inspect the captured file with
> drgn once we reboot.
> 
> > While using makedumpfile, we use mmap which is 4M at one time by
> > default, then process the content. So the copy_oldmem_page() may only
> > be called during elfcorehdr and notes reading.
> 
> We also use vmcore-dmesg
> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/vmcore-dmesg)
> on /proc/vmcore before calling makedumpfile. From what I can tell, that
> uses read()/pread()
> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/util_lib/elf_info.c),
> so it would also hit this issue.

Thanks for these details and great patch. It's clear to me now about the
situation and motivation.

We also use vmcore-dmesg to collect dmesg log before running
makedumpfile. That could be a small probability event, but worth adding
Fixes just in case.

> 
> I'll send a v3 adding Fixes: 690467c81b1a ("mm/vmalloc: Move draining
> areas out of caller context"). I don't think a stable tag is necessary
> since this was introduced in v5.18-rc1 and hasn't been backported as far
> as I can tell.
> 
> Thanks,
> Omar
> 


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Baoquan He <bhe@redhat.com>
Date: Thu, 14 Apr 2022 18:32:28 +0800
Subject: [PATCH v2] mm/vmalloc: fix spinning drain_vmap_work after
 reading from /proc/vmcore
In-Reply-To: <Ylb5PLNDFIif6vZ2@relinquished.localdomain>
References: <52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com>
 <Yk72+9UFwsaFXoZe@chrisdown.name> <Yk+l1/xLzhB02aIU@MiWiFi-R3L-srv>
 <Ylb5PLNDFIif6vZ2@relinquished.localdomain>
Message-ID: <Ylf4POlvtmYwkGrI@MiWiFi-R3L-srv>
List-Id: <kexec.lists.infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: kexec@lists.infradead.org

On 04/13/22 at 09:24am, Omar Sandoval wrote:
> On Fri, Apr 08, 2022 at 11:02:47AM +0800, Baoquan He wrote:
......
> > Since redhat mail server has issue, the body content of patch is empty
> > from my mail client. So reply here to add comment.
> > 
> > As replied in v1 to Omar, I think this is a great fix. That would be
> > also great to state if this is a real issue which is breaking thing,
> > then add 'Fixes' tag and Cc stable like "Cc: <stable@vger.kernel.org> # 5.17",
> > or a fantastic improvement from code inspecting.
> > 
> > Concern this because in distros, e.g in our rhel8, we maintain old kernel
> > and back port necessary patches into the kernel, those patches with
> > 'Fixes' tag definitely are good candidate. This is important too to LTS
> > kernel.
> > 
> > Thanks
> > Baoquan
> 
> Hi, Baoquan,
> 
> Sorry I missed your replies. I'll answer your questions from your first
> email.
> 
> > I am wondering if this is a real issue you met, or you just found it
> > by code inspecting
> 
> I hit this issue with the test suite for drgn
> (https://github.com/osandov/drgn). We run the test cases in a virtual
> machine on various kernel versions
> (https://github.com/osandov/drgn/tree/main/vmtest). Part of the test
> suite crashes the kernel to run some tests against /proc/vmcore
> (https://github.com/osandov/drgn/blob/13144eda119790cdbc11f360c15a04efdf81ae9a/setup.py#L213,
> https://github.com/osandov/drgn/blob/main/vmtest/enter_kdump.py,
> https://github.com/osandov/drgn/tree/main/tests/linux_kernel/vmcore).
> When I tried v5.18-rc1 configured with !SMP and !PREEMPT, that part of
> the test suite got stuck, which is how I found this issue.
> 
> > I am wondering how your vmcore dumping is handled. Asking this because
> > we usually use makedumpfile utility
> 
> In production at Facebook, we don't run drgn directly against
> /proc/vmcore. We use makedumpfile and inspect the captured file with
> drgn once we reboot.
> 
> > While using makedumpfile, we use mmap which is 4M at one time by
> > default, then process the content. So the copy_oldmem_page() may only
> > be called during elfcorehdr and notes reading.
> 
> We also use vmcore-dmesg
> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/vmcore-dmesg)
> on /proc/vmcore before calling makedumpfile. From what I can tell, that
> uses read()/pread()
> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/util_lib/elf_info.c),
> so it would also hit this issue.

Thanks for these details and great patch. It's clear to me now about the
situation and motivation.

We also use vmcore-dmesg to collect dmesg log before running
makedumpfile. That could be a small probability event, but worth adding
Fixes just in case.

> 
> I'll send a v3 adding Fixes: 690467c81b1a ("mm/vmalloc: Move draining
> areas out of caller context"). I don't think a stable tag is necessary
> since this was introduced in v5.18-rc1 and hasn't been backported as far
> as I can tell.
> 
> Thanks,
> Omar
>