linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 linux-fsdevel@vger.kernel.org, Zi Yan <ziy@nvidia.com>,
	 Yang Shi <shy828301@gmail.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	 Oscar Salvador <osalvador@suse.de>,
	Matthew Wilcox <willy@infradead.org>,
	 Bharata B Rao <bharata@amd.com>,
	Alistair Popple <apopple@nvidia.com>,
	 Xin Hao <xhao@linux.alibaba.com>,
	Minchan Kim <minchan@kernel.org>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	 Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	"Xu, Pengfei" <pengfei.xu@intel.com>,
	 Christoph Hellwig <hch@lst.de>, Stefan Roesch <shr@devkernel.io>,
	 Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH -v5 0/9] migrate_pages(): batch TLB flushing
Date: Tue, 21 Feb 2023 14:25:41 -0800 (PST)	[thread overview]
Message-ID: <20f1628e-96a7-3a5d-fef5-dae31f8eb196@google.com> (raw)
In-Reply-To: <871qmjdsj0.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Tue, 21 Feb 2023, Huang, Ying wrote:
> 
> On second thought, I think that it may be better to provide a fix as
> simple as possible firstly.  Then we can work on a more complex fix as
> we discussed above.  The simple fix is easy to review now.  And, we will
> have more time to test and review the complex fix.
> 
> In the following fix, I disabled the migration batching except for the
> MIGRATE_ASYNC mode, or the split folios of a THP folio.  After that, I
> will work on the complex fix to enable migration batching for all modes.
> 
> What do you think about that?

I don't think there's a need to rush in the wrong fix so quickly.
Your series was in (though sometimes out of) linux-next for some
while, without causing any widespread problems.  Andrew did send
it to Linus yesterday, I expect he'll be pushing it out later today
or tomorrow, but I don't think it's going to cause big problems.
Aiming for a fix in -rc2 would be good.  Why would it be complex?

Hugh

> 
> Best Regards,
> Huang, Ying
> 
> -------------------------------8<---------------------------------
> From 8e475812eacd9f2eeac76776c2b1a17af3e59b89 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@intel.com>
> Date: Tue, 21 Feb 2023 16:37:50 +0800
> Subject: [PATCH] migrate_pages: fix deadlock in batched migration
> 
> Two deadlock bugs were reported for the migrate_pages() batching
> series.  Thanks Hugh and Pengfei!  For example, in the following
> deadlock trace snippet,
> 
>  INFO: task kworker/u4:0:9 blocked for more than 147 seconds.
>        Not tainted 6.2.0-rc4-kvm+ #1314
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  task:kworker/u4:0    state:D stack:0     pid:9     ppid:2      flags:0x00004000
>  Workqueue: loop4 loop_rootcg_workfn
>  Call Trace:
>   <TASK>
>   __schedule+0x43b/0xd00
>   schedule+0x6a/0xf0
>   io_schedule+0x4a/0x80
>   folio_wait_bit_common+0x1b5/0x4e0
>   ? __pfx_wake_page_function+0x10/0x10
>   __filemap_get_folio+0x73d/0x770
>   shmem_get_folio_gfp+0x1fd/0xc80
>   shmem_write_begin+0x91/0x220
>   generic_perform_write+0x10e/0x2e0
>   __generic_file_write_iter+0x17e/0x290
>   ? generic_write_checks+0x12b/0x1a0
>   generic_file_write_iter+0x97/0x180
>   ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
>   do_iter_readv_writev+0x13c/0x210
>   ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
>   do_iter_write+0xf6/0x330
>   vfs_iter_write+0x46/0x70
>   loop_process_work+0x723/0xfe0
>   loop_rootcg_workfn+0x28/0x40
>   process_one_work+0x3cc/0x8d0
>   worker_thread+0x66/0x630
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0x153/0x190
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x29/0x50
>   </TASK>
> 
>  INFO: task repro:1023 blocked for more than 147 seconds.
>        Not tainted 6.2.0-rc4-kvm+ #1314
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  task:repro           state:D stack:0     pid:1023  ppid:360    flags:0x00004004
>  Call Trace:
>   <TASK>
>   __schedule+0x43b/0xd00
>   schedule+0x6a/0xf0
>   io_schedule+0x4a/0x80
>   folio_wait_bit_common+0x1b5/0x4e0
>   ? compaction_alloc+0x77/0x1150
>   ? __pfx_wake_page_function+0x10/0x10
>   folio_wait_bit+0x30/0x40
>   folio_wait_writeback+0x2e/0x1e0
>   migrate_pages_batch+0x555/0x1ac0
>   ? __pfx_compaction_alloc+0x10/0x10
>   ? __pfx_compaction_free+0x10/0x10
>   ? __this_cpu_preempt_check+0x17/0x20
>   ? lock_is_held_type+0xe6/0x140
>   migrate_pages+0x100e/0x1180
>   ? __pfx_compaction_free+0x10/0x10
>   ? __pfx_compaction_alloc+0x10/0x10
>   compact_zone+0xe10/0x1b50
>   ? lock_is_held_type+0xe6/0x140
>   ? check_preemption_disabled+0x80/0xf0
>   compact_node+0xa3/0x100
>   ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
>   ? _find_first_bit+0x7b/0x90
>   sysctl_compaction_handler+0x5d/0xb0
>   proc_sys_call_handler+0x29d/0x420
>   proc_sys_write+0x2b/0x40
>   vfs_write+0x3a3/0x780
>   ksys_write+0xb7/0x180
>   __x64_sys_write+0x26/0x30
>   do_syscall_64+0x3b/0x90
>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>  RIP: 0033:0x7f3a2471f59d
>  RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
>  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
>  RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
>  R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
>  R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
>   </TASK>
> 
> The page migration task has held the lock of the shmem folio A, and is
> waiting the writeback of the folio B of the file system on the loop
> block device to complete.  While the loop worker task which writes
> back the folio B is waiting to lock the shmem folio A, because the
> folio A backs the folio B in the loop device.  Thus deadlock is
> triggered.
> 
> In general, if we have locked some other folios except the one we are
> migrating, it's not safe to wait synchronously, for example, to wait
> the writeback to complete or wait to lock the buffer head.
> 
> To fix the deadlock, in this patch, we avoid to batch the page
> migration except for MIGRATE_ASYNC mode or the split folios of a THP
> folio.  In MIGRATE_ASYNC mode, synchronous waiting is avoided.  And
> there isn't any dependency relationship among the split folios of a
> THP folio.
> 
> The fix can be improved via converting migration mode from synchronous
> to asynchronous if we have locked some other folios except the one we
> are migrating.  We will do that in the near future.
> 
> Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com/
> Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com/
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Reported-by: Hugh Dickins <hughd@google.com>
> Reported-by: "Xu, Pengfei" <pengfei.xu@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Stefan Roesch <shr@devkernel.io>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Xin Hao <xhao@linux.alibaba.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/migrate.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ef68a1aff35c..bc04c34543f3 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1937,7 +1937,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
>  {
>  	int rc, rc_gather;
> -	int nr_pages;
> +	int nr_pages, batch;
>  	struct folio *folio, *folio2;
>  	LIST_HEAD(folios);
>  	LIST_HEAD(ret_folios);
> @@ -1951,6 +1951,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  				     mode, reason, &stats, &ret_folios);
>  	if (rc_gather < 0)
>  		goto out;
> +
> +	if (mode == MIGRATE_ASYNC)
> +		batch = NR_MAX_BATCHED_MIGRATION;
> +	else
> +		batch = 1;
>  again:
>  	nr_pages = 0;
>  	list_for_each_entry_safe(folio, folio2, from, lru) {
> @@ -1961,11 +1966,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>  		}
>  
>  		nr_pages += folio_nr_pages(folio);
> -		if (nr_pages > NR_MAX_BATCHED_MIGRATION)
> +		if (nr_pages >= batch)
>  			break;
>  	}
> -	if (nr_pages > NR_MAX_BATCHED_MIGRATION)
> -		list_cut_before(&folios, from, &folio->lru);
> +	if (nr_pages >= batch)
> +		list_cut_before(&folios, from, &folio2->lru);
>  	else
>  		list_splice_init(from, &folios);
>  	rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
> -- 
> 2.39.1


  reply	other threads:[~2023-02-21 22:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13 12:34 [PATCH -v5 0/9] migrate_pages(): batch TLB flushing Huang Ying
2023-02-13 12:34 ` [PATCH -v5 1/9] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
2023-02-13 12:34 ` [PATCH -v5 2/9] migrate_pages: separate hugetlb folios migration Huang Ying
2023-02-13 12:34 ` [PATCH -v5 3/9] migrate_pages: restrict number of pages to migrate in batch Huang Ying
2023-02-13 12:34 ` [PATCH -v5 4/9] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
2023-02-13 12:34 ` [PATCH -v5 5/9] migrate_pages: batch _unmap and _move Huang Ying
2023-02-13 12:34 ` [PATCH -v5 6/9] migrate_pages: move migrate_folio_unmap() Huang Ying
2023-02-13 12:34 ` [PATCH -v5 7/9] migrate_pages: share more code between _unmap and _move Huang Ying
2023-02-13 12:34 ` [PATCH -v5 8/9] migrate_pages: batch flushing TLB Huang Ying
2023-02-13 12:34 ` [PATCH -v5 9/9] migrate_pages: move THP/hugetlb migration support check to simplify code Huang Ying
2023-02-17 21:47 ` [PATCH -v5 0/9] migrate_pages(): batch TLB flushing Hugh Dickins
2023-02-20  9:28   ` Huang, Ying
2023-02-21  2:48     ` Hugh Dickins
2023-02-21  3:34       ` Huang, Ying
2023-02-21 22:15         ` Hugh Dickins
2023-02-21  4:30       ` Matthew Wilcox
2023-02-21  4:38         ` Huang, Ying
2023-02-21 14:04       ` Huang, Ying
2023-02-21 22:25         ` Hugh Dickins [this message]
2023-02-22  1:02           ` Huang, Ying
2023-02-27 11:06   ` Jan Kara
2023-02-28  1:13     ` Huang, Ying
2023-02-28  5:59       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20f1628e-96a7-3a5d-fef5-dae31f8eb196@google.com \
    --to=hughd@google.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bharata@amd.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=osalvador@suse.de \
    --cc=pengfei.xu@intel.com \
    --cc=shr@devkernel.io \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    --cc=xhao@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).