All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Zhaoyang Huang <huangzhaoyang@gmail.com>,
	"zhaoyang.huang" <zhaoyang.huang@unisoc.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	ke.wang@unisoc.com, steve.kang@unisoc.com,
	baocong.liu@unisoc.com, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH] mm: move xa forward when run across zombie page
Date: Thu, 20 Oct 2022 13:04:51 +1100	[thread overview]
Message-ID: <20221020020451.GS2703033@dread.disaster.area> (raw)
In-Reply-To: <Y0/kZbIvMgkNhWpM@bfoster>

On Wed, Oct 19, 2022 at 07:49:57AM -0400, Brian Foster wrote:
> On Wed, Oct 19, 2022 at 09:30:42AM +1100, Dave Chinner wrote:
> > On Tue, Oct 18, 2022 at 04:09:17AM +0100, Matthew Wilcox wrote:
> > > On Tue, Oct 18, 2022 at 10:52:19AM +0800, Zhaoyang Huang wrote:
> > > > On Mon, Oct 17, 2022 at 11:55 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Oct 17, 2022 at 01:34:13PM +0800, Zhaoyang Huang wrote:
> > > > > > On Fri, Oct 14, 2022 at 8:12 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 14, 2022 at 01:30:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > >
> > > > > > > > Bellowing RCU stall is reported where kswapd traps in a live lock when shrink
> > > > > > > > superblock's inode list. The direct reason is zombie page keeps staying on the
> > > > > > > > xarray's slot and make the check and retry loop permanently. The root cause is unknown yet
> > > > > > > > and supposed could be an xa update without synchronize_rcu etc. I would like to
> > > > > > > > suggest skip this page to break the live lock as a workaround.
> > > > > > >
> > > > > > > No, the underlying bug should be fixed.
> > > > >
> > > > >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > Understand. IMHO, find_get_entry actruely works as an open API dealing
> > > > with different kinds of address_spaces page cache, which requires high
> > > > robustness to deal with any corner cases. Take the current problem as
> > > > example, the inode with fault page(refcount=0) could remain on the
> > > > sb's list without live lock problem.
> > > 
> > > But it's a corner case that shouldn't happen!  What else is going on
> > > at the time?  Can you reproduce this problem easily?  If so, how?
> > 
> > I've been seeing this livelock, too. The reproducer is,
> > unfortunately, something I can't share - it's a massive program that
> > triggers a data corruption I'm working on solving.
> > 
> > Now that I've
> > mostly fixed the data corruption, long duration test runs end up
> > livelocking in page cache lookup after several hours.
> > 
> > The test is effectively writing a 100MB file with multiple threads
> > doing reverse adjacent racing 1MB unaligned writes. Once the file is
> > written, it is then mmap()d and read back from the filesystem for
> > verification.
> > 
> > THis is then run with tens of processes concurrently, and then under
> > a massively confined memcg (e.g. 32 processes/files are run in a
> > memcg with only 200MB of memory allowed). This causes writeback,
> > readahead and memory reclaim to race with incoming mmap read faults
> > and writes.  The livelock occurs on file verification and it appears
> > to be an interaction with readahead thrashing.
> > 
> > On my test rig, the physical read to write ratio is at least 20:1 -
> > with 32 processes running, the 5s IO rates are:
> > 
> > Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s    MB_read    MB_wrtn    MB_dscd
> > dm-0          52187.20      3677.42      1345.92         0.00      18387       6729          0
> > dm-0          62865.60      5947.29         0.08         0.00      29736          0          0
> > dm-0          62972.80      5911.20         0.00         0.00      29556          0          0
> > dm-0          59803.00      5516.72       133.47         0.00      27583        667          0
> > dm-0          63068.20      5292.34       511.52         0.00      26461       2557          0
> > dm-0          56775.60      4184.52      1248.38         0.00      20922       6241          0
> > dm-0          63087.40      5901.26        43.77         0.00      29506        218          0
> > dm-0          62769.00      5833.97        60.54         0.00      29169        302          0
> > dm-0          64810.20      5636.13       305.63         0.00      28180       1528          0
> > dm-0          65222.60      5598.99       349.48         0.00      27994       1747          0
> > dm-0          62444.00      4887.05       926.67         0.00      24435       4633          0
> > dm-0          63812.00      5622.68       294.66         0.00      28113       1473          0
> > dm-0          63482.00      5728.43       195.74         0.00      28642        978          0
> > 
> > This is reading and writing the same amount of file data at the
> > application level, but once the data has been written and kicked out
> > of the page cache it seems to require an awful lot more read IO to
> > get it back to the application. i.e. this looks like mmap() is
> > readahead thrashing severely, and eventually it livelocks with this
> > sort of report:
> > 
> > [175901.982484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [175901.985095] rcu:    Tasks blocked on level-1 rcu_node (CPUs 0-15): P25728
> > [175901.987996]         (detected by 0, t=97399871 jiffies, g=15891025, q=1972622 ncpus=32)
> > [175901.991698] task:test_write      state:R  running task     stack:12784 pid:25728 ppid: 25696 flags:0x00004002
> > [175901.995614] Call Trace:
> > [175901.996090]  <TASK>
> > [175901.996594]  ? __schedule+0x301/0xa30
> > [175901.997411]  ? sysvec_apic_timer_interrupt+0xb/0x90
> > [175901.998513]  ? sysvec_apic_timer_interrupt+0xb/0x90
> > [175901.999578]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > [175902.000714]  ? xas_start+0x53/0xc0
> > [175902.001484]  ? xas_load+0x24/0xa0
> > [175902.002208]  ? xas_load+0x5/0xa0
> > [175902.002878]  ? __filemap_get_folio+0x87/0x340
> > [175902.003823]  ? filemap_fault+0x139/0x8d0
> > [175902.004693]  ? __do_fault+0x31/0x1d0
> > [175902.005372]  ? __handle_mm_fault+0xda9/0x17d0
> > [175902.006213]  ? handle_mm_fault+0xd0/0x2a0
> > [175902.006998]  ? exc_page_fault+0x1d9/0x810
> > [175902.007789]  ? asm_exc_page_fault+0x22/0x30
> > [175902.008613]  </TASK>
> > 
> > Given that filemap_fault on XFS is probably trying to map large
> > folios, I do wonder if this is a result of some kind of race with
> > teardown of a large folio...
> > 
> 
> I somewhat recently tracked down a hugepage/swap problem that could
> manifest as a softlockup in the folio lookup path (due to indefinite
> folio_try_get_rcu() failure):
> 
> https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/
> 
> It could easily be something different leading to the same side effect,
> particularly since I believe the issue I saw was introduced in v5.19,
> but might be worth a test if you have a reliable reproducer.

Tests run and, unfortunately, that patch doesn't prevent/fix the
problem either.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-10-20  2:05 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14  5:30 [RFC PATCH] mm: move xa forward when run across zombie page zhaoyang.huang
2022-10-14 12:11 ` Matthew Wilcox
2022-10-17  5:34   ` Zhaoyang Huang
2022-10-17  6:58     ` Zhaoyang Huang
2022-10-17 15:55     ` Matthew Wilcox
2022-10-18  2:52       ` Zhaoyang Huang
2022-10-18  3:09         ` Matthew Wilcox
2022-10-18 22:30           ` Dave Chinner
2022-10-19  1:16             ` Dave Chinner
2022-10-19  4:47               ` Dave Chinner
2022-10-19  5:48                 ` Zhaoyang Huang
2022-10-19 13:06                   ` Matthew Wilcox
2022-10-20  1:27                     ` Zhaoyang Huang
2022-10-26 19:49                   ` Matthew Wilcox
2022-10-27  1:57                     ` Zhaoyang Huang
2022-10-19 11:49             ` Brian Foster
2022-10-20  2:04               ` Dave Chinner [this message]
2022-10-20  3:12                 ` Zhaoyang Huang
2022-10-19 15:23             ` Matthew Wilcox
2022-10-19 22:04               ` Dave Chinner
2022-10-19 22:46                 ` Dave Chinner
2022-10-19 23:42                   ` Dave Chinner
2022-10-20 21:52                 ` Matthew Wilcox
2022-10-26  8:38                   ` Zhaoyang Huang
2022-10-26 14:38                     ` Matthew Wilcox
2022-10-26 16:01                   ` Matthew Wilcox
2022-10-28  4:05                     ` Dave Chinner
2022-11-01  7:17                   ` Dave Chinner
2024-04-11  7:04                     ` Zhaoyang Huang
2022-10-21 21:37 Pulavarty, Badari
2022-10-21 22:31 ` Matthew Wilcox
2022-10-21 22:40   ` Pulavarty, Badari
2022-10-31 19:25   ` Pulavarty, Badari
2022-10-31 19:39     ` Hugh Dickins
2022-10-31 21:33       ` Pulavarty, Badari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221020020451.GS2703033@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=baocong.liu@unisoc.com \
    --cc=bfoster@redhat.com \
    --cc=huangzhaoyang@gmail.com \
    --cc=ke.wang@unisoc.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=steve.kang@unisoc.com \
    --cc=willy@infradead.org \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.