From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752209AbeDPWPK (ORCPT ); Mon, 16 Apr 2018 18:15:10 -0400 Received: from mail-yb0-f196.google.com ([209.85.213.196]:46066 "EHLO mail-yb0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750917AbeDPWPI (ORCPT ); Mon, 16 Apr 2018 18:15:08 -0400 X-Google-Smtp-Source: AIpwx486tx5rdAPAexdYAAn1ALKfekHLU11EDz2NRT4qPe4tMdAcs9ywmUiVyalPvHsXyraVEUt5pA== Subject: Re: Crashes/hung tasks with z3pool under memory pressure To: Guenter Roeck Cc: LKML , Andrew Morton , mawilcox@microsoft.com, asavery@chromium.org, gwendal@chromium.org References: <20180412215501.GA16406@roeck-us.net> <20180413173555.GA30587@roeck-us.net> <20180413175615.GA30242@roeck-us.net> <20180416155832.GB12015@roeck-us.net> From: Vitaly Wool Message-ID: Date: Tue, 17 Apr 2018 00:14:37 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180416155832.GB12015@roeck-us.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/16/18 5:58 PM, Guenter Roeck wrote: > On Mon, Apr 16, 2018 at 02:43:01PM +0200, Vitaly Wool wrote: >> Hey Guenter, >> >> On 04/13/2018 07:56 PM, Guenter Roeck wrote: >> >>> On Fri, Apr 13, 2018 at 05:40:18PM +0000, Vitaly Wool wrote: >>>> On Fri, Apr 13, 2018, 7:35 PM Guenter Roeck wrote: >>>> >>>>> On Fri, Apr 13, 2018 at 05:21:02AM +0000, Vitaly Wool wrote: >>>>>> Hi Guenter, >>>>>> >>>>>> >>>>>> Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck : >>>>>> >>>>>>> Hi all, >>>>>>> we are observing crashes with z3pool under memory pressure. The kernel >>>>>> version >>>>>>> used to reproduce the problem is v4.16-11827-g5d1365940a68, but the >>>>>> problem was >>>>>>> also seen with v4.14 based kernels. >>>>>> just before I dig into this, could you please try reproducing the errors >>>>>> you see with https://patchwork.kernel.org/patch/10210459/ applied? >>>>>> >>>>> As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already >>>>> includes this patch. >>>>> >>>> Bah. Sorry. Expect an update after the weekend. >>>> >>> NP; easy to miss. Thanks a lot for looking into it. >>> >> I wonder if the following patch would make a difference: >> >> diff --git a/mm/z3fold.c b/mm/z3fold.c >> index c0bca6153b95..5e547c2d5832 100644 >> --- a/mm/z3fold.c >> +++ b/mm/z3fold.c >> @@ -887,19 +887,21 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) >> goto next; >> } >> next: >> - spin_lock(&pool->lock); >> if (test_bit(PAGE_HEADLESS, &page->private)) { >> if (ret == 0) { >> - spin_unlock(&pool->lock); >> free_z3fold_page(page); >> return 0; >> } >> - } else if (kref_put(&zhdr->refcount, release_z3fold_page)) { >> - atomic64_dec(&pool->pages_nr); >> - spin_unlock(&pool->lock); >> - return 0; >> + } else { >> + spin_lock(&zhdr->page_lock); >> + if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) { >> + atomic64_dec(&pool->pages_nr); >> + return 0; >> + } >> + spin_unlock(&zhdr->page_lock); >> } >> + spin_lock(&pool->lock); >> /* >> * Add to the beginning of LRU. >> * Pool lock has to be kept here to ensure the page has >> > No, it doesn't. Same crash. > > BUG: MAX_LOCK_DEPTH too low! > turning off the locking correctness validator. > depth: 48 max: 48! > 48 locks held by kswapd0/51: > #0: 000000004d7a35a9 (&(&pool->lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0 > #1: 000000007739f49e (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > #2: 00000000ff6cd4c8 (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > #3: 000000004cffc6cb (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0 > ... > PU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 > Call Trace: > dump_stack+0x67/0x9b > __lock_acquire+0x429/0x18f0 > ? __lock_acquire+0x2af/0x18f0 > ? __lock_acquire+0x2af/0x18f0 > ? lock_acquire+0x93/0x230 > lock_acquire+0x93/0x230 > ? z3fold_zpool_shrink+0xb7/0x3e0 > _raw_spin_trylock+0x65/0x80 > ? z3fold_zpool_shrink+0xb7/0x3e0 > ? z3fold_zpool_shrink+0x47/0x3e0 > z3fold_zpool_shrink+0xb7/0x3e0 > zswap_frontswap_store+0x180/0x7c0 > ... > BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 > in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0 > INFO: lockdep is turned off. > Preemption disabled at: > [<0000000000000000>] (null) > CPU: 0 PID: 51 Comm: kswapd0 Not tainted 4.17.0-rc1-yocto-standard+ #11 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 > Call Trace: > dump_stack+0x67/0x9b > ___might_sleep+0x16c/0x250 > __alloc_pages_nodemask+0x1e7/0x1490 > ? lock_acquire+0x93/0x230 > ? lock_acquire+0x93/0x230 > __read_swap_cache_async+0x14d/0x260 > zswap_writeback_entry+0xdb/0x340 > z3fold_zpool_shrink+0x2b1/0x3e0 > zswap_frontswap_store+0x180/0x7c0 > ? page_vma_mapped_walk+0x22/0x230 > __frontswap_store+0x6e/0xf0 > swap_writepage+0x49/0x70 > ... > > This is with your patch applied on top of v4.17-rc1. > > Guenter > Ugh. Could you please keep that patch and apply this on top: diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..e8a80d044d9e 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(&zhdr->refcount); list_del_init(&zhdr->buddy); zhdr->cpu = -1; + break; } list_del_init(&page->lru); Thanks, Vitaly