linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Waiman Long <longman@redhat.com>, Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v2] hugetlbfs: Disable softIRQ when taking hugetlb_lock
Date: Wed, 11 Dec 2019 14:04:37 -0800	[thread overview]
Message-ID: <4fbc39a9-2c9c-4c2c-2b13-a548afe6083c@oracle.com> (raw)
In-Reply-To: <20191211194615.18502-1-longman@redhat.com>

Cc: Michal

Sorry for the late reply on this effort.

On 12/11/19 11:46 AM, Waiman Long wrote:
> The following lockdep splat was observed when a certain hugetlbfs test
> was run:
> 
> [  612.388273] ================================
> [  612.411273] WARNING: inconsistent lock state
> [  612.432273] 4.18.0-159.el8.x86_64+debug #1 Tainted: G        W --------- -  -
> [  612.469273] --------------------------------
> [  612.489273] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [  612.517273] swapper/30/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [  612.541273] ffffffff9acdc038 (hugetlb_lock){+.?.}, at: free_huge_page+0x36f/0xaa0
> [  612.576273] {SOFTIRQ-ON-W} state was registered at:
> [  612.598273]   lock_acquire+0x14f/0x3b0
> [  612.616273]   _raw_spin_lock+0x30/0x70
> [  612.634273]   __nr_hugepages_store_common+0x11b/0xb30
> [  612.657273]   hugetlb_sysctl_handler_common+0x209/0x2d0
> [  612.681273]   proc_sys_call_handler+0x37f/0x450
> [  612.703273]   vfs_write+0x157/0x460
> [  612.719273]   ksys_write+0xb8/0x170
> [  612.736273]   do_syscall_64+0xa5/0x4d0
> [  612.753273]   entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> [  612.777273] irq event stamp: 691296
> [  612.794273] hardirqs last  enabled at (691296): [<ffffffff99bb034b>] _raw_spin_unlock_irqrestore+0x4b/0x60
> [  612.839273] hardirqs last disabled at (691295): [<ffffffff99bb0ad2>] _raw_spin_lock_irqsave+0x22/0x81
> [  612.882273] softirqs last  enabled at (691284): [<ffffffff97ff0c63>] irq_enter+0xc3/0xe0
> [  612.922273] softirqs last disabled at (691285): [<ffffffff97ff0ebe>] irq_exit+0x23e/0x2b0
> [  612.962273]
> [  612.962273] other info that might help us debug this:
> [  612.993273]  Possible unsafe locking scenario:
> [  612.993273]
> [  613.020273]        CPU0
> [  613.031273]        ----
> [  613.042273]   lock(hugetlb_lock);
> [  613.057273]   <Interrupt>
> [  613.069273]     lock(hugetlb_lock);
> [  613.085273]
> [  613.085273]  *** DEADLOCK ***
>       :
> [  613.245273] Call Trace:
> [  613.256273]  <IRQ>
> [  613.265273]  dump_stack+0x9a/0xf0
> [  613.281273]  mark_lock+0xd0c/0x12f0
> [  613.297273]  ? print_shortest_lock_dependencies+0x80/0x80
> [  613.322273]  ? sched_clock_cpu+0x18/0x1e0
> [  613.341273]  __lock_acquire+0x146b/0x48c0
> [  613.360273]  ? trace_hardirqs_on+0x10/0x10
> [  613.379273]  ? trace_hardirqs_on_caller+0x27b/0x580
> [  613.401273]  lock_acquire+0x14f/0x3b0
> [  613.419273]  ? free_huge_page+0x36f/0xaa0
> [  613.440273]  _raw_spin_lock+0x30/0x70
> [  613.458273]  ? free_huge_page+0x36f/0xaa0
> [  613.477273]  free_huge_page+0x36f/0xaa0
> [  613.495273]  bio_check_pages_dirty+0x2fc/0x5c0
> [  613.516273]  clone_endio+0x17f/0x670 [dm_mod]
> [  613.536273]  ? disable_discard+0x90/0x90 [dm_mod]
> [  613.558273]  ? bio_endio+0x4ba/0x930
> [  613.575273]  ? blk_account_io_completion+0x400/0x530
> [  613.598273]  blk_update_request+0x276/0xe50
> [  613.617273]  scsi_end_request+0x7b/0x6a0
> [  613.636273]  ? lock_downgrade+0x6f0/0x6f0
> [  613.654273]  scsi_io_completion+0x1c6/0x1570
> [  613.674273]  ? sd_completed_bytes+0x3a0/0x3a0 [sd_mod]
> [  613.698273]  ? scsi_mq_requeue_cmd+0xc0/0xc0
> [  613.718273]  blk_done_softirq+0x22e/0x350
> [  613.737273]  ? blk_softirq_cpu_dead+0x230/0x230
> [  613.758273]  __do_softirq+0x23d/0xad8
> [  613.776273]  irq_exit+0x23e/0x2b0
> [  613.792273]  do_IRQ+0x11a/0x200
> [  613.806273]  common_interrupt+0xf/0xf
> [  613.823273]  </IRQ>

This is interesting.  I'm trying to wrap my head around how we ended up
with a BIO pointing to a hugetlbfs page.  My 'guess' is that user space
code passed an address to some system call or driver.  And, that system
call or driver set up the IO.  For the purpose of addressing this issue,
it does not matter.  I am just a little confused/curious.

> Since hugetlb_lock can be taken from both process and softIRQ contexts,
> we need to protect the lock from nested locking by disabling softIRQ
> using spin_lock_bh() before taking it.
> 
> Currently, only free_huge_page() is known to be called from softIRQ
> context.

We discussed this exact same issue more than a year ago.  See,
https://lkml.org/lkml/2018/9/5/398

At that time, the only 'known' caller of put_page for a hugetlbfs page from
softirq context was in powerpc specific code.  IIRC, Aneesh addressed the
issue last year by modifying the powerpc specific code.  The more general
issue in the hugetlbfs code was never addressed. :(

As part of the discussion in the previous e-mail thread, the issue of
whether we should address put_page for hugetlbfs pages for only softirq
or extend to hardirq context was discussed.  The conclusion (or at least
suggestion from Andrew and Michal) was that we should modify code to allow
for calls from hardirq context.  The reasoning IIRC, was that put_page of
other pages was allowed from hardirq context, so hugetlbfs pages should be
no different.

Matthew, do you think that reasoning from last year is still valid?  Should
we be targeting soft or hard irq calls?

One other thing.  free_huge_page may also take a subpool specific lock via
spin_lock().  See hugepage_subpool_put_pages.  This would also need to take
irq context into account.
-- 
Mike Kravetz

  reply	other threads:[~2019-12-11 22:05 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 19:46 [PATCH v2] hugetlbfs: Disable softIRQ when taking hugetlb_lock Waiman Long
2019-12-11 22:04 ` Mike Kravetz [this message]
2019-12-11 22:19   ` Waiman Long
2019-12-12  1:11     ` Mike Kravetz
2019-12-12  6:06       ` Davidlohr Bueso
2019-12-12  6:30         ` Davidlohr Bueso
2019-12-12 19:04           ` [PATCH v2] mm/hugetlb: defer free_huge_page() to a workqueue Davidlohr Bueso
2019-12-12 19:22             ` Mike Kravetz
2019-12-12 19:36               ` Davidlohr Bueso
2019-12-12 20:52               ` Waiman Long
2019-12-12 21:04                 ` Waiman Long
2019-12-16 13:26                 ` Michal Hocko
2019-12-16 15:38                   ` Waiman Long
2019-12-16 18:44                     ` Waiman Long
2019-12-17  9:00                       ` Michal Hocko
2019-12-17 18:05                         ` Mike Kravetz
2019-12-18 12:18                           ` hugetlbfs testing coverage (was: Re: [PATCH v2] mm/hugetlb: defer free_huge_page() to a workqueue) Michal Hocko
2019-12-12 21:01             ` [PATCH v2] mm/hugetlb: defer free_huge_page() to a workqueue Waiman Long
2019-12-16 13:37             ` Michal Hocko
2019-12-16 16:17               ` Waiman Long
2019-12-16 16:17               ` Davidlohr Bueso
2019-12-16 17:18                 ` Matthew Wilcox
2019-12-16 19:08                 ` Mike Kravetz
2019-12-16 19:13                   ` Waiman Long
2019-12-12 21:32         ` [PATCH v2] hugetlbfs: Disable softIRQ when taking hugetlb_lock Andi Kleen
2019-12-12 22:42           ` Davidlohr Bueso
2019-12-11 23:05 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fbc39a9-2c9c-4c2c-2b13-a548afe6083c@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).