All of lore.kernel.org
 help / color / mirror / Atom feed
From: Larry Woodman <lwoodman@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	"Xen-devel@lists.xensource.com" <Xen-devel@lists.xensource.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Jan Beulich <JBeulich@novell.com>, Andi Kleen <ak@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <jweiner@redhat.com>
Subject: Re: [PATCH] fix pgd_lock deadlock
Date: Tue, 15 Feb 2011 14:31:30 -0500	[thread overview]
Message-ID: <4D5AD492.5030500@redhat.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1102152020590.26192@localhost6.localdomain6>


[-- Attachment #1.1: Type: text/plain, Size: 2840 bytes --]

On 02/15/2011 02:26 PM, Thomas Gleixner wrote:
> On Tue, 15 Feb 2011, Andrea Arcangeli wrote:
>
>> Hello,
>>
>> Without this patch we can deadlock in the page_table_lock with NR_CPUS
>> <  4 or THP on, with this patch we hopefully won't deadlock in the
>> pgd_lock (if taken from irq). I can't see anything taking it from irq
>> (maybe aio? to check I also tried the libaio testuite with no apparent
>> VM_BUG_ON triggering), so unless somebody sees it, I think we should
>> apply it. I've been running for a while with this patch applied
>> without apparent problems. Other archs may follow suit if it's proven
>> that there's nothing taking the pgd_lock from irq.
>>
>> ===
>> Subject: fix pgd_lock deadlock
>>
>> From: Andrea Arcangeli<aarcange@redhat.com>
>>
>> It's forbidden to take the page_table_lock with the irq disabled or if there's
>> contention the IPIs (for tlb flushes) sent with the page_table_lock held will
>> never run leading to a deadlock.
> I really read this thing 5 times and still cannot make any sense of it.
>
> You talk about page_table_lock and then fiddle with pgd_lock.
>
> -ENOSENSE
>
> 	tglx
>
>
I put this expanation in the redhat BZ, says it all:


Larry Woodman <mailto:lwoodman@redhat.com> 2011-01-21 15:54:58 EST

The problem is with THP.  The page reclaim code calls page_referenced_one()
which takes the mm->page_table_lock on one CPU before sending an IPI to other
CPU(s):

On CPU1 we take the mm->page_table_lock, send IPIs and wait for a response:
page_referenced_one(...)
         if (unlikely(PageTransHuge(page))) {
                 pmd_t *pmd;

                 spin_lock(&mm->page_table_lock);
                 pmd = page_check_address_pmd(page, mm, address,
                                              PAGE_CHECK_ADDRESS_PMD_FLAG);
                 if (pmd&&  !pmd_trans_splitting(*pmd)&&
                     pmdp_clear_flush_young_notify(vma, address, pmd))
                         referenced++;
                 spin_unlock(&mm->page_table_lock);
         } else {


CPU2 can race in vmalloc_sync_all() because it disables interrupt(preventing a
response to the IPI from CPU1) and takes the pgd_lock then spins in the
mm->page_table_lock which is already held on CPU1.

                 spin_lock_irqsave(&pgd_lock, flags);
                 list_for_each_entry(page,&pgd_list, lru) {
                         pgd_t *pgd;
                         spinlock_t *pgt_lock;

                         pgd = (pgd_t *)page_address(page) + pgd_index(address);

                         pgt_lock =&pgd_page_get_mm(page)->page_table_lock;
                         spin_lock(pgt_lock);


At this point the system is deadlocked.  The pmdp_clear_flush_young_notify
needs to do its PDG business with the page_table_lock held then release that
lock before sending the IPIs to the other CPUs.


Larry


[-- Attachment #1.2: Type: text/html, Size: 3838 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-02-15 19:31 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-14 20:56 [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-14 20:56 ` Jeremy Fitzhardinge
2010-10-15 17:07 ` [Xen-devel] " Jeremy Fitzhardinge
2010-10-15 17:07   ` Jeremy Fitzhardinge
2010-10-19 22:17   ` [tip:x86/mm] x86, mm: Hold " tip-bot for Jeremy Fitzhardinge
2010-10-20 10:36     ` Borislav Petkov
2010-10-20 19:31       ` [tip:x86/mm] x86, mm: Fix incorrect data type in vmalloc_sync_all() tip-bot for tip-bot for Jeremy Fitzhardinge
2010-10-20 19:50         ` Borislav Petkov
2010-10-20 19:53           ` H. Peter Anvin
2010-10-20 20:10             ` Borislav Petkov
2010-10-20 20:13               ` H. Peter Anvin
2010-10-20 22:11                 ` Borislav Petkov
2010-10-20 21:26             ` Ben Pfaff
2010-10-20 19:58       ` tip-bot for Borislav Petkov
2010-10-21 21:06 ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Jeremy Fitzhardinge
2010-10-21 21:06   ` Jeremy Fitzhardinge
2010-10-21 21:26   ` H. Peter Anvin
2010-10-21 21:26     ` H. Peter Anvin
2010-10-21 21:34     ` Jeremy Fitzhardinge
2011-02-03  2:48   ` Andrea Arcangeli
2011-02-03 20:44     ` Jeremy Fitzhardinge
2011-02-03 20:44       ` Jeremy Fitzhardinge
2011-02-04  1:21       ` Andrea Arcangeli
2011-02-04 21:27         ` Jeremy Fitzhardinge
2011-02-07 23:20           ` Andrea Arcangeli
2011-02-15 19:07             ` [PATCH] fix pgd_lock deadlock Andrea Arcangeli
2011-02-15 19:26               ` Thomas Gleixner
2011-02-15 19:31                 ` Larry Woodman [this message]
2011-02-15 19:54                 ` Andrea Arcangeli
2011-02-15 19:54                   ` Andrea Arcangeli
2011-02-15 20:05                   ` Thomas Gleixner
2011-02-15 20:05                     ` Thomas Gleixner
2011-02-15 20:26                     ` Thomas Gleixner
2011-02-15 20:26                       ` Thomas Gleixner
2011-02-15 22:52                       ` Andrea Arcangeli
2011-02-15 23:03                         ` Thomas Gleixner
2011-02-15 23:03                           ` Thomas Gleixner
2011-02-15 23:17                           ` Andrea Arcangeli
2011-02-16  9:58                             ` Peter Zijlstra
2011-02-16 10:15                               ` Andrea Arcangeli
2011-02-16 10:15                                 ` Andrea Arcangeli
2011-02-16 10:28                                 ` Ingo Molnar
2011-02-16 10:28                                   ` Ingo Molnar
2011-02-16 14:49                                   ` Andrea Arcangeli
2011-02-16 16:26                                     ` Rik van Riel
2011-02-16 16:26                                       ` Rik van Riel
2011-02-16 20:15                                     ` Ingo Molnar
2012-04-23  9:07                                     ` [2.6.32.y][PATCH] " Philipp Hahn
2012-04-23 19:09                                       ` Willy Tarreau
2011-02-16 18:33                     ` [PATCH] " Andrea Arcangeli
2011-02-16 21:34                       ` Konrad Rzeszutek Wilk
2011-02-17 10:19                       ` Johannes Weiner
2011-02-21 14:30                         ` Andrea Arcangeli
2011-02-21 14:53                           ` Johannes Weiner
2011-02-22  7:48                             ` Jan Beulich
2011-02-22 13:49                               ` Andrea Arcangeli
2011-02-22 14:22                                 ` Jan Beulich
2011-02-22 14:22                                   ` Jan Beulich
2011-02-22 14:34                                   ` Andrea Arcangeli
2011-02-22 17:08                                     ` Jeremy Fitzhardinge
2011-02-22 17:08                                       ` Jeremy Fitzhardinge
2011-02-22 17:13                                       ` Andrea Arcangeli
2011-02-22 17:13                                         ` Andrea Arcangeli
2011-02-24  4:22                                   ` Andrea Arcangeli
2011-02-24  8:23                                     ` Jan Beulich
2011-02-24 14:11                                       ` Andrea Arcangeli
2011-02-21 17:40                         ` Jeremy Fitzhardinge
2011-02-03 20:59     ` [PATCH] x86: hold mm->page_table_lock while doing vmalloc_sync Larry Woodman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D5AD492.5030500@redhat.com \
    --to=lwoodman@redhat.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@novell.com \
    --cc=Xen-devel@lists.xensource.com \
    --cc=aarcange@redhat.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jeremy@goop.org \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.