All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>, Andi Kleen <ak@linux.intel.com>,
	Wu Fengguang <fengguang.wu@intel.com>, "hch@lst.de" <hch@lst.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>
Subject: Re: [PATCH 07/11] mm, madvise_inject_error: fix page count leak
Date: Thu, 24 May 2018 13:55:04 -0700	[thread overview]
Message-ID: <CAPcyv4hL5+ZfHjnYYtoioB5AK5Ukpg99d6eYWTKSeJc6uHxkyg@mail.gmail.com> (raw)
In-Reply-To: <20180523041954.GA16285@hori1.linux.bs1.fc.nec.co.jp>

On Tue, May 22, 2018 at 9:19 PM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> On Tue, May 22, 2018 at 07:40:09AM -0700, Dan Williams wrote:
>> The madvise_inject_error() routine uses get_user_pages() to lookup the
>> pfn and other information for injected error, but it fails to release
>> that pin.
>>
>> The dax-dma-vs-truncate warning catches this failure with the following
>> signature:
>>
>>  Injecting memory failure for pfn 0x208900 at process virtual address 0x7f3908d00000
>>  Memory failure: 0x208900: reserved kernel page still referenced by 1 users
>>  Memory failure: 0x208900: recovery action for reserved kernel page: Failed
>>  WARNING: CPU: 37 PID: 9566 at fs/dax.c:348 dax_disassociate_entry+0x4e/0x90
>>  CPU: 37 PID: 9566 Comm: umount Tainted: G        W  OE     4.17.0-rc6+ #1900
>>  [..]
>>  RIP: 0010:dax_disassociate_entry+0x4e/0x90
>>  RSP: 0018:ffffc9000a9b3b30 EFLAGS: 00010002
>>  RAX: ffffea0008224000 RBX: 0000000000208a00 RCX: 0000000000208900
>>  RDX: 0000000000000001 RSI: ffff8804058c6160 RDI: 0000000000000008
>>  RBP: 000000000822000a R08: 0000000000000002 R09: 0000000000208800
>>  R10: 0000000000000000 R11: 0000000000208801 R12: ffff8804058c6168
>>  R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000001
>>  FS:  00007f4548027fc0(0000) GS:ffff880431d40000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 000056316d5f8988 CR3: 00000004298cc000 CR4: 00000000000406e0
>>  Call Trace:
>>   __dax_invalidate_mapping_entry+0xab/0xe0
>>   dax_delete_mapping_entry+0xf/0x20
>>   truncate_exceptional_pvec_entries.part.14+0x1d4/0x210
>>   truncate_inode_pages_range+0x291/0x920
>>   ? kmem_cache_free+0x1f8/0x300
>>   ? lock_acquire+0x9f/0x200
>>   ? truncate_inode_pages_final+0x31/0x50
>>   ext4_evict_inode+0x69/0x740
>>
>> Cc: <stable@vger.kernel.org>
>> Fixes: bd1ce5f91f54 ("HWPOISON: avoid grabbing the page count...")
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/madvise.c |   11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 4d3c922ea1a1..246fa4d4eee2 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -631,11 +631,13 @@ static int madvise_inject_error(int behavior,
>>
>>
>>       for (; start < end; start += PAGE_SIZE << order) {
>> +             unsigned long pfn;
>>               int ret;
>>
>>               ret = get_user_pages_fast(start, 1, 0, &page);
>>               if (ret != 1)
>>                       return ret;
>> +             pfn = page_to_pfn(page);
>>
>>               /*
>>                * When soft offlining hugepages, after migrating the page
>> @@ -651,17 +653,20 @@ static int madvise_inject_error(int behavior,
>>
>>               if (behavior == MADV_SOFT_OFFLINE) {
>>                       pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
>> -                                             page_to_pfn(page), start);
>> +                                     pfn, start);
>>
>>                       ret = soft_offline_page(page, MF_COUNT_INCREASED);
>> +                     put_page(page);
>>                       if (ret)
>>                               return ret;
>>                       continue;
>>               }
>> +             put_page(page);
>
> We keep the page count pinned after the isolation of the error page
> in order to make sure that the error page is disabled and never reused.
> This seems not explicit enough, so some comment should be helpful.

As far as I can see this extra reference count to keep the page from
being should be taken internal to memory_failure(), not assumed from
the inject error path. I might be overlooking something, but I do not
see who is responsible for taking this extra reference in the case
where memory_failure() is called by the machine check code rather than
madvise_inject_error()?

>
> BTW, looking at the kernel message like "Memory failure: 0x208900:
> reserved kernel page still referenced by 1 users", memory_failure()
> considers dav_pagemap pages as "reserved kernel pages" (MF_MSG_KERNEL).
> If memory error handler recovers a dav_pagemap page in its special way,
> we can define a new action_page_types entry like MF_MSG_DAX.
> Reporting like "Memory failure: 0xXXXXX: recovery action for dax page:
> Failed" might be helpful for end user's perspective.

Sounds good, I'll take a look at this.

  reply	other threads:[~2018-05-24 20:55 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 14:39 [PATCH 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-05-22 14:39 ` Dan Williams
2018-05-22 14:39 ` Dan Williams
2018-05-22 14:39 ` [PATCH 01/11] device-dax: convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-05-22 14:39   ` Dan Williams
2018-05-22 14:39 ` [PATCH 02/11] device-dax: cleanup vm_fault de-reference chains Dan Williams
2018-05-22 14:39   ` Dan Williams
2018-05-22 14:39 ` [PATCH 03/11] device-dax: enable page_mapping() Dan Williams
2018-05-22 14:39   ` Dan Williams
2018-05-23  9:03   ` Jan Kara
2018-05-23  9:03     ` Jan Kara
2018-05-30 19:54   ` kbuild test robot
2018-05-30 19:54     ` kbuild test robot
2018-05-22 14:39 ` [PATCH 04/11] device-dax: set page->index Dan Williams
2018-05-22 14:39   ` Dan Williams
2018-05-22 14:39 ` [PATCH 05/11] filesystem-dax: " Dan Williams
2018-05-22 14:39   ` Dan Williams
2018-05-23  8:40   ` Jan Kara
2018-05-23  8:40     ` Jan Kara
2018-05-30  1:38     ` Dan Williams
2018-05-30  1:38       ` Dan Williams
2018-05-30  8:13       ` Jan Kara
2018-05-30  8:13         ` Jan Kara
2018-05-30 23:21         ` Dan Williams
2018-05-30 23:21           ` Dan Williams
2018-05-31 10:08           ` Jan Kara
2018-05-31 10:08             ` Jan Kara
2018-05-31 21:49             ` Dan Williams
2018-05-31 21:49               ` Dan Williams
2018-05-22 14:40 ` [PATCH 06/11] filesystem-dax: perform __dax_invalidate_mapping_entry() under the page lock Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-23  9:35   ` Jan Kara
2018-05-23 13:50     ` Dan Williams
2018-05-23 13:50       ` Dan Williams
2018-05-22 14:40 ` [PATCH 07/11] mm, madvise_inject_error: fix page count leak Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-23  4:19   ` Naoya Horiguchi
2018-05-24 20:55     ` Dan Williams [this message]
2018-05-22 14:40 ` [PATCH 08/11] x86, memory_failure: introduce {set, clear}_mce_nospec() Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-22 14:40   ` [08/11] " Dan Williams
2018-05-22 14:40 ` [PATCH 09/11] mm, memory_failure: pass page size to kill_proc() Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-23  6:41   ` Naoya Horiguchi
2018-05-23  6:41     ` Naoya Horiguchi
2018-05-22 14:40 ` [PATCH 10/11] mm, memory_failure: teach memory_failure() about dev_pagemap pages Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-22 14:40   ` Dan Williams
2018-05-23  6:48   ` Naoya Horiguchi
2018-05-23  6:48     ` Naoya Horiguchi
2018-05-22 14:40 ` [PATCH 11/11] libnvdimm, pmem: restore page attributes when clearing errors Dan Williams
2018-05-22 14:40   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4hL5+ZfHjnYYtoioB5AK5Ukpg99d6eYWTKSeJc6uHxkyg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=ak@linux.intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.