All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: aneesh.kumar@linux.vnet.ibm.com, Linux-MM <linux-mm@kvack.org>,
	Michal Hocko <mhocko@kernel.org>,
	mm-commits@vger.kernel.org, naoya.horiguchi@nec.com,
	Oscar Salvador <osalvador@suse.de>,
	stable <stable@vger.kernel.org>, Tony Luck <tony.luck@intel.com>
Subject: Re: [patch 23/29] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers
Date: Sat, 13 Mar 2021 11:23:40 -0800	[thread overview]
Message-ID: <CAHk-=wht_gk_d9k+NZs7eJvBeLOQT4xGcykgaCRHuiQ+LbReRw@mail.gmail.com> (raw)
In-Reply-To: <20210313050820.EoPGLS3gj%akpm@linux-foundation.org>

On Fri, Mar 12, 2021 at 9:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Subject: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

This patch can not possibly be correct, and is dangerous and very very wrong.

>  out:
> -       unlock_page(head);
> +       if (PageLocked(head))
> +               unlock_page(head);

You absolutely CANNOT do things like this. It is fundamentally insane.

You have two situations:

 (a) you know you locked the page.

     In this case an unconditional "unlock_page()" is the only correct
thing to do.

 (b) you don't know whether you maybe unlocked it again.

     In this case SOMEBODY ELSE might have locked the page, and you
doing "if it's locked, then unlock it" is pure and utter drivel, and
fundamentally and seriously wrong. You're unlocking SOMEBODY ELSES
lock, after having already unlocked your own once.

Now, it is possible that this kind of pattern could be ok if you
absolutely 100% know - and it is obvious from the code - that you are
the only thread that can possibly access the page. But if that is the
case, then the page should never have been locked in the first place,
because that would be an insane and pointless operation, since the
whole - and only - point of locking is to enforce some kind of
exclusivity - and if exclusivity is explicit in the code-path, then
locking the page is pointless.

And yes, in this case I can see a remote theoretical model: "all good
cases keep the page locked, and the only trhing that unlocks the page
also guarantees nothing else can possiblly see it".

But no. I don't actually believe this is fuaranteed to the case here,
and even if it were, I don't think this is a code sequence we can or
should accept.

End result: there is no possible situation that I can think of where the above

       if (PageLocked(head))
               unlock_page(head);

kind of sequence can EVER possibly be correct, and it shows a complete
disregard of everything that locking is all about.

Now, the memory error handling may be so special that this effectively
"works" in practice, but there is no way I can apply such a
fundamentally broken patch.

I see two options:

 - make the result of identify_page_state() *tell* you whether the
page was already unlocked (and then make the unlock conditional on
*that*)

   This is valid, because now you removed that whole "somebody else
might have transferred the lock" issue: you know you already unlocked
the page based on the return value, so you don't unlock it again.
That's perfectly valid.

 - probably better: make identify_page_state() itself always unlock
the page, and turn the two different code sequences that currently do

        res = identify_page_state(pfn, p, page_flags);
  out:
        unlock_page(head);
        return res;

into just doing

        return identify_page_state(pfn, p, page_flags);
  out:
        unlock_page(head);
        return -EBUSY;

instead, and move that now pointless "res" variable into the only
if-statement that actually uses it (for other things entirely! It
should be a "enum mf_result" there!)

And yes, that second (and imho better) rule means that now
{page_action()" itself needs to be the thing that unlocks the page,
and that in turn might be done a few different ways:

 (a) either add a separate "MF_UNLOCKED" state bit to the possible
return codes and make that me_huge_page() that unlocks the page set
that bit (NOTE! It still needs to also return either MF_FAILED or
MF_RECOVERED, so this "MF_UNLOCKED" bit really should be a separate
bitmask entirely from the other MF_xyz bits)

 (b) make the rule be that *all* the ->action() functions need to just
unlock the page.

( suspect (b) is the better model to aim for, because honestly, static
code rules for locking are basically almost always superior to dynamic
"based on this flag" locking rules. You can in theory actually have
static tooling check that the locking nests correctly with the
unlocking that way (and it's just conceptually simpler to have a hard
rule about "this function always unlocks the page").

But that existing patch is *so* fundamentally wrong that I cannot
possibly apply it, and I feel bad about the fact that it has made it
all the way to me with sign-offs and reviewed-by's and everything.

                  Linus

  reply	other threads:[~2021-03-13 19:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-13  5:06 incoming Andrew Morton
2021-03-13  5:07 ` [patch 01/29] memblock: fix section mismatch warning Andrew Morton
2021-03-13  5:07 ` [patch 02/29] stop_machine: mark helpers __always_inline Andrew Morton
2021-03-13  5:07 ` [patch 03/29] init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM Andrew Morton
2021-03-13  5:07 ` [patch 04/29] mm/page_alloc.c: refactor initialization of struct page for holes in memory layout Andrew Morton
2021-03-13  5:07 ` [patch 05/29] mm/fork: clear PASID for new mm Andrew Morton
2021-03-13  5:07 ` [patch 06/29] hugetlb: dedup the code to add a new file_region Andrew Morton
2021-03-13  5:07 ` [patch 07/29] hugetlb: break earlier in add_reservation_in_range() when we can Andrew Morton
2021-03-13  5:07 ` [patch 08/29] mm: introduce page_needs_cow_for_dma() for deciding whether cow Andrew Morton
2021-03-13  5:07 ` [patch 09/29] mm: use is_cow_mapping() across tree where proper Andrew Morton
2021-03-13  5:07 ` [patch 10/29] hugetlb: do early cow when page pinned on src mm Andrew Morton
2021-03-13  5:07 ` [patch 11/29] mm/highmem.c: fix zero_user_segments() with start > end Andrew Morton
2021-03-13  5:07 ` [patch 12/29] binfmt_misc: fix possible deadlock in bm_register_write Andrew Morton
2021-03-13  5:07 ` [patch 13/29] MAINTAINERS: exclude uapi directories in API/ABI section Andrew Morton
2021-03-13  5:07 ` [patch 14/29] linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Andrew Morton
2021-03-13  5:07 ` [patch 15/29] kfence: fix printk format for ptrdiff_t Andrew Morton
2021-03-13  5:07 ` [patch 16/29] kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations Andrew Morton
2021-03-13  5:08 ` [patch 17/29] kfence: fix reports if constant function prefixes exist Andrew Morton
2021-03-13  5:08 ` [patch 18/29] include/linux/sched/mm.h: use rcu_dereference in in_vfork() Andrew Morton
2021-03-13  5:08 ` [patch 19/29] mm/madvise: replace ptrace attach requirement for process_madvise Andrew Morton
2021-03-13  5:08 ` [patch 20/29] kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC Andrew Morton
2021-03-13  5:08 ` [patch 21/29] kasan: fix KASAN_STACK dependency for HW_TAGS Andrew Morton
2021-03-13  5:08 ` [patch 22/29] mm/userfaultfd: fix memory corruption due to writeprotect Andrew Morton
2021-03-13  5:08 ` [patch 23/29] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers Andrew Morton
2021-03-13 19:23   ` Linus Torvalds [this message]
2021-03-13 19:23     ` Linus Torvalds
2021-03-14  6:36     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  5:08 ` [patch 24/29] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls Andrew Morton
2021-03-13  5:08 ` [patch 25/29] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign Andrew Morton
2021-03-13  5:08 ` [patch 26/29] mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument Andrew Morton
2021-03-13  5:08 ` [patch 27/29] mm/memcg: set memcg when splitting page Andrew Morton
2021-03-13  5:08 ` [patch 28/29] zram: fix return value on writeback_store Andrew Morton
2021-03-13  5:08 ` [patch 29/29] zram: fix broken page writeback Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wht_gk_d9k+NZs7eJvBeLOQT4xGcykgaCRHuiQ+LbReRw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.