All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Minchan Kim <minchan@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	John Dias <joaodias@google.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v4] mm: fix is_pinnable_page against on cma page
Date: Tue, 17 May 2022 16:28:25 -0300	[thread overview]
Message-ID: <20220517192825.GM63055@ziepe.ca> (raw)
In-Reply-To: <ef901bdb-6117-bad4-2f26-c80df270b86e@nvidia.com>

On Tue, May 17, 2022 at 11:12:37AM -0700, John Hubbard wrote:
> On 5/17/22 07:00, Jason Gunthorpe wrote:
> > > > It does change the generated code slightly. I don't know if this will
> > > > affect performance here or not. But just for completeness, here you go:
> > > > 
> > > > free_one_page() originally has this (just showing the changed parts):
> > > > 
> > > >      mov    0x8(%rdx,%rax,8),%rbx
> > > >      and    $0x3f,%ecx
> > > >      shr    %cl,%rbx
> > > >      and    $0x7,
> > > > 
> > > > 
> > > > And after applying this diff:
> > > > 
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 0e42038382c1..df1f8e9a294f 100644
> > > > +++ b/mm/page_alloc.c
> > > > @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct
> > > > page *page,
> > > >          word_bitidx = bitidx / BITS_PER_LONG;
> > > >          bitidx &= (BITS_PER_LONG-1);
> > > > 
> > > > -       word = bitmap[word_bitidx];
> > > > +       word = READ_ONCE(bitmap[word_bitidx]);
> > > >          return (word >> bitidx) & mask;
> > > >   }
> > > > 
> > > > 
> > > > ...it now does an extra memory dereference:
> > > > 
> > > >      lea    0x8(%rdx,%rax,8),%rax
> > > >      and    $0x3f,%ecx
> > > >      mov    (%rax),%rbx
> > > >      shr    %cl,%rbx
> > > >      and    $0x7,%ebx
> > 
> > Where is the extra memory reference? 'lea' is not a memory reference,
> > it is just some maths?
> 
> If you compare this to the snippet above, you'll see that there is
> an extra mov statement, and that one dereferences a pointer from
> %rax:
> 
>     mov    (%rax),%rbx

That is the same move as:

   mov    0x8(%rdx,%rax,8),%rbx

Except that the EA calculation was done in advance and stored in rax.

lea isn't a memory reference, it is just computing the pointer value 
that 0x8(%rdx,%rax,8) represents. ie the lea computes

  %rax = %rdx + %rax*8 + 6

Which is then fed into the mov. Maybe it is an optimization to allow
one pipe to do the shr and an other to the EA - IDK, seems like a
random thing for the compiler to do.
> > IMHO putting a READ_ONCE on something that is not a memory load from
> > shared data is nonsense - if a simple == has a stability risk then so
> > does the '(word >> bitidx) & mask'.
> 
> Doing something like this:
> 
>     int __x = y();
>     int x = READ_ONCE(__x);
> 
> is just awful! I agree. Really, y() should handle any barriers, because
> otherwise it really does look pointless, and people reading the code
> need something that is clearer. My first reaction was that this was
> pointless and wrong, and it turns out that that's only about 80% true:
> as long as LTO-of-the-future doesn't arrive, and as long as no one
> refactors y() to be inline.

It is always pointless, because look at the whole flow:

 y():
   word = bitmap[word_bitidx];
   return (word >> bitidx) & mask;

 int __x = y()
 int x = READ_ONCE(__x)

The requirement here is for a coherent read of bitmap[word_bitidx]
that is not mangled by multi-load, load tearing or other theoritcal
compiler problems. Stated more clearly if bitmap[] has values {A,B,C}
then the result of all this in x should always be {A',B',C'} according
to the given maths, never, ever an impossible value D.

The nature of the compiler problems we are trying to prevent is that
the compiler my take a computation that seems deterministic and
corrupt it somehow by making an assumption that the memory is stable
when it is not.

For instance == may not work right if the compiler decides to do:

 if (READ_ONCE(a) == 1 && READ_ONCE(a) == 1)

This also means that >> and & could malfunction. So when LTO expands
and inlines everything and you get this:

 if (READ_ONCE((bitmap[word_bitidx] >> bitidx) & mask) == MIGRATE_CMA)

It is still not safe because the >> and & could have multi-loaded or
torn the read in some way that it is no longer a sane
calculation. (for instance imagine that the compiler decides to read
the value byte-at-a-time without READ_ONCE)

ie the memory may have had A racing turning into C but x computed into
D not A'.

Paul can correct me, but I understand we do not have a list of allowed
operations that are exempted from the READ_ONCE() requirement. ie it
is not just conditional branching that requires READ_ONCE().

This is why READ_ONCE() must always be on the memory load, because the
point is to sanitize away the uncertainty that comes with an unlocked
read of unstable memory contents. READ_ONCE() samples the value in
memory, and removes all tearing, multiload, etc "instability" that may
effect down stream computations. In this way down stream compulations
become reliable.

Jason

  reply	other threads:[~2022-05-17 19:28 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 21:17 [PATCH v4] mm: fix is_pinnable_page against on cma page Minchan Kim
2022-05-10 22:56 ` John Hubbard
2022-05-10 23:31   ` Minchan Kim
2022-05-10 23:58     ` John Hubbard
2022-05-11  0:09       ` Minchan Kim
2022-05-11  4:32         ` John Hubbard
2022-05-11 21:46           ` Minchan Kim
2022-05-11 22:25             ` John Hubbard
2022-05-11 22:37               ` Minchan Kim
2022-05-11 22:49                 ` John Hubbard
2022-05-11 23:08                   ` Minchan Kim
2022-05-11 23:13                     ` John Hubbard
2022-05-11 23:15                       ` Minchan Kim
2022-05-11 23:28                         ` Minchan Kim
2022-05-11 23:33                           ` John Hubbard
2022-05-11 23:45                       ` Paul E. McKenney
2022-05-11 23:57                         ` John Hubbard
2022-05-12  0:12                           ` Paul E. McKenney
2022-05-12  0:12                           ` John Hubbard
2022-05-12  0:22                             ` Paul E. McKenney
2022-05-12  0:26                               ` Minchan Kim
2022-05-12  0:34                                 ` John Hubbard
2022-05-12  0:49                                   ` Paul E. McKenney
2022-05-12  1:02                                     ` John Hubbard
2022-05-12  1:03                                     ` Minchan Kim
2022-05-12  1:08                                       ` John Hubbard
2022-05-12  2:18                                         ` John Hubbard
2022-05-12  3:44                                           ` Minchan Kim
2022-05-12  4:47                                             ` John Hubbard
2022-05-17 14:00                                             ` Jason Gunthorpe
2022-05-17 18:12                                               ` John Hubbard
2022-05-17 19:28                                                 ` Jason Gunthorpe [this message]
2022-05-17 20:12                                                   ` John Hubbard
2022-05-17 20:21                                                     ` Paul E. McKenney
2022-05-23 16:33                                                     ` Minchan Kim
2022-05-24  2:55                                                       ` John Hubbard
2022-05-24  5:16                                                         ` Minchan Kim
2022-05-24  6:22                                                           ` John Hubbard
2022-05-24 14:19                                                           ` Jason Gunthorpe
2022-05-24 15:43                                                             ` Minchan Kim
2022-05-24 15:48                                                               ` Jason Gunthorpe
2022-05-24 16:37                                                                 ` Paul E. McKenney
2022-05-24 16:59                                                                   ` Minchan Kim
2022-05-12  3:57                                           ` Paul E. McKenney
2022-05-12  1:03                                   ` Minchan Kim
2022-05-12  0:35                                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220517192825.GM63055@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joaodias@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.