Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

From: Suren Baghdasaryan <surenb@google.com>
To: Hillf Danton <hdanton@sina.com>
Cc: vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net,
	peterz@infradead.org, hughd@google.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock
Date: Mon, 16 Jan 2023 20:52:45 -0800	[thread overview]
Message-ID: <CAJuCfpFq23m-KYKaDoCS2K2aM8rO7j8aPa0nVPs-_xP2Sf6GGg@mail.gmail.com> (raw)
In-Reply-To: <20230117031632.2321-1-hdanton@sina.com>

On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 16 Jan 2023 15:08:48 -0800 Suren Baghdasaryan <surenb@google.com>
> > On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@sina.com> wrote:
> > > On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@google.com>
> > > >  static inline bool vma_read_trylock(struct vm_area_struct *vma)
> > > >  {
> > > >       /* Check before locking. A race might cause false locked result. */
> > > > -     if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > > +     if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > >               return false;
> > >
> > > Add mb to pair with the above wmb like
> >
> > The wmb above is to ensure the ordering between updates of lock_seq
> > and vm_lock->count (lock_seq is updated first and vm_lock->count only
> > after that). The first access to vm_lock->count in this function is
> > atomic_inc_unless_negative() and it's an atomic RMW operation with a
> > return value. According to documentation such functions are fully
> > ordered, therefore I think we already have an implicit full memory
> > barrier between reads of lock_seq and vm_lock->count here. Am I wrong?
>
> No you are not.

I'm not wrong or the other way around? Please expand a bit.

> Revisit it in case of full mb not ensured.
>
> #ifndef arch_atomic_inc_unless_negative
> static __always_inline bool
> arch_atomic_inc_unless_negative(atomic_t *v)
> {
>         int c = arch_atomic_read(v);
>
>         do {
>                 if (unlikely(c < 0))
>                         return false;
>         } while (!arch_atomic_try_cmpxchg(v, &c, c + 1));
>
>         return true;
> }
> #define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
> #endif

I think your point is that the full mb is not ensured here, specifically if c<0?

>
> > >         int count = atomic_read(&vma->vm_lock->count);
> > >         for (;;) {
> > >                 int new = count + 1;
> > >
> > >                 if (count < 0 || new < 0)
> > >                         return false;
> > >
> > >                 new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> > >                 if (new == count)
> > >                         break;
> > >                 count = new;
> > >                 cpu_relax();
> > >         }
> > >
> > >         (wake up waiting readers after taking the lock;
> > >         but the write lock owner before this read trylock should be
> > >         responsible for waking waiters up.)
> > >
> > >         lock acquire for read.
> >
> > This schema might cause readers to wait, which is not an exact
>
> Could you specify a bit more on wait?

Oh, I misunderstood your intent with that for() loop. Indeed, if a
writer took the lock, the count will be negative and it terminates
with a failure. Yeah, I think that would work.

>
> > replacement for down_read_trylock(). The requirement to wake up
> > waiting readers also complicates things
>
> If the writer lock owner is preempted by a reader while releasing lock,
>
>         set count to zero
>                           <-- preempt
>         wake up waiters
>
> then lock is owned by reader but with read waiters.
>
> That is buggy if write waiter starvation is allowed in this patchset.

I don't quite understand your point here. Readers don't wait, so there
can't be "read waiters". Could you please expand with a race diagram
maybe?

>
> > and since we can always fall
> > back to mmap_lock, that complication is unnecessary IMHO. I could use
> > part of your suggestion like this:
> >
> >                  int new = count + 1;
> >
> >                  if (count < 0 || new < 0)
> >                          return false;
> >
> >                  new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> >                  if (new == count)
> >                          return false;
> >
> > Compared to doing atomic_inc_unless_negative() first, like I did
> > originally, this schema opens a bit wider window for the writer to get
> > in the middle and cause the reader to fail locking but I don't think
> > it would result in any visible regression.
>
> It is definitely fine for writer to acquire the lock while reader is
> doing trylock, no?

Yes.

>