linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-cxl@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	"Schofield, Alison" <alison.schofield@intel.com>
Subject: Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
Date: Tue, 30 Mar 2021 12:00:23 -0700	[thread overview]
Message-ID: <CAPcyv4igMvwfZNgi-Uap_QUJi+uocMUD3KZBhXUy56AuHZQtqw@mail.gmail.com> (raw)
In-Reply-To: <20210330175431.GX2356281@nvidia.com>

On Tue, Mar 30, 2021 at 10:54 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Mar 30, 2021 at 10:31:15AM -0700, Dan Williams wrote:
> > On Tue, Mar 30, 2021 at 10:03 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Tue, Mar 30, 2021 at 09:05:29AM -0700, Dan Williams wrote:
> > >
> > > > > If you can't clearly point to the *data* under RCU protection it is
> > > > > being used wrong.
> > > >
> > > > Agree.
> > > >
> > > > The data being protected is the value of
> > > > dev->kobj.state_in_sysfs. The
> > >
> > > So where is that read under:
> > >
> > > +       idx = srcu_read_lock(&cxl_memdev_srcu);
> > > +       rc = __cxl_memdev_ioctl(cxlmd, cmd, arg);
> > > +       srcu_read_unlock(&cxl_memdev_srcu, idx);
> > >
> > > ?
> >
> > device_is_registered() inside __cxl_memdev_ioctl().
>
> Oh, I see, I missed that
>
> > > It can't read the RCU protected data outside the RCU critical region,
> > > and it can't read/write RCU protected data without using the helper
> > > macros which insert the required barriers.
> >
> > The required barriers are there. srcu_read_lock() +
> > device_is_registered() is paired with cdev_device_del() +
> > synchronize_rcu().
>
> RCU needs barriers on the actual load/store just a naked
> device_is_registered() alone is not strong enough.
>
> > > IMHO this can't use 'dev->kobj.state_in_sysfs' as the RCU protected data.
> >
> > This usage of srcu is functionally equivalent to replacing
> > srcu_read_lock() with down_read() and the shutdown path with:
>
> Sort of, but the rules for load/store under RCU are different than for
> load/store under a normal barriered lock. All the data is unstable for
> instance and minimially needs READ_ONCE.

The data is unstable under the srcu_read_lock until the end of the
next rcu grace period, synchronize_rcu() ensures all active
srcu_read_lock() sections have completed. Unless Paul and I
misunderstood each other, this scheme of synchronizing object state is
also used in kill_dax(), and I put that comment there the last time
this question was raised. If srcu was being used to synchronize the
liveness of an rcu object like @cxlm or a new ops object then I would
expect rcu_dereference + rcu_assign_pointer around usages of that
object. The liveness of the object in this case is handled by kobject
reference, or inode reference in the case of kill_dax() outside of
srcu.

>
> > cdev_device_del(...);
> > down_write(...):
> > up_write(...);
>
> The lock would have to enclose the store to state_in_sysfs, otherwise
> as written it has the same data race problems.

There's no race above. The rule is that any possible observation of
->state_in_sysfs == 1, or rcu_dereference() != NULL, must be flushed.
After that value transitions to zero, or the rcu object is marked for
deletion, an rcu grace period is needed before that memory can be
freed. If an rwsem is used the only requirement is that any read-side
sections that might have observed ->state_in_sysfs == 1 have ended
which is why the down_write() / up_write() does not need to surround
the cdev_device_del(). It's sufficient to flush the read side after
the state is known to have changed. There are several examples of
rwsem being used as a barrier like this:

drivers/mtd/ubi/wl.c:1432:      down_write(&ubi->work_sem);
drivers/mtd/ubi/wl.c-1433-      up_write(&ubi->work_sem);

drivers/scsi/cxlflash/main.c:2229:      down_write(&cfg->ioctl_rwsem);
drivers/scsi/cxlflash/main.c-2230-      up_write(&cfg->ioctl_rwsem);

fs/btrfs/block-group.c:355:     down_write(&space_info->groups_sem);
fs/btrfs/block-group.c-356-     up_write(&space_info->groups_sem);

fs/btrfs/disk-io.c:4189:        down_write(&fs_info->cleanup_work_sem);
fs/btrfs/disk-io.c-4190-        up_write(&fs_info->cleanup_work_sem);

net/core/net_namespace.c:629:   down_write(&pernet_ops_rwsem);
net/core/net_namespace.c-630-   up_write(&pernet_ops_rwsem);

  reply	other threads:[~2021-03-30 19:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30  2:47 [PATCH v2 0/4] cxl/mem: Fix memdev device setup Dan Williams
2021-03-30  2:47 ` [PATCH v2 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
2021-03-30  2:47 ` [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
2021-03-30 11:16   ` Jason Gunthorpe
2021-03-30 15:37     ` Dan Williams
2021-03-30 15:47       ` Jason Gunthorpe
2021-03-30 16:05         ` Dan Williams
2021-03-30 17:02           ` Jason Gunthorpe
2021-03-30 17:31             ` Dan Williams
2021-03-30 17:54               ` Jason Gunthorpe
2021-03-30 19:00                 ` Dan Williams [this message]
2021-03-30 19:26                   ` Jason Gunthorpe
2021-03-30 19:43                     ` Dan Williams
2021-03-30 19:51                       ` Jason Gunthorpe
2021-03-30 21:00                         ` Dan Williams
2021-03-30 22:09                           ` Jason Gunthorpe
2021-03-30  2:47 ` [PATCH v2 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
2021-03-30  2:48 ` [PATCH v2 4/4] cxl/mem: Disable cxl device power management Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4igMvwfZNgi-Uap_QUJi+uocMUD3KZBhXUy56AuHZQtqw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).