All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Eric Farman <farman@linux.ibm.com>,
	David Airlie <airlied@linux.ie>,
	Tony Krowiak <akrowiak@linux.ibm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	dri-devel@lists.freedesktop.org,
	Harald Freudenberger <freude@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	intel-gfx@lists.freedesktop.org,
	intel-gvt-dev@lists.freedesktop.org,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Jason Herne <jjherne@linux.ibm.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	kvm@vger.kernel.org, Kirti Wankhede <kwankhede@nvidia.com>,
	linux-s390@vger.kernel.org,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Vineeth Vijayan <vneethv@linux.ibm.com>,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhi Wang <zhi.a.wang@intel.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v2 0/9] Move vfio_ccw to the new mdev API
Date: Fri, 17 Sep 2021 16:37:44 +0200	[thread overview]
Message-ID: <87ee9ngtdz.fsf@redhat.com> (raw)
In-Reply-To: <20210917125109.GE327412@nvidia.com>

On Fri, Sep 17 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote:
>> >  		ret = cio_cancel_halt_clear(sch, &iretry);
>> > -
>> >  		if (ret == -EIO) {
>> >  			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
>> >  			       sch->schid.ssid, sch->schid.sch_no);
>> > -			break;
>> > +			return ret;
>> 
>> Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV
>> we should be done as well, as then the device is dead and we do not need
>> to disable it.
>
> cio_cancel_halt_clear() should probably succeed in that case.

It will actually give us -ENODEV, as the very first call in that
function will already fail.

>
>> > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
>> >  		spin_unlock_irq(sch->lock);
>> >  
>> >  		if (ret == -EBUSY)
>> > -			wait_for_completion_timeout(&completion, 3*HZ);
>> > +			wait_for_completion_timeout(&completion, 3 * HZ);
>> >  
>> >  		private->completion = NULL;
>> >  		flush_workqueue(vfio_ccw_work_q);
>> >  		spin_lock_irq(sch->lock);
>> >  		ret = cio_disable_subchannel(sch);
>> >  	} while (ret == -EBUSY);
>> > +	return ret;
>> > +}
>> > +
>> > +static void fsm_close(struct vfio_ccw_private *private,
>> > +		      enum vfio_ccw_event event)
>> > +{
>> > +	struct subchannel *sch = private->sch;
>> > +	int ret;
>> > +
>> > +	spin_lock_irq(sch->lock);
>> > +	if (!sch->schib.pmcw.ena)
>> > +		goto err_unlock;
>> > +	ret = cio_disable_subchannel(sch);
>> 
>> cio_disable_subchannel() should be happy to disable an already disabled
>> subchannel, so I guess we can just walk through this and end up in
>> CLOSED state... unless entering with !ena actually indicates that we
>> messed up somewhere else in the state machine. I still need to find time
>> to read the patches.
>
> I don't know, I looked at that ena stuff for a bit and couldn't guess
> what it is trying to do.

It is one of the bits in the pmcw control block that can be modified; if
it is 1, the subchannel is enabled and can be used for I/O, if it is 0,
the subchannel is disabled and all instructions that initiate or stop
I/O will fail. Basically, you enable the subchannel if you actually want
to access the device associated with it. Online/offline for (normal
usage) ccw devices maps (among other things) to associated subchannel
enabled/disabled; for a subchannel that is supposed to be passed via
vfio-ccw, we want to have it enabled so that it is actually usable.

I think the ena checking had been inspired from what the ccw bus
does. We could probably just forge ahead in any case and the called
functions in the css bus would be able to handle it just fine, but I
have not double checked.

> Arguably the channel should not be ripped away from vfio while the FSM
> is in the open states, so I'm not sure what a lot of this is for.

We could have surprise removal (i.e. a subchannel in active use being
ripped out), as that's what happens on real hardware as well. E.g. doing
a device_del in QEMU.


WARNING: multiple messages have this Message-ID (diff)
From: Cornelia Huck <cohuck@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Eric Farman <farman@linux.ibm.com>,
	David Airlie <airlied@linux.ie>,
	Tony Krowiak <akrowiak@linux.ibm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	dri-devel@lists.freedesktop.org,
	Harald Freudenberger <freude@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	intel-gfx@lists.freedesktop.org,
	intel-gvt-dev@lists.freedesktop.org,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Jason Herne <jjherne@linux.ibm.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	kvm@vger.kernel.org, Kirti Wankhede <kwankhede@nvidia.com>,
	linux-s390@vger.kernel.org,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Vineeth Vijayan <vneethv@linux.ibm.com>,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhi Wang <zhi.a.wang@intel.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [Intel-gfx] [PATCH v2 0/9] Move vfio_ccw to the new mdev API
Date: Fri, 17 Sep 2021 16:37:44 +0200	[thread overview]
Message-ID: <87ee9ngtdz.fsf@redhat.com> (raw)
In-Reply-To: <20210917125109.GE327412@nvidia.com>

On Fri, Sep 17 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote:
>> >  		ret = cio_cancel_halt_clear(sch, &iretry);
>> > -
>> >  		if (ret == -EIO) {
>> >  			pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n",
>> >  			       sch->schid.ssid, sch->schid.sch_no);
>> > -			break;
>> > +			return ret;
>> 
>> Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV
>> we should be done as well, as then the device is dead and we do not need
>> to disable it.
>
> cio_cancel_halt_clear() should probably succeed in that case.

It will actually give us -ENODEV, as the very first call in that
function will already fail.

>
>> > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
>> >  		spin_unlock_irq(sch->lock);
>> >  
>> >  		if (ret == -EBUSY)
>> > -			wait_for_completion_timeout(&completion, 3*HZ);
>> > +			wait_for_completion_timeout(&completion, 3 * HZ);
>> >  
>> >  		private->completion = NULL;
>> >  		flush_workqueue(vfio_ccw_work_q);
>> >  		spin_lock_irq(sch->lock);
>> >  		ret = cio_disable_subchannel(sch);
>> >  	} while (ret == -EBUSY);
>> > +	return ret;
>> > +}
>> > +
>> > +static void fsm_close(struct vfio_ccw_private *private,
>> > +		      enum vfio_ccw_event event)
>> > +{
>> > +	struct subchannel *sch = private->sch;
>> > +	int ret;
>> > +
>> > +	spin_lock_irq(sch->lock);
>> > +	if (!sch->schib.pmcw.ena)
>> > +		goto err_unlock;
>> > +	ret = cio_disable_subchannel(sch);
>> 
>> cio_disable_subchannel() should be happy to disable an already disabled
>> subchannel, so I guess we can just walk through this and end up in
>> CLOSED state... unless entering with !ena actually indicates that we
>> messed up somewhere else in the state machine. I still need to find time
>> to read the patches.
>
> I don't know, I looked at that ena stuff for a bit and couldn't guess
> what it is trying to do.

It is one of the bits in the pmcw control block that can be modified; if
it is 1, the subchannel is enabled and can be used for I/O, if it is 0,
the subchannel is disabled and all instructions that initiate or stop
I/O will fail. Basically, you enable the subchannel if you actually want
to access the device associated with it. Online/offline for (normal
usage) ccw devices maps (among other things) to associated subchannel
enabled/disabled; for a subchannel that is supposed to be passed via
vfio-ccw, we want to have it enabled so that it is actually usable.

I think the ena checking had been inspired from what the ccw bus
does. We could probably just forge ahead in any case and the called
functions in the css bus would be able to handle it just fine, but I
have not double checked.

> Arguably the channel should not be ripped away from vfio while the FSM
> is in the open states, so I'm not sure what a lot of this is for.

We could have surprise removal (i.e. a subchannel in active use being
ripped out), as that's what happens on real hardware as well. E.g. doing
a device_del in QEMU.


  reply	other threads:[~2021-09-17 14:41 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 19:38 [PATCH v2 0/9] Move vfio_ccw to the new mdev API Jason Gunthorpe
2021-09-09 19:38 ` [Intel-gfx] " Jason Gunthorpe
2021-09-09 19:38 ` [PATCH v2 1/9] vfio/ccw: Use functions for alloc/free of the vfio_ccw_private Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-10 11:27   ` Christoph Hellwig
2021-09-10 11:27     ` Christoph Hellwig
2021-09-14 15:50     ` Cornelia Huck
2021-09-14 15:50       ` [Intel-gfx] " Cornelia Huck
2021-09-14 18:03       ` Jason Gunthorpe
2021-09-14 18:03         ` [Intel-gfx] " Jason Gunthorpe
2021-09-24  2:53   ` Eric Farman
2021-09-24  2:53     ` [Intel-gfx] " Eric Farman
2021-09-09 19:38 ` [PATCH v2 2/9] vfio/ccw: Pass vfio_ccw_private not mdev_device to various functions Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-10 11:32   ` Christoph Hellwig
2021-09-10 11:32     ` Christoph Hellwig
2021-09-20 11:12   ` Cornelia Huck
2021-09-20 11:12     ` [Intel-gfx] " Cornelia Huck
2021-09-24  2:53   ` Eric Farman
2021-09-24  2:53     ` [Intel-gfx] " Eric Farman
2021-09-09 19:38 ` [PATCH v2 3/9] vfio/ccw: Convert to use vfio_register_group_dev() Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-24 20:37   ` Eric Farman
2021-09-24 20:37     ` [Intel-gfx] " Eric Farman
2021-09-27 12:17     ` Jason Gunthorpe
2021-09-27 12:17       ` [Intel-gfx] " Jason Gunthorpe
2021-09-09 19:38 ` [PATCH v2 4/9] vfio/ccw: Make the FSM complete and synchronize it to the mdev Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-20 12:19   ` Cornelia Huck
2021-09-20 12:19     ` [Intel-gfx] " Cornelia Huck
2021-09-20 12:30     ` Jason Gunthorpe
2021-09-20 12:30       ` [Intel-gfx] " Jason Gunthorpe
2021-09-09 19:38 ` [PATCH v2 5/9] vfio/mdev: Consolidate all the device_api sysfs into the core code Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-10 12:10   ` Christoph Hellwig
2021-09-10 12:10     ` [Intel-gfx] " Christoph Hellwig
2021-09-10 13:38     ` Jason Gunthorpe
2021-09-10 13:38       ` [Intel-gfx] " Jason Gunthorpe
2021-09-10 16:09       ` Alex Williamson
2021-09-10 16:09         ` [Intel-gfx] " Alex Williamson
2021-09-09 19:38 ` [PATCH v2 6/9] vfio/mdev: Add mdev available instance checking to the core Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-10 12:25   ` Christoph Hellwig
2021-09-10 12:25     ` [Intel-gfx] " Christoph Hellwig
2021-09-20 18:02   ` Cornelia Huck
2021-09-20 18:02     ` [Intel-gfx] " Cornelia Huck
2021-09-21 13:19     ` Jason Gunthorpe
2021-09-21 13:19       ` [Intel-gfx] " Jason Gunthorpe
2021-09-24  2:54       ` Eric Farman
2021-09-24  2:54         ` [Intel-gfx] " Eric Farman
2021-09-09 19:38 ` [PATCH v2 7/9] vfio/ccw: Remove private->mdev Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-24 20:45   ` Eric Farman
2021-09-24 20:45     ` [Intel-gfx] " Eric Farman
2021-09-27 12:32     ` Jason Gunthorpe
2021-09-27 12:32       ` [Intel-gfx] " Jason Gunthorpe
2021-09-27 20:45       ` Eric Farman
2021-09-27 20:45         ` [Intel-gfx] " Eric Farman
2021-09-09 19:38 ` [PATCH v2 8/9] vfio: Export vfio_device_try_get() Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-09 19:38 ` [PATCH v2 9/9] vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the mdev Jason Gunthorpe
2021-09-09 19:38   ` [Intel-gfx] " Jason Gunthorpe
2021-09-09 19:54 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API Patchwork
2021-09-13 17:40 ` [PATCH v2 0/9] " Eric Farman
2021-09-13 17:40   ` [Intel-gfx] " Eric Farman
2021-09-13 19:24   ` Jason Gunthorpe
2021-09-13 19:24     ` [Intel-gfx] " Jason Gunthorpe
2021-09-13 20:31     ` Eric Farman
2021-09-13 20:31       ` [Intel-gfx] " Eric Farman
2021-09-14 13:36       ` Jason Gunthorpe
2021-09-14 13:36         ` [Intel-gfx] " Jason Gunthorpe
2021-09-17 11:59         ` Cornelia Huck
2021-09-17 11:59           ` [Intel-gfx] " Cornelia Huck
2021-09-17 12:51           ` Jason Gunthorpe
2021-09-17 12:51             ` [Intel-gfx] " Jason Gunthorpe
2021-09-17 14:37             ` Cornelia Huck [this message]
2021-09-17 14:37               ` Cornelia Huck
2021-09-14 13:46 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API (rev2) Patchwork
2021-09-14 18:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API (rev3) Patchwork
2021-09-21 15:11 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API (rev4) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ee9ngtdz.fsf@redhat.com \
    --to=cohuck@redhat.com \
    --cc=airlied@linux.ie \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=farman@linux.ibm.com \
    --cc=freude@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=jgg@nvidia.com \
    --cc=jjherne@linux.ibm.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=oberpar@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=vneethv@linux.ibm.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.