All of lore.kernel.org
 help / color / mirror / Atom feed
From: Halil Pasic <pasic@linux.ibm.com>
To: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	Michael Mueller <mimu@linux.ibm.com>,
	linux-s390@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, bfu@redhat.com,
	Halil Pasic <pasic@linux.ibm.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>
Subject: Re: [PATCH 1/1] virtio/s390: fix vritio-ccw device teardown
Date: Tue, 21 Sep 2021 18:52:22 +0200	[thread overview]
Message-ID: <20210921185222.246b15bb.pasic@linux.ibm.com> (raw)
In-Reply-To: <05b1ac0e4aa4a1c7df1a8994c898630e9b2e384d.camel@linux.ibm.com>

On Tue, 21 Sep 2021 15:31:03 +0200
Vineeth Vijayan <vneethv@linux.ibm.com> wrote:

> On Tue, 2021-09-21 at 05:25 +0200, Halil Pasic wrote:
> > On Mon, 20 Sep 2021 12:07:23 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> > > On Mon, Sep 20 2021, Vineeth Vijayan <vneethv@linux.ibm.com> wrote:
> > >   
> > > > On Mon, 2021-09-20 at 00:39 +0200, Halil Pasic wrote:    
> > > > > On Fri, 17 Sep 2021 10:40:20 +0200
> > > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > > >     
> > > > ...snip...    
> > > > > > > Thanks, if I find time for it, I will try to understand
> > > > > > > this
> > > > > > > better and
> > > > > > > come back with my findings.
> > > > > > >      
> > > > > > > > > * Can virtio_ccw_remove() get called while !cdev-  
> > > > > > > > > >online and   
> > > > > > > > >   virtio_ccw_online() is running on a different cpu? If
> > > > > > > > > yes,
> > > > > > > > > what would
> > > > > > > > >   happen then?        
> > > > > > > > 
> > > > > > > > All of the remove/online/... etc. callbacks are invoked
> > > > > > > > via the
> > > > > > > > ccw bus
> > > > > > > > code. We have to trust that it gets it correct :) (Or
> > > > > > > > have the
> > > > > > > > common
> > > > > > > > I/O layer maintainers double-check it.)
> > > > > > > >       
> > > > > > > 
> > > > > > > Vineeth, what is your take on this? Are the struct
> > > > > > > ccw_driver
> > > > > > > virtio_ccw_remove and the virtio_ccw_online callbacks
> > > > > > > mutually
> > > > > > > exclusive. Please notice that we may initiate the onlining
> > > > > > > by
> > > > > > > calling ccw_device_set_online() from a workqueue.
> > > > > > > 
> > > > > > > @Conny: I'm not sure what is your definition of 'it gets it
> > > > > > > correct'...
> > > > > > > I doubt CIO can make things 100% foolproof in this
> > > > > > > area.      
> > > > > > 
> > > > > > Not 100% foolproof, but "don't online a device that is in the
> > > > > > progress
> > > > > > of going away" seems pretty basic to me.
> > > > > >     
> > > > > 
> > > > > I hope Vineeth will chime in on this.    
> > > > Considering the online/offline processing, 
> > > > The ccw_device_set_offline function or the online/offline is
> > > > handled
> > > > inside device_lock. Also, the online_store function takes care of
> > > > avoiding multiple online/offline processing. 
> > > > 
> > > > Now, when we consider the unconditional remove of the device,
> > > > I am not familiar with the virtio_ccw driver. My assumptions are
> > > > based
> > > > on how CIO/dasd drivers works. If i understand correctly, the
> > > > dasd
> > > > driver sets different flags to make sure that a device_open is
> > > > getting
> > > > prevented while the the device is in progress of offline-ing.     
> > > 
> > > Hm, if we are invoking the online/offline callbacks under the
> > > device
> > > lock already,   
> > 
> > I believe we have a misunderstanding here. I believe that Vineeth is
> > trying to tell us, that online_store_handle_offline() and
> > online_store_handle_offline() are called under the a device lock of
> > the ccw device. Right, Vineeth?  
> Yes. I wanted to bring-out both the scenario.The set_offline/_online()
> calls and the unconditional-remove call.

I don't understand the paragraph above. I can't map the terms
set_offline/_online() and unconditional-remove call to chunks of code.
:( 

> For the set_online The virtio_ccw_online() also invoked with ccwlock
> held. (ref: ccw_device_set_online)

I don't think virtio_ccw_online() is invoked with the ccwlock held. I
think we call virtio_ccw_online() in this line:
https://elixir.bootlin.com/linux/v5.15-rc2/source/drivers/s390/cio/device.c#L394
and we have released the cdev->ccwlock literally 2 lines above.


> > 
> > Conny, I believe, by online/offline callbacks, you mean
> > virtio_ccw_online() and virtio_ccw_offline(), right?
> > 
> > But the thing is that virtio_ccw_online() may get called (and is
> > typically called, AFAICT) with no locks held via:
> > virtio_ccw_probe() --> async_schedule(virtio_ccw_auto_online, cdev)
> > -*-> virtio_ccw_auto_online(cdev) --> ccw_device_set_online(cdev) -->
> > virtio_ccw_online()
> > 
> > Furthermore after a closer look, I believe because we don't take
> > a reference to the cdev in probe, we may get virtio_ccw_auto_online()
> > called with an invalid pointer (the pointer is guaranteed to be valid
> > in probe, but because of async we have no guarantee that it will be
> > called in the context of probe).
> > 
> > Shouldn't we take a reference to the cdev in probe?  
> We just had a quick look at the virtio_ccw_probe() function.
> Did you mean to have a get_device() during the probe() and put_device()
> just after the virtio_ccw_auto_online() ?

Yes, that would ensure that cdev pointer is still valid when
virtio_ccw_auto_online() is executed, and that things are cleaned up
properly, I guess. But I'm not 100% sure about all the interactions.
AFAIR ccw_device_set_online(cdev) would bail out if !drv. But then
we have the case where we already assigned it to a new driver (e.g.
vfio for dasd).

BTW I believe if we have a problem here, the dasd driver has the same
problem as well. The code looks very, very similar.

And shouldn't this auto-online be common CIO functionality? What is the
reason the char devices don't seem to have it?

Regards,
Halil

WARNING: multiple messages have this Message-ID (diff)
From: Halil Pasic <pasic@linux.ibm.com>
To: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: linux-s390@vger.kernel.org,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	bfu@redhat.com, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Halil Pasic <pasic@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	kvm@vger.kernel.org, Michael Mueller <mimu@linux.ibm.com>
Subject: Re: [PATCH 1/1] virtio/s390: fix vritio-ccw device teardown
Date: Tue, 21 Sep 2021 18:52:22 +0200	[thread overview]
Message-ID: <20210921185222.246b15bb.pasic@linux.ibm.com> (raw)
In-Reply-To: <05b1ac0e4aa4a1c7df1a8994c898630e9b2e384d.camel@linux.ibm.com>

On Tue, 21 Sep 2021 15:31:03 +0200
Vineeth Vijayan <vneethv@linux.ibm.com> wrote:

> On Tue, 2021-09-21 at 05:25 +0200, Halil Pasic wrote:
> > On Mon, 20 Sep 2021 12:07:23 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> > > On Mon, Sep 20 2021, Vineeth Vijayan <vneethv@linux.ibm.com> wrote:
> > >   
> > > > On Mon, 2021-09-20 at 00:39 +0200, Halil Pasic wrote:    
> > > > > On Fri, 17 Sep 2021 10:40:20 +0200
> > > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > > >     
> > > > ...snip...    
> > > > > > > Thanks, if I find time for it, I will try to understand
> > > > > > > this
> > > > > > > better and
> > > > > > > come back with my findings.
> > > > > > >      
> > > > > > > > > * Can virtio_ccw_remove() get called while !cdev-  
> > > > > > > > > >online and   
> > > > > > > > >   virtio_ccw_online() is running on a different cpu? If
> > > > > > > > > yes,
> > > > > > > > > what would
> > > > > > > > >   happen then?        
> > > > > > > > 
> > > > > > > > All of the remove/online/... etc. callbacks are invoked
> > > > > > > > via the
> > > > > > > > ccw bus
> > > > > > > > code. We have to trust that it gets it correct :) (Or
> > > > > > > > have the
> > > > > > > > common
> > > > > > > > I/O layer maintainers double-check it.)
> > > > > > > >       
> > > > > > > 
> > > > > > > Vineeth, what is your take on this? Are the struct
> > > > > > > ccw_driver
> > > > > > > virtio_ccw_remove and the virtio_ccw_online callbacks
> > > > > > > mutually
> > > > > > > exclusive. Please notice that we may initiate the onlining
> > > > > > > by
> > > > > > > calling ccw_device_set_online() from a workqueue.
> > > > > > > 
> > > > > > > @Conny: I'm not sure what is your definition of 'it gets it
> > > > > > > correct'...
> > > > > > > I doubt CIO can make things 100% foolproof in this
> > > > > > > area.      
> > > > > > 
> > > > > > Not 100% foolproof, but "don't online a device that is in the
> > > > > > progress
> > > > > > of going away" seems pretty basic to me.
> > > > > >     
> > > > > 
> > > > > I hope Vineeth will chime in on this.    
> > > > Considering the online/offline processing, 
> > > > The ccw_device_set_offline function or the online/offline is
> > > > handled
> > > > inside device_lock. Also, the online_store function takes care of
> > > > avoiding multiple online/offline processing. 
> > > > 
> > > > Now, when we consider the unconditional remove of the device,
> > > > I am not familiar with the virtio_ccw driver. My assumptions are
> > > > based
> > > > on how CIO/dasd drivers works. If i understand correctly, the
> > > > dasd
> > > > driver sets different flags to make sure that a device_open is
> > > > getting
> > > > prevented while the the device is in progress of offline-ing.     
> > > 
> > > Hm, if we are invoking the online/offline callbacks under the
> > > device
> > > lock already,   
> > 
> > I believe we have a misunderstanding here. I believe that Vineeth is
> > trying to tell us, that online_store_handle_offline() and
> > online_store_handle_offline() are called under the a device lock of
> > the ccw device. Right, Vineeth?  
> Yes. I wanted to bring-out both the scenario.The set_offline/_online()
> calls and the unconditional-remove call.

I don't understand the paragraph above. I can't map the terms
set_offline/_online() and unconditional-remove call to chunks of code.
:( 

> For the set_online The virtio_ccw_online() also invoked with ccwlock
> held. (ref: ccw_device_set_online)

I don't think virtio_ccw_online() is invoked with the ccwlock held. I
think we call virtio_ccw_online() in this line:
https://elixir.bootlin.com/linux/v5.15-rc2/source/drivers/s390/cio/device.c#L394
and we have released the cdev->ccwlock literally 2 lines above.


> > 
> > Conny, I believe, by online/offline callbacks, you mean
> > virtio_ccw_online() and virtio_ccw_offline(), right?
> > 
> > But the thing is that virtio_ccw_online() may get called (and is
> > typically called, AFAICT) with no locks held via:
> > virtio_ccw_probe() --> async_schedule(virtio_ccw_auto_online, cdev)
> > -*-> virtio_ccw_auto_online(cdev) --> ccw_device_set_online(cdev) -->
> > virtio_ccw_online()
> > 
> > Furthermore after a closer look, I believe because we don't take
> > a reference to the cdev in probe, we may get virtio_ccw_auto_online()
> > called with an invalid pointer (the pointer is guaranteed to be valid
> > in probe, but because of async we have no guarantee that it will be
> > called in the context of probe).
> > 
> > Shouldn't we take a reference to the cdev in probe?  
> We just had a quick look at the virtio_ccw_probe() function.
> Did you mean to have a get_device() during the probe() and put_device()
> just after the virtio_ccw_auto_online() ?

Yes, that would ensure that cdev pointer is still valid when
virtio_ccw_auto_online() is executed, and that things are cleaned up
properly, I guess. But I'm not 100% sure about all the interactions.
AFAIR ccw_device_set_online(cdev) would bail out if !drv. But then
we have the case where we already assigned it to a new driver (e.g.
vfio for dasd).

BTW I believe if we have a problem here, the dasd driver has the same
problem as well. The code looks very, very similar.

And shouldn't this auto-online be common CIO functionality? What is the
reason the char devices don't seem to have it?

Regards,
Halil
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-09-21 16:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-15 21:57 [PATCH 1/1] virtio/s390: fix vritio-ccw device teardown Halil Pasic
2021-09-15 21:57 ` Halil Pasic
2021-09-15 22:00 ` Halil Pasic
2021-09-15 22:00   ` Halil Pasic
2021-09-16  8:59 ` Cornelia Huck
2021-09-16  8:59   ` Cornelia Huck
2021-09-16 13:18   ` Halil Pasic
2021-09-16 13:18     ` Halil Pasic
2021-09-17  8:40     ` Cornelia Huck
2021-09-17  8:40       ` Cornelia Huck
2021-09-19 22:39       ` Halil Pasic
2021-09-19 22:39         ` Halil Pasic
2021-09-20  7:41         ` Vineeth Vijayan
2021-09-20 10:07           ` Cornelia Huck
2021-09-20 10:07             ` Cornelia Huck
2021-09-21  3:25             ` Halil Pasic
2021-09-21  3:25               ` Halil Pasic
2021-09-21 12:09               ` Cornelia Huck
2021-09-21 12:09                 ` Cornelia Huck
2021-09-21 13:31               ` Vineeth Vijayan
2021-09-21 16:52                 ` Halil Pasic [this message]
2021-09-21 16:52                   ` Halil Pasic
2021-09-21 18:25                   ` Vineeth Vijayan
2021-09-20 10:30         ` Cornelia Huck
2021-09-20 10:30           ` Cornelia Huck
2021-09-20 13:27           ` Halil Pasic
2021-09-20 13:27             ` Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210921185222.246b15bb.pasic@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=bfu@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mimu@linux.ibm.com \
    --cc=oberpar@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vneethv@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.