All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFD] uevent handling for subchannels
@ 2020-04-03 10:40 Cornelia Huck
  2020-04-17 12:38 ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-04-03 10:40 UTC (permalink / raw)
  To: Peter Oberparleiter, Vineeth Vijayan
  Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski

Hi,

this is kind-of-a-followup to the uevent patches I sent in
<20200327124503.9794-1-cohuck@redhat.com> last Friday.

Currently, the common I/O layer will suppress uevents for subchannels
that are being registered, delegating generating a delayed ADD uevent
to the driver that actually binds to it and only generating the uevent
itself if no driver gets bound. The initial version of that delaying
was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for
subchannels"); from what I remember, we were seeing quite bad storms of
uevents on LPARs that had a lot of I/O subchannels with no device
accessible through them.

So while there's definitely a good reason for wanting to delay uevents,
it is also introducing problems. One is udev rules for subchannels that
are supposed to do something before a driver binds (e.g. setting
driver_override to bind an I/O subchannel to vfio_ccw instead of
io_subchannel) are not effective, as the ADD uevent will only be
generated when the io_subchannel driver is already done with doing all
setup. Another one is that only the ADD uevent is generated after
uevent suppression is lifted; any other uevents that might have been
generated are lost.

So, what to do about this, especially in the light of vfio-ccw handling?

One idea I had is to call css_sch_is_valid() from
css_register_subchannel(); this would exclude the largest class of
non-operational subchannels already (those that don't have a valid
device; I'm not quite sure if there's also something needed for EADM
subchannels?) If we got rid of the uevent delaying, we would still get
ADD/REMOVE events for subchannels where the device turns out to be
non-accessible, but I believe (hope) that those are not too many in a
sane system at least. As a bonus, we could also add additional values
from the pmcw to the uevent; the device number, for example, could be
helpful for vfio-ccw matching rules.

A drawback is that we change the timing (not the sequence, AFAICS) of
the uevents, which might break brittle setups.

Thoughts?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-03 10:40 [RFD] uevent handling for subchannels Cornelia Huck
@ 2020-04-17 12:38 ` Cornelia Huck
  2020-04-20 15:29   ` Vineeth Vijayan
  2020-04-23 15:55   ` Halil Pasic
  0 siblings, 2 replies; 14+ messages in thread
From: Cornelia Huck @ 2020-04-17 12:38 UTC (permalink / raw)
  To: Peter Oberparleiter, Vineeth Vijayan
  Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski

Friendly ping.

On Fri, 3 Apr 2020 12:40:32 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> Hi,
> 
> this is kind-of-a-followup to the uevent patches I sent in
> <20200327124503.9794-1-cohuck@redhat.com> last Friday.
> 
> Currently, the common I/O layer will suppress uevents for subchannels
> that are being registered, delegating generating a delayed ADD uevent
> to the driver that actually binds to it and only generating the uevent
> itself if no driver gets bound. The initial version of that delaying
> was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for
> subchannels"); from what I remember, we were seeing quite bad storms of
> uevents on LPARs that had a lot of I/O subchannels with no device
> accessible through them.
> 
> So while there's definitely a good reason for wanting to delay uevents,
> it is also introducing problems. One is udev rules for subchannels that
> are supposed to do something before a driver binds (e.g. setting
> driver_override to bind an I/O subchannel to vfio_ccw instead of
> io_subchannel) are not effective, as the ADD uevent will only be
> generated when the io_subchannel driver is already done with doing all
> setup. Another one is that only the ADD uevent is generated after
> uevent suppression is lifted; any other uevents that might have been
> generated are lost.
> 
> So, what to do about this, especially in the light of vfio-ccw handling?
> 
> One idea I had is to call css_sch_is_valid() from
> css_register_subchannel(); this would exclude the largest class of
> non-operational subchannels already (those that don't have a valid
> device; I'm not quite sure if there's also something needed for EADM
> subchannels?) If we got rid of the uevent delaying, we would still get
> ADD/REMOVE events for subchannels where the device turns out to be
> non-accessible, but I believe (hope) that those are not too many in a
> sane system at least. As a bonus, we could also add additional values
> from the pmcw to the uevent; the device number, for example, could be
> helpful for vfio-ccw matching rules.
> 
> A drawback is that we change the timing (not the sequence, AFAICS) of
> the uevents, which might break brittle setups.
> 
> Thoughts?
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-17 12:38 ` Cornelia Huck
@ 2020-04-20 15:29   ` Vineeth Vijayan
  2020-04-23 14:52     ` Vineeth Vijayan
  2020-04-23 15:55   ` Halil Pasic
  1 sibling, 1 reply; 14+ messages in thread
From: Vineeth Vijayan @ 2020-04-20 15:29 UTC (permalink / raw)
  To: Cornelia Huck, Peter Oberparleiter, Vineeth Vijayan
  Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski

Thank you.

Looks like a reasonable idea. But a bit worried about the consequences of
this change. So i am trying to do some tests on LPARs and play around a
bit before i reply to you. For example, testing it on an LPAR which has 
got a large
number of ccw-devices which are DEV_STATE_NOT_OPER.

Let me get back to you after couple of tests.

On 4/17/20 2:38 PM, Cornelia Huck wrote:
> Friendly ping.
>
> On Fri, 3 Apr 2020 12:40:32 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> Hi,
>>
>> this is kind-of-a-followup to the uevent patches I sent in
>> <20200327124503.9794-1-cohuck@redhat.com> last Friday.
>>
>> Currently, the common I/O layer will suppress uevents for subchannels
>> that are being registered, delegating generating a delayed ADD uevent
>> to the driver that actually binds to it and only generating the uevent
>> itself if no driver gets bound. The initial version of that delaying
>> was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for
>> subchannels"); from what I remember, we were seeing quite bad storms of
>> uevents on LPARs that had a lot of I/O subchannels with no device
>> accessible through them.
>>
>> So while there's definitely a good reason for wanting to delay uevents,
>> it is also introducing problems. One is udev rules for subchannels that
>> are supposed to do something before a driver binds (e.g. setting
>> driver_override to bind an I/O subchannel to vfio_ccw instead of
>> io_subchannel) are not effective, as the ADD uevent will only be
>> generated when the io_subchannel driver is already done with doing all
>> setup. Another one is that only the ADD uevent is generated after
>> uevent suppression is lifted; any other uevents that might have been
>> generated are lost.
>>
>> So, what to do about this, especially in the light of vfio-ccw handling?
>>
>> One idea I had is to call css_sch_is_valid() from
>> css_register_subchannel(); this would exclude the largest class of
>> non-operational subchannels already (those that don't have a valid
>> device; I'm not quite sure if there's also something needed for EADM
>> subchannels?) If we got rid of the uevent delaying, we would still get
>> ADD/REMOVE events for subchannels where the device turns out to be
>> non-accessible, but I believe (hope) that those are not too many in a
>> sane system at least. As a bonus, we could also add additional values
>> from the pmcw to the uevent; the device number, for example, could be
>> helpful for vfio-ccw matching rules.
>>
>> A drawback is that we change the timing (not the sequence, AFAICS) of
>> the uevents, which might break brittle setups.
>>
>> Thoughts?
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-20 15:29   ` Vineeth Vijayan
@ 2020-04-23 14:52     ` Vineeth Vijayan
  2020-04-23 16:20       ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Vineeth Vijayan @ 2020-04-23 14:52 UTC (permalink / raw)
  To: Cornelia Huck, Peter Oberparleiter, Vineeth Vijayan
  Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski

Hi Corenelia,

few cents on this,

1. css_register_subchannel() is called only for valid subchannels, which 
is taken care in the
css_alloc_subchannel(). So Adding a second css_sch_is_valid() in 
css_register_subchannel()
might not help us here. We still need to find a mechanism to avoid the 
performance impact
because of the uevent-storm from IO-subchannels without any valid device 
on them.

2. We will have to find a way to get the AVAILABLE-VALID-CCW-device 
information from css layer,
which would help vfio-ccw drivers to work with the uevents when it is 
not suppressed.
Then we could also change the way ccw_device_call_sch_unregister() 
works, where
the subchannel-unregister is happening from an upper layer.

Vineeth

On 4/20/20 5:29 PM, Vineeth Vijayan wrote:
> Thank you.
>
> Looks like a reasonable idea. But a bit worried about the consequences of
> this change. So i am trying to do some tests on LPARs and play around a
> bit before i reply to you. For example, testing it on an LPAR which 
> has got a large
> number of ccw-devices which are DEV_STATE_NOT_OPER.
>
> Let me get back to you after couple of tests.
>
> On 4/17/20 2:38 PM, Cornelia Huck wrote:
>> Friendly ping.
>>
>> On Fri, 3 Apr 2020 12:40:32 +0200
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> this is kind-of-a-followup to the uevent patches I sent in
>>> <20200327124503.9794-1-cohuck@redhat.com> last Friday.
>>>
>>> Currently, the common I/O layer will suppress uevents for subchannels
>>> that are being registered, delegating generating a delayed ADD uevent
>>> to the driver that actually binds to it and only generating the uevent
>>> itself if no driver gets bound. The initial version of that delaying
>>> was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for
>>> subchannels"); from what I remember, we were seeing quite bad storms of
>>> uevents on LPARs that had a lot of I/O subchannels with no device
>>> accessible through them.
>>>
>>> So while there's definitely a good reason for wanting to delay uevents,
>>> it is also introducing problems. One is udev rules for subchannels that
>>> are supposed to do something before a driver binds (e.g. setting
>>> driver_override to bind an I/O subchannel to vfio_ccw instead of
>>> io_subchannel) are not effective, as the ADD uevent will only be
>>> generated when the io_subchannel driver is already done with doing all
>>> setup. Another one is that only the ADD uevent is generated after
>>> uevent suppression is lifted; any other uevents that might have been
>>> generated are lost.
>>>
>>> So, what to do about this, especially in the light of vfio-ccw 
>>> handling?
>>>
>>> One idea I had is to call css_sch_is_valid() from
>>> css_register_subchannel(); this would exclude the largest class of
>>> non-operational subchannels already (those that don't have a valid
>>> device; I'm not quite sure if there's also something needed for EADM
>>> subchannels?) If we got rid of the uevent delaying, we would still get
>>> ADD/REMOVE events for subchannels where the device turns out to be
>>> non-accessible, but I believe (hope) that those are not too many in a
>>> sane system at least. As a bonus, we could also add additional values
>>> from the pmcw to the uevent; the device number, for example, could be
>>> helpful for vfio-ccw matching rules.
>>>
>>> A drawback is that we change the timing (not the sequence, AFAICS) of
>>> the uevents, which might break brittle setups.
>>>
>>> Thoughts?
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-17 12:38 ` Cornelia Huck
  2020-04-20 15:29   ` Vineeth Vijayan
@ 2020-04-23 15:55   ` Halil Pasic
  2020-04-23 16:27     ` Cornelia Huck
  1 sibling, 1 reply; 14+ messages in thread
From: Halil Pasic @ 2020-04-23 15:55 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman,
	Boris Fiuczynski

On Fri, 17 Apr 2020 14:38:11 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> Friendly ping.
> 

Sorry for the late answer. I prefer to let Vineeth give us his opinion
first. I did invest some 30 minutes in understanding the problem, but
I'm not sure I understood it properly. According to my current
understanding, the current state of affairs is a mess, and the proposed
change wouldn't make the situation substantially cleaner, but it would
help with the problem at hand.

Conny, do you have more background information on uevent suppression
(is there some sort of a generic contract between kernel and userspace
for uevent suppression)?

From a quick grep it seems to me that most of the uses are about being
nice to userspace in a sense, that we want to make sure that when
event is received by userspace it can do it's thing, and does not have
to wait until the kernel has finished with the stuff that needs to be
done to reach a state of affairs that can be considered normal.

Regards,
Halil



> On Fri, 3 Apr 2020 12:40:32 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Hi,
> > 
> > this is kind-of-a-followup to the uevent patches I sent in
> > <20200327124503.9794-1-cohuck@redhat.com> last Friday.
> > 
> > Currently, the common I/O layer will suppress uevents for subchannels
> > that are being registered, delegating generating a delayed ADD uevent
> > to the driver that actually binds to it and only generating the uevent
> > itself if no driver gets bound. The initial version of that delaying
> > was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for
> > subchannels"); from what I remember, we were seeing quite bad storms of
> > uevents on LPARs that had a lot of I/O subchannels with no device
> > accessible through them.
[..]
> > Thoughts?
> > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-23 14:52     ` Vineeth Vijayan
@ 2020-04-23 16:20       ` Cornelia Huck
  2020-04-27 10:10         ` Peter Oberparleiter
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-04-23 16:20 UTC (permalink / raw)
  To: Vineeth Vijayan
  Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

On Thu, 23 Apr 2020 16:52:24 +0200
Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote:

> Hi Corenelia,
> 
> few cents on this,
> 
> 1. css_register_subchannel() is called only for valid subchannels, which 
> is taken care in the
> css_alloc_subchannel(). So Adding a second css_sch_is_valid() in 
> css_register_subchannel()
> might not help us here. We still need to find a mechanism to avoid the 
> performance impact
> because of the uevent-storm from IO-subchannels without any valid device 
> on them.

Ah, I missed that.

But I'm wondering whether the number of non-operational devices that
will end up not being registered is actually that high in a normal
setup.

The really bad case, as I understand it, is

0    ...    n      ...      m ... 0xffff
<nothing> <dev> <nothing> <dev> <nothing>

where we end up with large numbers of subchannels with !dnv prior to n
and between n and m. (On LPAR; z/VM and QEMU will usually have mostly
consecutive devices-on-subchannels, unless there has been a huge amount
of hotplug been going on.)

In this case, the !dnv check already prevents us from even registering
the device, so the only problematic devices left are those where we
fail to successfully drive I/O to -- are these very common on sane
setups?

(The code has seen some revisions since I introduced that suppression
stuff, maybe I'm missing something.)

> 
> 2. We will have to find a way to get the AVAILABLE-VALID-CCW-device 
> information from css layer,
> which would help vfio-ccw drivers to work with the uevents when it is 
> not suppressed.

But if we bind the subchannel to vfio-ccw, we do not have any ccw
device, right? Or am I misunderstanding?

> Then we could also change the way ccw_device_call_sch_unregister() 
> works, where
> the subchannel-unregister is happening from an upper layer.

Hm, what's the problem here? This seems to be mostly a case of "we did
I/O to the device and it appeared not operational; so we go ahead and
unregister the subchannel"? Childless I/O subchannels are a bit useless.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-23 15:55   ` Halil Pasic
@ 2020-04-23 16:27     ` Cornelia Huck
  0 siblings, 0 replies; 14+ messages in thread
From: Cornelia Huck @ 2020-04-23 16:27 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman,
	Boris Fiuczynski

On Thu, 23 Apr 2020 17:55:59 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 17 Apr 2020 14:38:11 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Friendly ping.
> >   
> 
> Sorry for the late answer. I prefer to let Vineeth give us his opinion
> first. I did invest some 30 minutes in understanding the problem, but
> I'm not sure I understood it properly. According to my current
> understanding, the current state of affairs is a mess, and the proposed
> change wouldn't make the situation substantially cleaner, but it would
> help with the problem at hand.
> 
> Conny, do you have more background information on uevent suppression
> (is there some sort of a generic contract between kernel and userspace
> for uevent suppression)?

I'm not aware of a 'generic contract'; it's probably a lot of 'we
thought it would be a good idea to do it like that'.

> 
> From a quick grep it seems to me that most of the uses are about being
> nice to userspace in a sense, that we want to make sure that when
> event is received by userspace it can do it's thing, and does not have
> to wait until the kernel has finished with the stuff that needs to be
> done to reach a state of affairs that can be considered normal.

The general idea is that once userspace is notified of the existence of
a device, it can go ahead and try to make use of it. This is not always
true; sometimes, there's a logic problem in how devices are
initialized, sometimes, we want indeed to avoid userspace do work that
will turn out to be useless. The latter was the intention with uevent
suppression for I/O subchannels: do not notify userspace of the
existence of an I/O subchannel, unless there's a valid ccw device
behind it. Of course, that leads to trouble for vfio-ccw.

Another idea: Have the add uevent for subchannels in all cases, but
adapt any udev rules that do anything other than set driver_override to
do nothing unless a (to be added) change uevent comes along.

(And yes, I agree that this is a mess.)

> 
> Regards,
> Halil
> 
> 
> 
> > On Fri, 3 Apr 2020 12:40:32 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> > > Hi,
> > > 
> > > this is kind-of-a-followup to the uevent patches I sent in
> > > <20200327124503.9794-1-cohuck@redhat.com> last Friday.
> > > 
> > > Currently, the common I/O layer will suppress uevents for subchannels
> > > that are being registered, delegating generating a delayed ADD uevent
> > > to the driver that actually binds to it and only generating the uevent
> > > itself if no driver gets bound. The initial version of that delaying
> > > was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for
> > > subchannels"); from what I remember, we were seeing quite bad storms of
> > > uevents on LPARs that had a lot of I/O subchannels with no device
> > > accessible through them.  
> [..]
> > > Thoughts?
> > >   
> >   
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-23 16:20       ` Cornelia Huck
@ 2020-04-27 10:10         ` Peter Oberparleiter
  2020-04-30 10:43           ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Oberparleiter @ 2020-04-27 10:10 UTC (permalink / raw)
  To: Cornelia Huck, Vineeth Vijayan
  Cc: Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski

On 23.04.2020 18:20, Cornelia Huck wrote:
> On Thu, 23 Apr 2020 16:52:24 +0200
> Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote:
>> Then we could also change the way ccw_device_call_sch_unregister() 
>> works, where
>> the subchannel-unregister is happening from an upper layer.
> 
> Hm, what's the problem here? This seems to be mostly a case of "we did
> I/O to the device and it appeared not operational; so we go ahead and
> unregister the subchannel"? Childless I/O subchannels are a bit useless.

Hey Conny,

sparked by your proposal, Vineeth and myself looked at the corresponding
CIO code and wondered if things couldn't be done in a generally
better/cleaner way. So here we'd like to get your opinion.

In particular, as it is currently, a child-driver (IO subchannel driver,
vfio-ccw, etc.) unregisters a device owned by a parent-device-driver
(CSS), which feels from a high-level-view like a layering violation:
only the parent driver should register and unregister the parent device.
Also in case no subchannel driver is available (e.g. due to
driver_override=none), there would be no subchannel ADD event at all.

So, tapping into you historical expertise about CIO, is there any reason
for doing it this way beyond being nice to userspace tooling that
subchannels with non-working CCW devices are automatically hidden by
unregistering them?

Removing the child-unregisters-parent logic this would also enable
manual rebind of subchannels for which only a different driver than the
default one can successfully talk to the child device, though I'm
unaware of any current application for that.


Regards,
  Peter

-- 
Peter Oberparleiter
Linux on Z Development - IBM Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-27 10:10         ` Peter Oberparleiter
@ 2020-04-30 10:43           ` Cornelia Huck
  2020-06-29 11:56             ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-04-30 10:43 UTC (permalink / raw)
  To: Peter Oberparleiter
  Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

On Mon, 27 Apr 2020 12:10:17 +0200
Peter Oberparleiter <oberpar@linux.ibm.com> wrote:

> On 23.04.2020 18:20, Cornelia Huck wrote:
> > On Thu, 23 Apr 2020 16:52:24 +0200
> > Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote:  
> >> Then we could also change the way ccw_device_call_sch_unregister() 
> >> works, where
> >> the subchannel-unregister is happening from an upper layer.  
> > 
> > Hm, what's the problem here? This seems to be mostly a case of "we did
> > I/O to the device and it appeared not operational; so we go ahead and
> > unregister the subchannel"? Childless I/O subchannels are a bit useless.  
> 
> Hey Conny,
> 
> sparked by your proposal, Vineeth and myself looked at the corresponding
> CIO code and wondered if things couldn't be done in a generally
> better/cleaner way. So here we'd like to get your opinion.
> 
> In particular, as it is currently, a child-driver (IO subchannel driver,
> vfio-ccw, etc.) unregisters a device owned by a parent-device-driver
> (CSS), which feels from a high-level-view like a layering violation:
> only the parent driver should register and unregister the parent device.
> Also in case no subchannel driver is available (e.g. due to
> driver_override=none), there would be no subchannel ADD event at all.

Doesn't the base css code generate the uevent in that case?

> 
> So, tapping into you historical expertise about CIO, is there any reason
> for doing it this way beyond being nice to userspace tooling that
> subchannels with non-working CCW devices are automatically hidden by
> unregistering them?

We always had ccw devices behind I/O subchannels, but that has not been
the case since we introduced vfio-ccw, so hopefully everybody can deal
with that. The rationale behind this was that device-less I/O
subchannels were deemed to be useless; I currently can't remember
another reason.

What about EADM, btw? CHSC does not have a device, and message does not
have a driver.

> 
> Removing the child-unregisters-parent logic this would also enable
> manual rebind of subchannels for which only a different driver than the
> default one can successfully talk to the child device, though I'm
> unaware of any current application for that.

Yes.

Let me think about that some more (no clear head currently, sorry.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-04-30 10:43           ` Cornelia Huck
@ 2020-06-29 11:56             ` Cornelia Huck
  2020-07-01  9:23               ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-06-29 11:56 UTC (permalink / raw)
  To: Peter Oberparleiter
  Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

On Thu, 30 Apr 2020 12:43:16 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

<It's been some time, but this topic has recently popped up again.>

> On Mon, 27 Apr 2020 12:10:17 +0200
> Peter Oberparleiter <oberpar@linux.ibm.com> wrote:
> 
> > On 23.04.2020 18:20, Cornelia Huck wrote:  
> > > On Thu, 23 Apr 2020 16:52:24 +0200
> > > Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote:    
> > >> Then we could also change the way ccw_device_call_sch_unregister() 
> > >> works, where
> > >> the subchannel-unregister is happening from an upper layer.    
> > > 
> > > Hm, what's the problem here? This seems to be mostly a case of "we did
> > > I/O to the device and it appeared not operational; so we go ahead and
> > > unregister the subchannel"? Childless I/O subchannels are a bit useless.    
> > 
> > Hey Conny,
> > 
> > sparked by your proposal, Vineeth and myself looked at the corresponding
> > CIO code and wondered if things couldn't be done in a generally
> > better/cleaner way. So here we'd like to get your opinion.
> > 
> > In particular, as it is currently, a child-driver (IO subchannel driver,
> > vfio-ccw, etc.) unregisters a device owned by a parent-device-driver
> > (CSS), which feels from a high-level-view like a layering violation:
> > only the parent driver should register and unregister the parent device.
> > Also in case no subchannel driver is available (e.g. due to
> > driver_override=none), there would be no subchannel ADD event at all.  
> 
> Doesn't the base css code generate the uevent in that case?

Just checked again, the code in css_register_subchannel() should indeed
take care of the !driver case. But still, even better if we can get rid
of it :)

> 
> > 
> > So, tapping into you historical expertise about CIO, is there any reason
> > for doing it this way beyond being nice to userspace tooling that
> > subchannels with non-working CCW devices are automatically hidden by
> > unregistering them?  
> 
> We always had ccw devices behind I/O subchannels, but that has not been
> the case since we introduced vfio-ccw, so hopefully everybody can deal
> with that. The rationale behind this was that device-less I/O
> subchannels were deemed to be useless; I currently can't remember
> another reason.
> 
> What about EADM, btw? CHSC does not have a device, and message does not
> have a driver.

Just checked EADM; it does not have a child device.

> 
> > 
> > Removing the child-unregisters-parent logic this would also enable
> > manual rebind of subchannels for which only a different driver than the
> > default one can successfully talk to the child device, though I'm
> > unaware of any current application for that.  
> 
> Yes.
> 
> Let me think about that some more (no clear head currently, sorry.)

Ok, so I've resumed the thinking process, and I think getting rid of
the "no I/O subchannel without functional device" approach is a good
idea, and allows us to make handling driver matches more similar to
everyone else.

What changes would be needed?
* The whole logic to suppress uevents for subchannels and generate one
  later will go. (Touches the various subchannel driver, including
  vfio-ccw.)
* ccw_device_todo() can just unregister the ccw device, and there's no
  longer a need for ccw_device_call_sch_unregister(). (IIUC, this also
  covers setting disconnected devices offline.)
* As the I/O subchannel driver now needs to deal with cases where no
  ccw device is available, the code for that needs to be checked.
  (That's probably the most time-consuming task.)

Userspace should be fine with I/O subchannels without ccw device,
that's nothing new.

Does that sound reasonable?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-06-29 11:56             ` Cornelia Huck
@ 2020-07-01  9:23               ` Cornelia Huck
  2020-09-14 11:46                 ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-07-01  9:23 UTC (permalink / raw)
  To: Peter Oberparleiter
  Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

On Mon, 29 Jun 2020 13:56:31 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> Ok, so I've resumed the thinking process, and I think getting rid of
> the "no I/O subchannel without functional device" approach is a good
> idea, and allows us to make handling driver matches more similar to
> everyone else.

As an aside, there's another odd construct: the I/O subchannel driver
*always* binds to the subchannel device, even if there is a problem,
and schedules an unregistration of the subchannel device on error. This
was introduced because events from machine check handling are not
processed if there isn't a driver (at least I thought back then that it
was a good idea.) I think a more correct way to handle this would be to
do the following:

* If something doesn't work, clean up and return an error in the probe
  function. The subchannel device stays around, it's just not bound.
* Have the css bus do some basic processing for subchannels not bound
  to any driver (e.g., check dnv/w). This would also make it possible
  to unregister dead message subchannels if a machine check is received
  for them (don't know if that's an actual problem in pratice.)

> 
> What changes would be needed?
> * The whole logic to suppress uevents for subchannels and generate one
>   later will go. (Touches the various subchannel driver, including
>   vfio-ccw.)
> * ccw_device_todo() can just unregister the ccw device, and there's no
>   longer a need for ccw_device_call_sch_unregister(). (IIUC, this also
>   covers setting disconnected devices offline.)

I'm actually not sure if unregistration-by-driver is the right thing
for most cases (except for something like disconnected device removal),
that should be done by the bus. Maybe something for later (don't fear,
I don't plan to work on the common I/O layer again :)

> * As the I/O subchannel driver now needs to deal with cases where no
>   ccw device is available, the code for that needs to be checked.
>   (That's probably the most time-consuming task.)

Had a quick look, doesn't actually look too bad (most places already
check for !cdev.)

> 
> Userspace should be fine with I/O subchannels without ccw device,
> that's nothing new.
> 
> Does that sound reasonable?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-07-01  9:23               ` Cornelia Huck
@ 2020-09-14 11:46                 ` Cornelia Huck
  2020-09-15 10:25                   ` Vineeth Vijayan
  0 siblings, 1 reply; 14+ messages in thread
From: Cornelia Huck @ 2020-09-14 11:46 UTC (permalink / raw)
  To: Peter Oberparleiter
  Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

<casts "reanimate" on dead thread>

On Wed, 1 Jul 2020 11:23:13 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 29 Jun 2020 13:56:31 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Ok, so I've resumed the thinking process, and I think getting rid of
> > the "no I/O subchannel without functional device" approach is a good
> > idea, and allows us to make handling driver matches more similar to
> > everyone else.  
> 
> As an aside, there's another odd construct: the I/O subchannel driver
> *always* binds to the subchannel device, even if there is a problem,
> and schedules an unregistration of the subchannel device on error. This
> was introduced because events from machine check handling are not
> processed if there isn't a driver (at least I thought back then that it
> was a good idea.) I think a more correct way to handle this would be to
> do the following:
> 
> * If something doesn't work, clean up and return an error in the probe
>   function. The subchannel device stays around, it's just not bound.
> * Have the css bus do some basic processing for subchannels not bound
>   to any driver (e.g., check dnv/w). This would also make it possible
>   to unregister dead message subchannels if a machine check is received
>   for them (don't know if that's an actual problem in pratice.)
> 
> > 
> > What changes would be needed?
> > * The whole logic to suppress uevents for subchannels and generate one
> >   later will go. (Touches the various subchannel driver, including
> >   vfio-ccw.)
> > * ccw_device_todo() can just unregister the ccw device, and there's no
> >   longer a need for ccw_device_call_sch_unregister(). (IIUC, this also
> >   covers setting disconnected devices offline.)  
> 
> I'm actually not sure if unregistration-by-driver is the right thing
> for most cases (except for something like disconnected device removal),
> that should be done by the bus. Maybe something for later (don't fear,
> I don't plan to work on the common I/O layer again :)
> 
> > * As the I/O subchannel driver now needs to deal with cases where no
> >   ccw device is available, the code for that needs to be checked.
> >   (That's probably the most time-consuming task.)  
> 
> Had a quick look, doesn't actually look too bad (most places already
> check for !cdev.)
> 
> > 
> > Userspace should be fine with I/O subchannels without ccw device,
> > that's nothing new.
> > 
> > Does that sound reasonable?  

Is anybody looking at this? The delayed uevent handling is a bit of a
mess for management of vfio-ccw devices...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-09-14 11:46                 ` Cornelia Huck
@ 2020-09-15 10:25                   ` Vineeth Vijayan
  2020-09-15 10:31                     ` Cornelia Huck
  0 siblings, 1 reply; 14+ messages in thread
From: Vineeth Vijayan @ 2020-09-15 10:25 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

Hi Conny,
Thank you for the ping. This is pending for a long time and we just resumed
looking at this approach. Me and Peter had a short discussion on this and we
are trying to do some cleaning as well on the CIO layer while working 
with the
new approach.

There are few uncertainties with the new approach, which we would like 
to clarify
and i am trying to test it.For example, the impact on the way 
cio-console driver works.
So, as discussed with Peter, let me make a smaller draft which will 
consolidate
your approach and the "cleaning" part that we would like to do.

As mentioned, i will resume working on this and get back to you with 
myfindings.

Thank you,
Vineeth



On 9/14/20 1:46 PM, Cornelia Huck wrote:
> <casts "reanimate" on dead thread>
>
> On Wed, 1 Jul 2020 11:23:13 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Mon, 29 Jun 2020 13:56:31 +0200
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> Ok, so I've resumed the thinking process, and I think getting rid of
>>> the "no I/O subchannel without functional device" approach is a good
>>> idea, and allows us to make handling driver matches more similar to
>>> everyone else.
>> As an aside, there's another odd construct: the I/O subchannel driver
>> *always* binds to the subchannel device, even if there is a problem,
>> and schedules an unregistration of the subchannel device on error. This
>> was introduced because events from machine check handling are not
>> processed if there isn't a driver (at least I thought back then that it
>> was a good idea.) I think a more correct way to handle this would be to
>> do the following:
>>
>> * If something doesn't work, clean up and return an error in the probe
>>    function. The subchannel device stays around, it's just not bound.
>> * Have the css bus do some basic processing for subchannels not bound
>>    to any driver (e.g., check dnv/w). This would also make it possible
>>    to unregister dead message subchannels if a machine check is received
>>    for them (don't know if that's an actual problem in pratice.)
>>
>>> What changes would be needed?
>>> * The whole logic to suppress uevents for subchannels and generate one
>>>    later will go. (Touches the various subchannel driver, including
>>>    vfio-ccw.)
>>> * ccw_device_todo() can just unregister the ccw device, and there's no
>>>    longer a need for ccw_device_call_sch_unregister(). (IIUC, this also
>>>    covers setting disconnected devices offline.)
>> I'm actually not sure if unregistration-by-driver is the right thing
>> for most cases (except for something like disconnected device removal),
>> that should be done by the bus. Maybe something for later (don't fear,
>> I don't plan to work on the common I/O layer again :)
>>
>>> * As the I/O subchannel driver now needs to deal with cases where no
>>>    ccw device is available, the code for that needs to be checked.
>>>    (That's probably the most time-consuming task.)
>> Had a quick look, doesn't actually look too bad (most places already
>> check for !cdev.)
>>
>>> Userspace should be fine with I/O subchannels without ccw device,
>>> that's nothing new.
>>>
>>> Does that sound reasonable?
> Is anybody looking at this? The delayed uevent handling is a bit of a
> mess for management of vfio-ccw devices...
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] uevent handling for subchannels
  2020-09-15 10:25                   ` Vineeth Vijayan
@ 2020-09-15 10:31                     ` Cornelia Huck
  0 siblings, 0 replies; 14+ messages in thread
From: Cornelia Huck @ 2020-09-15 10:31 UTC (permalink / raw)
  To: Vineeth Vijayan
  Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman,
	Halil Pasic, Boris Fiuczynski

On Tue, 15 Sep 2020 12:25:32 +0200
Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote:

> Hi Conny,
> Thank you for the ping. This is pending for a long time and we just resumed
> looking at this approach. Me and Peter had a short discussion on this and we
> are trying to do some cleaning as well on the CIO layer while working 
> with the
> new approach.
> 
> There are few uncertainties with the new approach, which we would like 
> to clarify
> and i am trying to test it.For example, the impact on the way 
> cio-console driver works.

Yes, the console might be fun.

> So, as discussed with Peter, let me make a smaller draft which will 
> consolidate
> your approach and the "cleaning" part that we would like to do.
> 
> As mentioned, i will resume working on this and get back to you with 
> myfindings.

Thanks for the update; I'll do my best to find some time to look at any
patches you send.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-15 10:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-03 10:40 [RFD] uevent handling for subchannels Cornelia Huck
2020-04-17 12:38 ` Cornelia Huck
2020-04-20 15:29   ` Vineeth Vijayan
2020-04-23 14:52     ` Vineeth Vijayan
2020-04-23 16:20       ` Cornelia Huck
2020-04-27 10:10         ` Peter Oberparleiter
2020-04-30 10:43           ` Cornelia Huck
2020-06-29 11:56             ` Cornelia Huck
2020-07-01  9:23               ` Cornelia Huck
2020-09-14 11:46                 ` Cornelia Huck
2020-09-15 10:25                   ` Vineeth Vijayan
2020-09-15 10:31                     ` Cornelia Huck
2020-04-23 15:55   ` Halil Pasic
2020-04-23 16:27     ` Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.