* [RFD] uevent handling for subchannels @ 2020-04-03 10:40 Cornelia Huck 2020-04-17 12:38 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Cornelia Huck @ 2020-04-03 10:40 UTC (permalink / raw) To: Peter Oberparleiter, Vineeth Vijayan Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski Hi, this is kind-of-a-followup to the uevent patches I sent in <20200327124503.9794-1-cohuck@redhat.com> last Friday. Currently, the common I/O layer will suppress uevents for subchannels that are being registered, delegating generating a delayed ADD uevent to the driver that actually binds to it and only generating the uevent itself if no driver gets bound. The initial version of that delaying was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for subchannels"); from what I remember, we were seeing quite bad storms of uevents on LPARs that had a lot of I/O subchannels with no device accessible through them. So while there's definitely a good reason for wanting to delay uevents, it is also introducing problems. One is udev rules for subchannels that are supposed to do something before a driver binds (e.g. setting driver_override to bind an I/O subchannel to vfio_ccw instead of io_subchannel) are not effective, as the ADD uevent will only be generated when the io_subchannel driver is already done with doing all setup. Another one is that only the ADD uevent is generated after uevent suppression is lifted; any other uevents that might have been generated are lost. So, what to do about this, especially in the light of vfio-ccw handling? One idea I had is to call css_sch_is_valid() from css_register_subchannel(); this would exclude the largest class of non-operational subchannels already (those that don't have a valid device; I'm not quite sure if there's also something needed for EADM subchannels?) If we got rid of the uevent delaying, we would still get ADD/REMOVE events for subchannels where the device turns out to be non-accessible, but I believe (hope) that those are not too many in a sane system at least. As a bonus, we could also add additional values from the pmcw to the uevent; the device number, for example, could be helpful for vfio-ccw matching rules. A drawback is that we change the timing (not the sequence, AFAICS) of the uevents, which might break brittle setups. Thoughts? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-03 10:40 [RFD] uevent handling for subchannels Cornelia Huck @ 2020-04-17 12:38 ` Cornelia Huck 2020-04-20 15:29 ` Vineeth Vijayan 2020-04-23 15:55 ` Halil Pasic 0 siblings, 2 replies; 14+ messages in thread From: Cornelia Huck @ 2020-04-17 12:38 UTC (permalink / raw) To: Peter Oberparleiter, Vineeth Vijayan Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski Friendly ping. On Fri, 3 Apr 2020 12:40:32 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > Hi, > > this is kind-of-a-followup to the uevent patches I sent in > <20200327124503.9794-1-cohuck@redhat.com> last Friday. > > Currently, the common I/O layer will suppress uevents for subchannels > that are being registered, delegating generating a delayed ADD uevent > to the driver that actually binds to it and only generating the uevent > itself if no driver gets bound. The initial version of that delaying > was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for > subchannels"); from what I remember, we were seeing quite bad storms of > uevents on LPARs that had a lot of I/O subchannels with no device > accessible through them. > > So while there's definitely a good reason for wanting to delay uevents, > it is also introducing problems. One is udev rules for subchannels that > are supposed to do something before a driver binds (e.g. setting > driver_override to bind an I/O subchannel to vfio_ccw instead of > io_subchannel) are not effective, as the ADD uevent will only be > generated when the io_subchannel driver is already done with doing all > setup. Another one is that only the ADD uevent is generated after > uevent suppression is lifted; any other uevents that might have been > generated are lost. > > So, what to do about this, especially in the light of vfio-ccw handling? > > One idea I had is to call css_sch_is_valid() from > css_register_subchannel(); this would exclude the largest class of > non-operational subchannels already (those that don't have a valid > device; I'm not quite sure if there's also something needed for EADM > subchannels?) If we got rid of the uevent delaying, we would still get > ADD/REMOVE events for subchannels where the device turns out to be > non-accessible, but I believe (hope) that those are not too many in a > sane system at least. As a bonus, we could also add additional values > from the pmcw to the uevent; the device number, for example, could be > helpful for vfio-ccw matching rules. > > A drawback is that we change the timing (not the sequence, AFAICS) of > the uevents, which might break brittle setups. > > Thoughts? > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-17 12:38 ` Cornelia Huck @ 2020-04-20 15:29 ` Vineeth Vijayan 2020-04-23 14:52 ` Vineeth Vijayan 2020-04-23 15:55 ` Halil Pasic 1 sibling, 1 reply; 14+ messages in thread From: Vineeth Vijayan @ 2020-04-20 15:29 UTC (permalink / raw) To: Cornelia Huck, Peter Oberparleiter, Vineeth Vijayan Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski Thank you. Looks like a reasonable idea. But a bit worried about the consequences of this change. So i am trying to do some tests on LPARs and play around a bit before i reply to you. For example, testing it on an LPAR which has got a large number of ccw-devices which are DEV_STATE_NOT_OPER. Let me get back to you after couple of tests. On 4/17/20 2:38 PM, Cornelia Huck wrote: > Friendly ping. > > On Fri, 3 Apr 2020 12:40:32 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > >> Hi, >> >> this is kind-of-a-followup to the uevent patches I sent in >> <20200327124503.9794-1-cohuck@redhat.com> last Friday. >> >> Currently, the common I/O layer will suppress uevents for subchannels >> that are being registered, delegating generating a delayed ADD uevent >> to the driver that actually binds to it and only generating the uevent >> itself if no driver gets bound. The initial version of that delaying >> was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for >> subchannels"); from what I remember, we were seeing quite bad storms of >> uevents on LPARs that had a lot of I/O subchannels with no device >> accessible through them. >> >> So while there's definitely a good reason for wanting to delay uevents, >> it is also introducing problems. One is udev rules for subchannels that >> are supposed to do something before a driver binds (e.g. setting >> driver_override to bind an I/O subchannel to vfio_ccw instead of >> io_subchannel) are not effective, as the ADD uevent will only be >> generated when the io_subchannel driver is already done with doing all >> setup. Another one is that only the ADD uevent is generated after >> uevent suppression is lifted; any other uevents that might have been >> generated are lost. >> >> So, what to do about this, especially in the light of vfio-ccw handling? >> >> One idea I had is to call css_sch_is_valid() from >> css_register_subchannel(); this would exclude the largest class of >> non-operational subchannels already (those that don't have a valid >> device; I'm not quite sure if there's also something needed for EADM >> subchannels?) If we got rid of the uevent delaying, we would still get >> ADD/REMOVE events for subchannels where the device turns out to be >> non-accessible, but I believe (hope) that those are not too many in a >> sane system at least. As a bonus, we could also add additional values >> from the pmcw to the uevent; the device number, for example, could be >> helpful for vfio-ccw matching rules. >> >> A drawback is that we change the timing (not the sequence, AFAICS) of >> the uevents, which might break brittle setups. >> >> Thoughts? >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-20 15:29 ` Vineeth Vijayan @ 2020-04-23 14:52 ` Vineeth Vijayan 2020-04-23 16:20 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Vineeth Vijayan @ 2020-04-23 14:52 UTC (permalink / raw) To: Cornelia Huck, Peter Oberparleiter, Vineeth Vijayan Cc: linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski Hi Corenelia, few cents on this, 1. css_register_subchannel() is called only for valid subchannels, which is taken care in the css_alloc_subchannel(). So Adding a second css_sch_is_valid() in css_register_subchannel() might not help us here. We still need to find a mechanism to avoid the performance impact because of the uevent-storm from IO-subchannels without any valid device on them. 2. We will have to find a way to get the AVAILABLE-VALID-CCW-device information from css layer, which would help vfio-ccw drivers to work with the uevents when it is not suppressed. Then we could also change the way ccw_device_call_sch_unregister() works, where the subchannel-unregister is happening from an upper layer. Vineeth On 4/20/20 5:29 PM, Vineeth Vijayan wrote: > Thank you. > > Looks like a reasonable idea. But a bit worried about the consequences of > this change. So i am trying to do some tests on LPARs and play around a > bit before i reply to you. For example, testing it on an LPAR which > has got a large > number of ccw-devices which are DEV_STATE_NOT_OPER. > > Let me get back to you after couple of tests. > > On 4/17/20 2:38 PM, Cornelia Huck wrote: >> Friendly ping. >> >> On Fri, 3 Apr 2020 12:40:32 +0200 >> Cornelia Huck <cohuck@redhat.com> wrote: >> >>> Hi, >>> >>> this is kind-of-a-followup to the uevent patches I sent in >>> <20200327124503.9794-1-cohuck@redhat.com> last Friday. >>> >>> Currently, the common I/O layer will suppress uevents for subchannels >>> that are being registered, delegating generating a delayed ADD uevent >>> to the driver that actually binds to it and only generating the uevent >>> itself if no driver gets bound. The initial version of that delaying >>> was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for >>> subchannels"); from what I remember, we were seeing quite bad storms of >>> uevents on LPARs that had a lot of I/O subchannels with no device >>> accessible through them. >>> >>> So while there's definitely a good reason for wanting to delay uevents, >>> it is also introducing problems. One is udev rules for subchannels that >>> are supposed to do something before a driver binds (e.g. setting >>> driver_override to bind an I/O subchannel to vfio_ccw instead of >>> io_subchannel) are not effective, as the ADD uevent will only be >>> generated when the io_subchannel driver is already done with doing all >>> setup. Another one is that only the ADD uevent is generated after >>> uevent suppression is lifted; any other uevents that might have been >>> generated are lost. >>> >>> So, what to do about this, especially in the light of vfio-ccw >>> handling? >>> >>> One idea I had is to call css_sch_is_valid() from >>> css_register_subchannel(); this would exclude the largest class of >>> non-operational subchannels already (those that don't have a valid >>> device; I'm not quite sure if there's also something needed for EADM >>> subchannels?) If we got rid of the uevent delaying, we would still get >>> ADD/REMOVE events for subchannels where the device turns out to be >>> non-accessible, but I believe (hope) that those are not too many in a >>> sane system at least. As a bonus, we could also add additional values >>> from the pmcw to the uevent; the device number, for example, could be >>> helpful for vfio-ccw matching rules. >>> >>> A drawback is that we change the timing (not the sequence, AFAICS) of >>> the uevents, which might break brittle setups. >>> >>> Thoughts? >>> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-23 14:52 ` Vineeth Vijayan @ 2020-04-23 16:20 ` Cornelia Huck 2020-04-27 10:10 ` Peter Oberparleiter 0 siblings, 1 reply; 14+ messages in thread From: Cornelia Huck @ 2020-04-23 16:20 UTC (permalink / raw) To: Vineeth Vijayan Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski On Thu, 23 Apr 2020 16:52:24 +0200 Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote: > Hi Corenelia, > > few cents on this, > > 1. css_register_subchannel() is called only for valid subchannels, which > is taken care in the > css_alloc_subchannel(). So Adding a second css_sch_is_valid() in > css_register_subchannel() > might not help us here. We still need to find a mechanism to avoid the > performance impact > because of the uevent-storm from IO-subchannels without any valid device > on them. Ah, I missed that. But I'm wondering whether the number of non-operational devices that will end up not being registered is actually that high in a normal setup. The really bad case, as I understand it, is 0 ... n ... m ... 0xffff <nothing> <dev> <nothing> <dev> <nothing> where we end up with large numbers of subchannels with !dnv prior to n and between n and m. (On LPAR; z/VM and QEMU will usually have mostly consecutive devices-on-subchannels, unless there has been a huge amount of hotplug been going on.) In this case, the !dnv check already prevents us from even registering the device, so the only problematic devices left are those where we fail to successfully drive I/O to -- are these very common on sane setups? (The code has seen some revisions since I introduced that suppression stuff, maybe I'm missing something.) > > 2. We will have to find a way to get the AVAILABLE-VALID-CCW-device > information from css layer, > which would help vfio-ccw drivers to work with the uevents when it is > not suppressed. But if we bind the subchannel to vfio-ccw, we do not have any ccw device, right? Or am I misunderstanding? > Then we could also change the way ccw_device_call_sch_unregister() > works, where > the subchannel-unregister is happening from an upper layer. Hm, what's the problem here? This seems to be mostly a case of "we did I/O to the device and it appeared not operational; so we go ahead and unregister the subchannel"? Childless I/O subchannels are a bit useless. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-23 16:20 ` Cornelia Huck @ 2020-04-27 10:10 ` Peter Oberparleiter 2020-04-30 10:43 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Peter Oberparleiter @ 2020-04-27 10:10 UTC (permalink / raw) To: Cornelia Huck, Vineeth Vijayan Cc: Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski On 23.04.2020 18:20, Cornelia Huck wrote: > On Thu, 23 Apr 2020 16:52:24 +0200 > Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote: >> Then we could also change the way ccw_device_call_sch_unregister() >> works, where >> the subchannel-unregister is happening from an upper layer. > > Hm, what's the problem here? This seems to be mostly a case of "we did > I/O to the device and it appeared not operational; so we go ahead and > unregister the subchannel"? Childless I/O subchannels are a bit useless. Hey Conny, sparked by your proposal, Vineeth and myself looked at the corresponding CIO code and wondered if things couldn't be done in a generally better/cleaner way. So here we'd like to get your opinion. In particular, as it is currently, a child-driver (IO subchannel driver, vfio-ccw, etc.) unregisters a device owned by a parent-device-driver (CSS), which feels from a high-level-view like a layering violation: only the parent driver should register and unregister the parent device. Also in case no subchannel driver is available (e.g. due to driver_override=none), there would be no subchannel ADD event at all. So, tapping into you historical expertise about CIO, is there any reason for doing it this way beyond being nice to userspace tooling that subchannels with non-working CCW devices are automatically hidden by unregistering them? Removing the child-unregisters-parent logic this would also enable manual rebind of subchannels for which only a different driver than the default one can successfully talk to the child device, though I'm unaware of any current application for that. Regards, Peter -- Peter Oberparleiter Linux on Z Development - IBM Germany ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-27 10:10 ` Peter Oberparleiter @ 2020-04-30 10:43 ` Cornelia Huck 2020-06-29 11:56 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Cornelia Huck @ 2020-04-30 10:43 UTC (permalink / raw) To: Peter Oberparleiter Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski On Mon, 27 Apr 2020 12:10:17 +0200 Peter Oberparleiter <oberpar@linux.ibm.com> wrote: > On 23.04.2020 18:20, Cornelia Huck wrote: > > On Thu, 23 Apr 2020 16:52:24 +0200 > > Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote: > >> Then we could also change the way ccw_device_call_sch_unregister() > >> works, where > >> the subchannel-unregister is happening from an upper layer. > > > > Hm, what's the problem here? This seems to be mostly a case of "we did > > I/O to the device and it appeared not operational; so we go ahead and > > unregister the subchannel"? Childless I/O subchannels are a bit useless. > > Hey Conny, > > sparked by your proposal, Vineeth and myself looked at the corresponding > CIO code and wondered if things couldn't be done in a generally > better/cleaner way. So here we'd like to get your opinion. > > In particular, as it is currently, a child-driver (IO subchannel driver, > vfio-ccw, etc.) unregisters a device owned by a parent-device-driver > (CSS), which feels from a high-level-view like a layering violation: > only the parent driver should register and unregister the parent device. > Also in case no subchannel driver is available (e.g. due to > driver_override=none), there would be no subchannel ADD event at all. Doesn't the base css code generate the uevent in that case? > > So, tapping into you historical expertise about CIO, is there any reason > for doing it this way beyond being nice to userspace tooling that > subchannels with non-working CCW devices are automatically hidden by > unregistering them? We always had ccw devices behind I/O subchannels, but that has not been the case since we introduced vfio-ccw, so hopefully everybody can deal with that. The rationale behind this was that device-less I/O subchannels were deemed to be useless; I currently can't remember another reason. What about EADM, btw? CHSC does not have a device, and message does not have a driver. > > Removing the child-unregisters-parent logic this would also enable > manual rebind of subchannels for which only a different driver than the > default one can successfully talk to the child device, though I'm > unaware of any current application for that. Yes. Let me think about that some more (no clear head currently, sorry.) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-30 10:43 ` Cornelia Huck @ 2020-06-29 11:56 ` Cornelia Huck 2020-07-01 9:23 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Cornelia Huck @ 2020-06-29 11:56 UTC (permalink / raw) To: Peter Oberparleiter Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski On Thu, 30 Apr 2020 12:43:16 +0200 Cornelia Huck <cohuck@redhat.com> wrote: <It's been some time, but this topic has recently popped up again.> > On Mon, 27 Apr 2020 12:10:17 +0200 > Peter Oberparleiter <oberpar@linux.ibm.com> wrote: > > > On 23.04.2020 18:20, Cornelia Huck wrote: > > > On Thu, 23 Apr 2020 16:52:24 +0200 > > > Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote: > > >> Then we could also change the way ccw_device_call_sch_unregister() > > >> works, where > > >> the subchannel-unregister is happening from an upper layer. > > > > > > Hm, what's the problem here? This seems to be mostly a case of "we did > > > I/O to the device and it appeared not operational; so we go ahead and > > > unregister the subchannel"? Childless I/O subchannels are a bit useless. > > > > Hey Conny, > > > > sparked by your proposal, Vineeth and myself looked at the corresponding > > CIO code and wondered if things couldn't be done in a generally > > better/cleaner way. So here we'd like to get your opinion. > > > > In particular, as it is currently, a child-driver (IO subchannel driver, > > vfio-ccw, etc.) unregisters a device owned by a parent-device-driver > > (CSS), which feels from a high-level-view like a layering violation: > > only the parent driver should register and unregister the parent device. > > Also in case no subchannel driver is available (e.g. due to > > driver_override=none), there would be no subchannel ADD event at all. > > Doesn't the base css code generate the uevent in that case? Just checked again, the code in css_register_subchannel() should indeed take care of the !driver case. But still, even better if we can get rid of it :) > > > > > So, tapping into you historical expertise about CIO, is there any reason > > for doing it this way beyond being nice to userspace tooling that > > subchannels with non-working CCW devices are automatically hidden by > > unregistering them? > > We always had ccw devices behind I/O subchannels, but that has not been > the case since we introduced vfio-ccw, so hopefully everybody can deal > with that. The rationale behind this was that device-less I/O > subchannels were deemed to be useless; I currently can't remember > another reason. > > What about EADM, btw? CHSC does not have a device, and message does not > have a driver. Just checked EADM; it does not have a child device. > > > > > Removing the child-unregisters-parent logic this would also enable > > manual rebind of subchannels for which only a different driver than the > > default one can successfully talk to the child device, though I'm > > unaware of any current application for that. > > Yes. > > Let me think about that some more (no clear head currently, sorry.) Ok, so I've resumed the thinking process, and I think getting rid of the "no I/O subchannel without functional device" approach is a good idea, and allows us to make handling driver matches more similar to everyone else. What changes would be needed? * The whole logic to suppress uevents for subchannels and generate one later will go. (Touches the various subchannel driver, including vfio-ccw.) * ccw_device_todo() can just unregister the ccw device, and there's no longer a need for ccw_device_call_sch_unregister(). (IIUC, this also covers setting disconnected devices offline.) * As the I/O subchannel driver now needs to deal with cases where no ccw device is available, the code for that needs to be checked. (That's probably the most time-consuming task.) Userspace should be fine with I/O subchannels without ccw device, that's nothing new. Does that sound reasonable? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-06-29 11:56 ` Cornelia Huck @ 2020-07-01 9:23 ` Cornelia Huck 2020-09-14 11:46 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Cornelia Huck @ 2020-07-01 9:23 UTC (permalink / raw) To: Peter Oberparleiter Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski On Mon, 29 Jun 2020 13:56:31 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > Ok, so I've resumed the thinking process, and I think getting rid of > the "no I/O subchannel without functional device" approach is a good > idea, and allows us to make handling driver matches more similar to > everyone else. As an aside, there's another odd construct: the I/O subchannel driver *always* binds to the subchannel device, even if there is a problem, and schedules an unregistration of the subchannel device on error. This was introduced because events from machine check handling are not processed if there isn't a driver (at least I thought back then that it was a good idea.) I think a more correct way to handle this would be to do the following: * If something doesn't work, clean up and return an error in the probe function. The subchannel device stays around, it's just not bound. * Have the css bus do some basic processing for subchannels not bound to any driver (e.g., check dnv/w). This would also make it possible to unregister dead message subchannels if a machine check is received for them (don't know if that's an actual problem in pratice.) > > What changes would be needed? > * The whole logic to suppress uevents for subchannels and generate one > later will go. (Touches the various subchannel driver, including > vfio-ccw.) > * ccw_device_todo() can just unregister the ccw device, and there's no > longer a need for ccw_device_call_sch_unregister(). (IIUC, this also > covers setting disconnected devices offline.) I'm actually not sure if unregistration-by-driver is the right thing for most cases (except for something like disconnected device removal), that should be done by the bus. Maybe something for later (don't fear, I don't plan to work on the common I/O layer again :) > * As the I/O subchannel driver now needs to deal with cases where no > ccw device is available, the code for that needs to be checked. > (That's probably the most time-consuming task.) Had a quick look, doesn't actually look too bad (most places already check for !cdev.) > > Userspace should be fine with I/O subchannels without ccw device, > that's nothing new. > > Does that sound reasonable? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-07-01 9:23 ` Cornelia Huck @ 2020-09-14 11:46 ` Cornelia Huck 2020-09-15 10:25 ` Vineeth Vijayan 0 siblings, 1 reply; 14+ messages in thread From: Cornelia Huck @ 2020-09-14 11:46 UTC (permalink / raw) To: Peter Oberparleiter Cc: Vineeth Vijayan, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski <casts "reanimate" on dead thread> On Wed, 1 Jul 2020 11:23:13 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Mon, 29 Jun 2020 13:56:31 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > > > Ok, so I've resumed the thinking process, and I think getting rid of > > the "no I/O subchannel without functional device" approach is a good > > idea, and allows us to make handling driver matches more similar to > > everyone else. > > As an aside, there's another odd construct: the I/O subchannel driver > *always* binds to the subchannel device, even if there is a problem, > and schedules an unregistration of the subchannel device on error. This > was introduced because events from machine check handling are not > processed if there isn't a driver (at least I thought back then that it > was a good idea.) I think a more correct way to handle this would be to > do the following: > > * If something doesn't work, clean up and return an error in the probe > function. The subchannel device stays around, it's just not bound. > * Have the css bus do some basic processing for subchannels not bound > to any driver (e.g., check dnv/w). This would also make it possible > to unregister dead message subchannels if a machine check is received > for them (don't know if that's an actual problem in pratice.) > > > > > What changes would be needed? > > * The whole logic to suppress uevents for subchannels and generate one > > later will go. (Touches the various subchannel driver, including > > vfio-ccw.) > > * ccw_device_todo() can just unregister the ccw device, and there's no > > longer a need for ccw_device_call_sch_unregister(). (IIUC, this also > > covers setting disconnected devices offline.) > > I'm actually not sure if unregistration-by-driver is the right thing > for most cases (except for something like disconnected device removal), > that should be done by the bus. Maybe something for later (don't fear, > I don't plan to work on the common I/O layer again :) > > > * As the I/O subchannel driver now needs to deal with cases where no > > ccw device is available, the code for that needs to be checked. > > (That's probably the most time-consuming task.) > > Had a quick look, doesn't actually look too bad (most places already > check for !cdev.) > > > > > Userspace should be fine with I/O subchannels without ccw device, > > that's nothing new. > > > > Does that sound reasonable? Is anybody looking at this? The delayed uevent handling is a bit of a mess for management of vfio-ccw devices... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-09-14 11:46 ` Cornelia Huck @ 2020-09-15 10:25 ` Vineeth Vijayan 2020-09-15 10:31 ` Cornelia Huck 0 siblings, 1 reply; 14+ messages in thread From: Vineeth Vijayan @ 2020-09-15 10:25 UTC (permalink / raw) To: Cornelia Huck Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski Hi Conny, Thank you for the ping. This is pending for a long time and we just resumed looking at this approach. Me and Peter had a short discussion on this and we are trying to do some cleaning as well on the CIO layer while working with the new approach. There are few uncertainties with the new approach, which we would like to clarify and i am trying to test it.For example, the impact on the way cio-console driver works. So, as discussed with Peter, let me make a smaller draft which will consolidate your approach and the "cleaning" part that we would like to do. As mentioned, i will resume working on this and get back to you with myfindings. Thank you, Vineeth On 9/14/20 1:46 PM, Cornelia Huck wrote: > <casts "reanimate" on dead thread> > > On Wed, 1 Jul 2020 11:23:13 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > >> On Mon, 29 Jun 2020 13:56:31 +0200 >> Cornelia Huck <cohuck@redhat.com> wrote: >> >>> Ok, so I've resumed the thinking process, and I think getting rid of >>> the "no I/O subchannel without functional device" approach is a good >>> idea, and allows us to make handling driver matches more similar to >>> everyone else. >> As an aside, there's another odd construct: the I/O subchannel driver >> *always* binds to the subchannel device, even if there is a problem, >> and schedules an unregistration of the subchannel device on error. This >> was introduced because events from machine check handling are not >> processed if there isn't a driver (at least I thought back then that it >> was a good idea.) I think a more correct way to handle this would be to >> do the following: >> >> * If something doesn't work, clean up and return an error in the probe >> function. The subchannel device stays around, it's just not bound. >> * Have the css bus do some basic processing for subchannels not bound >> to any driver (e.g., check dnv/w). This would also make it possible >> to unregister dead message subchannels if a machine check is received >> for them (don't know if that's an actual problem in pratice.) >> >>> What changes would be needed? >>> * The whole logic to suppress uevents for subchannels and generate one >>> later will go. (Touches the various subchannel driver, including >>> vfio-ccw.) >>> * ccw_device_todo() can just unregister the ccw device, and there's no >>> longer a need for ccw_device_call_sch_unregister(). (IIUC, this also >>> covers setting disconnected devices offline.) >> I'm actually not sure if unregistration-by-driver is the right thing >> for most cases (except for something like disconnected device removal), >> that should be done by the bus. Maybe something for later (don't fear, >> I don't plan to work on the common I/O layer again :) >> >>> * As the I/O subchannel driver now needs to deal with cases where no >>> ccw device is available, the code for that needs to be checked. >>> (That's probably the most time-consuming task.) >> Had a quick look, doesn't actually look too bad (most places already >> check for !cdev.) >> >>> Userspace should be fine with I/O subchannels without ccw device, >>> that's nothing new. >>> >>> Does that sound reasonable? > Is anybody looking at this? The delayed uevent handling is a bit of a > mess for management of vfio-ccw devices... > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-09-15 10:25 ` Vineeth Vijayan @ 2020-09-15 10:31 ` Cornelia Huck 0 siblings, 0 replies; 14+ messages in thread From: Cornelia Huck @ 2020-09-15 10:31 UTC (permalink / raw) To: Vineeth Vijayan Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman, Halil Pasic, Boris Fiuczynski On Tue, 15 Sep 2020 12:25:32 +0200 Vineeth Vijayan <vneethv@linux.vnet.ibm.com> wrote: > Hi Conny, > Thank you for the ping. This is pending for a long time and we just resumed > looking at this approach. Me and Peter had a short discussion on this and we > are trying to do some cleaning as well on the CIO layer while working > with the > new approach. > > There are few uncertainties with the new approach, which we would like > to clarify > and i am trying to test it.For example, the impact on the way > cio-console driver works. Yes, the console might be fun. > So, as discussed with Peter, let me make a smaller draft which will > consolidate > your approach and the "cleaning" part that we would like to do. > > As mentioned, i will resume working on this and get back to you with > myfindings. Thanks for the update; I'll do my best to find some time to look at any patches you send. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-17 12:38 ` Cornelia Huck 2020-04-20 15:29 ` Vineeth Vijayan @ 2020-04-23 15:55 ` Halil Pasic 2020-04-23 16:27 ` Cornelia Huck 1 sibling, 1 reply; 14+ messages in thread From: Halil Pasic @ 2020-04-23 15:55 UTC (permalink / raw) To: Cornelia Huck Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman, Boris Fiuczynski On Fri, 17 Apr 2020 14:38:11 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > Friendly ping. > Sorry for the late answer. I prefer to let Vineeth give us his opinion first. I did invest some 30 minutes in understanding the problem, but I'm not sure I understood it properly. According to my current understanding, the current state of affairs is a mess, and the proposed change wouldn't make the situation substantially cleaner, but it would help with the problem at hand. Conny, do you have more background information on uevent suppression (is there some sort of a generic contract between kernel and userspace for uevent suppression)? From a quick grep it seems to me that most of the uses are about being nice to userspace in a sense, that we want to make sure that when event is received by userspace it can do it's thing, and does not have to wait until the kernel has finished with the stuff that needs to be done to reach a state of affairs that can be considered normal. Regards, Halil > On Fri, 3 Apr 2020 12:40:32 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > > > Hi, > > > > this is kind-of-a-followup to the uevent patches I sent in > > <20200327124503.9794-1-cohuck@redhat.com> last Friday. > > > > Currently, the common I/O layer will suppress uevents for subchannels > > that are being registered, delegating generating a delayed ADD uevent > > to the driver that actually binds to it and only generating the uevent > > itself if no driver gets bound. The initial version of that delaying > > was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for > > subchannels"); from what I remember, we were seeing quite bad storms of > > uevents on LPARs that had a lot of I/O subchannels with no device > > accessible through them. [..] > > Thoughts? > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFD] uevent handling for subchannels 2020-04-23 15:55 ` Halil Pasic @ 2020-04-23 16:27 ` Cornelia Huck 0 siblings, 0 replies; 14+ messages in thread From: Cornelia Huck @ 2020-04-23 16:27 UTC (permalink / raw) To: Halil Pasic Cc: Peter Oberparleiter, Vineeth Vijayan, linux-s390, Eric Farman, Boris Fiuczynski On Thu, 23 Apr 2020 17:55:59 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > On Fri, 17 Apr 2020 14:38:11 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > > > Friendly ping. > > > > Sorry for the late answer. I prefer to let Vineeth give us his opinion > first. I did invest some 30 minutes in understanding the problem, but > I'm not sure I understood it properly. According to my current > understanding, the current state of affairs is a mess, and the proposed > change wouldn't make the situation substantially cleaner, but it would > help with the problem at hand. > > Conny, do you have more background information on uevent suppression > (is there some sort of a generic contract between kernel and userspace > for uevent suppression)? I'm not aware of a 'generic contract'; it's probably a lot of 'we thought it would be a good idea to do it like that'. > > From a quick grep it seems to me that most of the uses are about being > nice to userspace in a sense, that we want to make sure that when > event is received by userspace it can do it's thing, and does not have > to wait until the kernel has finished with the stuff that needs to be > done to reach a state of affairs that can be considered normal. The general idea is that once userspace is notified of the existence of a device, it can go ahead and try to make use of it. This is not always true; sometimes, there's a logic problem in how devices are initialized, sometimes, we want indeed to avoid userspace do work that will turn out to be useless. The latter was the intention with uevent suppression for I/O subchannels: do not notify userspace of the existence of an I/O subchannel, unless there's a valid ccw device behind it. Of course, that leads to trouble for vfio-ccw. Another idea: Have the add uevent for subchannels in all cases, but adapt any udev rules that do anything other than set driver_override to do nothing unless a (to be added) change uevent comes along. (And yes, I agree that this is a mess.) > > Regards, > Halil > > > > > On Fri, 3 Apr 2020 12:40:32 +0200 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > Hi, > > > > > > this is kind-of-a-followup to the uevent patches I sent in > > > <20200327124503.9794-1-cohuck@redhat.com> last Friday. > > > > > > Currently, the common I/O layer will suppress uevents for subchannels > > > that are being registered, delegating generating a delayed ADD uevent > > > to the driver that actually binds to it and only generating the uevent > > > itself if no driver gets bound. The initial version of that delaying > > > was introduced in fa1a8c23eb7d ("s390: cio: Delay uevents for > > > subchannels"); from what I remember, we were seeing quite bad storms of > > > uevents on LPARs that had a lot of I/O subchannels with no device > > > accessible through them. > [..] > > > Thoughts? > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-09-15 10:31 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-03 10:40 [RFD] uevent handling for subchannels Cornelia Huck 2020-04-17 12:38 ` Cornelia Huck 2020-04-20 15:29 ` Vineeth Vijayan 2020-04-23 14:52 ` Vineeth Vijayan 2020-04-23 16:20 ` Cornelia Huck 2020-04-27 10:10 ` Peter Oberparleiter 2020-04-30 10:43 ` Cornelia Huck 2020-06-29 11:56 ` Cornelia Huck 2020-07-01 9:23 ` Cornelia Huck 2020-09-14 11:46 ` Cornelia Huck 2020-09-15 10:25 ` Vineeth Vijayan 2020-09-15 10:31 ` Cornelia Huck 2020-04-23 15:55 ` Halil Pasic 2020-04-23 16:27 ` Cornelia Huck
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.