* [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") @ 2017-04-07 18:17 Ville Syrjälä 2017-04-07 18:38 ` Eric Dumazet 2017-04-21 11:40 ` Ville Syrjälä 0 siblings, 2 replies; 17+ messages in thread From: Ville Syrjälä @ 2017-04-07 18:17 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev, linux-kernel Hi, My old P3 laptop started to die on me in the middle of larger compile jobs (using distcc) after v4.11-rc<something>. I bisected the problem to 617f01211baf ("8139too: use napi_complete_done()"). Unfortunately I wasn't able to capture a full oops as the machine doesn't have serial and ramoops failed me. I did get one partial oops on vgacon which showed rtl8139_poll() being involved (EIP was around _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my bisect result. So maybe some kind of nasty thing going between the hard irq and softirq? Perhaps UP related? I tried to stare at the locking around rtl8139_poll() for a while but it looked mostly sane to me. -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-07 18:17 [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") Ville Syrjälä @ 2017-04-07 18:38 ` Eric Dumazet 2017-04-07 18:42 ` David Miller ` (2 more replies) 2017-04-21 11:40 ` Ville Syrjälä 1 sibling, 3 replies; 17+ messages in thread From: Eric Dumazet @ 2017-04-07 18:38 UTC (permalink / raw) To: Ville Syrjälä Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote: > Hi, > > My old P3 laptop started to die on me in the middle of larger compile > jobs (using distcc) after v4.11-rc<something>. I bisected the problem > to 617f01211baf ("8139too: use napi_complete_done()"). > > Unfortunately I wasn't able to capture a full oops as the machine doesn't > have serial and ramoops failed me. I did get one partial oops on vgacon > which showed rtl8139_poll() being involved (EIP was around > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my > bisect result. > > So maybe some kind of nasty thing going between the hard irq and > softirq? Perhaps UP related? I tried to stare at the locking around > rtl8139_poll() for a while but it looked mostly sane to me. > Thanks a lot for the detective work, I am so sorry for this ! Could you try the following patch ? I do not really see what could be wrong, the code should run just fine on UP. Thanks. diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644 --- a/drivers/net/ethernet/realtek/8139too.c +++ b/drivers/net/ethernet/realtek/8139too.c @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget) if (likely(RTL_R16(IntrStatus) & RxAckBits)) work_done += rtl8139_rx(dev, tp, budget); - if (work_done < budget && napi_complete_done(napi, work_done)) { + if (work_done < budget) { unsigned long flags; spin_lock_irqsave(&tp->lock, flags); - RTL_W16_F(IntrMask, rtl8139_intr_mask); + if (napi_complete_done(napi, work_done)) + RTL_W16_F(IntrMask, rtl8139_intr_mask); spin_unlock_irqrestore(&tp->lock, flags); } spin_unlock(&tp->rx_lock); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-07 18:38 ` Eric Dumazet @ 2017-04-07 18:42 ` David Miller 2017-04-08 10:23 ` Francois Romieu 2017-04-10 12:11 ` Ville Syrjälä 2017-06-19 14:44 ` Michal Kubecek 2 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2017-04-07 18:42 UTC (permalink / raw) To: eric.dumazet; +Cc: ville.syrjala, edumazet, netdev, linux-kernel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 07 Apr 2017 11:38:49 -0700 > I do not really see what could be wrong, the code should run just fine > on UP. One theory is that the interrupt masking isn't working properly and interrupts are still arriving and hitting the NAPI state even when we are actively polling NAPI. And this problem was masked by the locking done here. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-07 18:42 ` David Miller @ 2017-04-08 10:23 ` Francois Romieu 2017-04-10 14:22 ` tedheadster 0 siblings, 1 reply; 17+ messages in thread From: Francois Romieu @ 2017-04-08 10:23 UTC (permalink / raw) To: ville.syrjala; +Cc: David Miller, eric.dumazet, edumazet, netdev, linux-kernel David Miller <davem@davemloft.net> : [...] > One theory is that the interrupt masking isn't working properly > and interrupts are still arriving and hitting the NAPI state even > when we are actively polling NAPI. > > And this problem was masked by the locking done here. Yes. Ville, can you rule out irq sharing between the 8139 and some other device ? It's a candidate for unexpected interrupt handler invocation with older pc, even with properly working hardware. -- Ueimor ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-08 10:23 ` Francois Romieu @ 2017-04-10 14:22 ` tedheadster 2017-04-10 14:55 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: tedheadster @ 2017-04-10 14:22 UTC (permalink / raw) To: Francois Romieu Cc: ville.syrjala, David Miller, eric.dumazet, edumazet, netdev, linux-kernel On Sat, Apr 8, 2017 at 6:23 AM, Francois Romieu <romieu@fr.zoreil.com> wrote: > David Miller <davem@davemloft.net> : > [...] >> One theory is that the interrupt masking isn't working properly >> and interrupts are still arriving and hitting the NAPI state even >> when we are actively polling NAPI. >> >> And this problem was masked by the locking done here. > > Yes. > > Ville, can you rule out irq sharing between the 8139 and some other > device ? It's a candidate for unexpected interrupt handler invocation > with older pc, even with properly working hardware. > Eric, If napi_complete_done() calls could affect drivers on older hardware, I can test the following: drivers/net/ethernet/3com/typhoon.c drivers/net/ethernet/amd/pcnet32.c drivers/net/ethernet/broadcom/tg3.c drivers/net/ethernet/dec/tulip/interrupt.c drivers/net/ethernet/intel/e100.c drivers/net/ethernet/intel/e1000/e1000_main.c drivers/net/ethernet/smsc/epic100.c drivers/net/ethernet/via/via-rhine.c - Matthew ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-10 14:22 ` tedheadster @ 2017-04-10 14:55 ` Eric Dumazet 0 siblings, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2017-04-10 14:55 UTC (permalink / raw) To: whiteheadm Cc: Francois Romieu, ville.syrjala, David Miller, Eric Dumazet, netdev, linux-kernel On Mon, Apr 10, 2017 at 7:22 AM, tedheadster <tedheadster@gmail.com> wrote: > On Sat, Apr 8, 2017 at 6:23 AM, Francois Romieu <romieu@fr.zoreil.com> wrote: >> David Miller <davem@davemloft.net> : >> [...] >>> One theory is that the interrupt masking isn't working properly >>> and interrupts are still arriving and hitting the NAPI state even >>> when we are actively polling NAPI. >>> >>> And this problem was masked by the locking done here. >> >> Yes. >> >> Ville, can you rule out irq sharing between the 8139 and some other >> device ? It's a candidate for unexpected interrupt handler invocation >> with older pc, even with properly working hardware. >> > > Eric, > If napi_complete_done() calls could affect drivers on older > hardware, I can test the following: > > drivers/net/ethernet/3com/typhoon.c > drivers/net/ethernet/amd/pcnet32.c > drivers/net/ethernet/broadcom/tg3.c > drivers/net/ethernet/dec/tulip/interrupt.c > drivers/net/ethernet/intel/e100.c > drivers/net/ethernet/intel/e1000/e1000_main.c > drivers/net/ethernet/smsc/epic100.c > drivers/net/ethernet/via/via-rhine.c That would be great, thanks Matthew. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-07 18:38 ` Eric Dumazet 2017-04-07 18:42 ` David Miller @ 2017-04-10 12:11 ` Ville Syrjälä 2017-09-18 19:46 ` Ville Syrjälä 2017-06-19 14:44 ` Michal Kubecek 2 siblings, 1 reply; 17+ messages in thread From: Ville Syrjälä @ 2017-04-10 12:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote: > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote: > > Hi, > > > > My old P3 laptop started to die on me in the middle of larger compile > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem > > to 617f01211baf ("8139too: use napi_complete_done()"). > > > > Unfortunately I wasn't able to capture a full oops as the machine doesn't > > have serial and ramoops failed me. I did get one partial oops on vgacon > > which showed rtl8139_poll() being involved (EIP was around > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my > > bisect result. > > > > So maybe some kind of nasty thing going between the hard irq and > > softirq? Perhaps UP related? I tried to stare at the locking around > > rtl8139_poll() for a while but it looked mostly sane to me. > > > > Thanks a lot for the detective work, I am so sorry for this ! > > Could you try the following patch ? > > I do not really see what could be wrong, the code should run just fine > on UP. > > Thanks. > > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644 > --- a/drivers/net/ethernet/realtek/8139too.c > +++ b/drivers/net/ethernet/realtek/8139too.c > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget) > if (likely(RTL_R16(IntrStatus) & RxAckBits)) > work_done += rtl8139_rx(dev, tp, budget); > > - if (work_done < budget && napi_complete_done(napi, work_done)) { > + if (work_done < budget) { > unsigned long flags; > > spin_lock_irqsave(&tp->lock, flags); > - RTL_W16_F(IntrMask, rtl8139_intr_mask); > + if (napi_complete_done(napi, work_done)) > + RTL_W16_F(IntrMask, rtl8139_intr_mask); > spin_unlock_irqrestore(&tp->lock, flags); > } > spin_unlock(&tp->rx_lock); > > Yep, that patch does appear to make it stable again. Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-10 12:11 ` Ville Syrjälä @ 2017-09-18 19:46 ` Ville Syrjälä 2017-09-18 19:52 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Ville Syrjälä @ 2017-09-18 19:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel On Mon, Apr 10, 2017 at 03:11:02PM +0300, Ville Syrjälä wrote: > On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote: > > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote: > > > Hi, > > > > > > My old P3 laptop started to die on me in the middle of larger compile > > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem > > > to 617f01211baf ("8139too: use napi_complete_done()"). > > > > > > Unfortunately I wasn't able to capture a full oops as the machine doesn't > > > have serial and ramoops failed me. I did get one partial oops on vgacon > > > which showed rtl8139_poll() being involved (EIP was around > > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my > > > bisect result. > > > > > > So maybe some kind of nasty thing going between the hard irq and > > > softirq? Perhaps UP related? I tried to stare at the locking around > > > rtl8139_poll() for a while but it looked mostly sane to me. > > > > > > > Thanks a lot for the detective work, I am so sorry for this ! > > > > Could you try the following patch ? > > > > I do not really see what could be wrong, the code should run just fine > > on UP. > > > > Thanks. > > > > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c > > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644 > > --- a/drivers/net/ethernet/realtek/8139too.c > > +++ b/drivers/net/ethernet/realtek/8139too.c > > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget) > > if (likely(RTL_R16(IntrStatus) & RxAckBits)) > > work_done += rtl8139_rx(dev, tp, budget); > > > > - if (work_done < budget && napi_complete_done(napi, work_done)) { > > + if (work_done < budget) { > > unsigned long flags; > > > > spin_lock_irqsave(&tp->lock, flags); > > - RTL_W16_F(IntrMask, rtl8139_intr_mask); > > + if (napi_complete_done(napi, work_done)) > > + RTL_W16_F(IntrMask, rtl8139_intr_mask); > > spin_unlock_irqrestore(&tp->lock, flags); > > } > > spin_unlock(&tp->rx_lock); > > > > > > Yep, that patch does appear to make it stable again. > > Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> And five months later I'm still waiting for this patch to land... -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-09-18 19:46 ` Ville Syrjälä @ 2017-09-18 19:52 ` Eric Dumazet 2017-09-19 12:45 ` Ville Syrjälä 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-09-18 19:52 UTC (permalink / raw) To: Ville Syrjälä; +Cc: Eric Dumazet, David S. Miller, netdev, LKML On Mon, Sep 18, 2017 at 12:46 PM, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote: > And five months later I'm still waiting for this patch to land... Sure, while I was on vacation, I received thousands of emails that I simply have not read. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-09-18 19:52 ` Eric Dumazet @ 2017-09-19 12:45 ` Ville Syrjälä 2017-09-19 13:51 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Ville Syrjälä @ 2017-09-19 12:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, David S. Miller, netdev, LKML On Mon, Sep 18, 2017 at 12:52:15PM -0700, Eric Dumazet wrote: > On Mon, Sep 18, 2017 at 12:46 PM, Ville Syrjälä > <ville.syrjala@linux.intel.com> wrote: > > > And five months later I'm still waiting for this patch to land... > > Sure, while I was on vacation, I received thousands of emails that I > simply have not read. Ah that explains where my previous ping went. Pardon the snarky comment, and thanks for taking care of this. -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-09-19 12:45 ` Ville Syrjälä @ 2017-09-19 13:51 ` Eric Dumazet 0 siblings, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2017-09-19 13:51 UTC (permalink / raw) To: Ville Syrjälä; +Cc: Eric Dumazet, David S. Miller, netdev, LKML On Tue, 2017-09-19 at 15:45 +0300, Ville Syrjälä wrote: > On Mon, Sep 18, 2017 at 12:52:15PM -0700, Eric Dumazet wrote: > > On Mon, Sep 18, 2017 at 12:46 PM, Ville Syrjälä > > <ville.syrjala@linux.intel.com> wrote: > > > > > And five months later I'm still waiting for this patch to land... > > > > Sure, while I was on vacation, I received thousands of emails that I > > simply have not read. > > Ah that explains where my previous ping went. Pardon the snarky comment, > and thanks for taking care of this. > No problem, this was my fault not following this bug. Thanks a lot ! ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-07 18:38 ` Eric Dumazet 2017-04-07 18:42 ` David Miller 2017-04-10 12:11 ` Ville Syrjälä @ 2017-06-19 14:44 ` Michal Kubecek 2017-07-05 19:53 ` Ville Syrjälä 2 siblings, 1 reply; 17+ messages in thread From: Michal Kubecek @ 2017-06-19 14:44 UTC (permalink / raw) To: Eric Dumazet Cc: Ville Syrjälä, Eric Dumazet, David S. Miller, netdev, linux-kernel, Francois Romieu, whiteheadm On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote: > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote: > > Hi, > > > > My old P3 laptop started to die on me in the middle of larger compile > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem > > to 617f01211baf ("8139too: use napi_complete_done()"). > > > > Unfortunately I wasn't able to capture a full oops as the machine doesn't > > have serial and ramoops failed me. I did get one partial oops on vgacon > > which showed rtl8139_poll() being involved (EIP was around > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my > > bisect result. > > > > So maybe some kind of nasty thing going between the hard irq and > > softirq? Perhaps UP related? I tried to stare at the locking around > > rtl8139_poll() for a while but it looked mostly sane to me. > > > > Thanks a lot for the detective work, I am so sorry for this ! > > Could you try the following patch ? > > I do not really see what could be wrong, the code should run just fine > on UP. > > Thanks. > > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644 > --- a/drivers/net/ethernet/realtek/8139too.c > +++ b/drivers/net/ethernet/realtek/8139too.c > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget) > if (likely(RTL_R16(IntrStatus) & RxAckBits)) > work_done += rtl8139_rx(dev, tp, budget); > > - if (work_done < budget && napi_complete_done(napi, work_done)) { > + if (work_done < budget) { > unsigned long flags; > > spin_lock_irqsave(&tp->lock, flags); > - RTL_W16_F(IntrMask, rtl8139_intr_mask); > + if (napi_complete_done(napi, work_done)) > + RTL_W16_F(IntrMask, rtl8139_intr_mask); > spin_unlock_irqrestore(&tp->lock, flags); > } > spin_unlock(&tp->rx_lock); Eric, we have a bugreport of what seems to be the same problem: https://bugzilla.suse.com/show_bug.cgi?id=1042208 Do you plan to submit the patch above or is the conclusion that this is rather a hardware problem? Michal Kubecek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-06-19 14:44 ` Michal Kubecek @ 2017-07-05 19:53 ` Ville Syrjälä 0 siblings, 0 replies; 17+ messages in thread From: Ville Syrjälä @ 2017-07-05 19:53 UTC (permalink / raw) To: Michal Kubecek Cc: Eric Dumazet, Eric Dumazet, David S. Miller, netdev, linux-kernel, Francois Romieu, whiteheadm On Mon, Jun 19, 2017 at 04:44:45PM +0200, Michal Kubecek wrote: > On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote: > > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote: > > > Hi, > > > > > > My old P3 laptop started to die on me in the middle of larger compile > > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem > > > to 617f01211baf ("8139too: use napi_complete_done()"). > > > > > > Unfortunately I wasn't able to capture a full oops as the machine doesn't > > > have serial and ramoops failed me. I did get one partial oops on vgacon > > > which showed rtl8139_poll() being involved (EIP was around > > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my > > > bisect result. > > > > > > So maybe some kind of nasty thing going between the hard irq and > > > softirq? Perhaps UP related? I tried to stare at the locking around > > > rtl8139_poll() for a while but it looked mostly sane to me. > > > > > > > Thanks a lot for the detective work, I am so sorry for this ! > > > > Could you try the following patch ? > > > > I do not really see what could be wrong, the code should run just fine > > on UP. > > > > Thanks. > > > > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c > > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644 > > --- a/drivers/net/ethernet/realtek/8139too.c > > +++ b/drivers/net/ethernet/realtek/8139too.c > > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget) > > if (likely(RTL_R16(IntrStatus) & RxAckBits)) > > work_done += rtl8139_rx(dev, tp, budget); > > > > - if (work_done < budget && napi_complete_done(napi, work_done)) { > > + if (work_done < budget) { > > unsigned long flags; > > > > spin_lock_irqsave(&tp->lock, flags); > > - RTL_W16_F(IntrMask, rtl8139_intr_mask); > > + if (napi_complete_done(napi, work_done)) > > + RTL_W16_F(IntrMask, rtl8139_intr_mask); > > spin_unlock_irqrestore(&tp->lock, flags); > > } > > spin_unlock(&tp->rx_lock); > > Eric, > > we have a bugreport of what seems to be the same problem: > > https://bugzilla.suse.com/show_bug.cgi?id=1042208 > > Do you plan to submit the patch above or is the conclusion that this is > rather a hardware problem? Could someone please push this patch forward? My machine just died again with a fresh kernel because I forgot that I still need this patch. -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-07 18:17 [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") Ville Syrjälä 2017-04-07 18:38 ` Eric Dumazet @ 2017-04-21 11:40 ` Ville Syrjälä 2017-04-21 13:29 ` Eric Dumazet 1 sibling, 1 reply; 17+ messages in thread From: Ville Syrjälä @ 2017-04-21 11:40 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev, linux-kernel BTW I've also been getting some lockdep grief from r8169 netpoll stuff recently. Not sure if it might be related to these changes or not, but I don't remember seeing this sort of stuff until quite recently. [ 251.911044] ====================================================== [ 251.911044] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] [ 251.911045] 4.11.0-rc6-elk+ #101 Not tainted [ 251.911045] ------------------------------------------------------ [ 251.911046] kms_plane_blink/2132 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 251.911046] (&syncp->seq#2){+.-...}, at: [<c14398a3>] netpoll_poll_dev+0xb3/0x2a0 [ 251.911048] [ 251.911048] and this task is already holding: [ 251.911049] (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole] [ 251.911050] which would create a new lock dependency: [ 251.911050] (target_list_lock){-.....} -> (&syncp->seq#2){+.-...} [ 251.911053] [ 251.911053] but this new dependency connects a HARDIRQ-irq-safe lock: [ 251.911054] (target_list_lock){-.....} [ 251.911055] ... which became HARDIRQ-irq-safe at: [ 251.911055] __lock_acquire+0x6ec/0x1440 [ 251.911056] lock_acquire+0x7e/0x1a0 [ 251.911056] _raw_spin_lock_irqsave+0x2b/0x40 [ 251.911056] write_msg+0x3f/0xd0 [netconsole] [ 251.911057] console_unlock+0x41c/0x570 [ 251.911057] vprintk_emit+0x26d/0x350 [ 251.911058] vprintk_default+0x12/0x20 [ 251.911058] vprintk_func+0x1c/0x50 [ 251.911058] printk+0xe/0x10 [ 251.911059] drm_printk+0x5a/0x60 [drm] [ 251.911059] intel_get_hpd_pins+0x87/0xa0 [i915] [ 251.911060] i9xx_hpd_irq_handler+0x96/0xe0 [i915] [ 251.911060] i965_irq_handler+0x2c8/0x2f0 [i915] [ 251.911060] __handle_irq_event_percpu+0x38/0x360 [ 251.911061] handle_irq_event_percpu+0x19/0x50 [ 251.911061] handle_irq_event+0x29/0x50 [ 251.911062] handle_edge_irq+0x65/0x110 [ 251.911062] handle_irq+0x9f/0xc0 [ 251.911062] do_IRQ+0x4f/0x110 [ 251.911063] common_interrupt+0x36/0x3c [ 251.911063] cpuidle_enter_state+0xcc/0x390 [ 251.911064] cpuidle_enter+0xf/0x20 [ 251.911064] call_cpuidle+0x1c/0x40 [ 251.911064] do_idle+0x164/0x1d0 [ 251.911065] cpu_startup_entry+0x1d/0x20 [ 251.911065] start_secondary+0x104/0x170 [ 251.911066] startup_32_smp+0x16b/0x16d [ 251.911066] [ 251.911066] to a HARDIRQ-irq-unsafe lock: [ 251.911067] (&syncp->seq#2){+.-...} [ 251.911068] ... which became HARDIRQ-irq-unsafe at: [ 251.911068] ... __lock_acquire+0x5d7/0x1440 [ 251.911069] lock_acquire+0x7e/0x1a0 [ 251.911069] rtl8169_poll+0x474/0x620 [r8169] [ 251.911070] net_rx_action+0x1d0/0x3f0 [ 251.911070] __do_softirq+0x17d/0x460 [ 251.911070] do_softirq_own_stack+0x1d/0x30 [ 251.911071] irq_exit+0xa5/0xb0 [ 251.911071] do_IRQ+0x58/0x110 [ 251.911072] common_interrupt+0x36/0x3c [ 251.911072] cpuidle_enter_state+0xcc/0x390 [ 251.911072] cpuidle_enter+0xf/0x20 [ 251.911073] call_cpuidle+0x1c/0x40 [ 251.911073] do_idle+0x164/0x1d0 [ 251.911073] cpu_startup_entry+0x1d/0x20 [ 251.911074] rest_init+0x10c/0x120 [ 251.911074] start_kernel+0x30d/0x312 [ 251.911075] i386_start_kernel+0x85/0x89 [ 251.911075] startup_32_smp+0x16b/0x16d [ 251.911075] [ 251.911076] other info that might help us debug this: [ 251.911076] [ 251.911077] Possible interrupt unsafe locking scenario: [ 251.911077] [ 251.911077] CPU0 CPU1 [ 251.911078] ---- ---- [ 251.911078] lock(&syncp->seq#2); [ 251.911079] local_irq_disable(); [ 251.911080] lock(target_list_lock); [ 251.911081] lock(&syncp->seq#2); [ 251.911082] <Interrupt> [ 251.911082] lock(target_list_lock); [ 251.911083] [ 251.911084] *** DEADLOCK *** [ 251.911084] [ 251.911084] 2 locks held by kms_plane_blink/2132: [ 251.911085] #0: (console_lock){+.+.+.}, at: [<c10ad3b4>] vprintk_emit+0x264/0x350 [ 251.911086] #1: (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole] [ 251.911088] [ 251.911088] the dependencies between HARDIRQ-irq-safe lock and the holding lock: [ 251.911089] -> (target_list_lock){-.....} ops: 482 { [ 251.911090] IN-HARDIRQ-W at: [ 251.911091] __lock_acquire+0x6ec/0x1440 [ 251.911092] lock_acquire+0x7e/0x1a0 [ 251.911092] _raw_spin_lock_irqsave+0x2b/0x40 [ 251.911092] write_msg+0x3f/0xd0 [netconsole] [ 251.911093] console_unlock+0x41c/0x570 [ 251.911093] vprintk_emit+0x26d/0x350 [ 251.911094] vprintk_default+0x12/0x20 [ 251.911094] vprintk_func+0x1c/0x50 [ 251.911095] printk+0xe/0x10 [ 251.911095] drm_printk+0x5a/0x60 [drm] [ 251.911096] intel_get_hpd_pins+0x87/0xa0 [i915] [ 251.911096] i9xx_hpd_irq_handler+0x96/0xe0 [i915] [ 251.911097] i965_irq_handler+0x2c8/0x2f0 [i915] [ 251.911097] __handle_irq_event_percpu+0x38/0x360 [ 251.911098] handle_irq_event_percpu+0x19/0x50 [ 251.911098] handle_irq_event+0x29/0x50 [ 251.911098] handle_edge_irq+0x65/0x110 [ 251.911099] handle_irq+0x9f/0xc0 [ 251.911099] do_IRQ+0x4f/0x110 [ 251.911100] common_interrupt+0x36/0x3c [ 251.911100] cpuidle_enter_state+0xcc/0x390 [ 251.911101] cpuidle_enter+0xf/0x20 [ 251.911101] call_cpuidle+0x1c/0x40 [ 251.911102] do_idle+0x164/0x1d0 [ 251.911102] cpu_startup_entry+0x1d/0x20 [ 251.911102] start_secondary+0x104/0x170 [ 251.911103] startup_32_smp+0x16b/0x16d [ 251.911103] INITIAL USE at: [ 251.911104] __lock_acquire+0x221/0x1440 [ 251.911104] lock_acquire+0x7e/0x1a0 [ 251.911105] _raw_spin_lock_irqsave+0x2b/0x40 [ 251.911105] make_netconsole_target+0x80/0xab [netconsole] [ 251.911106] configfs_mkdir+0x312/0x3b0 [configfs] [ 251.911106] vfs_mkdir+0xd2/0x1b0 [ 251.911107] SyS_mkdir+0x57/0xc0 [ 251.911107] do_fast_syscall_32+0x80/0x430 [ 251.911108] entry_SYSENTER_32+0x4c/0x7b [ 251.911108] } [ 251.911108] ... key at: [<f85882f0>] target_list_lock+0x10/0xffffed20 [netconsole] [ 251.911109] ... acquired at: [ 251.911109] check_irq_usage+0x42/0xb0 [ 251.911110] __lock_acquire+0xfbf/0x1440 [ 251.911110] lock_acquire+0x7e/0x1a0 [ 251.911110] rtl8169_poll+0x474/0x620 [r8169] [ 251.911111] netpoll_poll_dev+0xb3/0x2a0 [ 251.911111] netpoll_send_skb_on_dev+0x180/0x240 [ 251.911112] netpoll_send_udp+0x292/0x440 [ 251.911112] write_msg+0x9c/0xd0 [netconsole] [ 251.911112] console_unlock+0x41c/0x570 [ 251.911113] vprintk_emit+0x26d/0x350 [ 251.911113] printk_emit+0x1b/0x1d [ 251.911114] devkmsg_write+0xeb/0x140 [ 251.911114] __vfs_write+0xb0/0x110 [ 251.911114] vfs_write+0xa1/0x1e0 [ 251.911115] SyS_write+0x3d/0x90 [ 251.911115] do_fast_syscall_32+0x80/0x430 [ 251.911116] entry_SYSENTER_32+0x4c/0x7b [ 251.911116] [ 251.911116] [ 251.911117] the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock: [ 251.911117] -> (&syncp->seq#2){+.-...} ops: 7439 { [ 251.911119] HARDIRQ-ON-W at: [ 251.911120] __lock_acquire+0x5d7/0x1440 [ 251.911120] lock_acquire+0x7e/0x1a0 [ 251.911121] rtl8169_poll+0x474/0x620 [r8169] [ 251.911121] net_rx_action+0x1d0/0x3f0 [ 251.911122] __do_softirq+0x17d/0x460 [ 251.911122] do_softirq_own_stack+0x1d/0x30 [ 251.911122] irq_exit+0xa5/0xb0 [ 251.911123] do_IRQ+0x58/0x110 [ 251.911123] common_interrupt+0x36/0x3c [ 251.911124] cpuidle_enter_state+0xcc/0x390 [ 251.911124] cpuidle_enter+0xf/0x20 [ 251.911125] call_cpuidle+0x1c/0x40 [ 251.911125] do_idle+0x164/0x1d0 [ 251.911126] cpu_startup_entry+0x1d/0x20 [ 251.911126] rest_init+0x10c/0x120 [ 251.911127] start_kernel+0x30d/0x312 [ 251.911127] i386_start_kernel+0x85/0x89 [ 251.911127] startup_32_smp+0x16b/0x16d [ 251.911128] IN-SOFTIRQ-W at: [ 251.911129] __lock_acquire+0x5b5/0x1440 [ 251.911129] lock_acquire+0x7e/0x1a0 [ 251.911129] rtl8169_poll+0x474/0x620 [r8169] [ 251.911130] net_rx_action+0x1d0/0x3f0 [ 251.911130] __do_softirq+0x17d/0x460 [ 251.911131] do_softirq_own_stack+0x1d/0x30 [ 251.911131] irq_exit+0xa5/0xb0 [ 251.911132] do_IRQ+0x58/0x110 [ 251.911132] common_interrupt+0x36/0x3c [ 251.911133] cpuidle_enter_state+0xcc/0x390 [ 251.911133] cpuidle_enter+0xf/0x20 [ 251.911133] call_cpuidle+0x1c/0x40 [ 251.911134] do_idle+0x164/0x1d0 [ 251.911134] cpu_startup_entry+0x1d/0x20 [ 251.911135] rest_init+0x10c/0x120 [ 251.911135] start_kernel+0x30d/0x312 [ 251.911136] i386_start_kernel+0x85/0x89 [ 251.911136] startup_32_smp+0x16b/0x16d [ 251.911136] INITIAL USE at: [ 251.911137] __lock_acquire+0x221/0x1440 [ 251.911138] lock_acquire+0x7e/0x1a0 [ 251.911138] rtl8169_get_stats64+0x81/0x320 [r8169] [ 251.911139] dev_get_stats+0x30/0xa0 [ 251.911139] rtnl_fill_stats+0x30/0x110 [ 251.911139] rtnl_fill_ifinfo+0x4be/0xd80 [ 251.911140] rtmsg_ifinfo_build_skb+0x45/0xc0 [ 251.911140] rtmsg_ifinfo.part.33+0xd/0x30 [ 251.911141] rtmsg_ifinfo+0x1b/0x20 [ 251.911141] register_netdevice+0x495/0x5e0 [ 251.911142] register_netdev+0x12/0x20 [ 251.911142] rtl_init_one+0x86d/0x1060 [r8169] [ 251.911143] pci_device_probe+0x8b/0x100 [ 251.911143] driver_probe_device+0x1ad/0x2a0 [ 251.911144] __driver_attach+0x89/0x90 [ 251.911144] bus_for_each_dev+0x4f/0x80 [ 251.911144] driver_attach+0x14/0x20 [ 251.911145] [ 251.911146] Lost 63 message(s)! -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-21 11:40 ` Ville Syrjälä @ 2017-04-21 13:29 ` Eric Dumazet 2017-04-21 15:09 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-04-21 13:29 UTC (permalink / raw) To: Ville Syrjälä Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel On Fri, 2017-04-21 at 14:40 +0300, Ville Syrjälä wrote: > BTW I've also been getting some lockdep grief from r8169 netpoll stuff > recently. Not sure if it might be related to these changes or not, but > I don't remember seeing this sort of stuff until quite recently. > > [ 251.911044] ====================================================== > [ 251.911044] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > [ 251.911045] 4.11.0-rc6-elk+ #101 Not tainted > [ 251.911045] ------------------------------------------------------ > [ 251.911046] kms_plane_blink/2132 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 251.911046] (&syncp->seq#2){+.-...}, at: [<c14398a3>] netpoll_poll_dev+0xb3/0x2a0 > [ 251.911048] > [ 251.911048] and this task is already holding: > [ 251.911049] (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole] > [ 251.911050] which would create a new lock dependency: > [ 251.911050] (target_list_lock){-.....} -> (&syncp->seq#2){+.-...} > [ 251.911053] > [ 251.911053] but this new dependency connects a HARDIRQ-irq-safe lock: > [ 251.911054] (target_list_lock){-.....} > [ 251.911055] ... which became HARDIRQ-irq-safe at: > [ 251.911055] __lock_acquire+0x6ec/0x1440 > [ 251.911056] lock_acquire+0x7e/0x1a0 > [ 251.911056] _raw_spin_lock_irqsave+0x2b/0x40 > [ 251.911056] write_msg+0x3f/0xd0 [netconsole] > [ 251.911057] console_unlock+0x41c/0x570 > [ 251.911057] vprintk_emit+0x26d/0x350 > [ 251.911058] vprintk_default+0x12/0x20 > [ 251.911058] vprintk_func+0x1c/0x50 > [ 251.911058] printk+0xe/0x10 > [ 251.911059] drm_printk+0x5a/0x60 [drm] > [ 251.911059] intel_get_hpd_pins+0x87/0xa0 [i915] > [ 251.911060] i9xx_hpd_irq_handler+0x96/0xe0 [i915] > [ 251.911060] i965_irq_handler+0x2c8/0x2f0 [i915] > [ 251.911060] __handle_irq_event_percpu+0x38/0x360 > [ 251.911061] handle_irq_event_percpu+0x19/0x50 > [ 251.911061] handle_irq_event+0x29/0x50 > [ 251.911062] handle_edge_irq+0x65/0x110 > [ 251.911062] handle_irq+0x9f/0xc0 > [ 251.911062] do_IRQ+0x4f/0x110 > [ 251.911063] common_interrupt+0x36/0x3c > [ 251.911063] cpuidle_enter_state+0xcc/0x390 > [ 251.911064] cpuidle_enter+0xf/0x20 > [ 251.911064] call_cpuidle+0x1c/0x40 > [ 251.911064] do_idle+0x164/0x1d0 > [ 251.911065] cpu_startup_entry+0x1d/0x20 > [ 251.911065] start_secondary+0x104/0x170 > [ 251.911066] startup_32_smp+0x16b/0x16d > [ 251.911066] > [ 251.911066] to a HARDIRQ-irq-unsafe lock: > [ 251.911067] (&syncp->seq#2){+.-...} > [ 251.911068] ... which became HARDIRQ-irq-unsafe at: > [ 251.911068] ... __lock_acquire+0x5d7/0x1440 > [ 251.911069] lock_acquire+0x7e/0x1a0 > [ 251.911069] rtl8169_poll+0x474/0x620 [r8169] > [ 251.911070] net_rx_action+0x1d0/0x3f0 > [ 251.911070] __do_softirq+0x17d/0x460 > [ 251.911070] do_softirq_own_stack+0x1d/0x30 > [ 251.911071] irq_exit+0xa5/0xb0 > [ 251.911071] do_IRQ+0x58/0x110 > [ 251.911072] common_interrupt+0x36/0x3c > [ 251.911072] cpuidle_enter_state+0xcc/0x390 > [ 251.911072] cpuidle_enter+0xf/0x20 > [ 251.911073] call_cpuidle+0x1c/0x40 > [ 251.911073] do_idle+0x164/0x1d0 > [ 251.911073] cpu_startup_entry+0x1d/0x20 > [ 251.911074] rest_init+0x10c/0x120 > [ 251.911074] start_kernel+0x30d/0x312 > [ 251.911075] i386_start_kernel+0x85/0x89 > [ 251.911075] startup_32_smp+0x16b/0x16d > [ 251.911075] > [ 251.911076] other info that might help us debug this: > [ 251.911076] > [ 251.911077] Possible interrupt unsafe locking scenario: > [ 251.911077] > [ 251.911077] CPU0 CPU1 > [ 251.911078] ---- ---- > [ 251.911078] lock(&syncp->seq#2); > [ 251.911079] local_irq_disable(); > [ 251.911080] lock(target_list_lock); > [ 251.911081] lock(&syncp->seq#2); > [ 251.911082] <Interrupt> > [ 251.911082] lock(target_list_lock); > [ 251.911083] > [ 251.911084] *** DEADLOCK *** > [ 251.911084] > [ 251.911084] 2 locks held by kms_plane_blink/2132: > [ 251.911085] #0: (console_lock){+.+.+.}, at: [<c10ad3b4>] vprintk_emit+0x264/0x350 > [ 251.911086] #1: (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole] > [ 251.911088] > [ 251.911088] the dependencies between HARDIRQ-irq-safe lock and the holding lock: > [ 251.911089] -> (target_list_lock){-.....} ops: 482 { > [ 251.911090] IN-HARDIRQ-W at: > [ 251.911091] __lock_acquire+0x6ec/0x1440 > [ 251.911092] lock_acquire+0x7e/0x1a0 > [ 251.911092] _raw_spin_lock_irqsave+0x2b/0x40 > [ 251.911092] write_msg+0x3f/0xd0 [netconsole] > [ 251.911093] console_unlock+0x41c/0x570 > [ 251.911093] vprintk_emit+0x26d/0x350 > [ 251.911094] vprintk_default+0x12/0x20 > [ 251.911094] vprintk_func+0x1c/0x50 > [ 251.911095] printk+0xe/0x10 > [ 251.911095] drm_printk+0x5a/0x60 [drm] > [ 251.911096] intel_get_hpd_pins+0x87/0xa0 [i915] > [ 251.911096] i9xx_hpd_irq_handler+0x96/0xe0 [i915] > [ 251.911097] i965_irq_handler+0x2c8/0x2f0 [i915] > [ 251.911097] __handle_irq_event_percpu+0x38/0x360 > [ 251.911098] handle_irq_event_percpu+0x19/0x50 > [ 251.911098] handle_irq_event+0x29/0x50 > [ 251.911098] handle_edge_irq+0x65/0x110 > [ 251.911099] handle_irq+0x9f/0xc0 > [ 251.911099] do_IRQ+0x4f/0x110 > [ 251.911100] common_interrupt+0x36/0x3c > [ 251.911100] cpuidle_enter_state+0xcc/0x390 > [ 251.911101] cpuidle_enter+0xf/0x20 > [ 251.911101] call_cpuidle+0x1c/0x40 > [ 251.911102] do_idle+0x164/0x1d0 > [ 251.911102] cpu_startup_entry+0x1d/0x20 > [ 251.911102] start_secondary+0x104/0x170 > [ 251.911103] startup_32_smp+0x16b/0x16d > [ 251.911103] INITIAL USE at: > [ 251.911104] __lock_acquire+0x221/0x1440 > [ 251.911104] lock_acquire+0x7e/0x1a0 > [ 251.911105] _raw_spin_lock_irqsave+0x2b/0x40 > [ 251.911105] make_netconsole_target+0x80/0xab [netconsole] > [ 251.911106] configfs_mkdir+0x312/0x3b0 [configfs] > [ 251.911106] vfs_mkdir+0xd2/0x1b0 > [ 251.911107] SyS_mkdir+0x57/0xc0 > [ 251.911107] do_fast_syscall_32+0x80/0x430 > [ 251.911108] entry_SYSENTER_32+0x4c/0x7b > [ 251.911108] } > [ 251.911108] ... key at: [<f85882f0>] target_list_lock+0x10/0xffffed20 [netconsole] > [ 251.911109] ... acquired at: > [ 251.911109] check_irq_usage+0x42/0xb0 > [ 251.911110] __lock_acquire+0xfbf/0x1440 > [ 251.911110] lock_acquire+0x7e/0x1a0 > [ 251.911110] rtl8169_poll+0x474/0x620 [r8169] > [ 251.911111] netpoll_poll_dev+0xb3/0x2a0 > [ 251.911111] netpoll_send_skb_on_dev+0x180/0x240 > [ 251.911112] netpoll_send_udp+0x292/0x440 > [ 251.911112] write_msg+0x9c/0xd0 [netconsole] > [ 251.911112] console_unlock+0x41c/0x570 > [ 251.911113] vprintk_emit+0x26d/0x350 > [ 251.911113] printk_emit+0x1b/0x1d > [ 251.911114] devkmsg_write+0xeb/0x140 > [ 251.911114] __vfs_write+0xb0/0x110 > [ 251.911114] vfs_write+0xa1/0x1e0 > [ 251.911115] SyS_write+0x3d/0x90 > [ 251.911115] do_fast_syscall_32+0x80/0x430 > [ 251.911116] entry_SYSENTER_32+0x4c/0x7b > [ 251.911116] > [ 251.911116] > [ 251.911117] the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock: > [ 251.911117] -> (&syncp->seq#2){+.-...} ops: 7439 { > [ 251.911119] HARDIRQ-ON-W at: > [ 251.911120] __lock_acquire+0x5d7/0x1440 > [ 251.911120] lock_acquire+0x7e/0x1a0 > [ 251.911121] rtl8169_poll+0x474/0x620 [r8169] > [ 251.911121] net_rx_action+0x1d0/0x3f0 > [ 251.911122] __do_softirq+0x17d/0x460 > [ 251.911122] do_softirq_own_stack+0x1d/0x30 > [ 251.911122] irq_exit+0xa5/0xb0 > [ 251.911123] do_IRQ+0x58/0x110 > [ 251.911123] common_interrupt+0x36/0x3c > [ 251.911124] cpuidle_enter_state+0xcc/0x390 > [ 251.911124] cpuidle_enter+0xf/0x20 > [ 251.911125] call_cpuidle+0x1c/0x40 > [ 251.911125] do_idle+0x164/0x1d0 > [ 251.911126] cpu_startup_entry+0x1d/0x20 > [ 251.911126] rest_init+0x10c/0x120 > [ 251.911127] start_kernel+0x30d/0x312 > [ 251.911127] i386_start_kernel+0x85/0x89 > [ 251.911127] startup_32_smp+0x16b/0x16d > [ 251.911128] IN-SOFTIRQ-W at: > [ 251.911129] __lock_acquire+0x5b5/0x1440 > [ 251.911129] lock_acquire+0x7e/0x1a0 > [ 251.911129] rtl8169_poll+0x474/0x620 [r8169] > [ 251.911130] net_rx_action+0x1d0/0x3f0 > [ 251.911130] __do_softirq+0x17d/0x460 > [ 251.911131] do_softirq_own_stack+0x1d/0x30 > [ 251.911131] irq_exit+0xa5/0xb0 > [ 251.911132] do_IRQ+0x58/0x110 > [ 251.911132] common_interrupt+0x36/0x3c > [ 251.911133] cpuidle_enter_state+0xcc/0x390 > [ 251.911133] cpuidle_enter+0xf/0x20 > [ 251.911133] call_cpuidle+0x1c/0x40 > [ 251.911134] do_idle+0x164/0x1d0 > [ 251.911134] cpu_startup_entry+0x1d/0x20 > [ 251.911135] rest_init+0x10c/0x120 > [ 251.911135] start_kernel+0x30d/0x312 > [ 251.911136] i386_start_kernel+0x85/0x89 > [ 251.911136] startup_32_smp+0x16b/0x16d > [ 251.911136] INITIAL USE at: > [ 251.911137] __lock_acquire+0x221/0x1440 > [ 251.911138] lock_acquire+0x7e/0x1a0 > [ 251.911138] rtl8169_get_stats64+0x81/0x320 [r8169] > [ 251.911139] dev_get_stats+0x30/0xa0 > [ 251.911139] rtnl_fill_stats+0x30/0x110 > [ 251.911139] rtnl_fill_ifinfo+0x4be/0xd80 > [ 251.911140] rtmsg_ifinfo_build_skb+0x45/0xc0 > [ 251.911140] rtmsg_ifinfo.part.33+0xd/0x30 > [ 251.911141] rtmsg_ifinfo+0x1b/0x20 > [ 251.911141] register_netdevice+0x495/0x5e0 > [ 251.911142] register_netdev+0x12/0x20 > [ 251.911142] rtl_init_one+0x86d/0x1060 [r8169] > [ 251.911143] pci_device_probe+0x8b/0x100 > [ 251.911143] driver_probe_device+0x1ad/0x2a0 > [ 251.911144] __driver_attach+0x89/0x90 > [ 251.911144] bus_for_each_dev+0x4f/0x80 > [ 251.911144] driver_attach+0x14/0x20 > [ 251.911145] > [ 251.911146] Lost 63 message(s)! > Thanks for this report. Interesting to see how many drivers got the netpoll stuff wrong :/ Can you try : diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 81f18a8335276495a59fa93219c4607c2b8a47aa..74e4c72c331d5a6cc5b653970ef4133c8ddf9999 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -7668,7 +7668,7 @@ static void rtl8169_netpoll(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - rtl8169_interrupt(tp->pci_dev->irq, dev); + napi_schedule(&tp->napi); } #endif ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-21 13:29 ` Eric Dumazet @ 2017-04-21 15:09 ` Eric Dumazet 2017-04-21 15:34 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-04-21 15:09 UTC (permalink / raw) To: Ville Syrjälä Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel On Fri, 2017-04-21 at 06:29 -0700, Eric Dumazet wrote: > Thanks for this report. > > Interesting to see how many drivers got the netpoll stuff wrong :/ > > Can you try : > > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 81f18a8335276495a59fa93219c4607c2b8a47aa..74e4c72c331d5a6cc5b653970ef4133c8ddf9999 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -7668,7 +7668,7 @@ static void rtl8169_netpoll(struct net_device *dev) > { > struct rtl8169_private *tp = netdev_priv(dev); > > - rtl8169_interrupt(tp->pci_dev->irq, dev); > + napi_schedule(&tp->napi); The problem is more likely that netconsole handling can call rtl_tx() from hard irq context, while standard NAPI poll calls it from BH Meaning that the following sequence triggers a lockdep warning. u64_stats_update_begin(&tp->tx_stats.syncp); tp->tx_stats.packets++; tp->tx_stats.bytes += tx_skb->skb->len; u64_stats_update_end(&tp->tx_stats.syncp); Lockdep does not know that poll_napi() ( called from netpoll_poll_dev()) uses an cmpxchg() to make sure that there is no race. I am not sure how we can teach lockdep to not splat in this case. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") 2017-04-21 15:09 ` Eric Dumazet @ 2017-04-21 15:34 ` Eric Dumazet 0 siblings, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2017-04-21 15:34 UTC (permalink / raw) To: Ville Syrjälä Cc: Eric Dumazet, David S. Miller, netdev, linux-kernel On Fri, 2017-04-21 at 08:09 -0700, Eric Dumazet wrote: > On Fri, 2017-04-21 at 06:29 -0700, Eric Dumazet wrote: > > > Thanks for this report. > > > > Interesting to see how many drivers got the netpoll stuff wrong :/ > > > > Can you try : > > > > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > > index 81f18a8335276495a59fa93219c4607c2b8a47aa..74e4c72c331d5a6cc5b653970ef4133c8ddf9999 100644 > > --- a/drivers/net/ethernet/realtek/r8169.c > > +++ b/drivers/net/ethernet/realtek/r8169.c > > @@ -7668,7 +7668,7 @@ static void rtl8169_netpoll(struct net_device *dev) > > { > > struct rtl8169_private *tp = netdev_priv(dev); > > > > - rtl8169_interrupt(tp->pci_dev->irq, dev); > > + napi_schedule(&tp->napi); > > The problem is more likely that netconsole handling can call rtl_tx() > from hard irq context, while standard NAPI poll calls it from BH > > Meaning that the following sequence triggers a lockdep warning. > > u64_stats_update_begin(&tp->tx_stats.syncp); > tp->tx_stats.packets++; > tp->tx_stats.bytes += tx_skb->skb->len; > u64_stats_update_end(&tp->tx_stats.syncp); > > Lockdep does not know that poll_napi() ( called from netpoll_poll_dev()) > uses an cmpxchg() to make sure that there is no race. > > I am not sure how we can teach lockdep to not splat in this case. Well, could you try the following patch ? Thanks ! diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 0a8f2817ea60f2172eb28177473a4879f85bd18a..f64f812b86029b772bb245e51cdc2263adc4e6ea 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -7313,10 +7313,10 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp) rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb, tp->TxDescArray + entry); if (status & LastFrag) { - u64_stats_update_begin(&tp->tx_stats.syncp); + u64_stats_update_begin_raw(&tp->tx_stats.syncp); tp->tx_stats.packets++; tp->tx_stats.bytes += tx_skb->skb->len; - u64_stats_update_end(&tp->tx_stats.syncp); + u64_stats_update_end_raw(&tp->tx_stats.syncp); dev_kfree_skb_any(tx_skb->skb); tx_skb->skb = NULL; } ^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-09-19 13:51 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-07 18:17 [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()") Ville Syrjälä 2017-04-07 18:38 ` Eric Dumazet 2017-04-07 18:42 ` David Miller 2017-04-08 10:23 ` Francois Romieu 2017-04-10 14:22 ` tedheadster 2017-04-10 14:55 ` Eric Dumazet 2017-04-10 12:11 ` Ville Syrjälä 2017-09-18 19:46 ` Ville Syrjälä 2017-09-18 19:52 ` Eric Dumazet 2017-09-19 12:45 ` Ville Syrjälä 2017-09-19 13:51 ` Eric Dumazet 2017-06-19 14:44 ` Michal Kubecek 2017-07-05 19:53 ` Ville Syrjälä 2017-04-21 11:40 ` Ville Syrjälä 2017-04-21 13:29 ` Eric Dumazet 2017-04-21 15:09 ` Eric Dumazet 2017-04-21 15:34 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).