All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: linux-kernel@vger.kernel.org,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev
Date: Tue, 6 Apr 2021 11:48:02 -0700	[thread overview]
Message-ID: <20210406114734.0e00cb2f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210406123619.rhvtr73xwwlbu2ll@spock.localdomain>

On Tue, 6 Apr 2021 14:36:19 +0200 Oleksandr Natalenko wrote:
> Hello.
> 
> I've raised this here [1] first, but was suggested to engage igb devs,
> so here we are.
> 
> I'm experiencing the following woes while using netconsole regularly:
> 
> ```
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> [22038.710802] Modules linked in: ...
> [22038.710835] CPU: 12 PID: 40362 Comm: systemd-sleep Not tainted 5.11.0-pf7 #1
> [22038.710836] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3302 03/05/2021
> [22038.710836] RIP: 0010:netpoll_poll_dev+0x18a/0x1a0
> [22038.710837] Code: 6e ff 80 3d d2 9d f8 00 00 0f 85 5c ff ff ff 48 8b 73 28 48 c7 c7 0c b8 21 84 89 44 24 04 c6 05 b6 9d f8 00 01 e8 84 21 1c 00 <0f> 0b 8b 54 24 04 e9 36 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> [22038.710838] RSP: 0018:ffffb24106e37ba0 EFLAGS: 00010086
> [22038.710838] RAX: 0000000000000000 RBX: ffff9599d2929c50 RCX: ffff959f8ed1ac30
> [22038.710839] RDX: 0000000000000000 RSI: 0000000000000023 RDI: ffff959f8ed1ac28
> [22038.710839] RBP: ffff9598981d4058 R08: 0000000000000019 R09: ffffb24206e3796d
> [22038.710839] R10: ffffffffffffffff R11: ffffb24106e37968 R12: ffff959887e51ec8
> [22038.710840] R13: 000000000000000c R14: 00000000ffffffff R15: ffff9599d2929c60
> [22038.710840] FS:  00007f3ade370a40(0000) GS:ffff959f8ed00000(0000) knlGS:0000000000000000
> [22038.710841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [22038.710841] CR2: 0000000000000000 CR3: 00000003017b0000 CR4: 0000000000350ee0
> [22038.710841] Call Trace:
> [22038.710842]  netpoll_send_skb+0x185/0x240
> [22038.710842]  write_msg+0xe5/0x100 [netconsole]
> [22038.710842]  console_unlock+0x37d/0x640
> [22038.710842]  ? __schedule+0x2e5/0xc90
> [22038.710843]  suspend_devices_and_enter+0x2ac/0x7f0
> [22038.710843]  pm_suspend.cold+0x321/0x36c
> [22038.710843]  state_store+0xa6/0x140
> [22038.710844]  kernfs_fop_write_iter+0x124/0x1b0
> [22038.710844]  new_sync_write+0x16a/0x200
> [22038.710844]  vfs_write+0x21c/0x2e0
> [22038.710844]  __x64_sys_write+0x6d/0xf0
> [22038.710845]  do_syscall_64+0x33/0x40
> [22038.710845]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [22038.710845] RIP: 0033:0x7f3adece10f7
> [22038.710846] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [22038.710847] RSP: 002b:00007ffc51c555b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [22038.710847] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3adece10f7
> [22038.710848] RDX: 0000000000000004 RSI: 00007ffc51c556a0 RDI: 0000000000000004
> [22038.710848] RBP: 00007ffc51c556a0 R08: 000055ea374302a0 R09: 00007f3aded770c0
> [22038.710849] R10: 00007f3aded76fc0 R11: 0000000000000246 R12: 0000000000000004
> [22038.710849] R13: 000055ea3742c430 R14: 0000000000000004 R15: 00007f3adedb3700
> [22038.710849] ---[ end trace 6eae54fbf23807f8 ]---
> ```
> 
> This one happened during suspend/resume cycle (on resume), followed by:
> 
> ```
> [22038.868669] igb 0000:05:00.0 enp5s0: Reset adapter
> [22040.998673] igb 0000:05:00.0 enp5s0: Reset adapter
> [22043.819198] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ```
> 
> I've bumped into a similar issue in BZ 211911 [2] (see c#16),
> and in c#17 it was suggested it was a separate unrelated issue,
> hence I'm raising a new concern.
> 
> Please help in finding out why this woe happens and in fixing it.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=211911

Looks like igb_clean_tx_irq() should not return true if budget is 0,
ever, otherwise we risk hitting the min(work, budget - 1) which may
go negative.

So something like this? 

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c9e8c65a3cfe..7a237b5311ca 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8028,7 +8028,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
        unsigned int i = tx_ring->next_to_clean;
 
        if (test_bit(__IGB_DOWN, &adapter->state))
-               return true;
+               goto out;
 
        tx_buffer = &tx_ring->tx_buffer_info[i];
        tx_desc = IGB_TX_DESC(tx_ring, i);
@@ -8157,7 +8157,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
                                            tx_ring->queue_index);
 
                        /* we are about to reset, no point in enabling stuff */
-                       return true;
+                       goto out;
                }
        }
 
@@ -8180,7 +8180,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
                        u64_stats_update_end(&tx_ring->tx_syncp);
                }
        }
-
+out:
        return !!budget;
 }

WARNING: multiple messages have this Message-ID (diff)
From: Jakub Kicinski <kuba@kernel.org>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [igb] netconsole triggers warning in netpoll_poll_dev
Date: Tue, 6 Apr 2021 11:48:02 -0700	[thread overview]
Message-ID: <20210406114734.0e00cb2f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210406123619.rhvtr73xwwlbu2ll@spock.localdomain>

On Tue, 6 Apr 2021 14:36:19 +0200 Oleksandr Natalenko wrote:
> Hello.
> 
> I've raised this here [1] first, but was suggested to engage igb devs,
> so here we are.
> 
> I'm experiencing the following woes while using netconsole regularly:
> 
> ```
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> [22038.710802] Modules linked in: ...
> [22038.710835] CPU: 12 PID: 40362 Comm: systemd-sleep Not tainted 5.11.0-pf7 #1
> [22038.710836] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3302 03/05/2021
> [22038.710836] RIP: 0010:netpoll_poll_dev+0x18a/0x1a0
> [22038.710837] Code: 6e ff 80 3d d2 9d f8 00 00 0f 85 5c ff ff ff 48 8b 73 28 48 c7 c7 0c b8 21 84 89 44 24 04 c6 05 b6 9d f8 00 01 e8 84 21 1c 00 <0f> 0b 8b 54 24 04 e9 36 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> [22038.710838] RSP: 0018:ffffb24106e37ba0 EFLAGS: 00010086
> [22038.710838] RAX: 0000000000000000 RBX: ffff9599d2929c50 RCX: ffff959f8ed1ac30
> [22038.710839] RDX: 0000000000000000 RSI: 0000000000000023 RDI: ffff959f8ed1ac28
> [22038.710839] RBP: ffff9598981d4058 R08: 0000000000000019 R09: ffffb24206e3796d
> [22038.710839] R10: ffffffffffffffff R11: ffffb24106e37968 R12: ffff959887e51ec8
> [22038.710840] R13: 000000000000000c R14: 00000000ffffffff R15: ffff9599d2929c60
> [22038.710840] FS:  00007f3ade370a40(0000) GS:ffff959f8ed00000(0000) knlGS:0000000000000000
> [22038.710841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [22038.710841] CR2: 0000000000000000 CR3: 00000003017b0000 CR4: 0000000000350ee0
> [22038.710841] Call Trace:
> [22038.710842]  netpoll_send_skb+0x185/0x240
> [22038.710842]  write_msg+0xe5/0x100 [netconsole]
> [22038.710842]  console_unlock+0x37d/0x640
> [22038.710842]  ? __schedule+0x2e5/0xc90
> [22038.710843]  suspend_devices_and_enter+0x2ac/0x7f0
> [22038.710843]  pm_suspend.cold+0x321/0x36c
> [22038.710843]  state_store+0xa6/0x140
> [22038.710844]  kernfs_fop_write_iter+0x124/0x1b0
> [22038.710844]  new_sync_write+0x16a/0x200
> [22038.710844]  vfs_write+0x21c/0x2e0
> [22038.710844]  __x64_sys_write+0x6d/0xf0
> [22038.710845]  do_syscall_64+0x33/0x40
> [22038.710845]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [22038.710845] RIP: 0033:0x7f3adece10f7
> [22038.710846] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [22038.710847] RSP: 002b:00007ffc51c555b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [22038.710847] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3adece10f7
> [22038.710848] RDX: 0000000000000004 RSI: 00007ffc51c556a0 RDI: 0000000000000004
> [22038.710848] RBP: 00007ffc51c556a0 R08: 000055ea374302a0 R09: 00007f3aded770c0
> [22038.710849] R10: 00007f3aded76fc0 R11: 0000000000000246 R12: 0000000000000004
> [22038.710849] R13: 000055ea3742c430 R14: 0000000000000004 R15: 00007f3adedb3700
> [22038.710849] ---[ end trace 6eae54fbf23807f8 ]---
> ```
> 
> This one happened during suspend/resume cycle (on resume), followed by:
> 
> ```
> [22038.868669] igb 0000:05:00.0 enp5s0: Reset adapter
> [22040.998673] igb 0000:05:00.0 enp5s0: Reset adapter
> [22043.819198] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ```
> 
> I've bumped into a similar issue in BZ 211911 [2] (see c#16),
> and in c#17 it was suggested it was a separate unrelated issue,
> hence I'm raising a new concern.
> 
> Please help in finding out why this woe happens and in fixing it.
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=211911

Looks like igb_clean_tx_irq() should not return true if budget is 0,
ever, otherwise we risk hitting the min(work, budget - 1) which may
go negative.

So something like this? 

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c9e8c65a3cfe..7a237b5311ca 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8028,7 +8028,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
        unsigned int i = tx_ring->next_to_clean;
 
        if (test_bit(__IGB_DOWN, &adapter->state))
-               return true;
+               goto out;
 
        tx_buffer = &tx_ring->tx_buffer_info[i];
        tx_desc = IGB_TX_DESC(tx_ring, i);
@@ -8157,7 +8157,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
                                            tx_ring->queue_index);
 
                        /* we are about to reset, no point in enabling stuff */
-                       return true;
+                       goto out;
                }
        }
 
@@ -8180,7 +8180,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
                        u64_stats_update_end(&tx_ring->tx_syncp);
                }
        }
-
+out:
        return !!budget;
 }

  reply	other threads:[~2021-04-06 18:48 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-06 12:36 [igb] netconsole triggers warning in netpoll_poll_dev Oleksandr Natalenko
2021-04-06 12:36 ` [Intel-wired-lan] " Oleksandr Natalenko
2021-04-06 18:48 ` Jakub Kicinski [this message]
2021-04-06 18:48   ` Jakub Kicinski
2021-04-07  6:00   ` Oleksandr Natalenko
2021-04-07  6:00     ` [Intel-wired-lan] " Oleksandr Natalenko
2021-04-07 15:37     ` Jakub Kicinski
2021-04-07 15:37       ` [Intel-wired-lan] " Jakub Kicinski
2021-04-07 16:25       ` Alexander Duyck
2021-04-07 16:25         ` [Intel-wired-lan] " Alexander Duyck
2021-04-07 18:07         ` Jakub Kicinski
2021-04-07 18:07           ` [Intel-wired-lan] " Jakub Kicinski
2021-04-07 23:06           ` Alexander Duyck
2021-04-07 23:06             ` [Intel-wired-lan] " Alexander Duyck
2021-04-23  8:19             ` Oleksandr Natalenko
2021-04-23  8:19               ` [Intel-wired-lan] " Oleksandr Natalenko
2021-04-23 22:58               ` Jakub Kicinski
2021-04-23 22:58                 ` [Intel-wired-lan] " Jakub Kicinski
2021-04-26  6:47                 ` Oleksandr Natalenko
2021-04-26  6:47                   ` [Intel-wired-lan] " Oleksandr Natalenko
2021-04-26 15:28                   ` Alexander Duyck
2021-04-26 15:28                     ` [Intel-wired-lan] " Alexander Duyck
2021-05-06 23:32                     ` Jesse Brandeburg
2021-05-06 23:32                       ` [Intel-wired-lan] " Jesse Brandeburg
2021-05-07  0:38                       ` Alexander Duyck
2021-05-07  0:38                         ` [Intel-wired-lan] " Alexander Duyck
2021-11-18 15:30 Danielle Ratson
2021-11-18 16:14 ` Alexander Duyck
2021-11-21 11:44   ` Danielle Ratson
2021-11-21 21:16     ` Alexander Duyck
2021-11-23 20:36       ` Jesse Brandeburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210406114734.0e00cb2f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=alexander.duyck@gmail.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=oleksandr@natalenko.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.