Hello, We have started to observe kernel crashes on 6.7.y kernels (atm we have hit the issue 5 times on 6.7.5 and 6.7.10). On 6.6.9 where we have nodes of cluster it looks stable. Please see stacktrace below. If you need more information please let me know. We do not have a consistent reproducer but when we put some bigger network load on a VM, the hypervisor's kernel crashes. Help is much appreciated! We are happy to test any patches. [62254.167584] stack segment: 0000 [#1] PREEMPT SMP NOPTI [62254.173450] CPU: 63 PID: 11939 Comm: vhost-11890 Tainted: G E 6.7.10-1.gdc.el9.x86_64 #1 [62254.183743] Hardware name: Dell Inc. PowerEdge R7525/0H3K7P, BIOS 2.14.1 12/17/2023 [62254.192083] RIP: 0010:skb_release_data+0xb8/0x1e0 [62254.197357] Code: 48 83 c3 01 39 d8 7e 54 48 89 d8 48 c1 e0 04 41 80 7d 7e 00 49 8b 6c 04 30 79 0f 44 89 f6 48 89 ef e8 4c e4 ff ff 84 c0 75 d0 <48> 8b 45 08 a8 01 0f 85 09 01 00 00 e9 d9 00 00 00 0f 1f 44 00 00 [62254.217013] RSP: 0018:ffffa975a0247ba8 EFLAGS: 00010206 [62254.222692] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000785 [62254.230263] RDX: 0000000000000016 RSI: 0000000000000002 RDI: ffff989862b32b00 [62254.237878] RBP: 4f2b318c69a8b0f9 R08: 000000000001fe4d R09: 000000000000003a [62254.245417] R10: 0000000000000000 R11: 0000000000001736 R12: ffff9880b819aec0 [62254.252963] R13: ffff989862b32b00 R14: 0000000000000000 R15: 0000000000000002 [62254.260591] FS: 00007f6cf388bf80(0000) GS:ffff98b85fbc0000(0000) knlGS:0000000000000000 [62254.269061] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [62254.275170] CR2: 000000c002236020 CR3: 000000387d37a002 CR4: 0000000000770ef0 [62254.282733] PKRU: 55555554 [62254.285911] Call Trace: [62254.288884] <TASK> [62254.291549] ? die+0x33/0x90 [62254.294769] ? do_trap+0xe0/0x110 [62254.298405] ? do_error_trap+0x65/0x80 [62254.302471] ? exc_stack_segment+0x35/0x50 [62254.306884] ? asm_exc_stack_segment+0x22/0x30 [62254.311637] ? skb_release_data+0xb8/0x1e0 [62254.316047] kfree_skb_list_reason+0x6d/0x210 [62254.320697] ? free_unref_page_commit+0x80/0x2f0 [62254.325700] ? free_unref_page+0xe9/0x130 [62254.330013] skb_release_data+0xfc/0x1e0 [62254.334261] consume_skb+0x45/0xd0 [62254.338077] tun_do_read+0x68/0x1f0 [tun] [62254.342414] tun_recvmsg+0x7e/0x160 [tun] [62254.346696] handle_rx+0x3ab/0x750 [vhost_net] [62254.351488] vhost_worker+0x42/0x70 [vhost] [62254.355934] vhost_task_fn+0x4b/0xb0 [62254.359758] ? finish_task_switch.isra.0+0x8f/0x2a0 [62254.364882] ? __pfx_vhost_task_fn+0x10/0x10 [62254.369390] ? __pfx_vhost_task_fn+0x10/0x10 [62254.373888] ret_from_fork+0x2d/0x50 [62254.377687] ? __pfx_vhost_task_fn+0x10/0x10 [62254.382169] ret_from_fork_asm+0x1b/0x30 [62254.386310] </TASK> [62254.388705] Modules linked in: nf_tables(E) nf_conntrack_netlink(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) mptcp_diag(E) xsk_diag(E) udp_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) netconsole(E) ib_core(E) scsi_transport_iscsi(E) sch_ingress(E) target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) target_core_iblock(E) iscsi_target_mod(E) target_core_mod(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E) nfnetlink_cttimeout(E) nfnetlink(E) openvswitch(E) nf_conncount(E) nf_nat(E) binfmt_misc(E) dell_rbu(E) sunrpc(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) dell_smbios(E) acpi_ipmi(E) dcdbas(E) dell_wmi_descriptor(E) wmi_bmof(E) rapl(E) ipmi_si(E) acp Mar 19 09:40:16 10.12.17.70 i_cpufreq(E) ipmi_devintf(E) [62254.388751] mgag200(E) i2c_algo_bit(E) ptdma(E) wmi(E) i2c_piix4(E) k10temp(E) ipmi_msghandler(E) acpi_power_meter(E) fuse(E) zram(E) ext4(E) mbcache(E) jbd2(E) dm_crypt(E) sd_mod(E) t10_pi(E) sg(E) ice(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ahci(E) libahci(E) ghash_clmulni_intel(E) sha512_ssse3(E) libata(E) megaraid_sas(E) ccp(E) gnss(E) sp5100_tco(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) [62254.480070] Unloaded tainted modules: fjes(E):2 padlock_aes(E):3 [62254.537711] ---[ end trace 0000000000000000 ]--- Thanks in advance!
On 2024-03-19 09:29, Michael S. Tsirkin wrote:
> On Tue, Mar 19, 2024 at 09:21:06AM +0100, Tobias Huschle wrote:
>> On 2024-03-15 11:31, Michael S. Tsirkin wrote:
>> > On Fri, Mar 15, 2024 at 09:33:49AM +0100, Tobias Huschle wrote:
>> > > On Thu, Mar 14, 2024 at 11:09:25AM -0400, Michael S. Tsirkin wrote:
>> > > >
>> >
>> > Could you remind me pls, what is the kworker doing specifically that
>> > vhost is relying on?
>>
>> The kworker is handling the actual data moving in memory if I'm not
>> mistaking.
>
> I think that is the vhost process itself. Maybe you mean the
> guest thread versus the vhost thread then?
My understanding was that vhost writes data into a file descriptor which
then triggers eventfd.
That's at least how I read the vhost code if I remember correctly.
The handler beneath (the kworker) then runs the actual instructions that
move the data to the receiving vhost on the other end of the connection.
Again, I might be wrong here.
Hi Maxime, Apologies for the delay in my response. On 3/6/24 19:05, Maxime Chevallier wrote: > Hello John, > > I'm adding Andrew and Russell to the thread as PHY maintainers and > reviewers. > > On Wed, 6 Mar 2024 13:37:45 +0000 > John Ernberg <john.ernberg@actia.se> wrote: > >> Since the power management is now performed by the FEC instead of generic >> pm the PHY will not suspend until the link has been up. >> >> Therefor suspend it on probe. It will be resumed by {of_,}phy_connect() >> when the link is brought up. >> >> Since {of_,}phy_connect() and phy_disconnect() will resume and suspend the >> PHY when the link is brought up and down respectively, and phy_stop() and >> phy_start() will resume and suspend the PHY in the suspend-resume paths >> there is no need for any additional calls anywhere. >> >> Signed-off-by: John Ernberg <john.ernberg@actia.se> > > [...] > >> @@ -2539,8 +2539,10 @@ static int fec_enet_mii_init(struct platform_device *pdev) >> /* find all the PHY devices on the bus and set mac_managed_pm to true */ >> for (addr = 0; addr < PHY_MAX_ADDR; addr++) { >> phydev = mdiobus_get_phy(fep->mii_bus, addr); >> - if (phydev) >> + if (phydev) { >> phydev->mac_managed_pm = true; >> + phy_suspend(phydev); >> + } > > I don't think that's correct. here phy_suspend() is being called before > the PHY got attached, so the PHY wasn't initialized at all at that > point (which I guess is your issue as the PHY is still in the state it > was configured into by the bootloader) > > Following the code paths, it looks like this works for you because the > PHY you're using has a .suspend callback populated, but for any PHY > that uses the genphy driver, this will do nothing at all (the PHY isn't > yet attached to the genphy ops, therefore genphy_suspend won't be > called). Thanks for highlighting this. Yes, it's a problem for genphy, although due to when genphy is probed, it's always been a problem for genphy, even before this patch. Whereas PHYs with specific drivers worked before due to MDIO bus PM. There is also a case where the phy driver module is not automatically loaded, in cases where request_module() fails, either due to the userspace helper feature being compiled out or other reasons, and the module is loaded manually later. I suspect for reasons like these the genphy probe happens so late. My solution here doesn't cover non-loaded modules either, but this could maybe be covered by moving phy_suspend() to phy_probe(). Unless there is an even more clever way to go about it which I can't see from inexperience. If a PHY driver doesn't populate .suspend there's probably a good reason for it and it makes sense to not suspend such a PHY, so I'm not concerned about an unpopulated .suspend. Best regards // John Ernberg > > Best regards, > > Maxime
On Tue, Mar 19, 2024 at 08:37:44AM +0000, John Ernberg wrote: > There is also a case where the phy driver module is not automatically > loaded, in cases where request_module() fails, either due to the > userspace helper feature being compiled out or other reasons, and the > module is loaded manually later. I suspect for reasons like these the > genphy probe happens so late. My solution here doesn't cover non-loaded > modules either, but this could maybe be covered by moving phy_suspend() > to phy_probe(). Unless there is an even more clever way to go about it > which I can't see from inexperience. Note that in the case where the PHY driver module is loaded late, phy_probe() won't be called for the PHY until that happens. I would say if one wants a platform to behave with minimal power consumption, that is something that has to be done across the software stack, and that includes the boot firmware. So, if one wants the PHY to be in a low power state at boot time, then firmware needs to ensure that happens. Trying to shoe-horn that into the kernel isn't going to work because we get to decide what to do with the PHY way too late (due to PHY drivers being modular and on the rootfs.) -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Mar 18, 2024 at 04:13:28PM -0700, Feng Wang wrote: > From: wangfe <wangfe@google.com> > > When there are multiple ipsec sessions, packet offload driver > can use the index to distinguish the packets from the different > sessions even though xfrm_selector are same. Do we have such "packet offload driver" in the kernel tree? Thanks > Thus each packet is handled corresponding to its session parameter. > > Signed-off-by: wangfe <wangfe@google.com> > --- > net/xfrm/xfrm_interface_core.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c > index 21d50d75c260..996571af53e5 100644 > --- a/net/xfrm/xfrm_interface_core.c > +++ b/net/xfrm/xfrm_interface_core.c > @@ -506,7 +506,9 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) > xfrmi_scrub_packet(skb, !net_eq(xi->net, dev_net(dev))); > skb_dst_set(skb, dst); > skb->dev = tdev; > - > +#ifdef CONFIG_XFRM_OFFLOAD > + skb->skb_iif = if_id; > +#endif > err = dst_output(xi->net, skb->sk, skb); > if (net_xmit_eval(err) == 0) { > dev_sw_netstats_tx_add(dev, 1, length); > -- > 2.44.0.291.gc1ea87d7ee-goog > >
On Tue, Mar 19, 2024 at 09:21:06AM +0100, Tobias Huschle wrote:
> On 2024-03-15 11:31, Michael S. Tsirkin wrote:
> > On Fri, Mar 15, 2024 at 09:33:49AM +0100, Tobias Huschle wrote:
> > > On Thu, Mar 14, 2024 at 11:09:25AM -0400, Michael S. Tsirkin wrote:
> > > >
> >
> > Could you remind me pls, what is the kworker doing specifically that
> > vhost is relying on?
>
> The kworker is handling the actual data moving in memory if I'm not
> mistaking.
I think that is the vhost process itself. Maybe you mean the
guest thread versus the vhost thread then?
On 2024-03-15 11:31, Michael S. Tsirkin wrote:
> On Fri, Mar 15, 2024 at 09:33:49AM +0100, Tobias Huschle wrote:
>> On Thu, Mar 14, 2024 at 11:09:25AM -0400, Michael S. Tsirkin wrote:
>> >
>
> Could you remind me pls, what is the kworker doing specifically that
> vhost is relying on?
The kworker is handling the actual data moving in memory if I'm not
mistaking.
The following changes since commit e8f897f4afef0031fe618a8e94127a0934896aba: Linux 6.8 (2024-03-10 13:38:09 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 5da7137de79ca6ffae3ace77050588cdf5263d33: virtio_net: rename free_old_xmit_skbs to free_old_xmit (2024-03-19 03:19:22 -0400) ---------------------------------------------------------------- virtio: features, fixes Per vq sizes in vdpa. Info query for block devices support in vdpa. DMA sync callbacks in vduse. Fixes, cleanups. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ---------------------------------------------------------------- Andrew Melnychenko (1): vhost: Added pad cleanup if vnet_hdr is not present. David Hildenbrand (1): virtio: reenable config if freezing device failed Jason Wang (2): virtio-net: convert rx mode setting to use workqueue virtio-net: add cond_resched() to the command waiting loop Jonah Palmer (1): vdpa/mlx5: Allow CVQ size changes Maxime Coquelin (1): vduse: implement DMA sync callbacks Ricardo B. Marliere (2): vdpa: make vdpa_bus const virtio: make virtio_bus const Shannon Nelson (1): vdpa/pds: fixes for VF vdpa flr-aer handling Steve Sistare (2): vdpa_sim: reset must not run vdpa: skip suspend/resume ops if not DRIVER_OK Suzuki K Poulose (1): virtio: uapi: Drop __packed attribute in linux/virtio_pci.h Xuan Zhuo (3): virtio: packed: fix unmap leak for indirect desc table virtio_net: unify the code for recycling the xmit ptr virtio_net: rename free_old_xmit_skbs to free_old_xmit Zhu Lingshan (20): vhost-vdpa: uapi to support reporting per vq size vDPA: introduce get_vq_size to vdpa_config_ops vDPA/ifcvf: implement vdpa_config_ops.get_vq_size vp_vdpa: implement vdpa_config_ops.get_vq_size eni_vdpa: implement vdpa_config_ops.get_vq_size vdpa_sim: implement vdpa_config_ops.get_vq_size for vDPA simulator vduse: implement vdpa_config_ops.get_vq_size for vduse virtio_vdpa: create vqs with the actual size vDPA/ifcvf: get_max_vq_size to return max size vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min vDPA: report virtio-block capacity to user space vDPA: report virtio-block max segment size to user space vDPA: report virtio-block block-size to user space vDPA: report virtio-block max segments in a request to user space vDPA: report virtio-block MQ info to user space vDPA: report virtio-block topology info to user space vDPA: report virtio-block discarding configuration to user space vDPA: report virtio-block write zeroes configuration to user space vDPA: report virtio-block read-only info to user space vDPA: report virtio-blk flush info to user space drivers/net/virtio_net.c | 151 +++++++++++++++--------- drivers/vdpa/alibaba/eni_vdpa.c | 8 ++ drivers/vdpa/ifcvf/ifcvf_base.c | 11 +- drivers/vdpa/ifcvf/ifcvf_base.h | 2 + drivers/vdpa/ifcvf/ifcvf_main.c | 15 +++ drivers/vdpa/mlx5/net/mlx5_vnet.c | 13 ++- drivers/vdpa/pds/aux_drv.c | 2 +- drivers/vdpa/pds/vdpa_dev.c | 20 +++- drivers/vdpa/pds/vdpa_dev.h | 1 + drivers/vdpa/vdpa.c | 214 ++++++++++++++++++++++++++++++++++- drivers/vdpa/vdpa_sim/vdpa_sim.c | 15 ++- drivers/vdpa/vdpa_user/iova_domain.c | 27 ++++- drivers/vdpa/vdpa_user/iova_domain.h | 8 ++ drivers/vdpa/vdpa_user/vduse_dev.c | 34 ++++++ drivers/vdpa/virtio_pci/vp_vdpa.c | 8 ++ drivers/vhost/net.c | 3 + drivers/vhost/vdpa.c | 14 +++ drivers/virtio/virtio.c | 6 +- drivers/virtio/virtio_ring.c | 6 +- drivers/virtio/virtio_vdpa.c | 5 +- include/linux/vdpa.h | 6 + include/uapi/linux/vdpa.h | 17 +++ include/uapi/linux/vhost.h | 7 ++ include/uapi/linux/virtio_pci.h | 10 +- 24 files changed, 521 insertions(+), 82 deletions(-)
On Tue, Mar 12, 2024 at 01:55:22PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> The missing check of x->encap caused to the situation where GSO packets
> were created with UDP encapsulation.
>
> As a solution return the encap check for non-offloaded SA.
>
> Fixes: 9f2b55961a80 ("xfrm: Pass UDP encapsulation in TX packet offload")
> Closes: https://lore.kernel.org/all/a650221ae500f0c7cf496c61c96c1b103dcb6f67.camel@redhat.com
> Reported-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Applied, thanks Leon!
On Fri, Mar 08, 2024 at 05:26:00PM +0200, Dragos Tatulea wrote: > When the skb is reorganized during esp_output (!esp->inline), the pages > coming from the original skb fragments are supposed to be released back > to the system through put_page. But if the skb fragment pages are > originating from a page_pool, calling put_page on them will trigger a > page_pool leak which will eventually result in a crash. > > This leak can be easily observed when using CONFIG_DEBUG_VM and doing > ipsec + gre (non offloaded) forwarding: ... > The suggested fix is to introduce a new wrapper (skb_page_unref) that > covers page refcounting for page_pool pages as well. > > Cc: stable@vger.kernel.org > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > Reported-and-tested-by: Anatoli N.Chechelnickiy <Anatoli.Chechelnickiy@m.interpipe.biz> > Reported-by: Ian Kumlien <ian.kumlien@gmail.com> > Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.com > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > Reviewed-by: Mina Almasry <almasrymina@google.com> > Reviewed-by: Jakub Kicinski <kuba@kernel.org> Applied to the ipsec tree, thanks a lot!
On Tue, Mar 12, 2024 at 11:35:47AM +0800, Xuan Zhuo wrote: > As discussed: > > http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com > > If the virtio is premapped mode, the driver should manage the dma info by self. > So the virtio core should not store the dma info. We can release the memory used > to store the dma info. > > For virtio-net xmit queue, if the virtio-net maintains the dma info, > the virtio-net must allocate too much memory(19 * queue_size for per-queue), so > we do not plan to make the virtio-net to maintain the dma info by default. The > virtio-net xmit queue only maintain the dma info when premapped mode is enable > (such as AF_XDP is enable). This landed when merge window was open already so I'm deferring this to the next merge window, just to be safe. Jason can you review please? > So this patch set try to do: > > 1. make the virtio core to do not store the dma info > - But if the desc_extra has not dma info, we face a new question, > it is hard to get the dma info of the desc with indirect flag. > For split mode, that is easy from desc, but for the packed mode, > it is hard to get the dma info from the desc. And hardening > the dma unmap is safe, we should store the dma info of indirect > descs when the virtio core does not store the bufer dma info. > > So I introduce the "structure the indirect desc table" to > allocate space to store dma info of the desc table. > > +struct vring_split_desc_indir { > + dma_addr_t addr; /* Descriptor Array DMA addr. */ > + u32 len; /* Descriptor Array length. */ > + u32 num; > + struct vring_desc desc[]; > +}; > > The follow patches to this: > * virtio_ring: packed: structure the indirect desc table > * virtio_ring: split: structure the indirect desc table > > - On the other side, in the umap handle, we mix the indirect descs with > other descs. That make things too complex. I found if we we distinguish > the descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer. > > The follow patches do this. > * virtio_ring: packed: remove double check of the unmap ops > * virtio_ring: split: structure the indirect desc table > > 2. make the virtio core to enable premapped mode by find_vqs() params > - Because the find_vqs() will try to allocate memory for the dma info. > If we set the premapped mode after find_vqs() and release the > dma info, that is odd. > > > Please review. > > Thanks > > v4: > 1. virtio-net xmit queue does not enable premapped mode by default > > v3: > 1. fix the conflict with the vp_modern_create_avq(). > > v2: > 1. change the dma item of virtio-net, every item have MAX_SKB_FRAGS + 2 addr + len pairs. > 2. introduce virtnet_sq_free_stats for __free_old_xmit > > v1: > 1. rename transport_vq_config to vq_transport_config > 2. virtio-net set dma meta number to (ring-size + 1)(MAX_SKB_FRGAS +2) > 3. introduce virtqueue_dma_map_sg_attrs > 4. separate vring_create_virtqueue to an independent commit > > Xuan Zhuo (10): > virtio_ring: introduce vring_need_unmap_buffer > virtio_ring: packed: remove double check of the unmap ops > virtio_ring: packed: structure the indirect desc table > virtio_ring: split: remove double check of the unmap ops > virtio_ring: split: structure the indirect desc table > virtio_ring: no store dma info when unmap is not needed > virtio: find_vqs: add new parameter premapped > virtio_ring: export premapped to driver by struct virtqueue > virtio_net: set premapped mode by find_vqs() > virtio_ring: virtqueue_set_dma_premapped support disable > > drivers/net/virtio_net.c | 57 +++-- > drivers/virtio/virtio_ring.c | 436 +++++++++++++++++++++------------- > include/linux/virtio.h | 3 +- > include/linux/virtio_config.h | 17 +- > 4 files changed, 307 insertions(+), 206 deletions(-) > > -- > 2.32.0.3.g01195cf9f
> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Friday, March 15, 2024 9:18 PM
> To: Vyavahare, Tushar <tushar.vyavahare@intel.com>
> Cc: bpf <bpf@vger.kernel.org>; Network Development
> <netdev@vger.kernel.org>; Björn Töpel <bjorn@kernel.org>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>; Jonathan Lemon
> <jonathan.lemon@gmail.com>; David S. Miller <davem@davemloft.net>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Alexei
> Starovoitov <ast@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>;
> Sarkar, Tirthendu <tirthendu.sarkar@intel.com>
> Subject: Re: [PATCH bpf-next 4/6] selftests/xsk: implement set_hw_ring_size
> function to configure interface ring size
>
> On Fri, Mar 15, 2024 at 7:23 AM Tushar Vyavahare
> <tushar.vyavahare@intel.com> wrote:
> >
> > Introduce a new function called set_hw_ring_size that allows for the
> > dynamic configuration of the ring size within the interface.
> >
> > Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
> > ---
> > tools/testing/selftests/bpf/xskxceiver.c | 35
> > ++++++++++++++++++++++++
> > 1 file changed, 35 insertions(+)
> >
> > diff --git a/tools/testing/selftests/bpf/xskxceiver.c
> > b/tools/testing/selftests/bpf/xskxceiver.c
> > index 32005bfb9c9f..aafa78307586 100644
> > --- a/tools/testing/selftests/bpf/xskxceiver.c
> > +++ b/tools/testing/selftests/bpf/xskxceiver.c
> > @@ -441,6 +441,41 @@ static int get_hw_ring_size(struct ifobject *ifobj)
> > return 0;
> > }
> >
> > +static int set_hw_ring_size(struct ifobject *ifobj, u32 tx, u32 rx) {
> > + struct ethtool_ringparam ring_param = {0};
> > + struct ifreq ifr = {0};
> > + int sockfd, ret;
> > + u32 ctr = 0;
> > +
> > + sockfd = socket(AF_INET, SOCK_DGRAM, 0);
> > + if (sockfd < 0)
> > + return errno;
> > +
> > + memcpy(ifr.ifr_name, ifobj->ifname, sizeof(ifr.ifr_name));
> > +
> > + ring_param.tx_pending = tx;
> > + ring_param.rx_pending = rx;
> > +
> > + ring_param.cmd = ETHTOOL_SRINGPARAM;
> > + ifr.ifr_data = (char *)&ring_param;
> > +
> > + while (ctr++ < SOCK_RECONF_CTR) {
> > + ret = ioctl(sockfd, SIOCETHTOOL, &ifr);
> > + if (!ret)
> > + break;
> > + /* Retry if it fails */
> > + if (ctr >= SOCK_RECONF_CTR) {
> > + close(sockfd);
> > + return errno;
> > + }
> > + usleep(USLEEP_MAX);
>
> Does it really have to sleep or copy paste from other places?
> This ioctl() is supposed to do synchronous config, no?
Response in the previous mail to Stanislav Fomichev <sdf@google.com>
> -----Original Message----- > From: Stanislav Fomichev <sdf@google.com> > Sent: Friday, March 15, 2024 11:14 PM > To: Vyavahare, Tushar <tushar.vyavahare@intel.com> > Cc: bpf@vger.kernel.org; netdev@vger.kernel.org; bjorn@kernel.org; Karlsson, > Magnus <magnus.karlsson@intel.com>; Fijalkowski, Maciej > <maciej.fijalkowski@intel.com>; jonathan.lemon@gmail.com; > davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com; > ast@kernel.org; daniel@iogearbox.net; Sarkar, Tirthendu > <tirthendu.sarkar@intel.com> > Subject: Re: [PATCH bpf-next 4/6] selftests/xsk: implement set_hw_ring_size > function to configure interface ring size > > On 03/15, Tushar Vyavahare wrote: > > Introduce a new function called set_hw_ring_size that allows for the > > dynamic configuration of the ring size within the interface. > > > > Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> > > --- > > tools/testing/selftests/bpf/xskxceiver.c | 35 > > ++++++++++++++++++++++++ > > 1 file changed, 35 insertions(+) > > > > diff --git a/tools/testing/selftests/bpf/xskxceiver.c > > b/tools/testing/selftests/bpf/xskxceiver.c > > index 32005bfb9c9f..aafa78307586 100644 > > --- a/tools/testing/selftests/bpf/xskxceiver.c > > +++ b/tools/testing/selftests/bpf/xskxceiver.c > > @@ -441,6 +441,41 @@ static int get_hw_ring_size(struct ifobject *ifobj) > > return 0; > > } > > > > +static int set_hw_ring_size(struct ifobject *ifobj, u32 tx, u32 rx) > > Hmm, now that we have set/get, should we put them into network_helpers.c? > These seem pretty generic if you accept iface_name + ethtool_ringparam in > the api. > Clean version of get_hw_ring_size() and set_hw_ring_size() both are moved to network_helpers.c > > +{ > > + struct ethtool_ringparam ring_param = {0}; > > + struct ifreq ifr = {0}; > > + int sockfd, ret; > > + u32 ctr = 0; > > + > > + sockfd = socket(AF_INET, SOCK_DGRAM, 0); > > + if (sockfd < 0) > > + return errno; > > + > > + memcpy(ifr.ifr_name, ifobj->ifname, sizeof(ifr.ifr_name)); > > + > > + ring_param.tx_pending = tx; > > + ring_param.rx_pending = rx; > > + > > + ring_param.cmd = ETHTOOL_SRINGPARAM; > > + ifr.ifr_data = (char *)&ring_param; > > + > > [..] > > > + while (ctr++ < SOCK_RECONF_CTR) { > > Is it to retry EINTR? Retrying something else doesn't make sense probably? So > maybe do only errno==EINTR cases? Will make it more generic and not > dependent on SOCK_RECONF_CTR. > > The close of an AF_XDP socket is an asynchronous operation. When an AF_XDP socket is active, changing the ring size is forbidden. Therefore, when we call set_hw_ring_size(), a previous AF_XDP socket might still be in the process of being closed and the ioct() will then fail, as it is forbidden to change the ring size when there is an active AF_XDP socket. When the AF_XDP socket is active, we are getting an EBUSY error. I will handle the retry logic for this in a separate patch for xskxceiver.c. > > + ret = ioctl(sockfd, SIOCETHTOOL, &ifr); > > + if (!ret) > > + break; > > + /* Retry if it fails */ > > + if (ctr >= SOCK_RECONF_CTR) { > > + close(sockfd); > > + return errno; > > + } > > [..] > > > + usleep(USLEEP_MAX); > > Same here. Not sure what's the purpose of sleep? Alternatively, maybe clarify > in the commit description what particular error case we want to retry. Removed this retry logic from set_hw_ring_size() , which moved to networks_helpers.c. I will handle the retry logic with sleep for this in a separate patch for xskxceiver.c and I will put this in the commit message.
> -----Original Message----- > From: Stanislav Fomichev <sdf@google.com> > Sent: Friday, March 15, 2024 11:11 PM > To: Vyavahare, Tushar <tushar.vyavahare@intel.com> > Cc: bpf@vger.kernel.org; netdev@vger.kernel.org; bjorn@kernel.org; Karlsson, > Magnus <magnus.karlsson@intel.com>; Fijalkowski, Maciej > <maciej.fijalkowski@intel.com>; jonathan.lemon@gmail.com; > davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com; > ast@kernel.org; daniel@iogearbox.net; Sarkar, Tirthendu > <tirthendu.sarkar@intel.com> > Subject: Re: [PATCH bpf-next 3/6] selftests/xsk: implement get_hw_ring_size > function to retrieve current and max interface size > > On 03/15, Tushar Vyavahare wrote: > > Introduce a new function called get_hw_size that retrieves both the > > current and maximum size of the interface and stores this information > > in the 'hw_ring' structure. > > > > Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> > > --- > > tools/testing/selftests/bpf/xskxceiver.c | 32 > > ++++++++++++++++++++++++ tools/testing/selftests/bpf/xskxceiver.h | > > 8 ++++++ > > 2 files changed, 40 insertions(+) > > > > diff --git a/tools/testing/selftests/bpf/xskxceiver.c > > b/tools/testing/selftests/bpf/xskxceiver.c > > index eaa102c8098b..32005bfb9c9f 100644 > > --- a/tools/testing/selftests/bpf/xskxceiver.c > > +++ b/tools/testing/selftests/bpf/xskxceiver.c > > @@ -81,6 +81,8 @@ > > #include <linux/mman.h> > > #include <linux/netdev.h> > > #include <linux/bitmap.h> > > +#include <linux/sockios.h> > > +#include <linux/ethtool.h> > > #include <arpa/inet.h> > > #include <net/if.h> > > #include <locale.h> > > @@ -95,6 +97,7 @@ > > #include <sys/socket.h> > > #include <sys/time.h> > > #include <sys/types.h> > > +#include <sys/ioctl.h> > > #include <unistd.h> > > > > #include "xsk_xdp_progs.skel.h" > > @@ -409,6 +412,35 @@ static void parse_command_line(struct ifobject > *ifobj_tx, struct ifobject *ifobj > > } > > } > > > > +static int get_hw_ring_size(struct ifobject *ifobj) { > > + struct ethtool_ringparam ring_param = {0}; > > + struct ifreq ifr = {0}; > > + int sockfd; > > + > > + sockfd = socket(AF_INET, SOCK_DGRAM, 0); > > + if (sockfd < 0) > > + return errno; > > + > > + memcpy(ifr.ifr_name, ifobj->ifname, sizeof(ifr.ifr_name)); > > + > > + ring_param.cmd = ETHTOOL_GRINGPARAM; > > + ifr.ifr_data = (char *)&ring_param; > > + > > + if (ioctl(sockfd, SIOCETHTOOL, &ifr) < 0) { > > + close(sockfd); > > + return errno; > > close(sockfd) can potentially override the errno. Also, return -errno to match > the other cases where errors are < 0. > I will do it. > > + } > > + > > + ifobj->ring.default_tx = ring_param.tx_pending; > > + ifobj->ring.default_rx = ring_param.rx_pending; > > + ifobj->ring.max_tx = ring_param.tx_max_pending; > > + ifobj->ring.max_rx = ring_param.rx_max_pending; > > + > > + close(sockfd); > > + return 0; > > +} > > + > > static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx, > > struct ifobject *ifobj_rx) > > { > > diff --git a/tools/testing/selftests/bpf/xskxceiver.h > > b/tools/testing/selftests/bpf/xskxceiver.h > > index 425304e52f35..4f58b70fa781 100644 > > --- a/tools/testing/selftests/bpf/xskxceiver.h > > +++ b/tools/testing/selftests/bpf/xskxceiver.h > > @@ -114,6 +114,13 @@ struct pkt_stream { > > bool verbatim; > > }; > > > > +struct hw_ring { > > + u32 default_tx; > > + u32 default_rx; > > + u32 max_tx; > > + u32 max_rx; > > +}; > > + > > struct ifobject; > > struct test_spec; > > typedef int (*validation_func_t)(struct ifobject *ifobj); @@ -130,6 > > +137,7 @@ struct ifobject { > > struct xsk_xdp_progs *xdp_progs; > > struct bpf_map *xskmap; > > struct bpf_program *xdp_prog; > > + struct hw_ring ring; > > Any reason not to store ethtool_ringparam directly here? No need to > introduce new hw_ring. I will use ethtool_ringparam directly for get_hw_ring_size().
> -----Original Message-----
> From: Stanislav Fomichev <sdf@google.com>
> Sent: Friday, March 15, 2024 11:04 PM
> To: Vyavahare, Tushar <tushar.vyavahare@intel.com>
> Cc: bpf@vger.kernel.org; netdev@vger.kernel.org; bjorn@kernel.org;
> Karlsson, Magnus <magnus.karlsson@intel.com>; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>; jonathan.lemon@gmail.com;
> davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> ast@kernel.org; daniel@iogearbox.net; Sarkar, Tirthendu
> <tirthendu.sarkar@intel.com>
> Subject: Re: [PATCH bpf-next 1/6] tools/include: add ethtool_ringparam
> definition to UAPI header
>
> On 03/15, Tushar Vyavahare wrote:
> > Introduce the definition for ethtool_ringparam in the UAPI header
> > located in the include directory. This is needed by the next patches
> > as they run tests with various ring sizes.
>
> Any reason not to 'cp {,tools/}include/uapi/linux/ethtool.h' instead?
> Less divergence should be easier to support/understand.
Sure, I will do it.
On 12/03/2024 0:30, Michael Liang wrote: > The mlx5 comp irq name scheme is changed a little bit between > commit 3663ad34bc70 ("net/mlx5: Shift control IRQ to the last index") > and commit 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation"). > The index in the comp irq name used to start from 0 but now it starts > from 1. There is nothing critical here, but it's harmless to change > back to the old behavior, a.k.a starting from 0. > > Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") > Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> > Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com> > Signed-off-by: Michael Liang <mliang@purestorage.com> Reviewed-by: Shay Drory <shayd@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c > index 4dcf995cb1a2..6bac8ad70ba6 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c > @@ -19,6 +19,7 @@ > #define MLX5_IRQ_CTRL_SF_MAX 8 > /* min num of vectors for SFs to be enabled */ > #define MLX5_IRQ_VEC_COMP_BASE_SF 2 > +#define MLX5_IRQ_VEC_COMP_BASE 1 > > #define MLX5_EQ_SHARE_IRQ_MAX_COMP (8) > #define MLX5_EQ_SHARE_IRQ_MAX_CTRL (UINT_MAX) > @@ -246,6 +247,7 @@ static void irq_set_name(struct mlx5_irq_pool *pool, char *name, int vecidx) > return; > } > > + vecidx -= MLX5_IRQ_VEC_COMP_BASE; > snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", vecidx); > } > > @@ -585,7 +587,7 @@ struct mlx5_irq *mlx5_irq_request_vector(struct mlx5_core_dev *dev, u16 cpu, > struct mlx5_irq_table *table = mlx5_irq_table_get(dev); > struct mlx5_irq_pool *pool = table->pcif_pool; > struct irq_affinity_desc af_desc; > - int offset = 1; > + int offset = MLX5_IRQ_VEC_COMP_BASE; > > if (!pool->xa_num_irqs.max) > offset = 0;
Hello,
Sorry for inconvenience. Please ignore this mail only.
By mistake it was not out.
Thanks,
Raju
> -----Original Message-----
> From: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
> Sent: Tuesday, March 19, 2024 11:21 AM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; kuba@kernel.org; pabeni@redhat.com;
> edumazet@google.com; linux-kernel@vger.kernel.org; Bryan Whitehead -
> C21958 <Bryan.Whitehead@microchip.com>; UNGLinuxDriver
> <UNGLinuxDriver@microchip.com>
> Subject: [PATCH net V1 3/3] net: lan743x: Address problems with wake option
> flags configuration sequences
>
> WOL secure-on and magic packet configuration table:
> --------------------------------------------------------------------------------
> | Ethtool Ops | Send magic packet | Send magic packet | Send magic packet
> |
> | | | with password | with wrong password|
> --------------------------------------------------------------------------------
> |WAKE_MAGIC (g) | wake | wake | wake |
> --------------------------------------------------------------------------------
> |WAKE_SECURE_MAGIC| no wake | wake | no wake |
> | (s) | | | |
> --------------------------------------------------------------------------------
> | WAKE_MAGIC & | | | |
> |WAKE_SECURE_MAGIC| wake | wake | wake |
> | (gs) | | | |
> --------------------------------------------------------------------------------
>
> Fixes: 6b3768ac8e2b3 ("net: lan743x: Add support to Secure-ON WOL")
> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
> ---
> Change List:
> ------------
> V0 -> V1:
> - Fix the wake option flags configuration sequences
>
> drivers/net/ethernet/microchip/lan743x_ethtool.c | 3 +--
> drivers/net/ethernet/microchip/lan743x_main.c | 8 +++++++-
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c
> b/drivers/net/ethernet/microchip/lan743x_ethtool.c
> index 4899582b3d1d..442c52aa0b0e 100644
> --- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
> +++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
> @@ -1188,8 +1188,7 @@ static int lan743x_ethtool_set_wol(struct
> net_device *netdev,
> adapter->wolopts |= WAKE_PHY;
> if (wol->wolopts & WAKE_ARP)
> adapter->wolopts |= WAKE_ARP;
> - if (wol->wolopts & WAKE_MAGICSECURE &&
> - wol->wolopts & WAKE_MAGIC) {
> + if (wol->wolopts & WAKE_MAGICSECURE) {
> memcpy(adapter->sopass, wol->sopass, sizeof(wol->sopass));
> adapter->wolopts |= WAKE_MAGICSECURE;
> } else {
> diff --git a/drivers/net/ethernet/microchip/lan743x_main.c
> b/drivers/net/ethernet/microchip/lan743x_main.c
> index 5641b466d70d..43e8e35fe9d0 100644
> --- a/drivers/net/ethernet/microchip/lan743x_main.c
> +++ b/drivers/net/ethernet/microchip/lan743x_main.c
> @@ -3639,9 +3639,15 @@ static void lan743x_pm_set_wol(struct
> lan743x_adapter *adapter)
> lan743x_csr_write(adapter, MAC_MP_SO_LO, sopass);
> sopass = *(u16 *)&adapter->sopass[4];
> lan743x_csr_write(adapter, MAC_MP_SO_HI, sopass);
> - wucsr |= MAC_MP_SO_EN_;
> + wucsr |= MAC_MP_SO_EN_ | MAC_WUCSR_MPEN_;
> + macrx |= MAC_RX_RXEN_;
> + pmtctl |= PMT_CTL_WOL_EN_ |
> PMT_CTL_MAC_D3_RX_CLK_OVR_;
> }
>
> + if (adapter->wolopts & WAKE_MAGICSECURE &&
> + adapter->wolopts & WAKE_MAGIC)
> + wucsr &= ~MAC_MP_SO_EN_;
> +
> lan743x_csr_write(adapter, MAC_WUCSR, wucsr);
> lan743x_csr_write(adapter, PMT_CTL, pmtctl);
> lan743x_csr_write(adapter, MAC_RX, macrx);
> --
> 2.34.1
WOL secure-on and magic packet configuration table: -------------------------------------------------------------------------------- | Ethtool Ops | Send magic packet | Send magic packet | Send magic packet | | | | with password | with wrong password| -------------------------------------------------------------------------------- |WAKE_MAGIC (g) | wake | wake | wake | -------------------------------------------------------------------------------- |WAKE_SECURE_MAGIC| no wake | wake | no wake | | (s) | | | | -------------------------------------------------------------------------------- | WAKE_MAGIC & | | | | |WAKE_SECURE_MAGIC| wake | wake | wake | | (gs) | | | | -------------------------------------------------------------------------------- Fixes: 6b3768ac8e2b3 ("net: lan743x: Add support to Secure-ON WOL") Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> --- Change List: ------------ V0 -> V1: - Fix the wake option flags configuration sequences drivers/net/ethernet/microchip/lan743x_ethtool.c | 3 +-- drivers/net/ethernet/microchip/lan743x_main.c | 8 +++++++- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c b/drivers/net/ethernet/microchip/lan743x_ethtool.c index 4899582b3d1d..442c52aa0b0e 100644 --- a/drivers/net/ethernet/microchip/lan743x_ethtool.c +++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c @@ -1188,8 +1188,7 @@ static int lan743x_ethtool_set_wol(struct net_device *netdev, adapter->wolopts |= WAKE_PHY; if (wol->wolopts & WAKE_ARP) adapter->wolopts |= WAKE_ARP; - if (wol->wolopts & WAKE_MAGICSECURE && - wol->wolopts & WAKE_MAGIC) { + if (wol->wolopts & WAKE_MAGICSECURE) { memcpy(adapter->sopass, wol->sopass, sizeof(wol->sopass)); adapter->wolopts |= WAKE_MAGICSECURE; } else { diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c index 5641b466d70d..43e8e35fe9d0 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.c +++ b/drivers/net/ethernet/microchip/lan743x_main.c @@ -3639,9 +3639,15 @@ static void lan743x_pm_set_wol(struct lan743x_adapter *adapter) lan743x_csr_write(adapter, MAC_MP_SO_LO, sopass); sopass = *(u16 *)&adapter->sopass[4]; lan743x_csr_write(adapter, MAC_MP_SO_HI, sopass); - wucsr |= MAC_MP_SO_EN_; + wucsr |= MAC_MP_SO_EN_ | MAC_WUCSR_MPEN_; + macrx |= MAC_RX_RXEN_; + pmtctl |= PMT_CTL_WOL_EN_ | PMT_CTL_MAC_D3_RX_CLK_OVR_; } + if (adapter->wolopts & WAKE_MAGICSECURE && + adapter->wolopts & WAKE_MAGIC) + wucsr &= ~MAC_MP_SO_EN_; + lan743x_csr_write(adapter, MAC_WUCSR, wucsr); lan743x_csr_write(adapter, PMT_CTL, pmtctl); lan743x_csr_write(adapter, MAC_RX, macrx); -- 2.34.1
Allow WOL support if MAC supports it, even if the PHY does not support it Fixes: e9e13b6adc338 ("lan743x: fix for potential NULL pointer dereference with bare card") Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> --- Change List: ------------ - Change the "phy does not support WOL" print from netif_info() to netif_dbg() drivers/net/ethernet/microchip/lan743x_ethtool.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c b/drivers/net/ethernet/microchip/lan743x_ethtool.c index 8a6ae171e375..7509a19269c3 100644 --- a/drivers/net/ethernet/microchip/lan743x_ethtool.c +++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c @@ -1163,6 +1163,17 @@ static int lan743x_ethtool_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol) { struct lan743x_adapter *adapter = netdev_priv(netdev); + int ret; + + if (netdev->phydev) { + ret = phy_ethtool_set_wol(netdev->phydev, wol); + if (ret != -EOPNOTSUPP && ret != 0) + return ret; + + if (ret == -EOPNOTSUPP) + netif_dbg(adapter, drv, adapter->netdev, + "phy does not support WOL\n"); + } adapter->wolopts = 0; if (wol->wolopts & WAKE_UCAST) @@ -1187,8 +1198,7 @@ static int lan743x_ethtool_set_wol(struct net_device *netdev, device_set_wakeup_enable(&adapter->pdev->dev, (bool)wol->wolopts); - return netdev->phydev ? phy_ethtool_set_wol(netdev->phydev, wol) - : -ENETDOWN; + return 0; } #endif /* CONFIG_PM */ -- 2.34.1
This patch series implement the following fixes: 1. Disable WOL upon resume in order to restore full data path operation 2. Support WOL in MAC even when PHY does not Change List: ------------ V0 -> V1: - Include the missing maintainer's email ids in 'CC list - Remove the patch "Address problems with wake option flags configuration sequences" from this patch series. Will fix this in next patch series. - Variable "data" change from "int" to "unsigned int" - Change the "phy does not support WOL" print from netif_info() to netif_dbg() Raju Lakkaraju (2): net: lan743x: disable WOL upon resume to restore full data path operation net: lan743x: support WOL in MAC even when PHY does not .../net/ethernet/microchip/lan743x_ethtool.c | 14 +++++++++-- drivers/net/ethernet/microchip/lan743x_main.c | 24 ++++++++++++++++++- drivers/net/ethernet/microchip/lan743x_main.h | 24 +++++++++++++++++++ 3 files changed, 59 insertions(+), 3 deletions(-) -- 2.34.1
In order for datapath to be restored to normal functionality after resume we disable all wakeup events. Additionally we clear all W1C status bits by writing 1's to them. Fixes: 4d94282afd95 ("lan743x: Add power management support") Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> --- Change List: ------------ V0 -> V1: - Variable "data" change from "int" to "unsigned int" drivers/net/ethernet/microchip/lan743x_main.c | 24 ++++++++++++++++++- drivers/net/ethernet/microchip/lan743x_main.h | 24 +++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c index bd8aa83b47e5..385e9dcd8cd9 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.c +++ b/drivers/net/ethernet/microchip/lan743x_main.c @@ -3550,7 +3550,7 @@ static void lan743x_pm_set_wol(struct lan743x_adapter *adapter) /* clear wake settings */ pmtctl = lan743x_csr_read(adapter, PMT_CTL); - pmtctl |= PMT_CTL_WUPS_MASK_; + pmtctl |= PMT_CTL_WUPS_MASK_ | PMT_CTL_RES_CLR_WKP_MASK_; pmtctl &= ~(PMT_CTL_GPIO_WAKEUP_EN_ | PMT_CTL_EEE_WAKEUP_EN_ | PMT_CTL_WOL_EN_ | PMT_CTL_MAC_D3_RX_CLK_OVR_ | PMT_CTL_RX_FCT_RFE_D3_CLK_OVR_ | PMT_CTL_ETH_PHY_WAKE_EN_); @@ -3685,6 +3685,7 @@ static int lan743x_pm_resume(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct net_device *netdev = pci_get_drvdata(pdev); struct lan743x_adapter *adapter = netdev_priv(netdev); + u32 data; int ret; pci_set_power_state(pdev, PCI_D0); @@ -3715,6 +3716,27 @@ static int lan743x_pm_resume(struct device *dev) netif_info(adapter, drv, adapter->netdev, "Wakeup source : 0x%08X\n", ret); + /* Clear the wol configuration and status bits when system + * events occurs. + * The status bits are "Write One to Clear (W1C)" + */ + data = MAC_WUCSR_EEE_TX_WAKE_ | MAC_WUCSR_EEE_RX_WAKE_ | + MAC_WUCSR_RFE_WAKE_FR_ | MAC_WUCSR_PFDA_FR_ | MAC_WUCSR_WUFR_ | + MAC_WUCSR_MPR_ | MAC_WUCSR_BCAST_FR_; + lan743x_csr_write(adapter, MAC_WUCSR, data); + + data = MAC_WUCSR2_NS_RCD_ | MAC_WUCSR2_ARP_RCD_ | + MAC_WUCSR2_IPV6_TCPSYN_RCD_ | MAC_WUCSR2_IPV4_TCPSYN_RCD_; + lan743x_csr_write(adapter, MAC_WUCSR2, data); + + data = MAC_WK_SRC_ETH_PHY_WK_ | MAC_WK_SRC_IPV6_TCPSYN_RCD_WK_ | + MAC_WK_SRC_IPV4_TCPSYN_RCD_WK_ | MAC_WK_SRC_EEE_TX_WK_ | + MAC_WK_SRC_EEE_RX_WK_ | MAC_WK_SRC_RFE_FR_WK_ | + MAC_WK_SRC_PFDA_FR_WK_ | MAC_WK_SRC_MP_FR_WK_ | + MAC_WK_SRC_BCAST_FR_WK_ | MAC_WK_SRC_WU_FR_WK_ | + MAC_WK_SRC_WK_FR_SAVED_; + lan743x_csr_write(adapter, MAC_WK_SRC, data); + return 0; } diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h index be79cb0ae5af..77fc3abc1428 100644 --- a/drivers/net/ethernet/microchip/lan743x_main.h +++ b/drivers/net/ethernet/microchip/lan743x_main.h @@ -60,6 +60,7 @@ #define PMT_CTL_RX_FCT_RFE_D3_CLK_OVR_ BIT(18) #define PMT_CTL_GPIO_WAKEUP_EN_ BIT(15) #define PMT_CTL_EEE_WAKEUP_EN_ BIT(13) +#define PMT_CTL_RES_CLR_WKP_MASK_ GENMASK(9, 8) #define PMT_CTL_READY_ BIT(7) #define PMT_CTL_ETH_PHY_RST_ BIT(4) #define PMT_CTL_WOL_EN_ BIT(3) @@ -226,12 +227,31 @@ #define MAC_WUCSR (0x140) #define MAC_MP_SO_EN_ BIT(21) #define MAC_WUCSR_RFE_WAKE_EN_ BIT(14) +#define MAC_WUCSR_EEE_TX_WAKE_ BIT(13) +#define MAC_WUCSR_EEE_RX_WAKE_ BIT(11) +#define MAC_WUCSR_RFE_WAKE_FR_ BIT(9) +#define MAC_WUCSR_PFDA_FR_ BIT(7) +#define MAC_WUCSR_WUFR_ BIT(6) +#define MAC_WUCSR_MPR_ BIT(5) +#define MAC_WUCSR_BCAST_FR_ BIT(4) #define MAC_WUCSR_PFDA_EN_ BIT(3) #define MAC_WUCSR_WAKE_EN_ BIT(2) #define MAC_WUCSR_MPEN_ BIT(1) #define MAC_WUCSR_BCST_EN_ BIT(0) #define MAC_WK_SRC (0x144) +#define MAC_WK_SRC_ETH_PHY_WK_ BIT(17) +#define MAC_WK_SRC_IPV6_TCPSYN_RCD_WK_ BIT(16) +#define MAC_WK_SRC_IPV4_TCPSYN_RCD_WK_ BIT(15) +#define MAC_WK_SRC_EEE_TX_WK_ BIT(14) +#define MAC_WK_SRC_EEE_RX_WK_ BIT(13) +#define MAC_WK_SRC_RFE_FR_WK_ BIT(12) +#define MAC_WK_SRC_PFDA_FR_WK_ BIT(11) +#define MAC_WK_SRC_MP_FR_WK_ BIT(10) +#define MAC_WK_SRC_BCAST_FR_WK_ BIT(9) +#define MAC_WK_SRC_WU_FR_WK_ BIT(8) +#define MAC_WK_SRC_WK_FR_SAVED_ BIT(7) + #define MAC_MP_SO_HI (0x148) #define MAC_MP_SO_LO (0x14C) @@ -294,6 +314,10 @@ #define RFE_INDX(index) (0x580 + (index << 2)) #define MAC_WUCSR2 (0x600) +#define MAC_WUCSR2_NS_RCD_ BIT(7) +#define MAC_WUCSR2_ARP_RCD_ BIT(6) +#define MAC_WUCSR2_IPV6_TCPSYN_RCD_ BIT(5) +#define MAC_WUCSR2_IPV4_TCPSYN_RCD_ BIT(4) #define SGMII_ACC (0x720) #define SGMII_ACC_SGMII_BZY_ BIT(31) -- 2.34.1
On 19/03/2024 05:20, Marek Vasut wrote:
> CYW43439 is a Wi-Fi + Bluetooth combo device from Infineon.
> The Bluetooth part is capable of Bluetooth 5.2 BR/EDR/LE .
> This chip is present e.g. on muRata 1YN module.
>
> Extend the binding with its DT compatible using fallback
> compatible string to "brcm,bcm4329-bt" which seems to be
> the oldest compatible device. This should also prevent the
> growth of compatible string tables in drivers. The existing
> block of compatible strings is retained.
>
> Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Best regards,
Krzysztof
On 3/18/24 04:35, Hangbin Liu wrote: > On Sun, Mar 17, 2024 at 12:19:12AM +0100, Mirsad Todorovac wrote: >> Hi, >> >> While running kselftest on vanilla torvalds tree kernel commit v6.8-11167-g4438a810f396, >> the test suite reported a number of errors. >> >> I was using the latest iproute2-next suite on an Ubuntu 22.04 LTS box. >> >> # Tests passed: 558 >> # Tests failed: 84 >> not ok 90 selftests: net: test_vxlan_mdb.sh # exit=1 > > FYI, I tested with 6.8 kernel with net tree. All passed. > > Data path: MDB torture test - IPv6 overlay / IPv6 underlay > ---------------------------------------------------------- > TEST: Torture test [ OK ] > > Tests passed: 642 > Tests failed: 0 > > # uname -r > 6.8.0-virtme > > Thanks > Hangbin Hi, Hangbin, I am running an Ubuntu 22.04 LTS configuration. I tried it again with 6.8.0-net-next-05204-g237bb5f7f7f5 and iproute2-next from this repo: https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git RESULTS: # Data path: MDB torture test - IPv6 overlay / IPv6 underlay # ---------------------------------------------------------- # TEST: Torture test [ OK ] # # Tests passed: 558 # Tests failed: 84 not ok 90 selftests: net: test_vxlan_mdb.sh # exit=1 David said there may be something in sysctl.conf: # # /etc/sysctl.conf - Configuration file for setting system variables # See /etc/sysctl.d/ for additional system variables. # See sysctl.conf (5) for information. # #kernel.domainname = example.com # Uncomment the following to stop low-level messages on console #kernel.printk = 3 4 1 3 ################################################################### # Functions previously found in netbase # # Uncomment the next two lines to enable Spoof protection (reverse-path filter) # Turn on Source Address Verification in all interfaces to # prevent some spoofing attacks #net.ipv4.conf.default.rp_filter=1 #net.ipv4.conf.all.rp_filter=1 # Uncomment the next line to enable TCP/IP SYN cookies # See http://lwn.net/Articles/277146/ # Note: This may impact IPv6 TCP sessions too #net.ipv4.tcp_syncookies=1 # Uncomment the next line to enable packet forwarding for IPv4 net.ipv4.ip_forward=1 # Uncomment the next line to enable packet forwarding for IPv6 # Enabling this option disables Stateless Address Autoconfiguration # based on Router Advertisements for this host #net.ipv6.conf.all.forwarding=1 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.all.rp_filter = 0 ################################################################### # Additional settings - these settings can improve the network # security of the host and prevent against some network attacks # including spoofing attacks and man in the middle attacks through # redirection. Some network environments, however, require that these # settings are disabled so review and enable them as needed. # # Do not accept ICMP redirects (prevent MITM attacks) net.ipv4.conf.all.accept_redirects = 0 net.ipv6.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 net.ipv6.conf.default.accept_redirects = 0 # _or_ # Accept ICMP redirects only for gateways listed in our default # gateway list (enabled by default) # net.ipv4.conf.all.secure_redirects = 1 # # Do not send ICMP redirects (we are not a router) net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv4.icmp_ignore_bogus_error_responses = 1 # # Do not accept IP source route packets (we are not a router) net.ipv4.conf.all.accept_source_route = 0 net.ipv6.conf.all.accept_source_route = 0 # # Log Martian Packets net.ipv4.conf.all.log_martians = 0 net.ipv4.conf.default.log_martians = 0 # ################################################################### # Magic system request Key # 0=disable, 1=enable all, >1 bitmask of sysrq functions # See https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html # for what other values do #kernel.sysrq=438 Best regards, Mirsad
[-- Attachment #1: Type: text/plain, Size: 7608 bytes --] On 3/18/24 15:42, David Ahern wrote: > On 3/17/24 8:38 PM, Hangbin Liu wrote: >> Wild guess, the last change of icmp_redirect is my netns update. Maybe >> there are something default sysctl settings in netns cause the error? > > It is most likely sysctl settings. It would be good to chase those down > and make sure we have the script setting them. > > Mirsad: What OS are you testing with? That script has a verbose option > (-v) to get more output like the commands run and pause-on-fail (-p) to > manually debug at that point. Hi, David, I am running an Ubuntu 22.04 LTS box, with the iptools upgraded to iproute2-next tree of March 15th, on the torvalds tree. Right now I have tried it against the net-next tree, and I get the same result: # timeout set to 3600 # selftests: net: icmp_redirect.sh # # ########################################################################### # Legacy routing # ########################################################################### # # TEST: IPv4: redirect exception [FAIL] # TEST: IPv6: redirect exception [ OK ] # TEST: IPv4: redirect exception plus mtu [FAIL] # TEST: IPv6: redirect exception plus mtu [ OK ] # TEST: IPv4: routing reset [ OK ] # TEST: IPv6: routing reset [ OK ] # TEST: IPv4: mtu exception [ OK ] # TEST: IPv6: mtu exception [ OK ] # TEST: IPv4: mtu exception plus redirect [FAIL] # TEST: IPv6: mtu exception plus redirect [ OK ] # # ########################################################################### # Legacy routing with VRF # ########################################################################### # # TEST: IPv4: redirect exception [FAIL] # TEST: IPv6: redirect exception [ OK ] # TEST: IPv4: redirect exception plus mtu [FAIL] # TEST: IPv6: redirect exception plus mtu [ OK ] # TEST: IPv4: routing reset [ OK ] # TEST: IPv6: routing reset [ OK ] # TEST: IPv4: mtu exception [ OK ] # TEST: IPv6: mtu exception [ OK ] # TEST: IPv4: mtu exception plus redirect [FAIL] # TEST: IPv6: mtu exception plus redirect [ OK ] # # ########################################################################### # Routing with nexthop objects # ########################################################################### # # TEST: IPv4: redirect exception [FAIL] # TEST: IPv6: redirect exception [ OK ] # TEST: IPv4: redirect exception plus mtu [FAIL] # TEST: IPv6: redirect exception plus mtu [ OK ] # TEST: IPv4: routing reset [ OK ] # TEST: IPv6: routing reset [ OK ] # TEST: IPv4: mtu exception [ OK ] # TEST: IPv6: mtu exception [ OK ] # TEST: IPv4: mtu exception plus redirect [FAIL] # TEST: IPv6: mtu exception plus redirect [ OK ] # # ########################################################################### # Routing with nexthop objects and VRF # ########################################################################### # # TEST: IPv4: redirect exception [FAIL] # TEST: IPv6: redirect exception [ OK ] # TEST: IPv4: redirect exception plus mtu [FAIL] # TEST: IPv6: redirect exception plus mtu [ OK ] # TEST: IPv4: routing reset [ OK ] # TEST: IPv6: routing reset [ OK ] # TEST: IPv4: mtu exception [ OK ] # TEST: IPv6: mtu exception [ OK ] # TEST: IPv4: mtu exception plus redirect [FAIL] # TEST: IPv6: mtu exception plus redirect [ OK ] # # Tests passed: 28 # Tests failed: 12 # Tests xfailed: 0 not ok 45 selftests: net: icmp_redirect.sh # exit=1 So, it is probably the sysctl you said, but I cannot tell which one. My /etc/sysctl.conf looks like this (I think something like Libreswan VPN required these to be redirects turned off): # # /etc/sysctl.conf - Configuration file for setting system variables # See /etc/sysctl.d/ for additional system variables. # See sysctl.conf (5) for information. # #kernel.domainname = example.com # Uncomment the following to stop low-level messages on console #kernel.printk = 3 4 1 3 ################################################################### # Functions previously found in netbase # # Uncomment the next two lines to enable Spoof protection (reverse-path filter) # Turn on Source Address Verification in all interfaces to # prevent some spoofing attacks #net.ipv4.conf.default.rp_filter=1 #net.ipv4.conf.all.rp_filter=1 # Uncomment the next line to enable TCP/IP SYN cookies # See http://lwn.net/Articles/277146/ # Note: This may impact IPv6 TCP sessions too #net.ipv4.tcp_syncookies=1 # Uncomment the next line to enable packet forwarding for IPv4 net.ipv4.ip_forward=1 # Uncomment the next line to enable packet forwarding for IPv6 # Enabling this option disables Stateless Address Autoconfiguration # based on Router Advertisements for this host #net.ipv6.conf.all.forwarding=1 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.all.rp_filter = 0 ################################################################### # Additional settings - these settings can improve the network # security of the host and prevent against some network attacks # including spoofing attacks and man in the middle attacks through # redirection. Some network environments, however, require that these # settings are disabled so review and enable them as needed. # # Do not accept ICMP redirects (prevent MITM attacks) net.ipv4.conf.all.accept_redirects = 0 net.ipv6.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 net.ipv6.conf.default.accept_redirects = 0 # _or_ # Accept ICMP redirects only for gateways listed in our default # gateway list (enabled by default) # net.ipv4.conf.all.secure_redirects = 1 # # Do not send ICMP redirects (we are not a router) net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv4.icmp_ignore_bogus_error_responses = 1 # # Do not accept IP source route packets (we are not a router) net.ipv4.conf.all.accept_source_route = 0 net.ipv6.conf.all.accept_source_route = 0 # # Log Martian Packets net.ipv4.conf.all.log_martians = 0 net.ipv4.conf.default.log_martians = 0 # ################################################################### # Magic system request Key # 0=disable, 1=enable all, >1 bitmask of sysrq functions # See https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html # for what other values do #kernel.sysrq=438 Thank you. Best regards, Mirsad [-- Attachment #2: sysctl-all.conf.xz --] [-- Type: application/x-xz, Size: 12432 bytes --]
CYW43439 is a Wi-Fi + Bluetooth combo device from Infineon. The Bluetooth part is capable of Bluetooth 5.2 BR/EDR/LE . This chip is present e.g. on muRata 1YN module. Extend the binding with its DT compatible using fallback compatible string to "brcm,bcm4329-bt" which seems to be the oldest compatible device. This should also prevent the growth of compatible string tables in drivers. The existing block of compatible strings is retained. Signed-off-by: Marek Vasut <marex@denx.de> --- Cc: "David S. Miller" <davem@davemloft.net> Cc: Conor Dooley <conor+dt@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: devicetree@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org --- V2: - Introduce fallback compatible string - Reword the second half of commit message to reflect that --- .../bindings/net/broadcom-bluetooth.yaml | 33 +++++++++++-------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/Documentation/devicetree/bindings/net/broadcom-bluetooth.yaml b/Documentation/devicetree/bindings/net/broadcom-bluetooth.yaml index cc70b00c6ce57..4a1bfc2b35849 100644 --- a/Documentation/devicetree/bindings/net/broadcom-bluetooth.yaml +++ b/Documentation/devicetree/bindings/net/broadcom-bluetooth.yaml @@ -14,20 +14,25 @@ description: properties: compatible: - enum: - - brcm,bcm20702a1 - - brcm,bcm4329-bt - - brcm,bcm4330-bt - - brcm,bcm4334-bt - - brcm,bcm43430a0-bt - - brcm,bcm43430a1-bt - - brcm,bcm43438-bt - - brcm,bcm4345c5 - - brcm,bcm43540-bt - - brcm,bcm4335a0 - - brcm,bcm4349-bt - - cypress,cyw4373a0-bt - - infineon,cyw55572-bt + oneOf: + - items: + - enum: + - infineon,cyw43439-bt + - const: brcm,bcm4329-bt + - enum: + - brcm,bcm20702a1 + - brcm,bcm4329-bt + - brcm,bcm4330-bt + - brcm,bcm4334-bt + - brcm,bcm43430a0-bt + - brcm,bcm43430a1-bt + - brcm,bcm43438-bt + - brcm,bcm4345c5 + - brcm,bcm43540-bt + - brcm,bcm4335a0 + - brcm,bcm4349-bt + - cypress,cyw4373a0-bt + - infineon,cyw55572-bt shutdown-gpios: maxItems: 1 -- 2.43.0