netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode
       [not found] <CACJMemXjp6F0KzzAfR8yR4s5BU8zJBpsXmF0LWu3ubmF8Kke3Q@mail.gmail.com>
@ 2021-05-02  6:21 ` Leon Romanovsky
  2021-05-02  7:33   ` Roi Dayan
  0 siblings, 1 reply; 2+ messages in thread
From: Leon Romanovsky @ 2021-05-02  6:21 UTC (permalink / raw)
  To: Dennis Afanasev, Vlad Buslov, Dmytro Linkin, Roi Dayan
  Cc: saeedm, netdev, linux-rdma

Thanks for the report.

+ more people.

On Fri, Apr 30, 2021 at 04:56:17PM -0400, Dennis Afanasev wrote:
> Dear Saeed and Leo,
> I am reporting a bug in the mlx5_core driver discovered by our team at
> Stateless while setting up SRIOV devices in eswitch mode. Below are the
> details and relevant files that relate to the bug. Please reach out to me
> if I can provide any further information.
> 
>    1.
> 
>    Description of problem: When creating SRIOV devices off physical mlx5
>    PCIe devices and then putting the physical devices into switchdev mode,
>    adding a new VRF device with a default route will cause the mlx5_core
>    driver to segfault (replicate_bug1.sh). In addition, attempting to set the
>    physical devices to switchdev mode after adding a VRF with a default route
>    will cause the mlx5_core driver to segfault (replicate_bug2.sh). The seg
>    fault occurs in the function mlx5e_tc_tun_fib_event in both cases.
>    2.
> 
>    Keywords: mlx5, ml5x_core, mlx5e_tc_tun_fib_event, tc, netdev, 5.12-rc7
>    3.
> 
>    Kernel information: Linux version 5.12.0-rc7 (root@data) (gcc (Debian
>    10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP
>    4.
> 
>    Kernel config file: File attached - config-5.12.0-rc7
>    5.
> 
>    Oops message: Files attached - dmesg_output_bug1 and dmesg_output_bug2
>    6.
> 
>    Shell script to replicate: Files attached - replicate_bug1.sh and
>    replicate_bug2.sh
>    7.
> 
>    ver_linux output: File attached - ver_linux_output
>    8.
> 
>    Processor information: File attached - cpuinfo
>    9.
> 
>    Module information: File attached - modules
>    10.
> 
>    Loaded driver and hardware: Files attached - ioport and iomem
>    11.
> 
>    PCI information: File attached - pci_info
>    12.
> 
>    Other information - I hardcoded the values of the physical PCIe device
>    and the address of the created SRIOV device. This will have to be adjusted
>    depending on your machine.




> #!/bin/bash
> 
> set -euxETo pipefail
> 
> mst start
> 
> # (Hardcoded) These need to be modified based on the host machine
> nic1_port0="0000:5e:00.0"
> nic1_port1="0000:5e:00.1"
> 
> # Create 1 SRIOV device per NIC port
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port0/sriov_numvfs
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port1/sriov_numvfs
> 
> # The SRIOV devices are given these addresses
> nic1_port0_vf="0000:5e:00.2"
> nic1_port1_vf="0000:5e:00.4"
> 
> declare -ar PCIE_PHYSICAL_ADDRESSES=($nic1_port0 $nic1_port1)
> declare -ar PCIE_SRIOV_ADDRESSES=($nic1_port0_vf $nic1_port1_vf)
> 
> # Unbind the driver from the SRIOV, required to activate the eswitch
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
>   echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/unbind
> done
> 
> # Wait for the binds to disappear
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
>   declare sys_symlink_file="/sys/bus/pci/drivers/mlx5_core/${pcie_address}"
>   until [[ ! -h "${sys_symlink_file}" ]]; do
>     inotifywait --event delete_self --timeout 1 "${sys_symlink_file}" || true
>   done
> done
> sync --file-system /sys
> udevadm settle --timeout=30
> sleep 5
> 
> # Set the cards to 'switchdev'
> for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do
>   devlink dev eswitch set "pci/${pcie_address}" mode switchdev encap-mode basic
> done
> 
> # Wait for the cards to be in switchdev mode
> for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do
>   until [[ "$(devlink -j dev eswitch show "pci/${pcie_address}" |
>     jq --arg dev "pci/${pcie_address}" -r '.dev[$dev].mode' 2> /dev/null)" == "switchdev" ]]; do
>     sleep 1
>   done
> done
> sync --file-system /sys
> udevadm settle --timeout=30
> sleep 5
> 
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
>   echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/bind
> done
> 
> ip link set group default up
> ip link add vrf0 type vrf table 100
> 
> # This will crash the kernel
> ip route add table 100 unreachable default

> #!/bin/bash
> 
> set -euxETo pipefail
> 
> mst start
> 
> # Add the VRF device and a route
> ip link add vrf0 type vrf table 100
> ip route add table 100 unreachable default
> 
> # (Hardcoded) These need to be modified based on the host machine
> nic1_port0="0000:5e:00.0"
> nic1_port1="0000:5e:00.1"
> 
> # Create 1 SRIOV device per NIC port
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port0/sriov_numvfs
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port1/sriov_numvfs
> 
> # The SRIOV devices are given these addresses
> nic1_port0_vf="0000:5e:00.2"
> nic1_port1_vf="0000:5e:00.4"
> 
> declare -ar PCIE_PHYSICAL_ADDRESSES=($nic1_port0 $nic1_port1)
> declare -ar PCIE_SRIOV_ADDRESSES=($nic1_port0_vf $nic1_port1_vf)
> 
> # Unbind the driver from the SRIOV, required to activate the eswitch
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
>   echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/unbind
> done
> 
> # Wait for the binds to disappear
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
>   declare sys_symlink_file="/sys/bus/pci/drivers/mlx5_core/${pcie_address}"
>   until [[ ! -h "${sys_symlink_file}" ]]; do
>     inotifywait --event delete_self --timeout 1 "${sys_symlink_file}" || true
>   done
> done
> sync --file-system /sys
> udevadm settle --timeout=30
> 
> # set the cards to 'switchdev'
> for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do
>   # This will crash the kernel
>   devlink dev eswitch set "pci/${pcie_address}" mode switchdev encap-mode basic
> done








^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode
  2021-05-02  6:21 ` PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode Leon Romanovsky
@ 2021-05-02  7:33   ` Roi Dayan
  0 siblings, 0 replies; 2+ messages in thread
From: Roi Dayan @ 2021-05-02  7:33 UTC (permalink / raw)
  To: Leon Romanovsky, Dennis Afanasev, Vlad Buslov, Dmytro Linkin
  Cc: saeedm, netdev, linux-rdma



On 2021-05-02 9:21 AM, Leon Romanovsky wrote:
> Thanks for the report.
> 
> + more people.
> 
> On Fri, Apr 30, 2021 at 04:56:17PM -0400, Dennis Afanasev wrote:
>> Dear Saeed and Leo,
>> I am reporting a bug in the mlx5_core driver discovered by our team at
>> Stateless while setting up SRIOV devices in eswitch mode. Below are the
>> details and relevant files that relate to the bug. Please reach out to me
>> if I can provide any further information.


thanks. reproduced and added fix for review.
also added internal test with unreachable route to catch this if
happens again.

Thanks,
Roi

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-05-02  7:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CACJMemXjp6F0KzzAfR8yR4s5BU8zJBpsXmF0LWu3ubmF8Kke3Q@mail.gmail.com>
2021-05-02  6:21 ` PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode Leon Romanovsky
2021-05-02  7:33   ` Roi Dayan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).