* Re: PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode
[not found] <CACJMemXjp6F0KzzAfR8yR4s5BU8zJBpsXmF0LWu3ubmF8Kke3Q@mail.gmail.com>
@ 2021-05-02 6:21 ` Leon Romanovsky
2021-05-02 7:33 ` Roi Dayan
0 siblings, 1 reply; 2+ messages in thread
From: Leon Romanovsky @ 2021-05-02 6:21 UTC (permalink / raw)
To: Dennis Afanasev, Vlad Buslov, Dmytro Linkin, Roi Dayan
Cc: saeedm, netdev, linux-rdma
Thanks for the report.
+ more people.
On Fri, Apr 30, 2021 at 04:56:17PM -0400, Dennis Afanasev wrote:
> Dear Saeed and Leo,
> I am reporting a bug in the mlx5_core driver discovered by our team at
> Stateless while setting up SRIOV devices in eswitch mode. Below are the
> details and relevant files that relate to the bug. Please reach out to me
> if I can provide any further information.
>
> 1.
>
> Description of problem: When creating SRIOV devices off physical mlx5
> PCIe devices and then putting the physical devices into switchdev mode,
> adding a new VRF device with a default route will cause the mlx5_core
> driver to segfault (replicate_bug1.sh). In addition, attempting to set the
> physical devices to switchdev mode after adding a VRF with a default route
> will cause the mlx5_core driver to segfault (replicate_bug2.sh). The seg
> fault occurs in the function mlx5e_tc_tun_fib_event in both cases.
> 2.
>
> Keywords: mlx5, ml5x_core, mlx5e_tc_tun_fib_event, tc, netdev, 5.12-rc7
> 3.
>
> Kernel information: Linux version 5.12.0-rc7 (root@data) (gcc (Debian
> 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP
> 4.
>
> Kernel config file: File attached - config-5.12.0-rc7
> 5.
>
> Oops message: Files attached - dmesg_output_bug1 and dmesg_output_bug2
> 6.
>
> Shell script to replicate: Files attached - replicate_bug1.sh and
> replicate_bug2.sh
> 7.
>
> ver_linux output: File attached - ver_linux_output
> 8.
>
> Processor information: File attached - cpuinfo
> 9.
>
> Module information: File attached - modules
> 10.
>
> Loaded driver and hardware: Files attached - ioport and iomem
> 11.
>
> PCI information: File attached - pci_info
> 12.
>
> Other information - I hardcoded the values of the physical PCIe device
> and the address of the created SRIOV device. This will have to be adjusted
> depending on your machine.
> #!/bin/bash
>
> set -euxETo pipefail
>
> mst start
>
> # (Hardcoded) These need to be modified based on the host machine
> nic1_port0="0000:5e:00.0"
> nic1_port1="0000:5e:00.1"
>
> # Create 1 SRIOV device per NIC port
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port0/sriov_numvfs
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port1/sriov_numvfs
>
> # The SRIOV devices are given these addresses
> nic1_port0_vf="0000:5e:00.2"
> nic1_port1_vf="0000:5e:00.4"
>
> declare -ar PCIE_PHYSICAL_ADDRESSES=($nic1_port0 $nic1_port1)
> declare -ar PCIE_SRIOV_ADDRESSES=($nic1_port0_vf $nic1_port1_vf)
>
> # Unbind the driver from the SRIOV, required to activate the eswitch
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
> echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/unbind
> done
>
> # Wait for the binds to disappear
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
> declare sys_symlink_file="/sys/bus/pci/drivers/mlx5_core/${pcie_address}"
> until [[ ! -h "${sys_symlink_file}" ]]; do
> inotifywait --event delete_self --timeout 1 "${sys_symlink_file}" || true
> done
> done
> sync --file-system /sys
> udevadm settle --timeout=30
> sleep 5
>
> # Set the cards to 'switchdev'
> for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do
> devlink dev eswitch set "pci/${pcie_address}" mode switchdev encap-mode basic
> done
>
> # Wait for the cards to be in switchdev mode
> for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do
> until [[ "$(devlink -j dev eswitch show "pci/${pcie_address}" |
> jq --arg dev "pci/${pcie_address}" -r '.dev[$dev].mode' 2> /dev/null)" == "switchdev" ]]; do
> sleep 1
> done
> done
> sync --file-system /sys
> udevadm settle --timeout=30
> sleep 5
>
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
> echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/bind
> done
>
> ip link set group default up
> ip link add vrf0 type vrf table 100
>
> # This will crash the kernel
> ip route add table 100 unreachable default
> #!/bin/bash
>
> set -euxETo pipefail
>
> mst start
>
> # Add the VRF device and a route
> ip link add vrf0 type vrf table 100
> ip route add table 100 unreachable default
>
> # (Hardcoded) These need to be modified based on the host machine
> nic1_port0="0000:5e:00.0"
> nic1_port1="0000:5e:00.1"
>
> # Create 1 SRIOV device per NIC port
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port0/sriov_numvfs
> echo 1 > /sys/bus/pci/drivers/mlx5_core/$nic1_port1/sriov_numvfs
>
> # The SRIOV devices are given these addresses
> nic1_port0_vf="0000:5e:00.2"
> nic1_port1_vf="0000:5e:00.4"
>
> declare -ar PCIE_PHYSICAL_ADDRESSES=($nic1_port0 $nic1_port1)
> declare -ar PCIE_SRIOV_ADDRESSES=($nic1_port0_vf $nic1_port1_vf)
>
> # Unbind the driver from the SRIOV, required to activate the eswitch
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
> echo "${pcie_address}" > /sys/bus/pci/drivers/mlx5_core/unbind
> done
>
> # Wait for the binds to disappear
> for pcie_address in "${PCIE_SRIOV_ADDRESSES[@]}"; do
> declare sys_symlink_file="/sys/bus/pci/drivers/mlx5_core/${pcie_address}"
> until [[ ! -h "${sys_symlink_file}" ]]; do
> inotifywait --event delete_self --timeout 1 "${sys_symlink_file}" || true
> done
> done
> sync --file-system /sys
> udevadm settle --timeout=30
>
> # set the cards to 'switchdev'
> for pcie_address in "${PCIE_PHYSICAL_ADDRESSES[@]}"; do
> # This will crash the kernel
> devlink dev eswitch set "pci/${pcie_address}" mode switchdev encap-mode basic
> done
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode
2021-05-02 6:21 ` PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode Leon Romanovsky
@ 2021-05-02 7:33 ` Roi Dayan
0 siblings, 0 replies; 2+ messages in thread
From: Roi Dayan @ 2021-05-02 7:33 UTC (permalink / raw)
To: Leon Romanovsky, Dennis Afanasev, Vlad Buslov, Dmytro Linkin
Cc: saeedm, netdev, linux-rdma
On 2021-05-02 9:21 AM, Leon Romanovsky wrote:
> Thanks for the report.
>
> + more people.
>
> On Fri, Apr 30, 2021 at 04:56:17PM -0400, Dennis Afanasev wrote:
>> Dear Saeed and Leo,
>> I am reporting a bug in the mlx5_core driver discovered by our team at
>> Stateless while setting up SRIOV devices in eswitch mode. Below are the
>> details and relevant files that relate to the bug. Please reach out to me
>> if I can provide any further information.
thanks. reproduced and added fix for review.
also added internal test with unreachable route to catch this if
happens again.
Thanks,
Roi
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-05-02 7:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CACJMemXjp6F0KzzAfR8yR4s5BU8zJBpsXmF0LWu3ubmF8Kke3Q@mail.gmail.com>
2021-05-02 6:21 ` PROBLEM: mlx5_core driver crashes when a VRF device with a route is added with mlx5 devices in switchdev mode Leon Romanovsky
2021-05-02 7:33 ` Roi Dayan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).