From: Oza Oza <oza.oza at broadcom.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] PCI hotplug and SPDK
Date: Wed, 30 Aug 2017 20:40:34 +0530 [thread overview]
Message-ID: <e259775a033fe1eefe6b39e74f87c7c8@mail.gmail.com> (raw)
In-Reply-To: 47464713-CCA7-4AD5-B6A6-5D556A590FC9@intel.com
[-- Attachment #1: Type: text/plain, Size: 9816 bytes --]
root(a)bcm958742k:~# lspci
0001:00:00.0 PCI bridge: Broadcom Corporation Device 0000
0001:01:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data
Center SSD (rev 01)
0008:00:00.0 PCI bridge: Broadcom Corporation Device 16f0 (rev 01)
0008:01:00.0 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)
0008:01:00.1 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)
0008:01:00.2 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)
0008:01:00.3 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)
root(a)bcm958742k:~# /usr/share/spdk/scripts/setup.sh
0001:01:00.0 (8086 0953): nvme -> vfio-pci
grep: /usr/share/spdk/scripts/../include/spdk/pci_ids.h: No such[
1520.258498] pci 0008:00:00.0: PCI bridge to [bus 01]
file or directory
[ 1520.267436] pci 0008:00:00.0: bridge window [mem 0x10000000-0x104fffff
64bit pref]
[ 1520.277225] pci 0000:00:00.0: ignoring class 0x020000 (doesn't match
header type 01) >> it is looking to unbind on empty slot as
well.
[ 1520.285324] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 1535.911738] Bad mode in Error handler detected on CPU7, code 0xbf000002
-- SError
[ 1535.919460] Internal error: Oops - bad mode: 0 [#1] SMP
[ 1535.924850] Modules linked in:
[ 1535.928001] CPU: 7 PID: 2108 Comm: lighttpd Not tainted
4.12.0-01624-gbbd4086-dirty #97
[ 1535.936257] Hardware name: Stingray Combo SVK w/PCIe IOMMU (BCM958742K)
(DT)
[ 1535.943527] task: ffff80a1642ce000 task.stack: ffff80a162790000
[ 1535.949634] PC is at 0xffff8baadca0
[ 1535.953230] LR is at 0x40aca8
[ 1535.956290] pc : [<0000ffff8baadca0>] lr : [<000000000040aca8>] pstate:
80000000
[ 1535.963919] sp : 0000ffffca61a970
[ 1535.967337] x29: 0000ffffca61a970 x28: 0000000000000000
[ 1535.972816] x27: 00000000283a9a00 x26: 0000000000000000
[ 1535.978296] x25: 000000000042a3a8 x24: 000000000042a3a0
[ 1535.983775] x23: 0000000000429000 x22: 000000000042a2b8
[ 1535.989254] x21: 000000000042a000 x20: 0000000058edd444
[ 1535.994734] x19: 00000000283a9010 x18: 0000000000000014
[ 1536.000213] x17: 0000ffff8baada88 x16: 0000000000442c08
[ 1536.005693] x15: 00002162cc000000 x14: 000bcd3d80000000
[ 1536.011173] x13: 00000001f4000000 x12: 0000000000000017
[ 1536.016653] x11: 0000000000061bf7 x10: 0000000058edd444
[ 1536.022131] x9 : 001dcd6500000000 x8 : 0000000000000016
[ 1536.027611] x7 : 000000000000dfda x6 : 0000ffff8bdc5000
[ 1536.033090] x5 : 0000000000000008 x4 : 0000000000000000
[ 1536.038570] x3 : 00000000000003e8 x2 : 0000000000000401
[ 1536.044049] x1 : 00000000283bd050 x0 : 0000000000000000
[ 1536.049529] Process lighttpd (pid: 2108, stack limit =
0xffff80a162790000)
Please find attached status script.
Regards,
Oza.
*From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Harris,
James R
*Sent:* Tuesday, August 29, 2017 11:30 PM
*To:* Storage Performance Development Kit
*Subject:* Re: [SPDK] PCI hotplug and SPDK
So to clarify, you have your system booted with no NVMe endpoint connected,
and then when you run the SPDK setup.sh script, you see all of these kernel
messages from trying to bind vfio to PCIe devices and system eventually
crashes?
If so, we need to determine what PCIe devices setup.sh is trying to bind to
vfio. It should only be trying to bind NVMe devices but if there is no
NVMe device connected then it shouldn’t be trying to bind anything.
Can you send lspci –vvvx output from your system before running setup.sh?
Thanks,
-Jim
*From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Oza Oza <
oza.oza(a)broadcom.com>
*Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Date: *Tuesday, August 29, 2017 at 9:45 AM
*To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Subject: *Re: [SPDK] PCI hotplug and SPDK
In my opinion, this has nothing to do with platform. Though our platform is
ARMv8.
(but, I can not test on any other, because we don’t know how the kernel
driver is written)
If kernel driver supports hotplug, which means they are allowing
pci_create_root_bus irrespective of whether EP is plugged or not.
In other words. Following APIs are never called.
pci_stop_root_bus(bus);
pci_remove_root_bus(bus);
and in that case, if PCIe slots is empty, running SPDK resulting in stalls.
(10-15 seconds) followed by crash.
Regards,
Oza.
*From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Harris,
James R
*Sent:* Tuesday, August 29, 2017 6:20 PM
*To:* Storage Performance Development Kit
*Subject:* Re: [SPDK] PCI hotplug and SPDK
Hi Oza,
Do you see this issue only on your armv8 platform or do you also see it on
amd64?
-Jim
*From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Oza Oza <
oza.oza(a)broadcom.com>
*Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Date: *Tuesday, August 29, 2017 at 1:51 AM
*To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Subject: *Re: [SPDK] PCI hotplug and SPDK
Sorry If I was unclear.
I am not talking about hotplug feature of SPDK.
Ø PCI hotplug feature is implemented in kernel driver and working fine.
Ø But the moment I run SPDK and try to bind vfio driver it stalls
completely.
The reason is: kernel driver will not remove the root bus (when PCIe
endpoint is not connected, during boot-time)
So SPDK tries to bind driver thinking host bridge is there.
Without PCI hotplug host bridge will not be there because of following API
call in kernel driver.
pci_stop_root_bus(bus);
pci_remove_root_bus(bus);
Ø since now we allow host bridge creation (API: pci_create_root_bus)
irrespective of EP is connected or not.
And then if I run SPDK (with no Endpoint connected/Empty slot) I get stalls.
Regards,
Oza.
*From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Chang, Cunyin
*Sent:* Tuesday, August 29, 2017 2:14 PM
*To:* Storage Performance Development Kit
*Subject:* Re: [SPDK] PCI hotplug and SPDK
Hi Oza,
Could you please provide some details steps to reproduce the issue?
SPDK take in charge for hotplug only after you bind the device to uio or
vfio driver,
so for the new insert deivce, it will handled by kernel driver first.
-Cunyin
*From:* SPDK [mailto:spdk-bounces(a)lists.01.org <spdk-bounces(a)lists.01.org>] *On
Behalf Of *Oza Oza
*Sent:* Tuesday, August 29, 2017 4:22 PM
*To:* Storage Performance Development Kit <spdk(a)lists.01.org>
*Subject:* [SPDK] PCI hotplug and SPDK
Hi All,
PCI hotplug support; requires creation of root bus and probe to go ahead
with all PCIe configuration.
Which means following APIs ae not called.
pci_stop_root_bus(bus);
pci_remove_root_bus(bus);
And then If I run SPDK, It makes system crash with following info.
Note: if the disk is connected then SPDK is fine.
Otherwise it stalls the system with following crash.
root(a)bcm958742k:~# echo 2048 > /proc/sys/vm/nr_hugepages;
/usr/share/spdk/scripts/setup.sh
grep: /usr/share/spdk/scripts/../include/spdk/pci_ids.h: No such[
34.621325] pci 0008:00:00.0: PCI bridge to [bus 01]
file or directory
[ 34.640586] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 50.267056] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 50.272337] pci 0001:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 65.898762] pci 0001:00:00.0: PCI bridge to [bus 01]
[ 65.904015] pci 0006:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 81.530437] pci 0006:00:00.0: PCI bridge to [bus 01]
[ 81.535680] pci 0007:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[ 97.162103] pci 0007:00:00.0: PCI bridge to [bus 01]
[ 97.167255] Bad mode in Error handler detected on CPU6, code 0xbf000002
-- SError
[ 97.174974] Internal error: Oops - bad mode: 0 [#1] SMP
[ 97.180364] Modules linked in:
[ 97.183515] CPU: 6 PID: 2104 Comm: bash Not tainted
4.12.0-01560-gc83093d-dirty #89
[ 97.191413] Hardware name: Stingray Combo SVK w/PCIe IOMMU (BCM958742K)
(DT)
[ 97.198683] task: ffff80a163a40000 task.stack: ffff80a1612b4000
[ 97.204790] PC is at 0xffff7cbdfba8
[ 97.208387] LR is at 0xffff7cb8f288
[ 97.211983] pc : [<0000ffff7cbdfba8>] lr : [<0000ffff7cb8f288>] pstate:
20000000
[ 97.219612] sp : 0000fffffe564040
[ 97.223029] x29: 0000fffffe564040 x28: 000000001054ce60
[ 97.228509] x27: 0000000000000000 x26: 00000000004e2000
[ 97.233989] x25: 00000000004e5000 x24: 0000000000000002
[ 97.239468] x23: 0000ffff7cc63638 x22: 0000000000000002
[ 97.244947] x21: 0000ffff7cc67480 x20: 000000001054db10
[ 97.250427] x19: 0000000000000002 x18: 0000000000000000
[ 97.255906] x17: 00000000004daac8 x16: 0000000000000000
[ 97.261386] x15: 0000000000000096 x14: 0000000000000000
[ 97.266865] x13: 0000000000000000 x12: 0000000000000000
[ 97.272344] x11: 0000000000000020 x10: 0101010101010101
[ 97.277824] x9 : ffffff80ffffffc8 x8 : 0000000000000040
[ 97.283303] x7 : 0000000000000001 x6 : 0000ffff7cc669f0
[ 97.288782] x5 : 0000000000015551 x4 : 0000000000000888
[ 97.294261] x3 : 0000000000000000 x2 : 0000000000000002
[ 97.299741] x1 : 000000001054db10 x0 : 0000000000000002
[ 97.305220] Process bash (pid: 2104, stack limit = 0xffff80a1612b4000)
[ 97.311960] ---[ end trace a1f48abe30820241 ]---
Regards,
Oza.
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 24252 bytes --]
[-- Attachment #3: setup.sh --]
[-- Type: application/octet-stream, Size: 6259 bytes --]
#!/usr/bin/env bash
set -e
rootdir=$(readlink -f $(dirname $0))/..
function linux_iter_pci {
# Argument is the class code
# TODO: More specifically match against only class codes in the grep
# step.
#lspci -D -mm -n | grep $1 | tr -d '"' | awk -F " " '{print $1}'
lspci -mm -n | grep $1 | tr -d '"' | awk -F " " '{print $1}'
}
function linux_bind_driver() {
bdf="$1"
driver_name="$2"
old_driver_name="no driver"
ven_dev_id=$(lspci -n -s $bdf | cut -d' ' -f3 | sed 's/:/ /')
if [ -e "/sys/bus/pci/devices/$bdf/driver" ]; then
old_driver_name=$(basename $(readlink /sys/bus/pci/devices/$bdf/driver))
if [ "$driver_name" = "$old_driver_name" ]; then
return 0
fi
echo "$ven_dev_id" > "/sys/bus/pci/devices/$bdf/driver/remove_id" 2> /dev/null || true
echo "$bdf" > "/sys/bus/pci/devices/$bdf/driver/unbind"
fi
echo "$bdf ($ven_dev_id): $old_driver_name -> $driver_name"
echo "$ven_dev_id" > "/sys/bus/pci/drivers/$driver_name/new_id" 2> /dev/null || true
echo "$bdf" > "/sys/bus/pci/drivers/$driver_name/bind" 2> /dev/null || true
iommu_group=$(basename $(readlink -f /sys/bus/pci/devices/$bdf/iommu_group))
if [ -e "/dev/vfio/$iommu_group" ]; then
if [ "$username" != "" ]; then
chown "$username" "/dev/vfio/$iommu_group"
fi
fi
}
function linux_hugetlbfs_mount() {
mount | grep '^hugetlbfs ' | awk '{ print $3 }'
}
function configure_linux {
driver_name=vfio-pci
if [ -z "$(ls /sys/kernel/iommu_groups)" ]; then
# No IOMMU. Use uio.
driver_name=uio_pci_generic
fi
# NVMe
if [ ! -d /sys/module/vfio_pci ]; then
modprobe $driver_name || true
fi
for bdf in $(linux_iter_pci 0108); do
linux_bind_driver "$bdf" "$driver_name"
done
# IOAT
TMP=`mktemp`
#collect all the device_id info of ioat devices.
grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
| awk -F"x" '{print $2}' > $TMP
for dev_id in `cat $TMP`; do
# Abuse linux_iter_pci by giving it a device ID instead of a class code
for bdf in $(linux_iter_pci $dev_id); do
linux_bind_driver "$bdf" "$driver_name"
done
done
rm $TMP
echo "1" > "/sys/bus/pci/rescan"
hugetlbfs_mount=$(linux_hugetlbfs_mount)
if [ -z "$hugetlbfs_mount" ]; then
hugetlbfs_mount=/mnt/huge
echo "Mounting hugetlbfs at $hugetlbfs_mount"
mkdir -p "$hugetlbfs_mount"
mount -t hugetlbfs nodev "$hugetlbfs_mount"
fi
echo "$NRHUGE" > /proc/sys/vm/nr_hugepages
# increase the current memlock limit
ulimit -l unlimited
if [ "$driver_name" = "vfio-pci" ]; then
if [ "$username" != "" ]; then
chown "$username" "$hugetlbfs_mount"
fi
MEMLOCK_AMNT=`ulimit -l`
if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
MEMLOCK_MB=$(( $MEMLOCK_AMNT / 1024 ))
echo ""
echo "Current user memlock limit: ${MEMLOCK_MB} MB"
echo ""
echo "This is the maximum amount of memory you will be"
echo "able to use with DPDK and VFIO if run as current user."
echo -n "To change this, please adjust limits.conf memlock "
echo "limit for current user."
if [ $MEMLOCK_AMNT -lt 65536 ] ; then
echo ""
echo "## WARNING: memlock limit is less than 64MB"
echo -n "## DPDK with VFIO may not be able to initialize "
echo "if run as current user."
fi
fi
fi
}
function reset_linux {
# NVMe
modprobe nvme || true
for bdf in $(linux_iter_pci 0108); do
linux_bind_driver "$bdf" nvme
done
# IOAT
TMP=`mktemp`
#collect all the device_id info of ioat devices.
grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
| awk -F"x" '{print $2}' > $TMP
modprobe ioatdma || true
for dev_id in `cat $TMP`; do
# Abuse linux_iter_pci by giving it a device ID instead of a class code
for bdf in $(linux_iter_pci $dev_id); do
linux_bind_driver "$bdf" ioatdma
done
done
rm $TMP
echo "1" > "/sys/bus/pci/rescan"
hugetlbfs_mount=$(linux_hugetlbfs_mount)
rm -f "$hugetlbfs_mount"/spdk*map_*
}
function status_linux {
echo "NVMe devices"
echo -e "BDF\t\tNuma Node\tDriver name\t\tDevice name"
for bdf in $(linux_iter_pci 0108); do
driver=`grep DRIVER /sys/bus/pci/devices/$bdf/uevent |awk -F"=" '{print $2}'`
node=`cat /sys/bus/pci/devices/$bdf/numa_node`;
if [ "$driver" = "nvme" ]; then
name="\t"`ls /sys/bus/pci/devices/$bdf/nvme`;
else
name="-";
fi
echo -e "$bdf\t$node\t\t$driver\t\t$name";
done
echo "I/OAT DMA"
#collect all the device_id info of ioat devices.
TMP=`grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
| awk -F"x" '{print $2}'`
echo -e "BDF\t\tNuma Node\tDriver Name"
for dev_id in $TMP; do
# Abuse linux_iter_pci by giving it a device ID instead of a class code
for bdf in $(linux_iter_pci $dev_id); do
driver=`grep DRIVER /sys/bus/pci/devices/$bdf/uevent |awk -F"=" '{print $2}'`
node=`cat /sys/bus/pci/devices/$bdf/numa_node`;
echo -e "$bdf\t$node\t\t$driver"
done
done
}
function configure_freebsd {
TMP=`mktemp`
# NVMe
GREP_STR="class=0x010802"
# IOAT
grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
| awk -F"x" '{print $2}' > $TMP
for dev_id in `cat $TMP`; do
GREP_STR="${GREP_STR}\|chip=0x${dev_id}8086"
done
AWK_PROG="{if (count > 0) printf \",\"; printf \"%s:%s:%s\",\$2,\$3,\$4; count++}"
echo $AWK_PROG > $TMP
BDFS=`pciconf -l | grep "${GREP_STR}" | awk -F: -f $TMP`
kldunload nic_uio.ko || true
kenv hw.nic_uio.bdfs=$BDFS
kldload nic_uio.ko
rm $TMP
kldunload contigmem.ko || true
kenv hw.contigmem.num_buffers=$((NRHUGE * 2 / 256))
kenv hw.contigmem.buffer_size=$((256 * 1024 * 1024))
kldload contigmem.ko
}
function reset_freebsd {
kldunload contigmem.ko || true
kldunload nic_uio.ko || true
}
: ${NRHUGE:=1024}
username=$1
mode=$2
if [ "$username" = "reset" -o "$username" = "config" -o "$username" = "status" ]; then
mode="$username"
username=""
fi
if [ "$mode" == "" ]; then
mode="config"
fi
if [ "$username" = "" ]; then
username="$SUDO_USER"
if [ "$username" = "" ]; then
username=`logname 2>/dev/null` || true
fi
fi
if [ `uname` = Linux ]; then
if [ "$mode" == "config" ]; then
configure_linux
elif [ "$mode" == "reset" ]; then
reset_linux
elif [ "$mode" == "status" ]; then
status_linux
fi
else
if [ "$mode" == "config" ]; then
configure_freebsd
elif [ "$mode" == "reset" ]; then
reset_freebsd
fi
fi
next reply other threads:[~2017-08-30 15:10 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-30 15:10 Oza Oza [this message]
-- strict thread matches above, loose matches on Subject: below --
2017-08-30 19:55 [SPDK] PCI hotplug and SPDK Luse, Paul E
2017-08-30 19:43 Oza Oza
2017-08-30 19:31 Luse, Paul E
2017-08-30 19:22 Oza Oza
2017-08-30 19:07 Oza Pawandeep
2017-08-30 19:03 Harris, James R
2017-08-30 18:58 Harris, James R
2017-08-30 18:51 Oza Oza
2017-08-30 18:48 Oza Oza
2017-08-30 18:39 Harris, James R
2017-08-30 18:17 Oza Pawandeep
2017-08-30 15:55 Walker, Benjamin
2017-08-30 15:30 Harris, James R
2017-08-30 15:20 Oza Oza
2017-08-29 18:00 Harris, James R
2017-08-29 16:45 Oza Oza
2017-08-29 12:50 Harris, James R
2017-08-29 8:51 Oza Oza
2017-08-29 8:43 Chang, Cunyin
2017-08-29 8:21 Oza Oza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e259775a033fe1eefe6b39e74f87c7c8@mail.gmail.com \
--to=spdk@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.