All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oza Oza <oza.oza at broadcom.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] PCI hotplug and SPDK
Date: Wed, 30 Aug 2017 20:40:34 +0530	[thread overview]
Message-ID: <e259775a033fe1eefe6b39e74f87c7c8@mail.gmail.com> (raw)
In-Reply-To: 47464713-CCA7-4AD5-B6A6-5D556A590FC9@intel.com

[-- Attachment #1: Type: text/plain, Size: 9816 bytes --]

root(a)bcm958742k:~# lspci

0001:00:00.0 PCI bridge: Broadcom Corporation Device 0000

0001:01:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data
Center SSD (rev 01)

0008:00:00.0 PCI bridge: Broadcom Corporation Device 16f0 (rev 01)

0008:01:00.0 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)

0008:01:00.1 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)

0008:01:00.2 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)

0008:01:00.3 Ethernet controller: Broadcom Corporation Device 16f0 (rev 01)



root(a)bcm958742k:~# /usr/share/spdk/scripts/setup.sh

0001:01:00.0 (8086 0953): nvme -> vfio-pci

grep: /usr/share/spdk/scripts/../include/spdk/pci_ids.h: No such[
1520.258498] pci 0008:00:00.0: PCI bridge to [bus 01]

file or directory

[ 1520.267436] pci 0008:00:00.0:   bridge window [mem 0x10000000-0x104fffff
64bit pref]

[ 1520.277225] pci 0000:00:00.0: ignoring class 0x020000 (doesn't match
header type 01)                 >> it is looking to unbind on empty slot as
well.

[ 1520.285324] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring

[ 1535.911738] Bad mode in Error handler detected on CPU7, code 0xbf000002
-- SError

[ 1535.919460] Internal error: Oops - bad mode: 0 [#1] SMP

[ 1535.924850] Modules linked in:

[ 1535.928001] CPU: 7 PID: 2108 Comm: lighttpd Not tainted
4.12.0-01624-gbbd4086-dirty #97

[ 1535.936257] Hardware name: Stingray Combo SVK w/PCIe IOMMU (BCM958742K)
(DT)

[ 1535.943527] task: ffff80a1642ce000 task.stack: ffff80a162790000

[ 1535.949634] PC is at 0xffff8baadca0

[ 1535.953230] LR is at 0x40aca8

[ 1535.956290] pc : [<0000ffff8baadca0>] lr : [<000000000040aca8>] pstate:
80000000

[ 1535.963919] sp : 0000ffffca61a970

[ 1535.967337] x29: 0000ffffca61a970 x28: 0000000000000000

[ 1535.972816] x27: 00000000283a9a00 x26: 0000000000000000

[ 1535.978296] x25: 000000000042a3a8 x24: 000000000042a3a0

[ 1535.983775] x23: 0000000000429000 x22: 000000000042a2b8

[ 1535.989254] x21: 000000000042a000 x20: 0000000058edd444

[ 1535.994734] x19: 00000000283a9010 x18: 0000000000000014

[ 1536.000213] x17: 0000ffff8baada88 x16: 0000000000442c08

[ 1536.005693] x15: 00002162cc000000 x14: 000bcd3d80000000

[ 1536.011173] x13: 00000001f4000000 x12: 0000000000000017

[ 1536.016653] x11: 0000000000061bf7 x10: 0000000058edd444

[ 1536.022131] x9 : 001dcd6500000000 x8 : 0000000000000016

[ 1536.027611] x7 : 000000000000dfda x6 : 0000ffff8bdc5000

[ 1536.033090] x5 : 0000000000000008 x4 : 0000000000000000

[ 1536.038570] x3 : 00000000000003e8 x2 : 0000000000000401

[ 1536.044049] x1 : 00000000283bd050 x0 : 0000000000000000

[ 1536.049529] Process lighttpd (pid: 2108, stack limit =
0xffff80a162790000)



Please find attached status script.



Regards,

Oza.

*From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Harris,
James R
*Sent:* Tuesday, August 29, 2017 11:30 PM
*To:* Storage Performance Development Kit
*Subject:* Re: [SPDK] PCI hotplug and SPDK



So to clarify, you have your system booted with no NVMe endpoint connected,
and then when you run the SPDK setup.sh script, you see all of these kernel
messages from trying to bind vfio to PCIe devices and system eventually
crashes?



If so, we need to determine what PCIe devices setup.sh is trying to bind to
vfio.  It should only be trying to bind NVMe devices but if there is no
NVMe device connected then it shouldn’t be trying to bind anything.



Can you send lspci –vvvx output from your system before running setup.sh?



Thanks,



-Jim



*From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Oza Oza <
oza.oza(a)broadcom.com>
*Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Date: *Tuesday, August 29, 2017 at 9:45 AM
*To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Subject: *Re: [SPDK] PCI hotplug and SPDK



In my opinion, this has nothing to do with platform. Though our platform is
ARMv8.

(but, I can not test on any other, because we don’t know how the kernel
driver is written)



If kernel driver supports hotplug, which means they are allowing
pci_create_root_bus irrespective of whether EP is plugged or not.

In other words. Following APIs are never called.

pci_stop_root_bus(bus);

pci_remove_root_bus(bus);



and in that case, if PCIe slots is empty, running SPDK resulting in stalls.
(10-15 seconds) followed by crash.



Regards,

Oza.

*From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Harris,
James R
*Sent:* Tuesday, August 29, 2017 6:20 PM
*To:* Storage Performance Development Kit
*Subject:* Re: [SPDK] PCI hotplug and SPDK



Hi Oza,



Do you see this issue only on your armv8 platform or do you also see it on
amd64?



-Jim





*From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Oza Oza <
oza.oza(a)broadcom.com>
*Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Date: *Tuesday, August 29, 2017 at 1:51 AM
*To: *Storage Performance Development Kit <spdk(a)lists.01.org>
*Subject: *Re: [SPDK] PCI hotplug and SPDK



Sorry If I was unclear.

I am not talking about hotplug feature of SPDK.





Ø  PCI hotplug feature is implemented in kernel driver and working fine.

Ø  But the moment I run SPDK and try to bind vfio driver it stalls
completely.

The reason is: kernel driver will not remove the root bus (when PCIe
endpoint is not connected, during boot-time)

So SPDK tries to bind driver thinking host bridge is there.

Without PCI hotplug host bridge will not be there because of following API
call in kernel driver.

pci_stop_root_bus(bus);

pci_remove_root_bus(bus);

Ø  since now we allow host bridge creation  (API: pci_create_root_bus)
irrespective of EP is connected or not.

And then if I run SPDK (with no Endpoint connected/Empty slot) I get stalls.



Regards,

Oza.

*From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Chang, Cunyin
*Sent:* Tuesday, August 29, 2017 2:14 PM
*To:* Storage Performance Development Kit
*Subject:* Re: [SPDK] PCI hotplug and SPDK



Hi Oza,



Could you please provide some details steps to reproduce the issue?

SPDK take in charge  for hotplug only after you bind the device to uio or
vfio driver,

so for the new insert deivce, it will handled by kernel driver first.



-Cunyin



*From:* SPDK [mailto:spdk-bounces(a)lists.01.org <spdk-bounces(a)lists.01.org>] *On
Behalf Of *Oza Oza
*Sent:* Tuesday, August 29, 2017 4:22 PM
*To:* Storage Performance Development Kit <spdk(a)lists.01.org>
*Subject:* [SPDK] PCI hotplug and SPDK



Hi All,



PCI hotplug support; requires creation of root bus and probe to go ahead
with all PCIe configuration.



Which means following APIs ae not called.

   pci_stop_root_bus(bus);

   pci_remove_root_bus(bus);





And then If I run SPDK, It makes system crash with following info.



Note: if the disk is connected then SPDK is fine.



Otherwise it stalls the system with following crash.



root(a)bcm958742k:~# echo 2048 > /proc/sys/vm/nr_hugepages;
/usr/share/spdk/scripts/setup.sh

grep: /usr/share/spdk/scripts/../include/spdk/pci_ids.h: No such[
34.621325] pci 0008:00:00.0: PCI bridge to [bus 01]

file or directory

[   34.640586] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring

[   50.267056] pci 0000:00:00.0: PCI bridge to [bus 01]

[   50.272337] pci 0001:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring

[   65.898762] pci 0001:00:00.0: PCI bridge to [bus 01]

[   65.904015] pci 0006:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring

[   81.530437] pci 0006:00:00.0: PCI bridge to [bus 01]

[   81.535680] pci 0007:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring

[   97.162103] pci 0007:00:00.0: PCI bridge to [bus 01]

[   97.167255] Bad mode in Error handler detected on CPU6, code 0xbf000002
-- SError

[   97.174974] Internal error: Oops - bad mode: 0 [#1] SMP

[   97.180364] Modules linked in:

[   97.183515] CPU: 6 PID: 2104 Comm: bash Not tainted
4.12.0-01560-gc83093d-dirty #89

[   97.191413] Hardware name: Stingray Combo SVK w/PCIe IOMMU (BCM958742K)
(DT)

[   97.198683] task: ffff80a163a40000 task.stack: ffff80a1612b4000

[   97.204790] PC is at 0xffff7cbdfba8

[   97.208387] LR is at 0xffff7cb8f288

[   97.211983] pc : [<0000ffff7cbdfba8>] lr : [<0000ffff7cb8f288>] pstate:
20000000

[   97.219612] sp : 0000fffffe564040

[   97.223029] x29: 0000fffffe564040 x28: 000000001054ce60

[   97.228509] x27: 0000000000000000 x26: 00000000004e2000

[   97.233989] x25: 00000000004e5000 x24: 0000000000000002

[   97.239468] x23: 0000ffff7cc63638 x22: 0000000000000002

[   97.244947] x21: 0000ffff7cc67480 x20: 000000001054db10

[   97.250427] x19: 0000000000000002 x18: 0000000000000000

[   97.255906] x17: 00000000004daac8 x16: 0000000000000000

[   97.261386] x15: 0000000000000096 x14: 0000000000000000

[   97.266865] x13: 0000000000000000 x12: 0000000000000000

[   97.272344] x11: 0000000000000020 x10: 0101010101010101

[   97.277824] x9 : ffffff80ffffffc8 x8 : 0000000000000040

[   97.283303] x7 : 0000000000000001 x6 : 0000ffff7cc669f0

[   97.288782] x5 : 0000000000015551 x4 : 0000000000000888

[   97.294261] x3 : 0000000000000000 x2 : 0000000000000002

[   97.299741] x1 : 000000001054db10 x0 : 0000000000000002

[   97.305220] Process bash (pid: 2104, stack limit = 0xffff80a1612b4000)

[   97.311960] ---[ end trace a1f48abe30820241 ]---



Regards,

Oza.

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 24252 bytes --]

[-- Attachment #3: setup.sh --]
[-- Type: application/octet-stream, Size: 6259 bytes --]

#!/usr/bin/env bash

set -e

rootdir=$(readlink -f $(dirname $0))/..

function linux_iter_pci {
	# Argument is the class code
	# TODO: More specifically match against only class codes in the grep
	# step.
	#lspci -D -mm -n | grep $1 | tr -d '"' | awk -F " " '{print $1}'
	lspci -mm -n | grep $1 | tr -d '"' | awk -F " " '{print $1}'
}

function linux_bind_driver() {
	bdf="$1"
	driver_name="$2"
	old_driver_name="no driver"
	ven_dev_id=$(lspci -n -s $bdf | cut -d' ' -f3 | sed 's/:/ /')

	if [ -e "/sys/bus/pci/devices/$bdf/driver" ]; then
		old_driver_name=$(basename $(readlink /sys/bus/pci/devices/$bdf/driver))

		if [ "$driver_name" = "$old_driver_name" ]; then
			return 0
		fi

		echo "$ven_dev_id" > "/sys/bus/pci/devices/$bdf/driver/remove_id" 2> /dev/null || true
		echo "$bdf" > "/sys/bus/pci/devices/$bdf/driver/unbind"
	fi

	echo "$bdf ($ven_dev_id): $old_driver_name -> $driver_name"

	echo "$ven_dev_id" > "/sys/bus/pci/drivers/$driver_name/new_id" 2> /dev/null || true
	echo "$bdf" > "/sys/bus/pci/drivers/$driver_name/bind" 2> /dev/null || true

	iommu_group=$(basename $(readlink -f /sys/bus/pci/devices/$bdf/iommu_group))
	if [ -e "/dev/vfio/$iommu_group" ]; then
		if [ "$username" != "" ]; then
			chown "$username" "/dev/vfio/$iommu_group"
		fi
	fi
}

function linux_hugetlbfs_mount() {
	mount | grep '^hugetlbfs ' | awk '{ print $3 }'
}

function configure_linux {
	driver_name=vfio-pci
	if [ -z "$(ls /sys/kernel/iommu_groups)" ]; then
		# No IOMMU. Use uio.
		driver_name=uio_pci_generic
	fi

	# NVMe
	if [ ! -d /sys/module/vfio_pci ]; then
		modprobe $driver_name || true
	fi

	for bdf in $(linux_iter_pci 0108); do
		linux_bind_driver "$bdf" "$driver_name"
	done


	# IOAT
	TMP=`mktemp`
	#collect all the device_id info of ioat devices.
	grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
	| awk -F"x" '{print $2}' > $TMP

	for dev_id in `cat $TMP`; do
		# Abuse linux_iter_pci by giving it a device ID instead of a class code
		for bdf in $(linux_iter_pci $dev_id); do
			linux_bind_driver "$bdf" "$driver_name"
		done
	done
	rm $TMP

	echo "1" > "/sys/bus/pci/rescan"

	hugetlbfs_mount=$(linux_hugetlbfs_mount)

	if [ -z "$hugetlbfs_mount" ]; then
		hugetlbfs_mount=/mnt/huge
		echo "Mounting hugetlbfs at $hugetlbfs_mount"
		mkdir -p "$hugetlbfs_mount"
		mount -t hugetlbfs nodev "$hugetlbfs_mount"
	fi
	echo "$NRHUGE" > /proc/sys/vm/nr_hugepages

	# increase the current memlock limit	
	ulimit -l unlimited

	if [ "$driver_name" = "vfio-pci" ]; then
		if [ "$username" != "" ]; then
			chown "$username" "$hugetlbfs_mount"
		fi

		MEMLOCK_AMNT=`ulimit -l`
		if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
			MEMLOCK_MB=$(( $MEMLOCK_AMNT / 1024 ))
			echo ""
			echo "Current user memlock limit: ${MEMLOCK_MB} MB"
			echo ""
			echo "This is the maximum amount of memory you will be"
			echo "able to use with DPDK and VFIO if run as current user."
			echo -n "To change this, please adjust limits.conf memlock "
			echo "limit for current user."

			if [ $MEMLOCK_AMNT -lt 65536 ] ; then
				echo ""
				echo "## WARNING: memlock limit is less than 64MB"
				echo -n "## DPDK with VFIO may not be able to initialize "
				echo "if run as current user."
			fi
		fi
	fi
}

function reset_linux {
	# NVMe
	modprobe nvme || true
	for bdf in $(linux_iter_pci 0108); do
		linux_bind_driver "$bdf" nvme
	done


	# IOAT
	TMP=`mktemp`
	#collect all the device_id info of ioat devices.
	grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
	| awk -F"x" '{print $2}' > $TMP

	modprobe ioatdma || true
	for dev_id in `cat $TMP`; do
		# Abuse linux_iter_pci by giving it a device ID instead of a class code
		for bdf in $(linux_iter_pci $dev_id); do
			linux_bind_driver "$bdf" ioatdma
		done
	done
	rm $TMP

	echo "1" > "/sys/bus/pci/rescan"

	hugetlbfs_mount=$(linux_hugetlbfs_mount)
	rm -f "$hugetlbfs_mount"/spdk*map_*
}

function status_linux {
	echo "NVMe devices"

	echo -e "BDF\t\tNuma Node\tDriver name\t\tDevice name"
	for bdf in $(linux_iter_pci 0108); do
		driver=`grep DRIVER /sys/bus/pci/devices/$bdf/uevent |awk -F"=" '{print $2}'`
		node=`cat /sys/bus/pci/devices/$bdf/numa_node`;
		if [ "$driver" = "nvme" ]; then
			name="\t"`ls /sys/bus/pci/devices/$bdf/nvme`;
		else
			name="-";
		fi
		echo -e "$bdf\t$node\t\t$driver\t\t$name";
	done

	echo "I/OAT DMA"

	#collect all the device_id info of ioat devices.
	TMP=`grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
	| awk -F"x" '{print $2}'`
	echo -e "BDF\t\tNuma Node\tDriver Name"
	for dev_id in $TMP; do
		# Abuse linux_iter_pci by giving it a device ID instead of a class code
		for bdf in $(linux_iter_pci $dev_id); do
			driver=`grep DRIVER /sys/bus/pci/devices/$bdf/uevent |awk -F"=" '{print $2}'`
			node=`cat /sys/bus/pci/devices/$bdf/numa_node`;
			echo -e "$bdf\t$node\t\t$driver"
		done
	done
}

function configure_freebsd {
	TMP=`mktemp`

	# NVMe
	GREP_STR="class=0x010802"

	# IOAT
	grep "PCI_DEVICE_ID_INTEL_IOAT" $rootdir/include/spdk/pci_ids.h \
	| awk -F"x" '{print $2}' > $TMP
	for dev_id in `cat $TMP`; do
		GREP_STR="${GREP_STR}\|chip=0x${dev_id}8086"
	done

	AWK_PROG="{if (count > 0) printf \",\"; printf \"%s:%s:%s\",\$2,\$3,\$4; count++}"
	echo $AWK_PROG > $TMP

	BDFS=`pciconf -l | grep "${GREP_STR}" | awk -F: -f $TMP`

	kldunload nic_uio.ko || true
	kenv hw.nic_uio.bdfs=$BDFS
	kldload nic_uio.ko
	rm $TMP

	kldunload contigmem.ko || true
	kenv hw.contigmem.num_buffers=$((NRHUGE * 2 / 256))
	kenv hw.contigmem.buffer_size=$((256 * 1024 * 1024))
	kldload contigmem.ko
}

function reset_freebsd {
	kldunload contigmem.ko || true
	kldunload nic_uio.ko || true
}

: ${NRHUGE:=1024}

username=$1
mode=$2

if [ "$username" = "reset" -o "$username" = "config" -o "$username" = "status" ]; then
	mode="$username"
	username=""
fi

if [ "$mode" == "" ]; then
	mode="config"
fi

if [ "$username" = "" ]; then
	username="$SUDO_USER"
	if [ "$username" = "" ]; then
		username=`logname 2>/dev/null` || true
	fi
fi

if [ `uname` = Linux ]; then
	if [ "$mode" == "config" ]; then
		configure_linux
	elif [ "$mode" == "reset" ]; then
		reset_linux
	elif [ "$mode" == "status" ]; then
		status_linux
	fi
else
	if [ "$mode" == "config" ]; then
		configure_freebsd
	elif [ "$mode" == "reset" ]; then
		reset_freebsd
	fi
fi

             reply	other threads:[~2017-08-30 15:10 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-30 15:10 Oza Oza [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-08-30 19:55 [SPDK] PCI hotplug and SPDK Luse, Paul E
2017-08-30 19:43 Oza Oza
2017-08-30 19:31 Luse, Paul E
2017-08-30 19:22 Oza Oza
2017-08-30 19:07 Oza Pawandeep
2017-08-30 19:03 Harris, James R
2017-08-30 18:58 Harris, James R
2017-08-30 18:51 Oza Oza
2017-08-30 18:48 Oza Oza
2017-08-30 18:39 Harris, James R
2017-08-30 18:17 Oza Pawandeep
2017-08-30 15:55 Walker, Benjamin
2017-08-30 15:30 Harris, James R
2017-08-30 15:20 Oza Oza
2017-08-29 18:00 Harris, James R
2017-08-29 16:45 Oza Oza
2017-08-29 12:50 Harris, James R
2017-08-29  8:51 Oza Oza
2017-08-29  8:43 Chang, Cunyin
2017-08-29  8:21 Oza Oza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e259775a033fe1eefe6b39e74f87c7c8@mail.gmail.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.