All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/6] driver-core: add asynch probe support
@ 2014-09-05  6:37 Luis R. Rodriguez
  2014-09-05  6:37 ` [RFC v2 1/6] driver-core: generalize freeing driver private member Luis R. Rodriguez
                   ` (6 more replies)
  0 siblings, 7 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Here's a complete reimplementation of asynch loading support, it
discards completely the hippie / pipe dream idea that we need asynch
loading of modules / subsystems in general and just addresses running
probe asynchronously. This respin is based on Tejun's recommendation
on how to treat the probe asynchronously, we avoid async_schedule()
completely and just peg a struct work_struct on the driver private
structure. This obviously also means we have to flush_work() before
the driver's own remove() is called, we do that too.

Tejun's concerns on this regressing some driver's scripts which expect
the device to be available after loading remains valid, and the only
thing we can do to help there is to annotate the expecations on the
use of this "feature" to driver users. Scripts should be not be relying
on the driver init anyway so that type of usage should be phased out
and they should be hunting in udev for things popping up.

I'm a bit concerned about this actually regressing load time on
drivers that use this though instead of just having the module
probe run off of finit_module() though. Even with a kthread alternative
at least Santosh (Cc'd) has noted a regression in terms of time it
takes to complete probe on cxgb4. I'll eventually get your exact
numbers, but for now its an obvious regression *with* kthreads,
this solution goes with:

queue_work(system_unbound_wq, async_probe_work)

This is surely going to make things even worse... We could
use system_highpri_wq, or change the scheduling priority, but
for that I'd prefer to get feedback and someone to decide what
the right choice (TM) should be.

It is very important to highlight that async probe was added here
in light of issues found on *two* domains now that creeped up in
parallel:

0) some built-in drivers delaying init
1) systemd 30 second timeout

I have been exchanging some e-mails with Tetsuo about his
original proposed work around that started this work when the
systemd 30 second timeout issue creeped up first, this series
includes a slightly modified version of that work around
which should address the sigkill even without 786235ee merged.
There may be others -- and that needs to be witch hunted.
It would also now safely allow us to find drivers that run
over the limit without killing systems / modules. I think that's
probably the best thing to do for now -- as we sweep through and
find these, we could eventually nuke the WARN_ONCE() and completely
listen to the kill. For now its just causing more problems than
solving anything, but its a good reflection of balance of desires
and design between userspace / kernelspace.

Luis R. Rodriguez (6):
  driver-core: generalize freeing driver private member
  driver-core: add driver async_probe support
  kthread: warn on kill signal if not OOM
  cxgb4: use async probe
  mptsas: use async probe
  pata_marvell: use async probe

 drivers/ata/pata_marvell.c                      |  1 +
 drivers/base/base.h                             |  6 +++
 drivers/base/bus.c                              | 72 ++++++++++++++++++++++---
 drivers/base/dd.c                               |  4 ++
 drivers/message/fusion/mptsas.c                 |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  1 +
 include/linux/device.h                          |  5 ++
 kernel/kmod.c                                   | 21 +++++++-
 kernel/kthread.c                                | 19 +++++++
 9 files changed, 122 insertions(+), 8 deletions(-)

-- 
2.0.3


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [RFC v2 1/6] driver-core: generalize freeing driver private member
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
@ 2014-09-05  6:37 ` Luis R. Rodriguez
  2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This will be used later.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/base/bus.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83e910a..a5f41e4 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -657,6 +657,15 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf,
 }
 static DRIVER_ATTR_WO(uevent);
 
+static void remove_driver_private(struct device_driver *drv)
+{
+	struct driver_private *priv = drv->p;
+
+	kobject_put(&priv->kobj);
+	kfree(priv);
+	drv->p = NULL;
+}
+
 /**
  * bus_add_driver - Add a driver to the bus.
  * @drv: driver.
@@ -719,9 +728,7 @@ int bus_add_driver(struct device_driver *drv)
 	return 0;
 
 out_unregister:
-	kobject_put(&priv->kobj);
-	kfree(drv->p);
-	drv->p = NULL;
+	remove_driver_private(drv);
 out_put_bus:
 	bus_put(bus);
 	return error;
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [RFC v2 2/6] driver-core: add driver async_probe support
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
  2014-09-05  6:37 ` [RFC v2 1/6] driver-core: generalize freeing driver private member Luis R. Rodriguez
@ 2014-09-05  6:37 ` Luis R. Rodriguez
  2014-09-05 11:24     ` Oleg Nesterov
  2014-09-05 22:10   ` Dmitry Torokhov
  2014-09-05  6:37   ` Luis R. Rodriguez
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, Tetsuo Handa, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We now have two documented use cases for probing asynchronously:

0) since we bundle together driver init() and probe() systemd's
   new 30 second timeout has put a limit on the amount of time
   a driver probe routine can take, we need to enable drivers
   to complete probe gracefully

1) when a built-in driver takes a few seconds to initialize its
   delays can stall the overall boot process

The built-in driver issues is pretty straight forward, and for
that we just need to let the probing happen behind the scenes.

The systemd issue is a bit more complex given the history of
how it was identified and work arounds proposed and evaluated.
The systemd issue was first identified first by Joseph when he
bisected and found that Tetsuo Handa's commit 786235ee
"kthread: make kthread_create() killable" modified kthread_create()
to bail as soon as SIGKILL is received [0] [1]. This was found
to cause some issues with some drivers and at times boot. There
are other patches which could also enable the SIGKILL trigger on
driver loading though:

70834d30 "usermodehelper: use UMH_WAIT_PROC consistently"
b3449922 "usermodehelper: introduce umh_complete(sub_info)"
d0bd587a "usermodehelper: implement UMH_KILLABLE"
9d944ef3 "usermodehelper: kill umh_wait, renumber UMH_* constants"
5b9bd473 "usermodehelper: ____call_usermodehelper() doesn't need do_exit()"
3e63a93b "kmod: introduce call_modprobe() helper"
1cc684ab "kmod: make __request_module() killable"

All of these went in on 3.4 upstream, and were part of the fixes
for CVE-2012-4398 [2] and documented more properly on Red Hat's
bugzilla [3]. Any of these patches may contribute to having a
module be properly killed now, but 786235ee is the latest in the
series. For instance on SLE12 cxgb4 has been fond to get the
SIGKILL even though SLE12 does not yet have 786235ee merged [4].

Joseph found that the systemd-udevd process sends SIGKILL to
systemd's usage of kmod for module loading if probe on a driver
takes over 30 seconds [5] [6]. When this happens probe will fail
on any driver, but *iff* its probe path ends up using kthreads.
Its why booting on some systems will fail if the driver happens
to be a storage related driver.  When helping debug the issue
Tetsuo suggested fixing this issue by kmodifying kthread_create()
to not leave upon SIGKILL immediately *unless* the source of the
SIGKILL was the OOM, and actually wait for 10 seconds more before
completing the kill [7] *unless* the source of the killer was OOM.
This is not the only source of a kill, as noted above, this same
issue is present on kernels without commit 786235ee. Additionally
upon review of this patch Oleg rejected this change [8] and the
discussion was punted out to systemd-devel to see if the default
timeout could be increased from 30 seconds to 120 [9]. The opinion
of the systemd maintainers was that the driver's behavior should
be fixed [10].  Linus seems to agree [11], however more recently
even networking drivers have been reported to fail on probe
since just writing the firmware to a device and kicking it can
take easy over 60 seconds [4]. Benjamim was able to trace the
issues reported on cxgb4 down to the same systemd-udevd 30
second timeout [6].  Even if asynch firmware loading was used
on the cxgb4 driver the driver would *still* hit other delays
due to the way the driver is currently designed, fixing that
will require a bit of time and until then some users are left
with a completely dysfunctional device.

Folks were a bit confused here though -- its not only module
initialization which is being penalized, because the driver core
will immediately trigger the driver's own bus probe routine if
autoprobe is enabled each driver's probe routine must also
complete within the same 30 second timeout.  This means not only
should driver's init complete within the set default systemd
timeout of 30 seconds but so should the probe routine, and
probe would obviously also have less time given that the
timeout is for both the module's init() and its own bus' probe().
A few drivers fail to complete the bus' probe within 30 seconds,
its not the init routine that takes long. The timeout seems
to currently hit *iff* kthreads are used somehow on the driver's
probe path. For example purposely breaking the e1000e driver
by adding a 30 second timeout on the probe path does not let
systemd kill it, however doing the same for iwlwifi triggers
the kill, this is because this driver uses request_threaded_irq()
and behind the scenes the kernel uses ktread_create() on
__setup_irq() to handle the thread *iff* its not nested, these
are drivers that set irq_set_nested_thread(irq, 1).

Hannes Reinecke has implemented now a timeout modifier for
systemd, however *systemd* still needs a way to gracefully
annotate drivers with long probes instead of failing these
drivers and at worst boot. On the kernel side of things we
can circumvent the timeout by probing asynchronously on only
drivers that need it. If a driver is changed to use this new
asynch probing, folks should be aware that any userspace
that assumed that completing driver loading would enable
device functionality will need to changed until the device
appears.

[0]  http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123
[1]  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
[2]  http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2012-4398
[3]  https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2012-4398
[4]  https://bugzilla.novell.com/show_bug.cgi?id=877622
[5]  http://article.gmane.org/gmane.linux.kernel/1669550
[6]  https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
[7]  https://launchpadlibrarian.net/169657493/kthread-defer-leaving.patch
[8]  http://article.gmane.org/gmane.linux.kernel/1669604
[9]  http://lists.freedesktop.org/archives/systemd-devel/2014-March/018006.html
[10] http://article.gmane.org/gmane.comp.sysutils.systemd.devel/17860
[11] http://article.gmane.org/gmane.linux.kernel/1671333

Cc: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Pierre Fersing <pierre-fersing@pierref.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Santosh Rastapur <santosh@chelsio.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/base/base.h    |  6 +++++
 drivers/base/bus.c     | 59 +++++++++++++++++++++++++++++++++++++++++++++++---
 drivers/base/dd.c      |  4 ++++
 include/linux/device.h |  5 +++++
 4 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 251c5d3..24836f1 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -43,11 +43,17 @@ struct subsys_private {
 };
 #define to_subsys_private(obj) container_of(obj, struct subsys_private, subsys.kobj)
 
+struct driver_attach_work {
+	struct work_struct work;
+	struct device_driver *driver;
+};
+
 struct driver_private {
 	struct kobject kobj;
 	struct klist klist_devices;
 	struct klist_node knode_bus;
 	struct module_kobject *mkobj;
+	struct driver_attach_work *attach_work;
 	struct device_driver *driver;
 };
 #define to_driver(obj) container_of(obj, struct driver_private, kobj)
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index a5f41e4..70d51b2 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -85,6 +85,7 @@ static void driver_release(struct kobject *kobj)
 	struct driver_private *drv_priv = to_driver(kobj);
 
 	pr_debug("driver: '%s': %s\n", kobject_name(kobj), __func__);
+	kfree(drv_priv->attach_work);
 	kfree(drv_priv);
 }
 
@@ -662,10 +663,56 @@ static void remove_driver_private(struct device_driver *drv)
 	struct driver_private *priv = drv->p;
 
 	kobject_put(&priv->kobj);
+	kfree(priv->attach_work);
 	kfree(priv);
 	drv->p = NULL;
 }
 
+static void driver_attach_workfn(struct work_struct *work)
+{
+	int ret;
+	struct driver_attach_work *attach_work =
+		container_of(work, struct driver_attach_work, work);
+	struct device_driver *drv = attach_work->driver;
+	ktime_t calltime, delta, rettime;
+	unsigned long long duration;
+
+	calltime = ktime_get();
+
+	ret = driver_attach(drv);
+	if (ret != 0) {
+		remove_driver_private(drv);
+		bus_put(drv->bus);
+	}
+
+	rettime = ktime_get();
+	delta = ktime_sub(rettime, calltime);
+	duration = (unsigned long long) ktime_to_ns(delta) >> 10;
+
+	pr_debug("bus: '%s': add driver %s attach completed after %lld usecs\n",
+		 drv->bus->name, drv->name, duration);
+}
+
+int bus_driver_async_probe(struct device_driver *drv)
+{
+	struct driver_private *priv = drv->p;
+
+	priv->attach_work = kzalloc(sizeof(struct driver_attach_work),
+				    GFP_KERNEL);
+	if (!priv->attach_work)
+		return -ENOMEM;
+
+	priv->attach_work->driver = drv;
+	INIT_WORK(&priv->attach_work->work, driver_attach_workfn);
+
+	pr_debug("bus: '%s': probe for driver %s is run asynchronously\n",
+		 drv->bus->name, drv->name);
+
+	queue_work(system_unbound_wq, &priv->attach_work->work);
+
+	return 0;
+}
+
 /**
  * bus_add_driver - Add a driver to the bus.
  * @drv: driver.
@@ -698,9 +745,15 @@ int bus_add_driver(struct device_driver *drv)
 
 	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
 	if (drv->bus->p->drivers_autoprobe) {
-		error = driver_attach(drv);
-		if (error)
-			goto out_unregister;
+		if (drv->owner && drv->async_probe) {
+			error = bus_driver_async_probe(drv);
+			if (error)
+				goto out_unregister;
+		} else {
+			error = driver_attach(drv);
+			if (error)
+				goto out_unregister;
+		}
 	}
 	module_add_driver(drv->owner, drv);
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index e4ffbcf..f1565f3 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -507,6 +507,10 @@ static void __device_release_driver(struct device *dev)
 
 	drv = dev->driver;
 	if (drv) {
+		if (drv->owner && drv->async_probe) {
+			struct driver_private *priv = drv->p;
+			flush_work(&priv->attach_work->work);
+		}
 		pm_runtime_get_sync(dev);
 
 		driver_sysfs_remove(dev);
diff --git a/include/linux/device.h b/include/linux/device.h
index 43d183a..7de1386b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -200,6 +200,10 @@ extern struct klist *bus_get_device_klist(struct bus_type *bus);
  * @owner:	The module owner.
  * @mod_name:	Used for built-in modules.
  * @suppress_bind_attrs: Disables bind/unbind via sysfs.
+ * @async_probe: requests probe to be run asynchronously. Drivers that
+ * 	have this enabled must take care that userspace will return
+ * 	immediately upon driver loading as probing will happen behind the
+ * 	schenes asynchronously.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
  * @probe:	Called to query the existence of a specific device,
@@ -233,6 +237,7 @@ struct device_driver {
 	const char		*mod_name;	/* used for built-in modules */
 
 	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool async_probe;
 
 	const struct of_device_id	*of_match_table;
 	const struct acpi_device_id	*acpi_match_table;
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
@ 2014-09-05  6:37   ` Luis R. Rodriguez
  2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, Tetsuo Handa, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The new umh kill option has allowed kthreads to receive
kill signals but they are generally accepting all sources
of kill signals while the original motivation was to enable
through the OOM from sending the kill. One particular user
which has been found to send kill signals on kthreads is
systemd, it does this upon a 30 second default timeout on
loading modules. That timeout was in place under the
assumption that some driver's init sequences were taking
long. Since the kernel batches both init and probe together
though its actually been the probe routines which take
long. These should not be penalized, the kill would only
happen if and only if the driver's probe routine ends up
using kthreads somehow. To help with this we now have the
async_probe flag for drivers but before we can amend
drivers with this functionality we need to find them. This
patch addresses that by avoiding the kill from any other
source than the OOM killer -- for now.

Users can provide a log output and it should be clear on
the trace what probe / driver got the kill signal.

This patch is based on Tetsuo's patch [0] to try to address
the timeout issue, which in itself is based on Tetsuo's
original patch to also address this months ago [1]. These
patches just lacked addressing all other callers which
would load modules for us. Although Oleg had rejected a
similar change a while ago [2] its now clear what the
source of the problem. A few solutions have been proposed,
one of them was to allow the default systemd timeout to be
modified, that change by Hannes Reinecke is now merged
upstream on systemd, we still however need a non fatal
way to deal with modules that take long and an easy way
for us to find these modules. At least one proposal has
been made for systemd but discussions on that approach
hasn't gotten much traction [3] so we need to address
this on the kernel, this will also be important for users
of new kernels on old versions of systemd.

[0] https://launchpadlibrarian.net/169657493/kthread-defer-leaving.patch
[1] https://lkml.org/lkml/2014/7/29/284
[2] http://article.gmane.org/gmane.linux.kernel/1669604
[3] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

An example log output captured by purposely breaking the iwlwifi
driver by using ssleep(33) on probe:

[   43.853997] iwlwifi going to sleep for 33 seconds
[   76.862975] iwlwifi done sleeping for 33 seconds
[   76.863880] iwlwifi 0000:03:00.0: irq 34 for MSI/MSI-X
[   76.863961] ------------[ cut here ]------------
[   76.864648] WARNING: CPU: 0 PID: 479 at kernel/kthread.c:308 kthread_create_on_node+0x1ea/0x200()
[   76.865309] Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe
[   76.865974] Modules linked in: xfs libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep aes_x86_64 uvcvideo glue_helper videobuf2_vmalloc lrw gf128mul snd_pcm ablk_helper iTCO_wdt rtsx_pci_ms videobuf2_memops videobuf2_core rtsx_pci_sdmmc v4l2_common mmc_core videodev snd_timer thinkpad_acpi memstick iTCO_vendor_support snd mei_me rtsx_pci cryptd iwlwifi(+) mei shpchp tpm_tis soundcore pcspkr joydev lpc_ich mfd_core serio_raw tpm btusb wmi i2c_i801 thermal intel_smartconnect ac battery processor dm_mod btrfs xor raid6_pq i915 i2c_algo_bit e1000e drm_kms_helper sr_mod crc32c_intel cdrom xhci_hcd drm video
[   76.869197]  button sg
[   76.870035] CPU: 0 PID: 479 Comm: systemd-udevd Not tainted 3.17.0-rc3-25.g1474ea5-desktop+ #12
[   76.870915] Hardware name: LENOVO 20AW000LUS/20AW000LUS, BIOS GLET43WW (1.18 ) 12/04/2013
[   76.871801]  0000000000000009 ffff8802133a3908 ffffffff8173960f ffff8802133a3950
[   76.872771]  ffff8802133a3940 ffffffff81072eed ffff8800c9004480 ffffffff810c8fd0
[   76.873693]  ffffffff81a77845 00000000ffffffff ffff8800c9d2abc0 ffff8802133a39a0
[   76.874620] Call Trace:
[   76.875522]  [<ffffffff8173960f>] dump_stack+0x4d/0x6f
[   76.876379]  [<ffffffff81072eed>] warn_slowpath_common+0x7d/0xa0
[   76.877286]  [<ffffffff810c8fd0>] ? irq_thread_check_affinity+0xb0/0xb0
[   76.878177]  [<ffffffff81072f5c>] warn_slowpath_fmt+0x4c/0x50
[   76.879048]  [<ffffffff810c8fd0>] ? irq_thread_check_affinity+0xb0/0xb0
[   76.879898]  [<ffffffff8108fdea>] kthread_create_on_node+0x1ea/0x200
[   76.880765]  [<ffffffff811bf50e>] ? enable_cpucache+0x4e/0xe0
[   76.881617]  [<ffffffff810c9c55>] __setup_irq+0x165/0x580
[   76.882459]  [<ffffffff8101bca6>] ? dma_generic_alloc_coherent+0x146/0x160
[   76.883314]  [<ffffffffa03cf780>] ? iwl_pcie_disable_ict+0x40/0x40 [iwlwifi]
[   76.884159]  [<ffffffff810ca1cf>] request_threaded_irq+0xcf/0x180
[   76.885010]  [<ffffffffa03d6efa>] iwl_trans_pcie_alloc+0x35a/0x4b1 [iwlwifi]
[   76.885861]  [<ffffffffa03cd3c0>] iwl_pci_probe+0x50/0x260 [iwlwifi]
[   76.886646]  [<ffffffff8146a59d>] ? __pm_runtime_resume+0x4d/0x60
[   76.887404]  [<ffffffff81383595>] local_pci_probe+0x45/0xa0
[   76.888155]  [<ffffffff81384795>] ? pci_match_device+0xe5/0x110
[   76.888899]  [<ffffffff813848d9>] pci_device_probe+0xd9/0x130
[   76.889646]  [<ffffffff8146090d>] driver_probe_device+0x12d/0x3e0
[   76.890391]  [<ffffffff81460c93>] __driver_attach+0x93/0xa0
[   76.891132]  [<ffffffff81460c00>] ? __device_attach+0x40/0x40
[   76.891870]  [<ffffffff8145e713>] bus_for_each_dev+0x63/0xa0
[   76.892763]  [<ffffffff814602de>] driver_attach+0x1e/0x20
[   76.893528]  [<ffffffff8145fe4e>] bus_add_driver+0xfe/0x270
[   76.894292]  [<ffffffffa036d000>] ? 0xffffffffa036d000
[   76.895118]  [<ffffffff814614e4>] driver_register+0x64/0xf0
[   76.895847]  [<ffffffff81382f1c>] __pci_register_driver+0x4c/0x50
[   76.896615]  [<ffffffffa03cd5f4>] iwl_pci_register_driver+0x24/0x40 [iwlwifi]
[   76.896619]  [<ffffffffa036d085>] iwl_drv_init+0x85/0x1000 [iwlwifi]
[   76.896621]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[   76.896624]  [<ffffffff811a49e4>] ? __vunmap+0x94/0x100
[   76.896626]  [<ffffffff810f34d5>] load_module+0x1f25/0x2670
[   76.896627]  [<ffffffff810ef170>] ? store_uevent+0x40/0x40
[   76.896630]  [<ffffffff810f3d96>] SyS_finit_module+0x86/0xb0
[   76.896632]  [<ffffffff817413ed>] system_call_fastpath+0x1a/0x1f
[   76.896632] ---[ end trace 9a32581b585745d8 ]---
[   76.982019] iwlwifi 0000:03:00.0: loaded firmware version 23.214.9.0 op_mode iwlmvm
[   77.174150] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
[   77.174952] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
[   77.175955] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S

Cc: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Pierre Fersing <pierre-fersing@pierref.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Santosh Rastapur <santosh@chelsio.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 kernel/kmod.c    | 21 +++++++++++++++++++--
 kernel/kthread.c | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 8637e04..b22228c 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -596,16 +596,33 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 		goto unlock;
 
 	if (wait & UMH_KILLABLE) {
+		unsigned int i;
+
 		retval = wait_for_completion_killable(&done);
-		if (!retval)
+		if (likely(!retval))
 			goto wait_done;
 
+		/*
+		 * I got SIGKILL, but wait for 60 more seconds for completion
+		 * unless chosen by the OOM killer. This delay is there as a
+		 * workaround for boot failure caused by SIGKILL upon device
+		 * driver initialization timeout.
+		 *
+		 * N.B. this will actually let the thread complete regularly,
+		 * wait_for_completion() will be used eventually, the 60 second
+		 * try here is just to check for the OOM over that time.
+		 */
+		WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
+			  "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
+		for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
+			if (wait_for_completion_timeout(&done, HZ))
+				goto wait_done;
+
 		/* umh_complete() will see NULL and free sub_info */
 		if (xchg(&sub_info->complete, NULL))
 			goto unlock;
 		/* fallthrough, umh_complete() was already called */
 	}
-
 	wait_for_completion(&done);
 wait_done:
 	retval = sub_info->retval;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ef48322..bfb6dbe 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -292,6 +292,24 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	 * new kernel thread.
 	 */
 	if (unlikely(wait_for_completion_killable(&done))) {
+		unsigned int i;
+
+		/*
+		 * I got SIGKILL, but wait for 10 more seconds for completion
+		 * unless chosen by the OOM killer. This delay is there as a
+		 * workaround for boot failure caused by SIGKILL upon device
+		 * driver initialization timeout.
+		 *
+		 * N.B. this will actually let the thread complete regularly,
+		 * wait_for_completion() will be used eventually, the 10 second
+		 * try here is just to check for the OOM over that time.
+		 */
+		WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
+			  "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
+		for (i = 0; i < 10 && !test_thread_flag(TIF_MEMDIE); i++)
+			if (wait_for_completion_timeout(&done, HZ))
+				goto ready;
+
 		/*
 		 * If I was SIGKILLed before kthreadd (or new kernel thread)
 		 * calls complete(), leave the cleanup of this structure to
@@ -305,6 +323,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 		 */
 		wait_for_completion(&done);
 	}
+ready:
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05  6:37   ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, Tetsuo Handa, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The new umh kill option has allowed kthreads to receive
kill signals but they are generally accepting all sources
of kill signals while the original motivation was to enable
through the OOM from sending the kill. One particular user
which has been found to send kill signals on kthreads is
systemd, it does this upon a 30 second default timeout on
loading modules. That timeout was in place under the
assumption that some driver's init sequences were taking
long. Since the kernel batches both init and probe together
though its actually been the probe routines which take
long. These should not be penalized, the kill would only
happen if and only if the driver's probe routine ends up
using kthreads somehow. To help with this we now have the
async_probe flag for drivers but before we can amend
drivers with this functionality we need to find them. This
patch addresses that by avoiding the kill from any other
source than the OOM killer -- for now.

Users can provide a log output and it should be clear on
the trace what probe / driver got the kill signal.

This patch is based on Tetsuo's patch [0] to try to address
the timeout issue, which in itself is based on Tetsuo's
original patch to also address this months ago [1]. These
patches just lacked addressing all other callers which
would load modules for us. Although Oleg had rejected a
similar change a while ago [2] its now clear what the
source of the problem. A few solutions have been proposed,
one of them was to allow the default systemd timeout to be
modified, that change by Hannes Reinecke is now merged
upstream on systemd, we still however need a non fatal
way to deal with modules that take long and an easy way
for us to find these modules. At least one proposal has
been made for systemd but discussions on that approach
hasn't gotten much traction [3] so we need to address
this on the kernel, this will also be important for users
of new kernels on old versions of systemd.

[0] https://launchpadlibrarian.net/169657493/kthread-defer-leaving.patch
[1] https://lkml.org/lkml/2014/7/29/284
[2] http://article.gmane.org/gmane.linux.kernel/1669604
[3] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

An example log output captured by purposely breaking the iwlwifi
driver by using ssleep(33) on probe:

[   43.853997] iwlwifi going to sleep for 33 seconds
[   76.862975] iwlwifi done sleeping for 33 seconds
[   76.863880] iwlwifi 0000:03:00.0: irq 34 for MSI/MSI-X
[   76.863961] ------------[ cut here ]------------
[   76.864648] WARNING: CPU: 0 PID: 479 at kernel/kthread.c:308 kthread_create_on_node+0x1ea/0x200()
[   76.865309] Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe
[   76.865974] Modules linked in: xfs libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep aes_x86_64 uvcvideo glue_helper videobuf2_vmalloc lrw gf128mul snd_pcm ablk_helper iTCO_wdt rtsx_pci_ms videobuf2_memops videobuf2_core rtsx_pci_sdmmc v4l2_common mmc_core videodev snd_timer thinkpad_acpi memstick iTCO_vendor_support snd mei_me rtsx_pci cryptd iwlwifi(+) mei shpchp tpm_tis soundcore pcspkr joydev lpc_ich mfd_core serio_raw tpm btusb wmi i2c_i801 thermal intel_smartconnect ac battery processor dm_mod btrfs xor raid6_pq i915 i2c_algo_bit e1000e drm_kms_helper sr_mod crc32c_intel cdrom xhci
 _hcd drm video
[   76.869197]  button sg
[   76.870035] CPU: 0 PID: 479 Comm: systemd-udevd Not tainted 3.17.0-rc3-25.g1474ea5-desktop+ #12
[   76.870915] Hardware name: LENOVO 20AW000LUS/20AW000LUS, BIOS GLET43WW (1.18 ) 12/04/2013
[   76.871801]  0000000000000009 ffff8802133a3908 ffffffff8173960f ffff8802133a3950
[   76.872771]  ffff8802133a3940 ffffffff81072eed ffff8800c9004480 ffffffff810c8fd0
[   76.873693]  ffffffff81a77845 00000000ffffffff ffff8800c9d2abc0 ffff8802133a39a0
[   76.874620] Call Trace:
[   76.875522]  [<ffffffff8173960f>] dump_stack+0x4d/0x6f
[   76.876379]  [<ffffffff81072eed>] warn_slowpath_common+0x7d/0xa0
[   76.877286]  [<ffffffff810c8fd0>] ? irq_thread_check_affinity+0xb0/0xb0
[   76.878177]  [<ffffffff81072f5c>] warn_slowpath_fmt+0x4c/0x50
[   76.879048]  [<ffffffff810c8fd0>] ? irq_thread_check_affinity+0xb0/0xb0
[   76.879898]  [<ffffffff8108fdea>] kthread_create_on_node+0x1ea/0x200
[   76.880765]  [<ffffffff811bf50e>] ? enable_cpucache+0x4e/0xe0
[   76.881617]  [<ffffffff810c9c55>] __setup_irq+0x165/0x580
[   76.882459]  [<ffffffff8101bca6>] ? dma_generic_alloc_coherent+0x146/0x160
[   76.883314]  [<ffffffffa03cf780>] ? iwl_pcie_disable_ict+0x40/0x40 [iwlwifi]
[   76.884159]  [<ffffffff810ca1cf>] request_threaded_irq+0xcf/0x180
[   76.885010]  [<ffffffffa03d6efa>] iwl_trans_pcie_alloc+0x35a/0x4b1 [iwlwifi]
[   76.885861]  [<ffffffffa03cd3c0>] iwl_pci_probe+0x50/0x260 [iwlwifi]
[   76.886646]  [<ffffffff8146a59d>] ? __pm_runtime_resume+0x4d/0x60
[   76.887404]  [<ffffffff81383595>] local_pci_probe+0x45/0xa0
[   76.888155]  [<ffffffff81384795>] ? pci_match_device+0xe5/0x110
[   76.888899]  [<ffffffff813848d9>] pci_device_probe+0xd9/0x130
[   76.889646]  [<ffffffff8146090d>] driver_probe_device+0x12d/0x3e0
[   76.890391]  [<ffffffff81460c93>] __driver_attach+0x93/0xa0
[   76.891132]  [<ffffffff81460c00>] ? __device_attach+0x40/0x40
[   76.891870]  [<ffffffff8145e713>] bus_for_each_dev+0x63/0xa0
[   76.892763]  [<ffffffff814602de>] driver_attach+0x1e/0x20
[   76.893528]  [<ffffffff8145fe4e>] bus_add_driver+0xfe/0x270
[   76.894292]  [<ffffffffa036d000>] ? 0xffffffffa036d000
[   76.895118]  [<ffffffff814614e4>] driver_register+0x64/0xf0
[   76.895847]  [<ffffffff81382f1c>] __pci_register_driver+0x4c/0x50
[   76.896615]  [<ffffffffa03cd5f4>] iwl_pci_register_driver+0x24/0x40 [iwlwifi]
[   76.896619]  [<ffffffffa036d085>] iwl_drv_init+0x85/0x1000 [iwlwifi]
[   76.896621]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[   76.896624]  [<ffffffff811a49e4>] ? __vunmap+0x94/0x100
[   76.896626]  [<ffffffff810f34d5>] load_module+0x1f25/0x2670
[   76.896627]  [<ffffffff810ef170>] ? store_uevent+0x40/0x40
[   76.896630]  [<ffffffff810f3d96>] SyS_finit_module+0x86/0xb0
[   76.896632]  [<ffffffff817413ed>] system_call_fastpath+0x1a/0x1f
[   76.896632] ---[ end trace 9a32581b585745d8 ]---
[   76.982019] iwlwifi 0000:03:00.0: loaded firmware version 23.214.9.0 op_mode iwlmvm
[   77.174150] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
[   77.174952] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
[   77.175955] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S

Cc: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Pierre Fersing <pierre-fersing@pierref.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Santosh Rastapur <santosh@chelsio.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 kernel/kmod.c    | 21 +++++++++++++++++++--
 kernel/kthread.c | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 8637e04..b22228c 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -596,16 +596,33 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 		goto unlock;
 
 	if (wait & UMH_KILLABLE) {
+		unsigned int i;
+
 		retval = wait_for_completion_killable(&done);
-		if (!retval)
+		if (likely(!retval))
 			goto wait_done;
 
+		/*
+		 * I got SIGKILL, but wait for 60 more seconds for completion
+		 * unless chosen by the OOM killer. This delay is there as a
+		 * workaround for boot failure caused by SIGKILL upon device
+		 * driver initialization timeout.
+		 *
+		 * N.B. this will actually let the thread complete regularly,
+		 * wait_for_completion() will be used eventually, the 60 second
+		 * try here is just to check for the OOM over that time.
+		 */
+		WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
+			  "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
+		for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
+			if (wait_for_completion_timeout(&done, HZ))
+				goto wait_done;
+
 		/* umh_complete() will see NULL and free sub_info */
 		if (xchg(&sub_info->complete, NULL))
 			goto unlock;
 		/* fallthrough, umh_complete() was already called */
 	}
-
 	wait_for_completion(&done);
 wait_done:
 	retval = sub_info->retval;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ef48322..bfb6dbe 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -292,6 +292,24 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	 * new kernel thread.
 	 */
 	if (unlikely(wait_for_completion_killable(&done))) {
+		unsigned int i;
+
+		/*
+		 * I got SIGKILL, but wait for 10 more seconds for completion
+		 * unless chosen by the OOM killer. This delay is there as a
+		 * workaround for boot failure caused by SIGKILL upon device
+		 * driver initialization timeout.
+		 *
+		 * N.B. this will actually let the thread complete regularly,
+		 * wait_for_completion() will be used eventually, the 10 second
+		 * try here is just to check for the OOM over that time.
+		 */
+		WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
+			  "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
+		for (i = 0; i < 10 && !test_thread_flag(TIF_MEMDIE); i++)
+			if (wait_for_completion_timeout(&done, HZ))
+				goto ready;
+
 		/*
 		 * If I was SIGKILLed before kthreadd (or new kernel thread)
 		 * calls complete(), leave the cleanup of this structure to
@@ -305,6 +323,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 		 */
 		wait_for_completion(&done);
 	}
+ready:
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };
-- 
2.0.3

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [RFC v2 4/6] cxgb4: use async probe
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
                   ` (2 preceding siblings ...)
  2014-09-05  6:37   ` Luis R. Rodriguez
@ 2014-09-05  6:37 ` Luis R. Rodriguez
  2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, Tetsuo Handa,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Hariprasad S, Casey Leedom, MPT-FusionLinux.pdl,
	linux-scsi, netdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

cxgb4 probe can take up to over 1 minute when the firmware is
is written and installed on the device, even after this the device
driver still does some device probing and can take quite a bit.
systemd will kill this driver when probe does take over 30 seconds,
use the asynch probe mechanism to circumvent this.

Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Pierre Fersing <pierre-fersing@pierref.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Santosh Rastapur <santosh@chelsio.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 18fb9c6..5f7d24a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6794,6 +6794,7 @@ static struct pci_driver cxgb4_driver = {
 	.remove   = remove_one,
 	.shutdown = remove_one,
 	.err_handler = &cxgb4_eeh,
+	.driver.async_probe = true,
 };
 
 static int __init cxgb4_init_module(void)
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [RFC v2 5/6] mptsas: use async probe
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
                   ` (3 preceding siblings ...)
  2014-09-05  6:37 ` [RFC v2 4/6] cxgb4: use async probe Luis R. Rodriguez
@ 2014-09-05  6:37 ` Luis R. Rodriguez
  2014-09-05  7:16   ` Tejun Heo
  2014-09-05  7:23   ` Hannes Reinecke
  2014-09-05  6:37 ` [RFC v2 6/6] pata_marvell: " Luis R. Rodriguez
  2014-09-05  7:11 ` [RFC v2 0/6] driver-core: add asynch probe support Tejun Heo
  6 siblings, 2 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, Tetsuo Handa,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Hariprasad S, Casey Leedom, MPT-FusionLinux.pdl,
	linux-scsi, netdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Its reported that mptsas can at times take over 30 seconds
to recognize SCSI storage devices [0], this is done on the
driver's probe path. Use the the new asynch probe to
circumvent systemd from killing this driver.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705

Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Pierre Fersing <pierre-fersing@pierref.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Santosh Rastapur <santosh@chelsio.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/message/fusion/mptsas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 0707fa2..6dfee95 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -5385,6 +5385,7 @@ static struct pci_driver mptsas_driver = {
 	.suspend	= mptscsih_suspend,
 	.resume		= mptscsih_resume,
 #endif
+	.driver.async_probe = true,
 };
 
 static int __init
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [RFC v2 6/6] pata_marvell: use async probe
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
                   ` (4 preceding siblings ...)
  2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
@ 2014-09-05  6:37 ` Luis R. Rodriguez
  2014-09-05  6:59   ` Alexander E. Patrakov
  2014-09-05  7:15   ` Tejun Heo
  2014-09-05  7:11 ` [RFC v2 0/6] driver-core: add asynch probe support Tejun Heo
  6 siblings, 2 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  6:37 UTC (permalink / raw)
  To: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, linux-ide,
	One Thousand Gnomes, patrakov

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Alexander reported that on his Sony VAIO VPCZ23A4R laptop
experiences long delays on boot when connected to its dock
station on pre 3.9 kernels but anything after 3.9 will cause
the device to not be detected at all ending with:

[   38.065673] pata_marvell 0000:1a:00.0: no available native port
[   38.065769] pata_acpi 0000:1a:00.0: no available native port

This laptop has a Marvell 88SE6121 SATA II Controller [11ab:6121]
and a BluRay writer attached. The reason for the delays are
caused by SRST errors and the link being slow to respond.
The pata_marvell driver is a simple libata wrapper so the
real required changes need to be made on libata however not
many folks are around and available anymore with intimate
knowledge and experience with these devices. Alexander notes
that it may be that *any* ATA BMDMA controller that fails to
respond to an identify command until a reset or other device
poking might suffer from similar fate, this needs to be
investigated further. Using async probe the issue caused
by systemd killing the driver after taking over 30 seconds
on probe.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=59581

Cc: Tejun Heo <tj@kernel.org>
Cc: linux-ide@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: patrakov@gmail.com
Reported-by: "Alexander E. Patrakov" <patrakov@gmail.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/ata/pata_marvell.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/pata_marvell.c b/drivers/ata/pata_marvell.c
index ae9feb1..6a543b9 100644
--- a/drivers/ata/pata_marvell.c
+++ b/drivers/ata/pata_marvell.c
@@ -175,6 +175,7 @@ static struct pci_driver marvell_pci_driver = {
 	.suspend		= ata_pci_device_suspend,
 	.resume			= ata_pci_device_resume,
 #endif
+	.driver.async_probe = true,
 };
 
 module_pci_driver(marvell_pci_driver);
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: [RFC v2 6/6] pata_marvell: use async probe
  2014-09-05  6:37 ` [RFC v2 6/6] pata_marvell: " Luis R. Rodriguez
@ 2014-09-05  6:59   ` Alexander E. Patrakov
  2014-09-05  7:15   ` Tejun Heo
  1 sibling, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-09-05  6:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, hare, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, linux-ide,
	One Thousand Gnomes

05.09.2014 12:37, Luis R. Rodriguez пишет:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Alexander reported that on his Sony VAIO VPCZ23A4R laptop
> experiences long delays on boot when connected to its dock
> station on pre 3.9 kernels but anything after 3.9 will cause
> the device to not be detected at all ending with:
>
> [   38.065673] pata_marvell 0000:1a:00.0: no available native port
> [   38.065769] pata_acpi 0000:1a:00.0: no available native port

I object to this commit message, it is based on outdated information and 
is due to a different bug that was fixed in 3.10 as a last-minute fix. 
Modern kernels just experience long delays during boot.

> This laptop has a Marvell 88SE6121 SATA II Controller [11ab:6121]
> and a BluRay writer attached. The reason for the delays are
> caused by SRST errors and the link being slow to respond.
> The pata_marvell driver is a simple libata wrapper so the
> real required changes need to be made on libata however not
> many folks are around and available anymore with intimate
> knowledge and experience with these devices. Alexander notes
> that it may be that *any* ATA BMDMA controller that fails to
> respond to an identify command until a reset or other device
> poking might suffer from similar fate, this needs to be
> investigated further. Using async probe the issue caused
> by systemd killing the driver after taking over 30 seconds
> on probe.
>
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=59581
>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: linux-ide@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Benjamin Poirier <bpoirier@suse.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: patrakov@gmail.com
> Reported-by: "Alexander E. Patrakov" <patrakov@gmail.com>
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>   drivers/ata/pata_marvell.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/ata/pata_marvell.c b/drivers/ata/pata_marvell.c
> index ae9feb1..6a543b9 100644
> --- a/drivers/ata/pata_marvell.c
> +++ b/drivers/ata/pata_marvell.c
> @@ -175,6 +175,7 @@ static struct pci_driver marvell_pci_driver = {
>   	.suspend		= ata_pci_device_suspend,
>   	.resume			= ata_pci_device_resume,
>   #endif
> +	.driver.async_probe = true,
>   };
>
>   module_pci_driver(marvell_pci_driver);
>


-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 0/6] driver-core: add asynch probe support
  2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
                   ` (5 preceding siblings ...)
  2014-09-05  6:37 ` [RFC v2 6/6] pata_marvell: " Luis R. Rodriguez
@ 2014-09-05  7:11 ` Tejun Heo
  6 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05  7:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, falcon, Takashi Iwai,
	Arjan van de Ven, lkml, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, bpoirier, santosh,
	Luis R. Rodriguez

Hello,

On Thu, Sep 04, 2014 at 11:37:21PM -0700, Luis R. Rodriguez wrote:
> Tejun's concerns on this regressing some driver's scripts which expect
> the device to be available after loading remains valid, and the only
> thing we can do to help there is to annotate the expecations on the
> use of this "feature" to driver users. Scripts should be not be relying
> on the driver init anyway so that type of usage should be phased out
> and they should be hunting in udev for things popping up.

Ummm... I really don't think we can say that.  This was one of the
supported ways to wait for the probing of pre-existing devices on
driver load.  We can't simply go and declare that "scripts should not
be relying on the driver init anyway".  We just can't do that.

> I'm a bit concerned about this actually regressing load time on
> drivers that use this though instead of just having the module
> probe run off of finit_module() though. Even with a kthread alternative
> at least Santosh (Cc'd) has noted a regression in terms of time it
> takes to complete probe on cxgb4. I'll eventually get your exact
> numbers, but for now its an obvious regression *with* kthreads,
> this solution goes with:
>
> queue_work(system_unbound_wq, async_probe_work)
>
> This is surely going to make things even worse... We could
> use system_highpri_wq, or change the scheduling priority, but
> for that I'd prefer to get feedback and someone to decide what
> the right choice (TM) should be.

It shouldn't add any noticeable delays in probing.  If it does, we
should track down why that's happening and fix it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 6/6] pata_marvell: use async probe
  2014-09-05  6:37 ` [RFC v2 6/6] pata_marvell: " Luis R. Rodriguez
  2014-09-05  6:59   ` Alexander E. Patrakov
@ 2014-09-05  7:15   ` Tejun Heo
  1 sibling, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05  7:15 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, arjan, linux-kernel,
	oleg, hare, akpm, penguin-kernel, joseph.salisbury, bpoirier,
	santosh, Luis R. Rodriguez, linux-ide, One Thousand Gnomes,
	patrakov

On Thu, Sep 04, 2014 at 11:37:27PM -0700, Luis R. Rodriguez wrote:
> diff --git a/drivers/ata/pata_marvell.c b/drivers/ata/pata_marvell.c
> index ae9feb1..6a543b9 100644
> --- a/drivers/ata/pata_marvell.c
> +++ b/drivers/ata/pata_marvell.c
> @@ -175,6 +175,7 @@ static struct pci_driver marvell_pci_driver = {
>  	.suspend		= ata_pci_device_suspend,
>  	.resume			= ata_pci_device_resume,
>  #endif
> +	.driver.async_probe = true,

You can't do this.  There's nothing special about pata_marvell.  Sure
there was a bug report which made long probe durations more common on
this driver on certain configurations but those long durations can
happen on *any* libata driver and singling out pata_marvell for async
probing is adding a different probing behavior basically arbitrarily.
I really can't see how this marking random drivers with async probing
would work, so one driver does synchronous probing while the
equivalent next one doesn't?  That's crazy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 5/6] mptsas: use async probe
  2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
@ 2014-09-05  7:16   ` Tejun Heo
  2014-09-05  7:23   ` Hannes Reinecke
  1 sibling, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05  7:16 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, arjan, linux-kernel,
	oleg, hare, akpm, penguin-kernel, joseph.salisbury, bpoirier,
	santosh, Luis R. Rodriguez, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Hariprasad S, Casey Leedom,
	MPT-FusionLinux.pdl, linux-scsi, netdev

On Thu, Sep 04, 2014 at 11:37:26PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> Its reported that mptsas can at times take over 30 seconds
> to recognize SCSI storage devices [0], this is done on the
> driver's probe path. Use the the new asynch probe to
> circumvent systemd from killing this driver.

Again, *ANY* SCSI storage controller may.  The fact that a specific
bug report was filed on mptsas doesn't really mean anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05  6:37   ` Luis R. Rodriguez
@ 2014-09-05  7:19     ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05  7:19 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, arjan, linux-kernel,
	oleg, hare, akpm, penguin-kernel, joseph.salisbury, bpoirier,
	santosh, Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl, linux-scsi,
	netdev

On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
...
> +		/*
> +		 * I got SIGKILL, but wait for 60 more seconds for completion
> +		 * unless chosen by the OOM killer. This delay is there as a
> +		 * workaround for boot failure caused by SIGKILL upon device
> +		 * driver initialization timeout.
> +		 *
> +		 * N.B. this will actually let the thread complete regularly,
> +		 * wait_for_completion() will be used eventually, the 60 second
> +		 * try here is just to check for the OOM over that time.
> +		 */
> +		WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> +			  "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> +		for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> +			if (wait_for_completion_timeout(&done, HZ))
> +				goto wait_done;
> +

Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
instead of 30?  Why do we even need this with the proposed async
probing changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05  7:19     ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05  7:19 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, arjan, linux-kernel,
	oleg, hare, akpm, penguin-kernel, joseph.salisbury, bpoirier,
	santosh, Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl, linux-scsi,
	netdev

On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
...
> +		/*
> +		 * I got SIGKILL, but wait for 60 more seconds for completion
> +		 * unless chosen by the OOM killer. This delay is there as a
> +		 * workaround for boot failure caused by SIGKILL upon device
> +		 * driver initialization timeout.
> +		 *
> +		 * N.B. this will actually let the thread complete regularly,
> +		 * wait_for_completion() will be used eventually, the 60 second
> +		 * try here is just to check for the OOM over that time.
> +		 */
> +		WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> +			  "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> +		for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> +			if (wait_for_completion_timeout(&done, HZ))
> +				goto wait_done;
> +

Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
instead of 30?  Why do we even need this with the proposed async
probing changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 5/6] mptsas: use async probe
  2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
  2014-09-05  7:16   ` Tejun Heo
@ 2014-09-05  7:23   ` Hannes Reinecke
  1 sibling, 0 replies; 227+ messages in thread
From: Hannes Reinecke @ 2014-09-05  7:23 UTC (permalink / raw)
  To: Luis R. Rodriguez, gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan
  Cc: linux-kernel, oleg, akpm, penguin-kernel, joseph.salisbury,
	bpoirier, santosh, Luis R. Rodriguez, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Hariprasad S, Casey Leedom, MPT-FusionLinux.pdl, linux-scsi,
	netdev

On 09/05/2014 08:37 AM, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> Its reported that mptsas can at times take over 30 seconds
> to recognize SCSI storage devices [0], this is done on the
> driver's probe path. Use the the new asynch probe to
> circumvent systemd from killing this driver.
> 
> [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
> 
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
> Cc: Tim Gardner <tim.gardner@canonical.com>
> Cc: Pierre Fersing <pierre-fersing@pierref.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Benjamin Poirier <bpoirier@suse.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
> Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
> Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
> Cc: Hariprasad S <hariprasad@chelsio.com>
> Cc: Santosh Rastapur <santosh@chelsio.com>
> Cc: Casey Leedom <leedom@chelsio.com>
> Cc: MPT-FusionLinux.pdl@avagotech.com
> Cc: linux-scsi@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/message/fusion/mptsas.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
> index 0707fa2..6dfee95 100644
> --- a/drivers/message/fusion/mptsas.c
> +++ b/drivers/message/fusion/mptsas.c
> @@ -5385,6 +5385,7 @@ static struct pci_driver mptsas_driver = {
>  	.suspend	= mptscsih_suspend,
>  	.resume		= mptscsih_resume,
>  #endif
> +	.driver.async_probe = true,
>  };
>  
>  static int __init
> 
This is the wrong appoach.
First of all, the mptsas, mpt2sas, and mpt3sas all share the same
driver layout, so any issue happeing with this driver will most
likely affect the others, too.
Secondly the driver is event-based anyway, so we should be moving
the initialisation to the already existing event handler:

diff --git a/drivers/message/fusion/mptsas.c
b/drivers/message/fusion/mptsas.c
index 0707fa2..6f41e2c 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -5305,7 +5305,7 @@ mptsas_probe(struct pci_dev *pdev, const
struct pci_device_id *id)
        /* older firmware doesn't support expander events */
        if ((ioc->facts.HeaderVersion >> 8) < 0xE)
                ioc->old_sas_discovery_protocal = 1;
-       mptsas_scan_sas_topology(ioc);
+       mptsas_queue_rescan(ioc);
        mptsas_fw_event_on(ioc);
        return 0;



^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05  7:19     ` Tejun Heo
  (?)
@ 2014-09-05  7:47       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  7:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> ...
>> +             /*
>> +              * I got SIGKILL, but wait for 60 more seconds for completion
>> +              * unless chosen by the OOM killer. This delay is there as a
>> +              * workaround for boot failure caused by SIGKILL upon device
>> +              * driver initialization timeout.
>> +              *
>> +              * N.B. this will actually let the thread complete regularly,
>> +              * wait_for_completion() will be used eventually, the 60 second
>> +              * try here is just to check for the OOM over that time.
>> +              */
>> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
>> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
>> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
>> +                     if (wait_for_completion_timeout(&done, HZ))
>> +                             goto wait_done;
>> +
>
> Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> instead of 30?

Nope! I fell into the same trap and only with tons of patience by part
of Tetsuo with me was I able to grok that the 60 seconds here are not
for increasing the timeout, this is just time spent checking to ensure
that the OOM wasn't the one who triggered the SIGKILL. Even if the
drivers took eons it should be fine now, I tried it :D

>  Why do we even need this with the proposed async
> probing changes?

Ah -- well without it the way we "find" drivers that need this new
"async feature" is by a bug report and folks saying their system can't
boot, or they say their device doesn't come up. That's all. Tracing
this to systemd and a timeout was one of the most ugliest things ever.
There two insane bug reports you can go check:

mptsas was the first:

http://article.gmane.org/gmane.linux.kernel/1669550
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

Then cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

I only had Cc'd you on the newest gem pata_marvell :

https://bugzilla.kernel.org/show_bug.cgi?id=59581

We can't seriously expect to be doing all this work for every driver.
a WARN_ONCE() would enable us to find the drivers that need this new
async probe "feature".

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05  7:47       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  7:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> ...
>> +             /*
>> +              * I got SIGKILL, but wait for 60 more seconds for completion
>> +              * unless chosen by the OOM killer. This delay is there as a
>> +              * workaround for boot failure caused by SIGKILL upon device
>> +              * driver initialization timeout.
>> +              *
>> +              * N.B. this will actually let the thread complete regularly,
>> +              * wait_for_completion() will be used eventually, the 60 second
>> +              * try here is just to check for the OOM over that time.
>> +              */
>> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
>> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
>> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
>> +                     if (wait_for_completion_timeout(&done, HZ))
>> +                             goto wait_done;
>> +
>
> Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> instead of 30?

Nope! I fell into the same trap and only with tons of patience by part
of Tetsuo with me was I able to grok that the 60 seconds here are not
for increasing the timeout, this is just time spent checking to ensure
that the OOM wasn't the one who triggered the SIGKILL. Even if the
drivers took eons it should be fine now, I tried it :D

>  Why do we even need this with the proposed async
> probing changes?

Ah -- well without it the way we "find" drivers that need this new
"async feature" is by a bug report and folks saying their system can't
boot, or they say their device doesn't come up. That's all. Tracing
this to systemd and a timeout was one of the most ugliest things ever.
There two insane bug reports you can go check:

mptsas was the first:

http://article.gmane.org/gmane.linux.kernel/1669550
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

Then cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

I only had Cc'd you on the newest gem pata_marvell :

https://bugzilla.kernel.org/show_bug.cgi?id=59581

We can't seriously expect to be doing all this work for every driver.
a WARN_ONCE() would enable us to find the drivers that need this new
async probe "feature".

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05  7:47       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05  7:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> ...
>> +             /*
>> +              * I got SIGKILL, but wait for 60 more seconds for completion
>> +              * unless chosen by the OOM killer. This delay is there as a
>> +              * workaround for boot failure caused by SIGKILL upon device
>> +              * driver initialization timeout.
>> +              *
>> +              * N.B. this will actually let the thread complete regularly,
>> +              * wait_for_completion() will be used eventually, the 60 second
>> +              * try here is just to check for the OOM over that time.
>> +              */
>> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
>> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
>> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
>> +                     if (wait_for_completion_timeout(&done, HZ))
>> +                             goto wait_done;
>> +
>
> Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> instead of 30?

Nope! I fell into the same trap and only with tons of patience by part
of Tetsuo with me was I able to grok that the 60 seconds here are not
for increasing the timeout, this is just time spent checking to ensure
that the OOM wasn't the one who triggered the SIGKILL. Even if the
drivers took eons it should be fine now, I tried it :D

>  Why do we even need this with the proposed async
> probing changes?

Ah -- well without it the way we "find" drivers that need this new
"async feature" is by a bug report and folks saying their system can't
boot, or they say their device doesn't come up. That's all. Tracing
this to systemd and a timeout was one of the most ugliest things ever.
There two insane bug reports you can go check:

mptsas was the first:

http://article.gmane.org/gmane.linux.kernel/1669550
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

Then cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

I only had Cc'd you on the newest gem pata_marvell :

https://bugzilla.kernel.org/show_bug.cgi?id=59581

We can't seriously expect to be doing all this work for every driver.
a WARN_ONCE() would enable us to find the drivers that need this new
async probe "feature".

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05  7:47       ` Luis R. Rodriguez
  (?)
@ 2014-09-05  9:14         ` Mike Galbraith
  -1 siblings, 0 replies; 227+ messages in thread
From: Mike Galbraith @ 2014-09-05  9:14 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote: 
> On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> > ...
> >> +             /*
> >> +              * I got SIGKILL, but wait for 60 more seconds for completion
> >> +              * unless chosen by the OOM killer. This delay is there as a
> >> +              * workaround for boot failure caused by SIGKILL upon device
> >> +              * driver initialization timeout.
> >> +              *
> >> +              * N.B. this will actually let the thread complete regularly,
> >> +              * wait_for_completion() will be used eventually, the 60 second
> >> +              * try here is just to check for the OOM over that time.
> >> +              */
> >> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> >> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> >> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> >> +                     if (wait_for_completion_timeout(&done, HZ))
> >> +                             goto wait_done;
> >> +
> >
> > Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> > instead of 30?
> 
> Nope! I fell into the same trap and only with tons of patience by part
> of Tetsuo with me was I able to grok that the 60 seconds here are not
> for increasing the timeout, this is just time spent checking to ensure
> that the OOM wasn't the one who triggered the SIGKILL. Even if the
> drivers took eons it should be fine now, I tried it :D
> 
> >  Why do we even need this with the proposed async
> > probing changes?
> 
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

<quote>
(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
    30 seconds timeout. As a result, finit_module() of mptsas kernel
    module receives SIGKILL when waiting for error handler thread to be
    started.
</quote>

Hm.  Why is this not a systemd-udevd bug for running around killing
stuff when it has no idea whether progress is being made or not?

-Mike


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05  9:14         ` Mike Galbraith
  0 siblings, 0 replies; 227+ messages in thread
From: Mike Galbraith @ 2014-09-05  9:14 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy

On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote: 
> On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> > ...
> >> +             /*
> >> +              * I got SIGKILL, but wait for 60 more seconds for completion
> >> +              * unless chosen by the OOM killer. This delay is there as a
> >> +              * workaround for boot failure caused by SIGKILL upon device
> >> +              * driver initialization timeout.
> >> +              *
> >> +              * N.B. this will actually let the thread complete regularly,
> >> +              * wait_for_completion() will be used eventually, the 60 second
> >> +              * try here is just to check for the OOM over that time.
> >> +              */
> >> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> >> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> >> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> >> +                     if (wait_for_completion_timeout(&done, HZ))
> >> +                             goto wait_done;
> >> +
> >
> > Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> > instead of 30?
> 
> Nope! I fell into the same trap and only with tons of patience by part
> of Tetsuo with me was I able to grok that the 60 seconds here are not
> for increasing the timeout, this is just time spent checking to ensure
> that the OOM wasn't the one who triggered the SIGKILL. Even if the
> drivers took eons it should be fine now, I tried it :D
> 
> >  Why do we even need this with the proposed async
> > probing changes?
> 
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

<quote>
(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
    30 seconds timeout. As a result, finit_module() of mptsas kernel
    module receives SIGKILL when waiting for error handler thread to be
    started.
</quote>

Hm.  Why is this not a systemd-udevd bug for running around killing
stuff when it has no idea whether progress is being made or not?

-Mike

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05  9:14         ` Mike Galbraith
  0 siblings, 0 replies; 227+ messages in thread
From: Mike Galbraith @ 2014-09-05  9:14 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy

On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote: 
> On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
> > ...
> >> +             /*
> >> +              * I got SIGKILL, but wait for 60 more seconds for completion
> >> +              * unless chosen by the OOM killer. This delay is there as a
> >> +              * workaround for boot failure caused by SIGKILL upon device
> >> +              * driver initialization timeout.
> >> +              *
> >> +              * N.B. this will actually let the thread complete regularly,
> >> +              * wait_for_completion() will be used eventually, the 60 second
> >> +              * try here is just to check for the OOM over that time.
> >> +              */
> >> +             WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> >> +                       "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
> >> +             for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> >> +                     if (wait_for_completion_timeout(&done, HZ))
> >> +                             goto wait_done;
> >> +
> >
> > Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> > instead of 30?
> 
> Nope! I fell into the same trap and only with tons of patience by part
> of Tetsuo with me was I able to grok that the 60 seconds here are not
> for increasing the timeout, this is just time spent checking to ensure
> that the OOM wasn't the one who triggered the SIGKILL. Even if the
> drivers took eons it should be fine now, I tried it :D
> 
> >  Why do we even need this with the proposed async
> > probing changes?
> 
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

<quote>
(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
    30 seconds timeout. As a result, finit_module() of mptsas kernel
    module receives SIGKILL when waiting for error handler thread to be
    started.
</quote>

Hm.  Why is this not a systemd-udevd bug for running around killing
stuff when it has no idea whether progress is being made or not?

-Mike

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05  6:37   ` Luis R. Rodriguez
@ 2014-09-05 10:59     ` Oleg Nesterov
  -1 siblings, 0 replies; 227+ messages in thread
From: Oleg Nesterov @ 2014-09-05 10:59 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan, linux-kernel,
	hare, akpm, penguin-kernel, joseph.salisbury, bpoirier, santosh,
	Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, linux-scsi, netdev

On 09/04, Luis R. Rodriguez wrote:
>
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> The new umh kill option has allowed kthreads to receive
> kill signals but they are generally accepting all sources
> of kill signals

And I think this is right,

> while the original motivation was to enable
> through the OOM from sending the kill.

even if the main concern was OOM.

> Users can provide a log output and it should be clear on
> the trace what probe / driver got the kill signal.

Well, if you need a WARN output, perhaps you could just add
WARN_ON(fatal_signal_pending()) at the end of load_module() ?

Not only kthread_create() can fail if systemd sends SIGKILL.

> Although Oleg had rejected a
> similar change a while ago

And honestly, I still dislike this change.

Oleg.


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 10:59     ` Oleg Nesterov
  0 siblings, 0 replies; 227+ messages in thread
From: Oleg Nesterov @ 2014-09-05 10:59 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan, linux-kernel,
	hare, akpm, penguin-kernel, joseph.salisbury, bpoirier, santosh,
	Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, linux-scsi, netdev

On 09/04, Luis R. Rodriguez wrote:
>
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> The new umh kill option has allowed kthreads to receive
> kill signals but they are generally accepting all sources
> of kill signals

And I think this is right,

> while the original motivation was to enable
> through the OOM from sending the kill.

even if the main concern was OOM.

> Users can provide a log output and it should be clear on
> the trace what probe / driver got the kill signal.

Well, if you need a WARN output, perhaps you could just add
WARN_ON(fatal_signal_pending()) at the end of load_module() ?

Not only kthread_create() can fail if systemd sends SIGKILL.

> Although Oleg had rejected a
> similar change a while ago

And honestly, I still dislike this change.

Oleg.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
  2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
@ 2014-09-05 11:24     ` Oleg Nesterov
  2014-09-05 22:10   ` Dmitry Torokhov
  1 sibling, 0 replies; 227+ messages in thread
From: Oleg Nesterov @ 2014-09-05 11:24 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan, linux-kernel,
	hare, akpm, penguin-kernel, joseph.salisbury, bpoirier, santosh,
	Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, linux-scsi, netdev

On 09/04, Luis R. Rodriguez wrote:
>
>  struct driver_private {
>  	struct kobject kobj;
>  	struct klist klist_devices;
>  	struct klist_node knode_bus;
>  	struct module_kobject *mkobj;
> +	struct driver_attach_work *attach_work;
>  	struct device_driver *driver;

I am not arguing, just curious...

Are you trying to shrink sizeof(driver_private) ? The code can be simpler
if you just embedd "struct work_struct attach_work" into driver_private,
and you do not need "struct driver_attach_work" or another ->driver pointer
this way.

Oleg.


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
@ 2014-09-05 11:24     ` Oleg Nesterov
  0 siblings, 0 replies; 227+ messages in thread
From: Oleg Nesterov @ 2014-09-05 11:24 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, dmitry.torokhov, falcon, tiwai, tj, arjan, linux-kernel,
	hare, akpm, penguin-kernel, joseph.salisbury, bpoirier, santosh,
	Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, linux-scsi, netdev

On 09/04, Luis R. Rodriguez wrote:
>
>  struct driver_private {
>  	struct kobject kobj;
>  	struct klist klist_devices;
>  	struct klist_node knode_bus;
>  	struct module_kobject *mkobj;
> +	struct driver_attach_work *attach_work;
>  	struct device_driver *driver;

I am not arguing, just curious...

Are you trying to shrink sizeof(driver_private) ? The code can be simpler
if you just embedd "struct work_struct attach_work" into driver_private,
and you do not need "struct driver_attach_work" or another ->driver pointer
this way.

Oleg.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05  7:47       ` Luis R. Rodriguez
  (?)
@ 2014-09-05 14:12         ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 14:12 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
> 
> Then cxgb4:
> 
> https://bugzilla.novell.com/show_bug.cgi?id=877622
> 
> I only had Cc'd you on the newest gem pata_marvell :
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=59581
> 
> We can't seriously expect to be doing all this work for every driver.
> a WARN_ONCE() would enable us to find the drivers that need this new
> async probe "feature".

This whole approach of trying to mark specific drivers as needing
"async probing" is completely broken for the problem at hand.  It
can't address the problem adequately while breaking backward
compatibility.  I don't think this makes much sense.

Nacked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 14:12         ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 14:12 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
> 
> Then cxgb4:
> 
> https://bugzilla.novell.com/show_bug.cgi?id=877622
> 
> I only had Cc'd you on the newest gem pata_marvell :
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=59581
> 
> We can't seriously expect to be doing all this work for every driver.
> a WARN_ONCE() would enable us to find the drivers that need this new
> async probe "feature".

This whole approach of trying to mark specific drivers as needing
"async probing" is completely broken for the problem at hand.  It
can't address the problem adequately while breaking backward
compatibility.  I don't think this makes much sense.

Nacked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 14:12         ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 14:12 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Greg Kroah-Hartman, Dmitry Torokhov, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
> 
> Then cxgb4:
> 
> https://bugzilla.novell.com/show_bug.cgi?id=877622
> 
> I only had Cc'd you on the newest gem pata_marvell :
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=59581
> 
> We can't seriously expect to be doing all this work for every driver.
> a WARN_ONCE() would enable us to find the drivers that need this new
> async probe "feature".

This whole approach of trying to mark specific drivers as needing
"async probing" is completely broken for the problem at hand.  It
can't address the problem adequately while breaking backward
compatibility.  I don't think this makes much sense.

Nacked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 14:12         ` Tejun Heo
@ 2014-09-05 16:44           ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 16:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Friday, September 05, 2014 11:12:41 PM Tejun Heo wrote:
> On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
> > Ah -- well without it the way we "find" drivers that need this new
> > "async feature" is by a bug report and folks saying their system can't
> > boot, or they say their device doesn't come up. That's all. Tracing
> > this to systemd and a timeout was one of the most ugliest things ever.
> > There two insane bug reports you can go check:
> > 
> > mptsas was the first:
> > 
> > http://article.gmane.org/gmane.linux.kernel/1669550
> > https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
> > 
> > Then cxgb4:
> > 
> > https://bugzilla.novell.com/show_bug.cgi?id=877622
> > 
> > I only had Cc'd you on the newest gem pata_marvell :
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=59581
> > 
> > We can't seriously expect to be doing all this work for every driver.
> > a WARN_ONCE() would enable us to find the drivers that need this new
> > async probe "feature".
> 
> This whole approach of trying to mark specific drivers as needing
> "async probing" is completely broken for the problem at hand.  It
> can't address the problem adequately while breaking backward
> compatibility.  I don't think this makes much sense.
> 

Which problem are we talking about here though? It does solve the slow device
stalling the rest if the kernel booting (non-module case) for me.

I also reject the notion that anyone should be relying on drivers to be fully
bound on module loading. It is not nineties anymore. We have hot pluggable
buses, deferred probing, and even for not hot-pluggable ones the module
providing the device itself might not be yet loaded. Any scripts that expect to
find device 100% ready after module loading are simply broken.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 16:44           ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 16:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Friday, September 05, 2014 11:12:41 PM Tejun Heo wrote:
> On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
> > Ah -- well without it the way we "find" drivers that need this new
> > "async feature" is by a bug report and folks saying their system can't
> > boot, or they say their device doesn't come up. That's all. Tracing
> > this to systemd and a timeout was one of the most ugliest things ever.
> > There two insane bug reports you can go check:
> > 
> > mptsas was the first:
> > 
> > http://article.gmane.org/gmane.linux.kernel/1669550
> > https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
> > 
> > Then cxgb4:
> > 
> > https://bugzilla.novell.com/show_bug.cgi?id=877622
> > 
> > I only had Cc'd you on the newest gem pata_marvell :
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=59581
> > 
> > We can't seriously expect to be doing all this work for every driver.
> > a WARN_ONCE() would enable us to find the drivers that need this new
> > async probe "feature".
> 
> This whole approach of trying to mark specific drivers as needing
> "async probing" is completely broken for the problem at hand.  It
> can't address the problem adequately while breaking backward
> compatibility.  I don't think this makes much sense.
> 

Which problem are we talking about here though? It does solve the slow device
stalling the rest if the kernel booting (non-module case) for me.

I also reject the notion that anyone should be relying on drivers to be fully
bound on module loading. It is not nineties anymore. We have hot pluggable
buses, deferred probing, and even for not hot-pluggable ones the module
providing the device itself might not be yet loaded. Any scripts that expect to
find device 100% ready after module loading are simply broken.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
  2014-09-05 11:24     ` Oleg Nesterov
@ 2014-09-05 17:25       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 17:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis R. Rodriguez, gregkh, dmitry.torokhov, falcon, tiwai, tj,
	arjan, linux-kernel, hare, akpm, penguin-kernel,
	joseph.salisbury, bpoirier, santosh, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netdev

On Fri, Sep 05, 2014 at 01:24:17PM +0200, Oleg Nesterov wrote:
> On 09/04, Luis R. Rodriguez wrote:
> >
> >  struct driver_private {
> >  	struct kobject kobj;
> >  	struct klist klist_devices;
> >  	struct klist_node knode_bus;
> >  	struct module_kobject *mkobj;
> > +	struct driver_attach_work *attach_work;
> >  	struct device_driver *driver;
> 
> I am not arguing, just curious...
> 
> Are you trying to shrink sizeof(driver_private) ? 

Yeap.

> The code can be simpler
> if you just embedd "struct work_struct attach_work" into driver_private,
> and you do not need "struct driver_attach_work" or another ->driver pointer
> this way.

Agreed, I considered it and figured it wouldn't make much sense
to push onto folks more bytes if this feature was optional and
likely only used by a few drivers, so a pointer / kzalloc seemed
better to deal with. This saves us 24 bytes. I even tried to
implement a container_of_p() for pointers but that obviosly
didn't work well fast as a pointer can have any address and is
not relative to the parent, and if its on stack the address
can vary depending on implementation. For example the first member
should always have the same address as the struct but if the
first member is a pointer it would be off for me by 12 bytes.
I am not sure if this is standarized or not.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
@ 2014-09-05 17:25       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 17:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis R. Rodriguez, gregkh, dmitry.torokhov, falcon, tiwai, tj,
	arjan, linux-kernel, hare, akpm, penguin-kernel,
	joseph.salisbury, bpoirier, santosh, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netd

On Fri, Sep 05, 2014 at 01:24:17PM +0200, Oleg Nesterov wrote:
> On 09/04, Luis R. Rodriguez wrote:
> >
> >  struct driver_private {
> >  	struct kobject kobj;
> >  	struct klist klist_devices;
> >  	struct klist_node knode_bus;
> >  	struct module_kobject *mkobj;
> > +	struct driver_attach_work *attach_work;
> >  	struct device_driver *driver;
> 
> I am not arguing, just curious...
> 
> Are you trying to shrink sizeof(driver_private) ? 

Yeap.

> The code can be simpler
> if you just embedd "struct work_struct attach_work" into driver_private,
> and you do not need "struct driver_attach_work" or another ->driver pointer
> this way.

Agreed, I considered it and figured it wouldn't make much sense
to push onto folks more bytes if this feature was optional and
likely only used by a few drivers, so a pointer / kzalloc seemed
better to deal with. This saves us 24 bytes. I even tried to
implement a container_of_p() for pointers but that obviosly
didn't work well fast as a pointer can have any address and is
not relative to the parent, and if its on stack the address
can vary depending on implementation. For example the first member
should always have the same address as the struct but if the
first member is a pointer it would be off for me by 12 bytes.
I am not sure if this is standarized or not.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 10:59     ` Oleg Nesterov
@ 2014-09-05 17:35       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 17:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis R. Rodriguez, gregkh, dmitry.torokhov, falcon, tiwai, tj,
	arjan, linux-kernel, hare, akpm, penguin-kernel,
	joseph.salisbury, bpoirier, santosh, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netdev

On Fri, Sep 05, 2014 at 12:59:49PM +0200, Oleg Nesterov wrote:
> On 09/04, Luis R. Rodriguez wrote:
> >
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > The new umh kill option has allowed kthreads to receive
> > kill signals but they are generally accepting all sources
> > of kill signals
> 
> And I think this is right,
> 
> > while the original motivation was to enable
> > through the OOM from sending the kill.
> 
> even if the main concern was OOM.
> 
> > Users can provide a log output and it should be clear on
> > the trace what probe / driver got the kill signal.
> 
> Well, if you need a WARN output, perhaps you could just add
> WARN_ON(fatal_signal_pending()) at the end of load_module() ?

We could and that's a good idea, thanks! This however would
at least allow the device to be functional in the case the
kill was received during kthread usage, but it would certainly
also set precedents for doing similar things in the kernel
which I do agree with is hacky. If we had upstream at
least WARN_ON(fatal_signal_pending()) as you note then
I think it would at least be a reasonable compromise.

> Not only kthread_create() can fail if systemd sends SIGKILL.

Sure, although its currently the only source found and debugged.

> > Although Oleg had rejected a
> > similar change a while ago
> 
> And honestly, I still dislike this change.

Don't blame you. The code is sensitive and hacky.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 17:35       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 17:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis R. Rodriguez, gregkh, dmitry.torokhov, falcon, tiwai, tj,
	arjan, linux-kernel, hare, akpm, penguin-kernel,
	joseph.salisbury, bpoirier, santosh, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	linux-scsi, netd

On Fri, Sep 05, 2014 at 12:59:49PM +0200, Oleg Nesterov wrote:
> On 09/04, Luis R. Rodriguez wrote:
> >
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > The new umh kill option has allowed kthreads to receive
> > kill signals but they are generally accepting all sources
> > of kill signals
> 
> And I think this is right,
> 
> > while the original motivation was to enable
> > through the OOM from sending the kill.
> 
> even if the main concern was OOM.
> 
> > Users can provide a log output and it should be clear on
> > the trace what probe / driver got the kill signal.
> 
> Well, if you need a WARN output, perhaps you could just add
> WARN_ON(fatal_signal_pending()) at the end of load_module() ?

We could and that's a good idea, thanks! This however would
at least allow the device to be functional in the case the
kill was received during kthread usage, but it would certainly
also set precedents for doing similar things in the kernel
which I do agree with is hacky. If we had upstream at
least WARN_ON(fatal_signal_pending()) as you note then
I think it would at least be a reasonable compromise.

> Not only kthread_create() can fail if systemd sends SIGKILL.

Sure, although its currently the only source found and debugged.

> > Although Oleg had rejected a
> > similar change a while ago
> 
> And honestly, I still dislike this change.

Don't blame you. The code is sensitive and hacky.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 16:44           ` Dmitry Torokhov
@ 2014-09-05 17:49             ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 17:49 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello,

On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> Which problem are we talking about here though? It does solve the slow device
> stalling the rest if the kernel booting (non-module case) for me.

The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
has slow probing stalling boot problem.

> I also reject the notion that anyone should be relying on drivers to be fully
> bound on module loading. It is not nineties anymore. We have hot pluggable
> buses, deferred probing, and even for not hot-pluggable ones the module
> providing the device itself might not be yet loaded. Any scripts that expect to
> find device 100% ready after module loading are simply broken.

We've been treating loading + probing as a single operation when
loading drivers and the assumption has always been that the existing
devices at the time of loading finished probing by the time insmod
finishes.  We now need to split loading and probing and wait for each
of them differently.  The *only* thing we can do is somehow making the
issuer specify that it's gonna wait for probing separately.  I'm not
sure this can even be up for discussion.  We're talking about a major
userland visible behavior change.  We simply can't change it
underneath the existing users.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 17:49             ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 17:49 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hello,

On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> Which problem are we talking about here though? It does solve the slow device
> stalling the rest if the kernel booting (non-module case) for me.

The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
has slow probing stalling boot problem.

> I also reject the notion that anyone should be relying on drivers to be fully
> bound on module loading. It is not nineties anymore. We have hot pluggable
> buses, deferred probing, and even for not hot-pluggable ones the module
> providing the device itself might not be yet loaded. Any scripts that expect to
> find device 100% ready after module loading are simply broken.

We've been treating loading + probing as a single operation when
loading drivers and the assumption has always been that the existing
devices at the time of loading finished probing by the time insmod
finishes.  We now need to split loading and probing and wait for each
of them differently.  The *only* thing we can do is somehow making the
issuer specify that it's gonna wait for probing separately.  I'm not
sure this can even be up for discussion.  We're talking about a major
userland visible behavior change.  We simply can't change it
underneath the existing users.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 17:49             ` Tejun Heo
@ 2014-09-05 18:10               ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 18:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Sat, Sep 06, 2014 at 02:49:25AM +0900, Tejun Heo wrote:
> Hello,
> 
> On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> > Which problem are we talking about here though? It does solve the slow device
> > stalling the rest if the kernel booting (non-module case) for me.
> 
> The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> has slow probing stalling boot problem.
> 
> > I also reject the notion that anyone should be relying on drivers to be fully
> > bound on module loading. It is not nineties anymore. We have hot pluggable
> > buses, deferred probing, and even for not hot-pluggable ones the module
> > providing the device itself might not be yet loaded. Any scripts that expect to
> > find device 100% ready after module loading are simply broken.
> 
> We've been treating loading + probing as a single operation when
> loading drivers and the assumption has always been that the existing
> devices at the time of loading finished probing by the time insmod
> finishes.  We now need to split loading and probing and wait for each
> of them differently.  The *only* thing we can do is somehow making the
> issuer specify that it's gonna wait for probing separately.  I'm not
> sure this can even be up for discussion.  We're talking about a major
> userland visible behavior change.

I do not agree that it is actually user-visible change: generally speaking you
do not really know if device is there or not. They come and go. Like I said,
consider all permutations, with hot-pluggable buses, deferred probing, etc,
etc.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 18:10               ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 18:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Sat, Sep 06, 2014 at 02:49:25AM +0900, Tejun Heo wrote:
> Hello,
> 
> On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> > Which problem are we talking about here though? It does solve the slow device
> > stalling the rest if the kernel booting (non-module case) for me.
> 
> The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> has slow probing stalling boot problem.
> 
> > I also reject the notion that anyone should be relying on drivers to be fully
> > bound on module loading. It is not nineties anymore. We have hot pluggable
> > buses, deferred probing, and even for not hot-pluggable ones the module
> > providing the device itself might not be yet loaded. Any scripts that expect to
> > find device 100% ready after module loading are simply broken.
> 
> We've been treating loading + probing as a single operation when
> loading drivers and the assumption has always been that the existing
> devices at the time of loading finished probing by the time insmod
> finishes.  We now need to split loading and probing and wait for each
> of them differently.  The *only* thing we can do is somehow making the
> issuer specify that it's gonna wait for probing separately.  I'm not
> sure this can even be up for discussion.  We're talking about a major
> userland visible behavior change.

I do not agree that it is actually user-visible change: generally speaking you
do not really know if device is there or not. They come and go. Like I said,
consider all permutations, with hot-pluggable buses, deferred probing, etc,
etc.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 17:49             ` Tejun Heo
  (?)
@ 2014-09-05 18:12               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 18:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
>> Which problem are we talking about here though? It does solve the slow device
>> stalling the rest if the kernel booting (non-module case) for me.
>
> The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> has slow probing stalling boot problem.
>
>> I also reject the notion that anyone should be relying on drivers to be fully
>> bound on module loading. It is not nineties anymore. We have hot pluggable
>> buses, deferred probing, and even for not hot-pluggable ones the module
>> providing the device itself might not be yet loaded. Any scripts that expect to
>> find device 100% ready after module loading are simply broken.
>
> We've been treating loading + probing as a single operation when
> loading drivers and the assumption has always been that the existing
> devices at the time of loading finished probing by the time insmod
> finishes.  We now need to split loading and probing and wait for each
> of them differently.  The *only* thing we can do is somehow making the
> issuer specify that it's gonna wait for probing separately.  I'm not
> sure this can even be up for discussion.  We're talking about a major
> userland visible behavior change.  We simply can't change it
> underneath the existing users.

Meanwhile we are allowing a major design consideration such as a 30
second timeout for both init + probe all of a sudden become a hard
requirement for device drivers. I see your point but can't also be
introducing major design changes willy nilly either. We *need* a
solution for the affected drivers.

Also what stops drivers from going ahead and just implementing their
own async probe? Would that now be frowned upon as it strives away
from the original design? The bool would let those drivers do this
easily, and we would still need to identify these drivers, although
this particular change can be NAK'd Oleg's suggestion on
WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
at least needed. And if its not async probe... what do those with
failed drivers do?

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 18:12               ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 18:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
>> Which problem are we talking about here though? It does solve the slow device
>> stalling the rest if the kernel booting (non-module case) for me.
>
> The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> has slow probing stalling boot problem.
>
>> I also reject the notion that anyone should be relying on drivers to be fully
>> bound on module loading. It is not nineties anymore. We have hot pluggable
>> buses, deferred probing, and even for not hot-pluggable ones the module
>> providing the device itself might not be yet loaded. Any scripts that expect to
>> find device 100% ready after module loading are simply broken.
>
> We've been treating loading + probing as a single operation when
> loading drivers and the assumption has always been that the existing
> devices at the time of loading finished probing by the time insmod
> finishes.  We now need to split loading and probing and wait for each
> of them differently.  The *only* thing we can do is somehow making the
> issuer specify that it's gonna wait for probing separately.  I'm not
> sure this can even be up for discussion.  We're talking about a major
> userland visible behavior change.  We simply can't change it
> underneath the existing users.

Meanwhile we are allowing a major design consideration such as a 30
second timeout for both init + probe all of a sudden become a hard
requirement for device drivers. I see your point but can't also be
introducing major design changes willy nilly either. We *need* a
solution for the affected drivers.

Also what stops drivers from going ahead and just implementing their
own async probe? Would that now be frowned upon as it strives away
from the original design? The bool would let those drivers do this
easily, and we would still need to identify these drivers, although
this particular change can be NAK'd Oleg's suggestion on
WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
at least needed. And if its not async probe... what do those with
failed drivers do?

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 18:12               ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-05 18:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
>> Which problem are we talking about here though? It does solve the slow device
>> stalling the rest if the kernel booting (non-module case) for me.
>
> The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> has slow probing stalling boot problem.
>
>> I also reject the notion that anyone should be relying on drivers to be fully
>> bound on module loading. It is not nineties anymore. We have hot pluggable
>> buses, deferred probing, and even for not hot-pluggable ones the module
>> providing the device itself might not be yet loaded. Any scripts that expect to
>> find device 100% ready after module loading are simply broken.
>
> We've been treating loading + probing as a single operation when
> loading drivers and the assumption has always been that the existing
> devices at the time of loading finished probing by the time insmod
> finishes.  We now need to split loading and probing and wait for each
> of them differently.  The *only* thing we can do is somehow making the
> issuer specify that it's gonna wait for probing separately.  I'm not
> sure this can even be up for discussion.  We're talking about a major
> userland visible behavior change.  We simply can't change it
> underneath the existing users.

Meanwhile we are allowing a major design consideration such as a 30
second timeout for both init + probe all of a sudden become a hard
requirement for device drivers. I see your point but can't also be
introducing major design changes willy nilly either. We *need* a
solution for the affected drivers.

Also what stops drivers from going ahead and just implementing their
own async probe? Would that now be frowned upon as it strives away
from the original design? The bool would let those drivers do this
easily, and we would still need to identify these drivers, although
this particular change can be NAK'd Oleg's suggestion on
WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
at least needed. And if its not async probe... what do those with
failed drivers do?

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 18:12               ` Luis R. Rodriguez
  (?)
@ 2014-09-05 18:29                 ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 18:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo <tj@kernel.org> wrote:
> > Hello,
> >
> > On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> >> Which problem are we talking about here though? It does solve the slow device
> >> stalling the rest if the kernel booting (non-module case) for me.
> >
> > The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> > has slow probing stalling boot problem.
> >
> >> I also reject the notion that anyone should be relying on drivers to be fully
> >> bound on module loading. It is not nineties anymore. We have hot pluggable
> >> buses, deferred probing, and even for not hot-pluggable ones the module
> >> providing the device itself might not be yet loaded. Any scripts that expect to
> >> find device 100% ready after module loading are simply broken.
> >
> > We've been treating loading + probing as a single operation when
> > loading drivers and the assumption has always been that the existing
> > devices at the time of loading finished probing by the time insmod
> > finishes.  We now need to split loading and probing and wait for each
> > of them differently.  The *only* thing we can do is somehow making the
> > issuer specify that it's gonna wait for probing separately.  I'm not
> > sure this can even be up for discussion.  We're talking about a major
> > userland visible behavior change.  We simply can't change it
> > underneath the existing users.
> 
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.
> 
> Also what stops drivers from going ahead and just implementing their
> own async probe? 

They already do and the problem is that they do that poorly. One of the issues
is that the device is considered bound and so may attempt to suspend/resume
them, or unbind them, and the driver is not ready for such operations to take
place.

And even though driver is bound "synchronously" it does not help the user in
the slightest and the object that is the result of driver initialization is
still created asynchronously and is not ready (well, it might if drivers use
async_schedule as we are doing asych_sycnhronize_full() in module load.unload).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 18:29                 ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 18:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo <tj@kernel.org> wrote:
> > Hello,
> >
> > On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> >> Which problem are we talking about here though? It does solve the slow device
> >> stalling the rest if the kernel booting (non-module case) for me.
> >
> > The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> > has slow probing stalling boot problem.
> >
> >> I also reject the notion that anyone should be relying on drivers to be fully
> >> bound on module loading. It is not nineties anymore. We have hot pluggable
> >> buses, deferred probing, and even for not hot-pluggable ones the module
> >> providing the device itself might not be yet loaded. Any scripts that expect to
> >> find device 100% ready after module loading are simply broken.
> >
> > We've been treating loading + probing as a single operation when
> > loading drivers and the assumption has always been that the existing
> > devices at the time of loading finished probing by the time insmod
> > finishes.  We now need to split loading and probing and wait for each
> > of them differently.  The *only* thing we can do is somehow making the
> > issuer specify that it's gonna wait for probing separately.  I'm not
> > sure this can even be up for discussion.  We're talking about a major
> > userland visible behavior change.  We simply can't change it
> > underneath the existing users.
> 
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.
> 
> Also what stops drivers from going ahead and just implementing their
> own async probe? 

They already do and the problem is that they do that poorly. One of the issues
is that the device is considered bound and so may attempt to suspend/resume
them, or unbind them, and the driver is not ready for such operations to take
place.

And even though driver is bound "synchronously" it does not help the user in
the slightest and the object that is the result of driver initialization is
still created asynchronously and is not ready (well, it might if drivers use
async_schedule as we are doing asych_sycnhronize_full() in module load.unload).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 18:29                 ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 18:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo <tj@kernel.org> wrote:
> > Hello,
> >
> > On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> >> Which problem are we talking about here though? It does solve the slow device
> >> stalling the rest if the kernel booting (non-module case) for me.
> >
> > The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
> > has slow probing stalling boot problem.
> >
> >> I also reject the notion that anyone should be relying on drivers to be fully
> >> bound on module loading. It is not nineties anymore. We have hot pluggable
> >> buses, deferred probing, and even for not hot-pluggable ones the module
> >> providing the device itself might not be yet loaded. Any scripts that expect to
> >> find device 100% ready after module loading are simply broken.
> >
> > We've been treating loading + probing as a single operation when
> > loading drivers and the assumption has always been that the existing
> > devices at the time of loading finished probing by the time insmod
> > finishes.  We now need to split loading and probing and wait for each
> > of them differently.  The *only* thing we can do is somehow making the
> > issuer specify that it's gonna wait for probing separately.  I'm not
> > sure this can even be up for discussion.  We're talking about a major
> > userland visible behavior change.  We simply can't change it
> > underneath the existing users.
> 
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.
> 
> Also what stops drivers from going ahead and just implementing their
> own async probe? 

They already do and the problem is that they do that poorly. One of the issues
is that the device is considered bound and so may attempt to suspend/resume
them, or unbind them, and the driver is not ready for such operations to take
place.

And even though driver is bound "synchronously" it does not help the user in
the slightest and the object that is the result of driver initialization is
still created asynchronously and is not ready (well, it might if drivers use
async_schedule as we are doing asych_sycnhronize_full() in module load.unload).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
  2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
  2014-09-05 11:24     ` Oleg Nesterov
@ 2014-09-05 22:10   ` Dmitry Torokhov
  2014-10-20 23:43       ` Luis R. Rodriguez
  1 sibling, 1 reply; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 22:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, falcon, tiwai, tj, arjan, linux-kernel, oleg, hare, akpm,
	penguin-kernel, joseph.salisbury, bpoirier, santosh,
	Luis R. Rodriguez, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, linux-scsi, netdev

Hi Luis,

On Thu, Sep 04, 2014 at 11:37:23PM -0700, Luis R. Rodriguez wrote:
> 1) when a built-in driver takes a few seconds to initialize its
>    delays can stall the overall boot process

This patch does not solve the 2nd issue fully as it only calls probe
asynchronously during driver registration (and also only for modules???
- it checks drv->owner in a few places). The device may get created
  after driver is initialized, in this case we still want probe to be
called asynchronously.

I think something like the patch below should work. Note that it uses
async_checdule(), so that will satisy for the moment Tejun's objections
to the behavior with regard to module loading and initialization, but it
does not solve your issue with modules being killed after 30 seconds.

To tell the truth I think systemd should not be doing it; it is not its
place to dictate how long module should take to load. It may print
warnings and we'll work on fixing the drivers, but aborting boot just
because they feel like it took too long is not a good idea.

Thanks.

-- 
Dmitry


driver-core: add driver async_probe support

From: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Some devices take a long time when initializing, and not all drivers are
suited to initialize their devices when they are open. For example, input
drivers need to interrogate device in order to publish its capabilities
before userspace will open them. When such drivers are compiled into kernel
they may stall entire kernel initialization.

This change allows drivers request for their probe functions to be called
asynchronously during driver and device registration (manual binding is
still synchronous). Because async_schedule is used to perform asynchronous
calls module loading will still wait for the probing to complete.

This is based on earlier patch by "Luis R. Rodriguez" <mcgrof@suse.com>

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 drivers/base/bus.c     |   31 ++++++++++----
 drivers/base/dd.c      |  106 +++++++++++++++++++++++++++++++++++++++---------
 include/linux/device.h |    2 +
 3 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83e910a..49fe573 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -10,6 +10,7 @@
  *
  */
 
+#include <linux/async.h>
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/errno.h>
@@ -547,15 +548,12 @@ void bus_probe_device(struct device *dev)
 {
 	struct bus_type *bus = dev->bus;
 	struct subsys_interface *sif;
-	int ret;
 
 	if (!bus)
 		return;
 
-	if (bus->p->drivers_autoprobe) {
-		ret = device_attach(dev);
-		WARN_ON(ret < 0);
-	}
+	if (bus->p->drivers_autoprobe)
+		device_initial_probe(dev);
 
 	mutex_lock(&bus->p->mutex);
 	list_for_each_entry(sif, &bus->p->interfaces, node)
@@ -657,6 +655,17 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf,
 }
 static DRIVER_ATTR_WO(uevent);
 
+static void driver_attach_async(void *_drv, async_cookie_t cookie)
+{
+	struct device_driver *drv = _drv;
+	int ret;
+
+	ret = driver_attach(drv);
+
+	pr_debug("bus: '%s': driver %s async attach completed: %d\n",
+		 drv->bus->name, drv->name, ret);
+}
+
 /**
  * bus_add_driver - Add a driver to the bus.
  * @drv: driver.
@@ -689,9 +698,15 @@ int bus_add_driver(struct device_driver *drv)
 
 	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
 	if (drv->bus->p->drivers_autoprobe) {
-		error = driver_attach(drv);
-		if (error)
-			goto out_unregister;
+		if (drv->async_probe) {
+			pr_debug("bus: '%s': probing driver %s asynchronously\n",
+				drv->bus->name, drv->name);
+			async_schedule(driver_attach_async, drv);
+		} else {
+			error = driver_attach(drv);
+			if (error)
+				goto out_unregister;
+		}
 	}
 	module_add_driver(drv->owner, drv);
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index e4ffbcf..67a2f85 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -402,31 +402,52 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
 	return ret;
 }
 
-static int __device_attach(struct device_driver *drv, void *data)
+struct device_attach_data {
+	struct device *dev;
+	bool check_async;
+	bool want_async;
+	bool have_async;
+};
+
+static int __device_attach_driver(struct device_driver *drv, void *_data)
 {
-	struct device *dev = data;
+	struct device_attach_data *data = _data;
+	struct device *dev = data->dev;
 
 	if (!driver_match_device(drv, dev))
 		return 0;
 
+	if (drv->async_probe)
+		data->have_async = true;
+
+	if (data->check_async && drv->async_probe != data->want_async)
+		return 0;
+
 	return driver_probe_device(drv, dev);
 }
 
-/**
- * device_attach - try to attach device to a driver.
- * @dev: device.
- *
- * Walk the list of drivers that the bus has and call
- * driver_probe_device() for each pair. If a compatible
- * pair is found, break out and return.
- *
- * Returns 1 if the device was bound to a driver;
- * 0 if no matching driver was found;
- * -ENODEV if the device is not registered.
- *
- * When called for a USB interface, @dev->parent lock must be held.
- */
-int device_attach(struct device *dev)
+static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
+{
+	struct device *dev = _dev;
+	struct device_attach_data data = {
+		.dev		= dev,
+		.check_async	= true,
+		.want_async	= true,
+	};
+
+	device_lock(dev);
+
+	bus_for_each_drv(dev->bus, NULL, &data, __device_attach_driver);
+	dev_dbg(dev, "async probe completed\n");
+
+	pm_request_idle(dev);
+
+	device_unlock(dev);
+
+	put_device(dev);
+}
+
+int __device_attach(struct device *dev, bool allow_async)
 {
 	int ret = 0;
 
@@ -444,15 +465,59 @@ int device_attach(struct device *dev)
 			ret = 0;
 		}
 	} else {
-		ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
-		pm_request_idle(dev);
+		struct device_attach_data data = {
+			.dev = dev,
+			.check_async = allow_async,
+			.want_async = false,
+		};
+
+		ret = bus_for_each_drv(dev->bus, NULL, &data,
+					__device_attach_driver);
+		if (!ret && allow_async && data.have_async) {
+			/*
+			 * If we could not find appropriate driver
+			 * synchronously and we are allowed to do
+			 * async probes and there are drivers that
+			 * want to probe asynchronously, we'll
+			 * try them.
+			 */
+			dev_dbg(dev, "scheduling asynchronous probe\n");
+			get_device(dev);
+			async_schedule(__device_attach_async_helper, dev);
+		} else {
+			pm_request_idle(dev);
+		}
 	}
 out_unlock:
 	device_unlock(dev);
 	return ret;
 }
+
+/**
+ * device_attach - try to attach device to a driver.
+ * @dev: device.
+ *
+ * Walk the list of drivers that the bus has and call
+ * driver_probe_device() for each pair. If a compatible
+ * pair is found, break out and return.
+ *
+ * Returns 1 if the device was bound to a driver;
+ * 0 if no matching driver was found;
+ * -ENODEV if the device is not registered.
+ *
+ * When called for a USB interface, @dev->parent lock must be held.
+ */
+int device_attach(struct device *dev)
+{
+	return __device_attach(dev, false);
+}
 EXPORT_SYMBOL_GPL(device_attach);
 
+void device_initial_probe(struct device *dev)
+{
+	__device_attach(dev, true);
+}
+
 static int __driver_attach(struct device *dev, void *data)
 {
 	struct device_driver *drv = data;
@@ -507,6 +572,9 @@ static void __device_release_driver(struct device *dev)
 
 	drv = dev->driver;
 	if (drv) {
+		if (drv->async_probe)
+			async_synchronize_full();
+
 		pm_runtime_get_sync(dev);
 
 		driver_sysfs_remove(dev);
diff --git a/include/linux/device.h b/include/linux/device.h
index 43d183a..c6fa2e7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -233,6 +233,7 @@ struct device_driver {
 	const char		*mod_name;	/* used for built-in modules */
 
 	bool suppress_bind_attrs;	/* disables bind/unbind via sysfs */
+	bool async_probe;
 
 	const struct of_device_id	*of_match_table;
 	const struct acpi_device_id	*acpi_match_table;
@@ -966,6 +967,7 @@ extern int __must_check device_bind_driver(struct device *dev);
 extern void device_release_driver(struct device *dev);
 extern int  __must_check device_attach(struct device *dev);
 extern int __must_check driver_attach(struct device_driver *drv);
+extern void device_initial_probe(struct device *dev);
 extern int __must_check device_reprobe(struct device *dev);
 
 /*

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 18:10               ` Dmitry Torokhov
@ 2014-09-05 22:29                 ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:29 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello, Dmitry.

On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> I do not agree that it is actually user-visible change: generally speaking you
> do not really know if device is there or not. They come and go. Like I said,
> consider all permutations, with hot-pluggable buses, deferred probing, etc,

It is for storage devices which always have guaranteed synchronous
probing on module load and well-defined probing order.  Sure, modern
setups are a lot more dynamic but I'm quite certain that there are
setups in the wild which depend on storage driver loading being
synchronous.  We can't simply declare one day that such behavior is
broken and break, most likely, their boots.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:29                 ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:29 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hello, Dmitry.

On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> I do not agree that it is actually user-visible change: generally speaking you
> do not really know if device is there or not. They come and go. Like I said,
> consider all permutations, with hot-pluggable buses, deferred probing, etc,

It is for storage devices which always have guaranteed synchronous
probing on module load and well-defined probing order.  Sure, modern
setups are a lot more dynamic but I'm quite certain that there are
setups in the wild which depend on storage driver loading being
synchronous.  We can't simply declare one day that such behavior is
broken and break, most likely, their boots.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:29                 ` Tejun Heo
@ 2014-09-05 22:31                   ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:31 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> It is for storage devices which always have guaranteed synchronous
> probing on module load and well-defined probing order.  Sure, modern
> setups are a lot more dynamic but I'm quite certain that there are
> setups in the wild which depend on storage driver loading being
> synchronous.  We can't simply declare one day that such behavior is
> broken and break, most likely, their boots.

To add a bit, if the argument here is that dependency on such behavior
shouldn't exist and module loading and device probing should always be
asynchronous, the right approach is implementing "synchronous_probing"
flag not the other way around.  I actually wouldn't hate to see that
change happening but whoever submits and routes such a change should
be ready for a major shitstorm, I'm afraid.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:31                   ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:31 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> It is for storage devices which always have guaranteed synchronous
> probing on module load and well-defined probing order.  Sure, modern
> setups are a lot more dynamic but I'm quite certain that there are
> setups in the wild which depend on storage driver loading being
> synchronous.  We can't simply declare one day that such behavior is
> broken and break, most likely, their boots.

To add a bit, if the argument here is that dependency on such behavior
shouldn't exist and module loading and device probing should always be
asynchronous, the right approach is implementing "synchronous_probing"
flag not the other way around.  I actually wouldn't hate to see that
change happening but whoever submits and routes such a change should
be ready for a major shitstorm, I'm afraid.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 18:12               ` Luis R. Rodriguez
  (?)
@ 2014-09-05 22:40                 ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello, Luis.

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.

Yes, make the behavior specifically specified from userland.  When did
I ever say that there should be no solution for the problem?  I've
been saying that the behavior should be selected from userland from
the get-go, haven't I?

I have no idea how the seleciton should be.  It could be per-insmod or
maybe just a system-wide flag with explicit exceptions marked on
drivers is good enough.  I don't know.

> Also what stops drivers from going ahead and just implementing their
> own async probe? Would that now be frowned upon as it strives away

The drivers can't.  How many times should I explain the same thing
over and over again.  libata can't simply make probing asynchronous
w.r.t. module loading no matter how it does it.  Yeah, sure, there can
be other drivers which can do that without most people noticing it but
a storage driver isn't one of them and the storage drivers are the
problematic ones already, right?

> from the original design? The bool would let those drivers do this
> easily, and we would still need to identify these drivers, although
> this particular change can be NAK'd Oleg's suggestion on
> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
> at least needed. And if its not async probe... what do those with
> failed drivers do?

I'm getting tired of explaining the same thing over and over again.
The said change was nacked because the whole approach of "let's see
which drivers get reported on the issue which exists basically for all
drivers and just change the behavior of them" is braindead.  It makes
no sense whatsoever.  It doesn't address the root cause of the problem
while making the same class of drivers behave significantly
differently for no good reason.  Please stop chasing your own tail and
try to understand the larger picture.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:40                 ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hello, Luis.

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.

Yes, make the behavior specifically specified from userland.  When did
I ever say that there should be no solution for the problem?  I've
been saying that the behavior should be selected from userland from
the get-go, haven't I?

I have no idea how the seleciton should be.  It could be per-insmod or
maybe just a system-wide flag with explicit exceptions marked on
drivers is good enough.  I don't know.

> Also what stops drivers from going ahead and just implementing their
> own async probe? Would that now be frowned upon as it strives away

The drivers can't.  How many times should I explain the same thing
over and over again.  libata can't simply make probing asynchronous
w.r.t. module loading no matter how it does it.  Yeah, sure, there can
be other drivers which can do that without most people noticing it but
a storage driver isn't one of them and the storage drivers are the
problematic ones already, right?

> from the original design? The bool would let those drivers do this
> easily, and we would still need to identify these drivers, although
> this particular change can be NAK'd Oleg's suggestion on
> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
> at least needed. And if its not async probe... what do those with
> failed drivers do?

I'm getting tired of explaining the same thing over and over again.
The said change was nacked because the whole approach of "let's see
which drivers get reported on the issue which exists basically for all
drivers and just change the behavior of them" is braindead.  It makes
no sense whatsoever.  It doesn't address the root cause of the problem
while making the same class of drivers behave significantly
differently for no good reason.  Please stop chasing your own tail and
try to understand the larger picture.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:40                 ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hello, Luis.

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.

Yes, make the behavior specifically specified from userland.  When did
I ever say that there should be no solution for the problem?  I've
been saying that the behavior should be selected from userland from
the get-go, haven't I?

I have no idea how the seleciton should be.  It could be per-insmod or
maybe just a system-wide flag with explicit exceptions marked on
drivers is good enough.  I don't know.

> Also what stops drivers from going ahead and just implementing their
> own async probe? Would that now be frowned upon as it strives away

The drivers can't.  How many times should I explain the same thing
over and over again.  libata can't simply make probing asynchronous
w.r.t. module loading no matter how it does it.  Yeah, sure, there can
be other drivers which can do that without most people noticing it but
a storage driver isn't one of them and the storage drivers are the
problematic ones already, right?

> from the original design? The bool would let those drivers do this
> easily, and we would still need to identify these drivers, although
> this particular change can be NAK'd Oleg's suggestion on
> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
> at least needed. And if its not async probe... what do those with
> failed drivers do?

I'm getting tired of explaining the same thing over and over again.
The said change was nacked because the whole approach of "let's see
which drivers get reported on the issue which exists basically for all
drivers and just change the behavior of them" is braindead.  It makes
no sense whatsoever.  It doesn't address the root cause of the problem
while making the same class of drivers behave significantly
differently for no good reason.  Please stop chasing your own tail and
try to understand the larger picture.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:29                 ` Tejun Heo
@ 2014-09-05 22:45                   ` Arjan van de Ven
  -1 siblings, 0 replies; 227+ messages in thread
From: Arjan van de Ven @ 2014-09-05 22:45 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	Kay Sievers, One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On 9/5/2014 3:29 PM, Tejun Heo wrote:
> Hello, Dmitry.
>
> On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
>> I do not agree that it is actually user-visible change: generally speaking you
>> do not really know if device is there or not. They come and go. Like I said,
>> consider all permutations, with hot-pluggable buses, deferred probing, etc,
>
> It is for storage devices which always have guaranteed synchronous
> probing on module load and well-defined probing order.  Sure, modern
> setups are a lot more dynamic but I'm quite certain that there are
> setups in the wild which depend on storage driver loading being
> synchronous.  We can't simply declare one day that such behavior is
> broken and break, most likely, their boots.

we even depend on this in the mount-by-label cases

many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
by label/uuid.
the security there tends to assume that built-in wins over USB


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:45                   ` Arjan van de Ven
  0 siblings, 0 replies; 227+ messages in thread
From: Arjan van de Ven @ 2014-09-05 22:45 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	Kay Sievers, One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, C

On 9/5/2014 3:29 PM, Tejun Heo wrote:
> Hello, Dmitry.
>
> On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
>> I do not agree that it is actually user-visible change: generally speaking you
>> do not really know if device is there or not. They come and go. Like I said,
>> consider all permutations, with hot-pluggable buses, deferred probing, etc,
>
> It is for storage devices which always have guaranteed synchronous
> probing on module load and well-defined probing order.  Sure, modern
> setups are a lot more dynamic but I'm quite certain that there are
> setups in the wild which depend on storage driver loading being
> synchronous.  We can't simply declare one day that such behavior is
> broken and break, most likely, their boots.

we even depend on this in the mount-by-label cases

many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
by label/uuid.
the security there tends to assume that built-in wins over USB

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:31                   ` Tejun Heo
@ 2014-09-05 22:49                     ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 22:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > It is for storage devices which always have guaranteed synchronous
> > probing on module load and well-defined probing order.

Agree about probing order (IIRC that is why we had to revert the
wholesale asynchronous probing a few years back) but totally disagree
about synchronous module loading.

Anyway, I just posted a patch that I think preserves module loading
behavior and solves my issue with built-in modules. It does not help
Luis' issue though (but then I think the main problem is with systemd
being stupid there).

> > Sure, modern
> > setups are a lot more dynamic but I'm quite certain that there are
> > setups in the wild which depend on storage driver loading being
> > synchronous.  We can't simply declare one day that such behavior is
> > broken and break, most likely, their boots.
> 
> To add a bit, if the argument here is that dependency on such behavior
> shouldn't exist and module loading and device probing should always be
> asynchronous, the right approach is implementing "synchronous_probing"
> flag not the other way around.  I actually wouldn't hate to see that
> change happening but whoever submits and routes such a change should
> be ready for a major shitstorm, I'm afraid.

I think we already had this storm and that is why here we have opt-in
behavior for the drivers.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:49                     ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 22:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > It is for storage devices which always have guaranteed synchronous
> > probing on module load and well-defined probing order.

Agree about probing order (IIRC that is why we had to revert the
wholesale asynchronous probing a few years back) but totally disagree
about synchronous module loading.

Anyway, I just posted a patch that I think preserves module loading
behavior and solves my issue with built-in modules. It does not help
Luis' issue though (but then I think the main problem is with systemd
being stupid there).

> > Sure, modern
> > setups are a lot more dynamic but I'm quite certain that there are
> > setups in the wild which depend on storage driver loading being
> > synchronous.  We can't simply declare one day that such behavior is
> > broken and break, most likely, their boots.
> 
> To add a bit, if the argument here is that dependency on such behavior
> shouldn't exist and module loading and device probing should always be
> asynchronous, the right approach is implementing "synchronous_probing"
> flag not the other way around.  I actually wouldn't hate to see that
> change happening but whoever submits and routes such a change should
> be ready for a major shitstorm, I'm afraid.

I think we already had this storm and that is why here we have opt-in
behavior for the drivers.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:45                   ` Arjan van de Ven
  (?)
@ 2014-09-05 22:52                     ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 22:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
> On 9/5/2014 3:29 PM, Tejun Heo wrote:
> >Hello, Dmitry.
> >
> >On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> >>I do not agree that it is actually user-visible change: generally speaking you
> >>do not really know if device is there or not. They come and go. Like I said,
> >>consider all permutations, with hot-pluggable buses, deferred probing, etc,
> >
> >It is for storage devices which always have guaranteed synchronous
> >probing on module load and well-defined probing order.  Sure, modern
> >setups are a lot more dynamic but I'm quite certain that there are
> >setups in the wild which depend on storage driver loading being
> >synchronous.  We can't simply declare one day that such behavior is
> >broken and break, most likely, their boots.
> 
> we even depend on this in the mount-by-label cases
> 
> many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
> it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
> by label/uuid.
> the security there tends to assume that built-in wins over USB

Ahem... and they sure it works reliably with large storage arrays? With
SCSI doing probing asynchronously already?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:52                     ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 22:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
> On 9/5/2014 3:29 PM, Tejun Heo wrote:
> >Hello, Dmitry.
> >
> >On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> >>I do not agree that it is actually user-visible change: generally speaking you
> >>do not really know if device is there or not. They come and go. Like I said,
> >>consider all permutations, with hot-pluggable buses, deferred probing, etc,
> >
> >It is for storage devices which always have guaranteed synchronous
> >probing on module load and well-defined probing order.  Sure, modern
> >setups are a lot more dynamic but I'm quite certain that there are
> >setups in the wild which depend on storage driver loading being
> >synchronous.  We can't simply declare one day that such behavior is
> >broken and break, most likely, their boots.
> 
> we even depend on this in the mount-by-label cases
> 
> many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
> it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
> by label/uuid.
> the security there tends to assume that built-in wins over USB

Ahem... and they sure it works reliably with large storage arrays? With
SCSI doing probing asynchronously already?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:52                     ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 22:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
> On 9/5/2014 3:29 PM, Tejun Heo wrote:
> >Hello, Dmitry.
> >
> >On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> >>I do not agree that it is actually user-visible change: generally speaking you
> >>do not really know if device is there or not. They come and go. Like I said,
> >>consider all permutations, with hot-pluggable buses, deferred probing, etc,
> >
> >It is for storage devices which always have guaranteed synchronous
> >probing on module load and well-defined probing order.  Sure, modern
> >setups are a lot more dynamic but I'm quite certain that there are
> >setups in the wild which depend on storage driver loading being
> >synchronous.  We can't simply declare one day that such behavior is
> >broken and break, most likely, their boots.
> 
> we even depend on this in the mount-by-label cases
> 
> many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
> it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
> by label/uuid.
> the security there tends to assume that built-in wins over USB

Ahem... and they sure it works reliably with large storage arrays? With
SCSI doing probing asynchronously already?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:49                     ` Dmitry Torokhov
@ 2014-09-05 22:55                       ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:55 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello, Dmitry.

On Fri, Sep 05, 2014 at 03:49:17PM -0700, Dmitry Torokhov wrote:
> On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> > On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > > It is for storage devices which always have guaranteed synchronous
> > > probing on module load and well-defined probing order.
> 
> Agree about probing order (IIRC that is why we had to revert the
> wholesale asynchronous probing a few years back) but totally disagree
> about synchronous module loading.

I don't get it.  This is a behavior userland already depends on for
boots.  What's there to agree or disagree?  This is just a fact that
we can't do this w/o disturbing some userlands in a major way.

> Anyway, I just posted a patch that I think preserves module loading
> behavior and solves my issue with built-in modules. It does not help
> Luis' issue though (but then I think the main problem is with systemd
> being stupid there).

This sure can be worked around from userland side too by not imposing
any timeout on module loading but that said for the same reasons that
you've been arguing until now, I actually do think that it's kinda
silly to make device probing synchronous to module loading at this
time and age.  What we disagree on is not that we want to separate
those waits.  It is about how to achieve it.

> > To add a bit, if the argument here is that dependency on such behavior
> > shouldn't exist and module loading and device probing should always be
> > asynchronous, the right approach is implementing "synchronous_probing"
> > flag not the other way around.  I actually wouldn't hate to see that
> > change happening but whoever submits and routes such a change should
> > be ready for a major shitstorm, I'm afraid.
> 
> I think we already had this storm and that is why here we have opt-in
> behavior for the drivers.

It's a different shitstorm where we actively break bootings on some
userlands.  Trust me.  That's gonna be a lot worse.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:55                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:55 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hello, Dmitry.

On Fri, Sep 05, 2014 at 03:49:17PM -0700, Dmitry Torokhov wrote:
> On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> > On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > > It is for storage devices which always have guaranteed synchronous
> > > probing on module load and well-defined probing order.
> 
> Agree about probing order (IIRC that is why we had to revert the
> wholesale asynchronous probing a few years back) but totally disagree
> about synchronous module loading.

I don't get it.  This is a behavior userland already depends on for
boots.  What's there to agree or disagree?  This is just a fact that
we can't do this w/o disturbing some userlands in a major way.

> Anyway, I just posted a patch that I think preserves module loading
> behavior and solves my issue with built-in modules. It does not help
> Luis' issue though (but then I think the main problem is with systemd
> being stupid there).

This sure can be worked around from userland side too by not imposing
any timeout on module loading but that said for the same reasons that
you've been arguing until now, I actually do think that it's kinda
silly to make device probing synchronous to module loading at this
time and age.  What we disagree on is not that we want to separate
those waits.  It is about how to achieve it.

> > To add a bit, if the argument here is that dependency on such behavior
> > shouldn't exist and module loading and device probing should always be
> > asynchronous, the right approach is implementing "synchronous_probing"
> > flag not the other way around.  I actually wouldn't hate to see that
> > change happening but whoever submits and routes such a change should
> > be ready for a major shitstorm, I'm afraid.
> 
> I think we already had this storm and that is why here we have opt-in
> behavior for the drivers.

It's a different shitstorm where we actively break bootings on some
userlands.  Trust me.  That's gonna be a lot worse.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:52                     ` Dmitry Torokhov
@ 2014-09-05 22:57                       ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:57 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Arjan van de Ven, Luis R. Rodriguez, Greg Kroah-Hartman,
	Wu Zhangjin, Takashi Iwai, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello,

On Fri, Sep 05, 2014 at 03:52:48PM -0700, Dmitry Torokhov wrote:
> Ahem... and they sure it works reliably with large storage arrays? With
> SCSI doing probing asynchronously already?

I believe this has been mentioned before too but, yes, SCSI device
probing is asynchronous and parallelized but the registration of the
discovered devices are fully serialized according to driver attach
order.  Storage devices are probed in parallel and attached in a fully
deterministic order.  That part has never changed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 22:57                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 22:57 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Arjan van de Ven, Luis R. Rodriguez, Greg Kroah-Hartman,
	Wu Zhangjin, Takashi Iwai, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hello,

On Fri, Sep 05, 2014 at 03:52:48PM -0700, Dmitry Torokhov wrote:
> Ahem... and they sure it works reliably with large storage arrays? With
> SCSI doing probing asynchronously already?

I believe this has been mentioned before too but, yes, SCSI device
probing is asynchronous and parallelized but the registration of the
discovered devices are fully serialized according to driver attach
order.  Storage devices are probed in parallel and attached in a fully
deterministic order.  That part has never changed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:52                     ` Dmitry Torokhov
  (?)
@ 2014-09-05 23:05                       ` Arjan van de Ven
  -1 siblings, 0 replies; 227+ messages in thread
From: Arjan van de Ven @ 2014-09-05 23:05 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:
> On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
>> On 9/5/2014 3:29 PM, Tejun Heo wrote:
>>> Hello, Dmitry.
>>>
>>> On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
>>>> I do not agree that it is actually user-visible change: generally speaking you
>>>> do not really know if device is there or not. They come and go. Like I said,
>>>> consider all permutations, with hot-pluggable buses, deferred probing, etc,
>>>
>>> It is for storage devices which always have guaranteed synchronous
>>> probing on module load and well-defined probing order.  Sure, modern
>>> setups are a lot more dynamic but I'm quite certain that there are
>>> setups in the wild which depend on storage driver loading being
>>> synchronous.  We can't simply declare one day that such behavior is
>>> broken and break, most likely, their boots.
>>
>> we even depend on this in the mount-by-label cases
>>
>> many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
>> it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
>> by label/uuid.
>> the security there tends to assume that built-in wins over USB
>
> Ahem... and they sure it works reliably with large storage arrays? With
> SCSI doing probing asynchronously already?

you tend to trust your large storage array
you tend to not trust the walk up USB stick.


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 23:05                       ` Arjan van de Ven
  0 siblings, 0 replies; 227+ messages in thread
From: Arjan van de Ven @ 2014-09-05 23:05 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:
> On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
>> On 9/5/2014 3:29 PM, Tejun Heo wrote:
>>> Hello, Dmitry.
>>>
>>> On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
>>>> I do not agree that it is actually user-visible change: generally speaking you
>>>> do not really know if device is there or not. They come and go. Like I said,
>>>> consider all permutations, with hot-pluggable buses, deferred probing, etc,
>>>
>>> It is for storage devices which always have guaranteed synchronous
>>> probing on module load and well-defined probing order.  Sure, modern
>>> setups are a lot more dynamic but I'm quite certain that there are
>>> setups in the wild which depend on storage driver loading being
>>> synchronous.  We can't simply declare one day that such behavior is
>>> broken and break, most likely, their boots.
>>
>> we even depend on this in the mount-by-label cases
>>
>> many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
>> it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
>> by label/uuid.
>> the security there tends to assume that built-in wins over USB
>
> Ahem... and they sure it works reliably with large storage arrays? With
> SCSI doing probing asynchronously already?

you tend to trust your large storage array
you tend to not trust the walk up USB stick.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 23:05                       ` Arjan van de Ven
  0 siblings, 0 replies; 227+ messages in thread
From: Arjan van de Ven @ 2014-09-05 23:05 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:
> On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
>> On 9/5/2014 3:29 PM, Tejun Heo wrote:
>>> Hello, Dmitry.
>>>
>>> On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
>>>> I do not agree that it is actually user-visible change: generally speaking you
>>>> do not really know if device is there or not. They come and go. Like I said,
>>>> consider all permutations, with hot-pluggable buses, deferred probing, etc,
>>>
>>> It is for storage devices which always have guaranteed synchronous
>>> probing on module load and well-defined probing order.  Sure, modern
>>> setups are a lot more dynamic but I'm quite certain that there are
>>> setups in the wild which depend on storage driver loading being
>>> synchronous.  We can't simply declare one day that such behavior is
>>> broken and break, most likely, their boots.
>>
>> we even depend on this in the mount-by-label cases
>>
>> many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
>> it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
>> by label/uuid.
>> the security there tends to assume that built-in wins over USB
>
> Ahem... and they sure it works reliably with large storage arrays? With
> SCSI doing probing asynchronously already?

you tend to trust your large storage array
you tend to not trust the walk up USB stick.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 23:05                       ` Arjan van de Ven
  (?)
@ 2014-09-05 23:18                         ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 23:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 05, 2014 at 04:05:30PM -0700, Arjan van de Ven wrote:
> On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:
> >On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
> >>On 9/5/2014 3:29 PM, Tejun Heo wrote:
> >>>Hello, Dmitry.
> >>>
> >>>On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> >>>>I do not agree that it is actually user-visible change: generally speaking you
> >>>>do not really know if device is there or not. They come and go. Like I said,
> >>>>consider all permutations, with hot-pluggable buses, deferred probing, etc,
> >>>
> >>>It is for storage devices which always have guaranteed synchronous
> >>>probing on module load and well-defined probing order.  Sure, modern
> >>>setups are a lot more dynamic but I'm quite certain that there are
> >>>setups in the wild which depend on storage driver loading being
> >>>synchronous.  We can't simply declare one day that such behavior is
> >>>broken and break, most likely, their boots.
> >>
> >>we even depend on this in the mount-by-label cases
> >>
> >>many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
> >>it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
> >>by label/uuid.
> >>the security there tends to assume that built-in wins over USB
> >
> >Ahem... and they sure it works reliably with large storage arrays? With
> >SCSI doing probing asynchronously already?
> 
> you tend to trust your large storage array
> you tend to not trust the walk up USB stick.

If you allow physical access it does not matter really.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 23:18                         ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 23:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 05, 2014 at 04:05:30PM -0700, Arjan van de Ven wrote:
> On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:
> >On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
> >>On 9/5/2014 3:29 PM, Tejun Heo wrote:
> >>>Hello, Dmitry.
> >>>
> >>>On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> >>>>I do not agree that it is actually user-visible change: generally speaking you
> >>>>do not really know if device is there or not. They come and go. Like I said,
> >>>>consider all permutations, with hot-pluggable buses, deferred probing, etc,
> >>>
> >>>It is for storage devices which always have guaranteed synchronous
> >>>probing on module load and well-defined probing order.  Sure, modern
> >>>setups are a lot more dynamic but I'm quite certain that there are
> >>>setups in the wild which depend on storage driver loading being
> >>>synchronous.  We can't simply declare one day that such behavior is
> >>>broken and break, most likely, their boots.
> >>
> >>we even depend on this in the mount-by-label cases
> >>
> >>many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
> >>it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
> >>by label/uuid.
> >>the security there tends to assume that built-in wins over USB
> >
> >Ahem... and they sure it works reliably with large storage arrays? With
> >SCSI doing probing asynchronously already?
> 
> you tend to trust your large storage array
> you tend to not trust the walk up USB stick.

If you allow physical access it does not matter really.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 23:18                         ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 23:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tejun Heo, Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, linux-kernel, Oleg Nesterov, hare, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 05, 2014 at 04:05:30PM -0700, Arjan van de Ven wrote:
> On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:
> >On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
> >>On 9/5/2014 3:29 PM, Tejun Heo wrote:
> >>>Hello, Dmitry.
> >>>
> >>>On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> >>>>I do not agree that it is actually user-visible change: generally speaking you
> >>>>do not really know if device is there or not. They come and go. Like I said,
> >>>>consider all permutations, with hot-pluggable buses, deferred probing, etc,
> >>>
> >>>It is for storage devices which always have guaranteed synchronous
> >>>probing on module load and well-defined probing order.  Sure, modern
> >>>setups are a lot more dynamic but I'm quite certain that there are
> >>>setups in the wild which depend on storage driver loading being
> >>>synchronous.  We can't simply declare one day that such behavior is
> >>>broken and break, most likely, their boots.
> >>
> >>we even depend on this in the mount-by-label cases
> >>
> >>many setups assume that the internal storage prevails over the USB stick in the case of conflicts.
> >>it's a security issue; you don't want the built in secure bootloader that has a kernel root argument
> >>by label/uuid.
> >>the security there tends to assume that built-in wins over USB
> >
> >Ahem... and they sure it works reliably with large storage arrays? With
> >SCSI doing probing asynchronously already?
> 
> you tend to trust your large storage array
> you tend to not trust the walk up USB stick.

If you allow physical access it does not matter really.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:55                       ` Tejun Heo
@ 2014-09-05 23:22                         ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 23:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hi Tejun,

On Sat, Sep 06, 2014 at 07:55:33AM +0900, Tejun Heo wrote:
> Hello, Dmitry.
> 
> On Fri, Sep 05, 2014 at 03:49:17PM -0700, Dmitry Torokhov wrote:
> > On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> > > On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > > > It is for storage devices which always have guaranteed synchronous
> > > > probing on module load and well-defined probing order.
> > 
> > Agree about probing order (IIRC that is why we had to revert the
> > wholesale asynchronous probing a few years back) but totally disagree
> > about synchronous module loading.
> 
> I don't get it.  This is a behavior userland already depends on for
> boots.  What's there to agree or disagree?  This is just a fact that
> we can't do this w/o disturbing some userlands in a major way.

I am just expressing my disbelief that somebody relies on module loading
being synchronous with probing. Out of curiosity, do you have any
pointers?

> 
> > Anyway, I just posted a patch that I think preserves module loading
> > behavior and solves my issue with built-in modules. It does not help
> > Luis' issue though (but then I think the main problem is with systemd
> > being stupid there).
> 
> This sure can be worked around from userland side too by not imposing
> any timeout on module loading but that said for the same reasons that
> you've been arguing until now, I actually do think that it's kinda
> silly to make device probing synchronous to module loading at this
> time and age.  What we disagree on is not that we want to separate
> those waits.  It is about how to achieve it.

Well, there are separate things we want to solve. My main issue is not
with modules, but rather compiled-in drivers that stall kernel boot,
and these particular drivers are just fine if they are probed out of
bound.

> 
> > > To add a bit, if the argument here is that dependency on such behavior
> > > shouldn't exist and module loading and device probing should always be
> > > asynchronous, the right approach is implementing "synchronous_probing"
> > > flag not the other way around.  I actually wouldn't hate to see that
> > > change happening but whoever submits and routes such a change should
> > > be ready for a major shitstorm, I'm afraid.
> > 
> > I think we already had this storm and that is why here we have opt-in
> > behavior for the drivers.
> 
> It's a different shitstorm where we actively break bootings on some
> userlands.  Trust me.  That's gonna be a lot worse.

That did break bootings and that's why we reverted the wholesale async
probing.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 23:22                         ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-05 23:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hi Tejun,

On Sat, Sep 06, 2014 at 07:55:33AM +0900, Tejun Heo wrote:
> Hello, Dmitry.
> 
> On Fri, Sep 05, 2014 at 03:49:17PM -0700, Dmitry Torokhov wrote:
> > On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> > > On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > > > It is for storage devices which always have guaranteed synchronous
> > > > probing on module load and well-defined probing order.
> > 
> > Agree about probing order (IIRC that is why we had to revert the
> > wholesale asynchronous probing a few years back) but totally disagree
> > about synchronous module loading.
> 
> I don't get it.  This is a behavior userland already depends on for
> boots.  What's there to agree or disagree?  This is just a fact that
> we can't do this w/o disturbing some userlands in a major way.

I am just expressing my disbelief that somebody relies on module loading
being synchronous with probing. Out of curiosity, do you have any
pointers?

> 
> > Anyway, I just posted a patch that I think preserves module loading
> > behavior and solves my issue with built-in modules. It does not help
> > Luis' issue though (but then I think the main problem is with systemd
> > being stupid there).
> 
> This sure can be worked around from userland side too by not imposing
> any timeout on module loading but that said for the same reasons that
> you've been arguing until now, I actually do think that it's kinda
> silly to make device probing synchronous to module loading at this
> time and age.  What we disagree on is not that we want to separate
> those waits.  It is about how to achieve it.

Well, there are separate things we want to solve. My main issue is not
with modules, but rather compiled-in drivers that stall kernel boot,
and these particular drivers are just fine if they are probed out of
bound.

> 
> > > To add a bit, if the argument here is that dependency on such behavior
> > > shouldn't exist and module loading and device probing should always be
> > > asynchronous, the right approach is implementing "synchronous_probing"
> > > flag not the other way around.  I actually wouldn't hate to see that
> > > change happening but whoever submits and routes such a change should
> > > be ready for a major shitstorm, I'm afraid.
> > 
> > I think we already had this storm and that is why here we have opt-in
> > behavior for the drivers.
> 
> It's a different shitstorm where we actively break bootings on some
> userlands.  Trust me.  That's gonna be a lot worse.

That did break bootings and that's why we reverted the wholesale async
probing.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 23:22                         ` Dmitry Torokhov
@ 2014-09-05 23:32                           ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 23:32 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hey,

On Fri, Sep 05, 2014 at 04:22:42PM -0700, Dmitry Torokhov wrote:
> > I don't get it.  This is a behavior userland already depends on for
> > boots.  What's there to agree or disagree?  This is just a fact that
> > we can't do this w/o disturbing some userlands in a major way.
> 
> I am just expressing my disbelief that somebody relies on module loading
> being synchronous with probing. Out of curiosity, do you have any
> pointers?

I've seen initrd scripts which depended on the behavior to wait for
storage devices over the years.  AFAIK, none of the modern distros
does it but this has been such a basic feature all along and it seems
highly unlikely to me that there's no userland remaining out there
depending on such behavior.  We do have a lot of different userlands,
many of them quite ad-hoc.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-05 23:32                           ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-05 23:32 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Luis R. Rodriguez, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit

Hey,

On Fri, Sep 05, 2014 at 04:22:42PM -0700, Dmitry Torokhov wrote:
> > I don't get it.  This is a behavior userland already depends on for
> > boots.  What's there to agree or disagree?  This is just a fact that
> > we can't do this w/o disturbing some userlands in a major way.
> 
> I am just expressing my disbelief that somebody relies on module loading
> being synchronous with probing. Out of curiosity, do you have any
> pointers?

I've seen initrd scripts which depended on the behavior to wait for
storage devices over the years.  AFAIK, none of the modern distros
does it but this has been such a basic feature all along and it seems
highly unlikely to me that there's no userland remaining out there
depending on such behavior.  We do have a lot of different userlands,
many of them quite ad-hoc.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-05 22:40                 ` Tejun Heo
  (?)
@ 2014-09-09  1:04                   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:04 UTC (permalink / raw)
  To: Tejun Heo, Lennart Poettering, Kay Sievers
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Fri, Sep 5, 2014 at 3:40 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Luis.
>
> On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
>> Meanwhile we are allowing a major design consideration such as a 30
>> second timeout for both init + probe all of a sudden become a hard
>> requirement for device drivers. I see your point but can't also be
>> introducing major design changes willy nilly either. We *need* a
>> solution for the affected drivers.
>
> Yes, make the behavior specifically specified from userland.  When did
> I ever say that there should be no solution for the problem?  I've
> been saying that the behavior should be selected from userland from
> the get-go, haven't I?
>
> I have no idea how the selection should be.  It could be per-insmod or
> maybe just a system-wide flag with explicit exceptions marked on
> drivers is good enough.  I don't know.

Its perfectly understandable if we don't know what path to take yet
and its also understandable for it to take time to figure out --
meanwhile though systemd already has merged a policy of a 30 second
timeout for *all drivers* though so we therefore need:

0) a solutions for affected combination of systemd / drivers
1) an agreed path forward

If we want a tight integration between both kernel / init system we
need to be able to communicate effectively folks and I'm afraid this
isn't happening. I last noted on systemd-devel how the 30 second
timeout issue was merged under incorrect assumptions -- that it was
not just init that at times caused delays, and that since we currently
batch both init and probe on the driver core we need a non fatal
userspace solution [0], while we work on design on the kernel side of
things for async'ing for drivers that make sense. A proper kernel
solution may take longer than expected, we can't just assume a
probe_async flag will suffice on drivers, in fact as Tejun notes, its
wrong since historically we have had some random userland depend on
the synhronous behaviour of module loading of some drivers, and that
*could* have taken a while.

Kay, Lennart, any recommendations ?

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/022696.html

>> Also what stops drivers from going ahead and just implementing their
>> own async probe? Would that now be frowned upon as it strives away
>
> The drivers can't.  How many times should I explain the same thing
> over and over again.  libata can't simply make probing asynchronous
> w.r.t. module loading no matter how it does it.  Yeah, sure, there can
> be other drivers which can do that without most people noticing it but
> a storage driver isn't one of them and the storage drivers are the
> problematic ones already, right?

Its one of the subsystems that has suffered from this, but not the only one.

>> from the original design? The bool would let those drivers do this
>> easily, and we would still need to identify these drivers, although
>> this particular change can be NAK'd Oleg's suggestion on
>> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
>> at least needed. And if its not async probe... what do those with
>> failed drivers do?
>
> I'm getting tired of explaining the same thing over and over again.
> The said change was nacked because the whole approach of "let's see
> which drivers get reported on the issue which exists basically for all
> drivers and just change the behavior of them" is braindead.  It makes
> no sense whatsoever.  It doesn't address the root cause of the problem
> while making the same class of drivers behave significantly
> differently for no good reason.  Please stop chasing your own tail and
> try to understand the larger picture.

Understood.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:04                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:04 UTC (permalink / raw)
  To: Tejun Heo, Lennart Poettering, Kay Sievers
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 5, 2014 at 3:40 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Luis.
>
> On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
>> Meanwhile we are allowing a major design consideration such as a 30
>> second timeout for both init + probe all of a sudden become a hard
>> requirement for device drivers. I see your point but can't also be
>> introducing major design changes willy nilly either. We *need* a
>> solution for the affected drivers.
>
> Yes, make the behavior specifically specified from userland.  When did
> I ever say that there should be no solution for the problem?  I've
> been saying that the behavior should be selected from userland from
> the get-go, haven't I?
>
> I have no idea how the selection should be.  It could be per-insmod or
> maybe just a system-wide flag with explicit exceptions marked on
> drivers is good enough.  I don't know.

Its perfectly understandable if we don't know what path to take yet
and its also understandable for it to take time to figure out --
meanwhile though systemd already has merged a policy of a 30 second
timeout for *all drivers* though so we therefore need:

0) a solutions for affected combination of systemd / drivers
1) an agreed path forward

If we want a tight integration between both kernel / init system we
need to be able to communicate effectively folks and I'm afraid this
isn't happening. I last noted on systemd-devel how the 30 second
timeout issue was merged under incorrect assumptions -- that it was
not just init that at times caused delays, and that since we currently
batch both init and probe on the driver core we need a non fatal
userspace solution [0], while we work on design on the kernel side of
things for async'ing for drivers that make sense. A proper kernel
solution may take longer than expected, we can't just assume a
probe_async flag will suffice on drivers, in fact as Tejun notes, its
wrong since historically we have had some random userland depend on
the synhronous behaviour of module loading of some drivers, and that
*could* have taken a while.

Kay, Lennart, any recommendations ?

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/022696.html

>> Also what stops drivers from going ahead and just implementing their
>> own async probe? Would that now be frowned upon as it strives away
>
> The drivers can't.  How many times should I explain the same thing
> over and over again.  libata can't simply make probing asynchronous
> w.r.t. module loading no matter how it does it.  Yeah, sure, there can
> be other drivers which can do that without most people noticing it but
> a storage driver isn't one of them and the storage drivers are the
> problematic ones already, right?

Its one of the subsystems that has suffered from this, but not the only one.

>> from the original design? The bool would let those drivers do this
>> easily, and we would still need to identify these drivers, although
>> this particular change can be NAK'd Oleg's suggestion on
>> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
>> at least needed. And if its not async probe... what do those with
>> failed drivers do?
>
> I'm getting tired of explaining the same thing over and over again.
> The said change was nacked because the whole approach of "let's see
> which drivers get reported on the issue which exists basically for all
> drivers and just change the behavior of them" is braindead.  It makes
> no sense whatsoever.  It doesn't address the root cause of the problem
> while making the same class of drivers behave significantly
> differently for no good reason.  Please stop chasing your own tail and
> try to understand the larger picture.

Understood.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:04                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:04 UTC (permalink / raw)
  To: Tejun Heo, Lennart Poettering, Kay Sievers
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

On Fri, Sep 5, 2014 at 3:40 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Luis.
>
> On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
>> Meanwhile we are allowing a major design consideration such as a 30
>> second timeout for both init + probe all of a sudden become a hard
>> requirement for device drivers. I see your point but can't also be
>> introducing major design changes willy nilly either. We *need* a
>> solution for the affected drivers.
>
> Yes, make the behavior specifically specified from userland.  When did
> I ever say that there should be no solution for the problem?  I've
> been saying that the behavior should be selected from userland from
> the get-go, haven't I?
>
> I have no idea how the selection should be.  It could be per-insmod or
> maybe just a system-wide flag with explicit exceptions marked on
> drivers is good enough.  I don't know.

Its perfectly understandable if we don't know what path to take yet
and its also understandable for it to take time to figure out --
meanwhile though systemd already has merged a policy of a 30 second
timeout for *all drivers* though so we therefore need:

0) a solutions for affected combination of systemd / drivers
1) an agreed path forward

If we want a tight integration between both kernel / init system we
need to be able to communicate effectively folks and I'm afraid this
isn't happening. I last noted on systemd-devel how the 30 second
timeout issue was merged under incorrect assumptions -- that it was
not just init that at times caused delays, and that since we currently
batch both init and probe on the driver core we need a non fatal
userspace solution [0], while we work on design on the kernel side of
things for async'ing for drivers that make sense. A proper kernel
solution may take longer than expected, we can't just assume a
probe_async flag will suffice on drivers, in fact as Tejun notes, its
wrong since historically we have had some random userland depend on
the synhronous behaviour of module loading of some drivers, and that
*could* have taken a while.

Kay, Lennart, any recommendations ?

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/022696.html

>> Also what stops drivers from going ahead and just implementing their
>> own async probe? Would that now be frowned upon as it strives away
>
> The drivers can't.  How many times should I explain the same thing
> over and over again.  libata can't simply make probing asynchronous
> w.r.t. module loading no matter how it does it.  Yeah, sure, there can
> be other drivers which can do that without most people noticing it but
> a storage driver isn't one of them and the storage drivers are the
> problematic ones already, right?

Its one of the subsystems that has suffered from this, but not the only one.

>> from the original design? The bool would let those drivers do this
>> easily, and we would still need to identify these drivers, although
>> this particular change can be NAK'd Oleg's suggestion on
>> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
>> at least needed. And if its not async probe... what do those with
>> failed drivers do?
>
> I'm getting tired of explaining the same thing over and over again.
> The said change was nacked because the whole approach of "let's see
> which drivers get reported on the issue which exists basically for all
> drivers and just change the behavior of them" is braindead.  It makes
> no sense whatsoever.  It doesn't address the root cause of the problem
> while making the same class of drivers behave significantly
> differently for no good reason.  Please stop chasing your own tail and
> try to understand the larger picture.

Understood.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:04                   ` Luis R. Rodriguez
  (?)
@ 2014-09-09  1:10                     ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

Hello, Luis.

On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > I have no idea how the selection should be.  It could be per-insmod or
> > maybe just a system-wide flag with explicit exceptions marked on
> > drivers is good enough.  I don't know.
> 
> Its perfectly understandable if we don't know what path to take yet
> and its also understandable for it to take time to figure out --
> meanwhile though systemd already has merged a policy of a 30 second
> timeout for *all drivers* though so we therefore need:

I'm not too convinced this is such a difficult problem to figure out.
We already have most of logic in place and the only thing missing is
how to switch it.  Wouldn't something like the following work?

* Add a sysctl knob to enable asynchronous device probing on module
  load and enable asynchronous probing globally if the knob is set.

* Identify cases which can't be asynchronous and make them
  synchronous.  e.g. keep who's doing request_module() and avoid
  asynchronous probing if current is probing one of those.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:10                     ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello, Luis.

On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > I have no idea how the selection should be.  It could be per-insmod or
> > maybe just a system-wide flag with explicit exceptions marked on
> > drivers is good enough.  I don't know.
> 
> Its perfectly understandable if we don't know what path to take yet
> and its also understandable for it to take time to figure out --
> meanwhile though systemd already has merged a policy of a 30 second
> timeout for *all drivers* though so we therefore need:

I'm not too convinced this is such a difficult problem to figure out.
We already have most of logic in place and the only thing missing is
how to switch it.  Wouldn't something like the following work?

* Add a sysctl knob to enable asynchronous device probing on module
  load and enable asynchronous probing globally if the knob is set.

* Identify cases which can't be asynchronous and make them
  synchronous.  e.g. keep who's doing request_module() and avoid
  asynchronous probing if current is probing one of those.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:10                     ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello, Luis.

On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > I have no idea how the selection should be.  It could be per-insmod or
> > maybe just a system-wide flag with explicit exceptions marked on
> > drivers is good enough.  I don't know.
> 
> Its perfectly understandable if we don't know what path to take yet
> and its also understandable for it to take time to figure out --
> meanwhile though systemd already has merged a policy of a 30 second
> timeout for *all drivers* though so we therefore need:

I'm not too convinced this is such a difficult problem to figure out.
We already have most of logic in place and the only thing missing is
how to switch it.  Wouldn't something like the following work?

* Add a sysctl knob to enable asynchronous device probing on module
  load and enable asynchronous probing globally if the knob is set.

* Identify cases which can't be asynchronous and make them
  synchronous.  e.g. keep who's doing request_module() and avoid
  asynchronous probing if current is probing one of those.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:10                     ` Tejun Heo
  (?)
@ 2014-09-09  1:13                       ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

That wouldn't work as we don't know what's gonna happen in userland
but we can start with just disallowing async probing for char devices
for now.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:13                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

That wouldn't work as we don't know what's gonna happen in userland
but we can start with just disallowing async probing for char devices
for now.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:13                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

That wouldn't work as we don't know what's gonna happen in userland
but we can start with just disallowing async probing for char devices
for now.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:10                     ` Tejun Heo
  (?)
@ 2014-09-09  1:22                       ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.

Alternatively, add a module-generic param "async_probe" or whatever
and use that to switch the behavior should work too.  I don't know
which way is better but either should work fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:22                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.

Alternatively, add a module-generic param "async_probe" or whatever
and use that to switch the behavior should work too.  I don't know
which way is better but either should work fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:22                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.

Alternatively, add a module-generic param "async_probe" or whatever
and use that to switch the behavior should work too.  I don't know
which way is better but either should work fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:22                       ` Tejun Heo
  (?)
@ 2014-09-09  1:26                         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
>> I'm not too convinced this is such a difficult problem to figure out.
>> We already have most of logic in place and the only thing missing is
>> how to switch it.  Wouldn't something like the following work?
>>
>> * Add a sysctl knob to enable asynchronous device probing on module
>>   load and enable asynchronous probing globally if the knob is set.
>
> Alternatively, add a module-generic param "async_probe" or whatever
> and use that to switch the behavior should work too.  I don't know
> which way is better but either should work fine.

I take it by this you meant a generic system-wide sysctl or kernel cmd
line option to enable this for al drivers?

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:26                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
>> I'm not too convinced this is such a difficult problem to figure out.
>> We already have most of logic in place and the only thing missing is
>> how to switch it.  Wouldn't something like the following work?
>>
>> * Add a sysctl knob to enable asynchronous device probing on module
>>   load and enable asynchronous probing globally if the knob is set.
>
> Alternatively, add a module-generic param "async_probe" or whatever
> and use that to switch the behavior should work too.  I don't know
> which way is better but either should work fine.

I take it by this you meant a generic system-wide sysctl or kernel cmd
line option to enable this for al drivers?

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:26                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
>> I'm not too convinced this is such a difficult problem to figure out.
>> We already have most of logic in place and the only thing missing is
>> how to switch it.  Wouldn't something like the following work?
>>
>> * Add a sysctl knob to enable asynchronous device probing on module
>>   load and enable asynchronous probing globally if the knob is set.
>
> Alternatively, add a module-generic param "async_probe" or whatever
> and use that to switch the behavior should work too.  I don't know
> which way is better but either should work fine.

I take it by this you meant a generic system-wide sysctl or kernel cmd
line option to enable this for al drivers?

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:26                         ` Luis R. Rodriguez
  (?)
@ 2014-09-09  1:29                           ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
> > Alternatively, add a module-generic param "async_probe" or whatever
> > and use that to switch the behavior should work too.  I don't know
> > which way is better but either should work fine.
> 
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

Well, either global or per-insmod switch should work.  There probably
are details that I haven't mentioned - e.g. probably global switch is
easier to backport and deploy to existing systems - but as long as it
works I don't have fundmental objections either way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:29                           ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
> > Alternatively, add a module-generic param "async_probe" or whatever
> > and use that to switch the behavior should work too.  I don't know
> > which way is better but either should work fine.
> 
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

Well, either global or per-insmod switch should work.  There probably
are details that I haven't mentioned - e.g. probably global switch is
easier to backport and deploy to existing systems - but as long as it
works I don't have fundmental objections either way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:29                           ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
> > Alternatively, add a module-generic param "async_probe" or whatever
> > and use that to switch the behavior should work too.  I don't know
> > which way is better but either should work fine.
> 
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

Well, either global or per-insmod switch should work.  There probably
are details that I haven't mentioned - e.g. probably global switch is
easier to backport and deploy to existing systems - but as long as it
works I don't have fundmental objections either way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:29                           ` Tejun Heo
  (?)
@ 2014-09-09  1:38                             ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 8, 2014 at 6:29 PM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
>> > Alternatively, add a module-generic param "async_probe" or whatever
>> > and use that to switch the behavior should work too.  I don't know
>> > which way is better but either should work fine.
>>
>> I take it by this you meant a generic system-wide sysctl or kernel cmd
>> line option to enable this for al drivers?
>
> Well, either global or per-insmod switch should work.  There probably
> are details that I haven't mentioned - e.g. probably global switch is
> easier to backport and deploy to existing systems

Yes a global sysctl solution might make it easier to backport.

> - but as long as it
> works I don't have fundmental objections either way.

OK then one only concern I would have with this is that the presence
of such a flag doesn't necessarily mean that all drivers on a system
have been tested for asynch probe yet. I'd feel much more comfortable
if this global flag allowed say specific drivers that *did* have such
a bool enabled, for example. Then that would enable synchronous
behaviour for the kernel by default, require the flag for enabling the
new async feature but only for drivers that have been tested.

That also still would not technically solve the issue of the current
existence of the timeout, unless of course we wish to ask systemd to
only make the timeout take effect *iff* the global sysctl flag /
whatever was enabled.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:38                             ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 6:29 PM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
>> > Alternatively, add a module-generic param "async_probe" or whatever
>> > and use that to switch the behavior should work too.  I don't know
>> > which way is better but either should work fine.
>>
>> I take it by this you meant a generic system-wide sysctl or kernel cmd
>> line option to enable this for al drivers?
>
> Well, either global or per-insmod switch should work.  There probably
> are details that I haven't mentioned - e.g. probably global switch is
> easier to backport and deploy to existing systems

Yes a global sysctl solution might make it easier to backport.

> - but as long as it
> works I don't have fundmental objections either way.

OK then one only concern I would have with this is that the presence
of such a flag doesn't necessarily mean that all drivers on a system
have been tested for asynch probe yet. I'd feel much more comfortable
if this global flag allowed say specific drivers that *did* have such
a bool enabled, for example. Then that would enable synchronous
behaviour for the kernel by default, require the flag for enabling the
new async feature but only for drivers that have been tested.

That also still would not technically solve the issue of the current
existence of the timeout, unless of course we wish to ask systemd to
only make the timeout take effect *iff* the global sysctl flag /
whatever was enabled.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:38                             ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  1:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 6:29 PM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
>> > Alternatively, add a module-generic param "async_probe" or whatever
>> > and use that to switch the behavior should work too.  I don't know
>> > which way is better but either should work fine.
>>
>> I take it by this you meant a generic system-wide sysctl or kernel cmd
>> line option to enable this for al drivers?
>
> Well, either global or per-insmod switch should work.  There probably
> are details that I haven't mentioned - e.g. probably global switch is
> easier to backport and deploy to existing systems

Yes a global sysctl solution might make it easier to backport.

> - but as long as it
> works I don't have fundmental objections either way.

OK then one only concern I would have with this is that the presence
of such a flag doesn't necessarily mean that all drivers on a system
have been tested for asynch probe yet. I'd feel much more comfortable
if this global flag allowed say specific drivers that *did* have such
a bool enabled, for example. Then that would enable synchronous
behaviour for the kernel by default, require the flag for enabling the
new async feature but only for drivers that have been tested.

That also still would not technically solve the issue of the current
existence of the timeout, unless of course we wish to ask systemd to
only make the timeout take effect *iff* the global sysctl flag /
whatever was enabled.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:38                             ` Luis R. Rodriguez
  (?)
@ 2014-09-09  1:47                               ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:47 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

Hello,

On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
> OK then one only concern I would have with this is that the presence
> of such a flag doesn't necessarily mean that all drivers on a system
> have been tested for asynch probe yet. I'd feel much more comfortable

Given that the behvaior change is from driver core and that device
probing can happen post-loading anyway, I don't think we need to worry
about drivers breaking from probing made asynchronous to loading.  The
problem is the expectation of the entity which initiated loading of
the module.  If it's depending on device being probed synchronously
but insmod returns before that, it can break things.  We probably
should audit request_module() users and see which ones expect such
behavior.

> if this global flag allowed say specific drivers that *did* have such
> a bool enabled, for example. Then that would enable synchronous
> behaviour for the kernel by default, require the flag for enabling the
> new async feature but only for drivers that have been tested.

If we're gonna do the global switch, I personally think the right
approach is blacklisting instead of the other way around because each
specific driver doesn't really have much to do with it and the
exceptions are about specific use cases that we don't have a good way
to identify them from module loading path.

> That also still would not technically solve the issue of the current
> existence of the timeout, unless of course we wish to ask systemd to
> only make the timeout take effect *iff* the global sysctl flag /
> whatever was enabled.

Userland could backport a fix to set the sysctl.  Given that we need
both synchrnous and asynchronous behaviors, it's unlikely that we can
come up with a solution which doesn't need cooperation from userland.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:47                               ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:47 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello,

On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
> OK then one only concern I would have with this is that the presence
> of such a flag doesn't necessarily mean that all drivers on a system
> have been tested for asynch probe yet. I'd feel much more comfortable

Given that the behvaior change is from driver core and that device
probing can happen post-loading anyway, I don't think we need to worry
about drivers breaking from probing made asynchronous to loading.  The
problem is the expectation of the entity which initiated loading of
the module.  If it's depending on device being probed synchronously
but insmod returns before that, it can break things.  We probably
should audit request_module() users and see which ones expect such
behavior.

> if this global flag allowed say specific drivers that *did* have such
> a bool enabled, for example. Then that would enable synchronous
> behaviour for the kernel by default, require the flag for enabling the
> new async feature but only for drivers that have been tested.

If we're gonna do the global switch, I personally think the right
approach is blacklisting instead of the other way around because each
specific driver doesn't really have much to do with it and the
exceptions are about specific use cases that we don't have a good way
to identify them from module loading path.

> That also still would not technically solve the issue of the current
> existence of the timeout, unless of course we wish to ask systemd to
> only make the timeout take effect *iff* the global sysctl flag /
> whatever was enabled.

Userland could backport a fix to set the sysctl.  Given that we need
both synchrnous and asynchronous behaviors, it's unlikely that we can
come up with a solution which doesn't need cooperation from userland.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  1:47                               ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  1:47 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello,

On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
> OK then one only concern I would have with this is that the presence
> of such a flag doesn't necessarily mean that all drivers on a system
> have been tested for asynch probe yet. I'd feel much more comfortable

Given that the behvaior change is from driver core and that device
probing can happen post-loading anyway, I don't think we need to worry
about drivers breaking from probing made asynchronous to loading.  The
problem is the expectation of the entity which initiated loading of
the module.  If it's depending on device being probed synchronously
but insmod returns before that, it can break things.  We probably
should audit request_module() users and see which ones expect such
behavior.

> if this global flag allowed say specific drivers that *did* have such
> a bool enabled, for example. Then that would enable synchronous
> behaviour for the kernel by default, require the flag for enabling the
> new async feature but only for drivers that have been tested.

If we're gonna do the global switch, I personally think the right
approach is blacklisting instead of the other way around because each
specific driver doesn't really have much to do with it and the
exceptions are about specific use cases that we don't have a good way
to identify them from module loading path.

> That also still would not technically solve the issue of the current
> existence of the timeout, unless of course we wish to ask systemd to
> only make the timeout take effect *iff* the global sysctl flag /
> whatever was enabled.

Userland could backport a fix to set the sysctl.  Given that we need
both synchrnous and asynchronous behaviors, it's unlikely that we can
come up with a solution which doesn't need cooperation from userland.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:47                               ` Tejun Heo
  (?)
@ 2014-09-09  2:28                                 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 8, 2014 at 6:47 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
>> OK then one only concern I would have with this is that the presence
>> of such a flag doesn't necessarily mean that all drivers on a system
>> have been tested for asynch probe yet. I'd feel much more comfortable
>
> Given that the behvaior change is from driver core and that device
> probing can happen post-loading anyway,

Ah but lets not forget Dmitry's requirement which is for in-kernel
drivers. We'd need to deal with both built-in and modules. Dmitry's
case is completely orthogonal to the systemd issue and is just needed
to help not stall boot but I see no reason to blend these two issues
into one requirement together.

> I don't think we need to worry
> about drivers breaking from probing made asynchronous to loading.  The
> problem is the expectation of the entity which initiated loading of
> the module.  If it's depending on device being probed synchronously
> but insmod returns before that, it can break things.  We probably
> should audit request_module() users and see which ones expect such
> behavior.

Sure. Based on a quick glance I see sloppy uses of this, this should
probably be fixed anyway.

>> if this global flag allowed say specific drivers that *did* have such
>> a bool enabled, for example. Then that would enable synchronous
>> behaviour for the kernel by default, require the flag for enabling the
>> new async feature but only for drivers that have been tested.
>
> If we're gonna do the global switch, I personally think the right
> approach is blacklisting instead of the other way around because each
> specific driver doesn't really have much to do with it and the
> exceptions are about specific use cases that we don't have a good way
> to identify them from module loading path.

OK sure... even if we did whitelist I'm afraid such a white list might
be subjective in terms of design to specific systems anyway... I
suppose the only real way to do it right is to push and strive towards
a full system whitelist and address the black list as you mention.

In terms of approach we would still need to decide on a path for how
to do asynch probing for both in-kernel drivers and modules, do we
want async_schedule(), or queue_work()? If async_schedule() do we want
to use a new domain or a new one shared for all drivers? Priority on
the schedular was one of my other concerns which we'd need to make
right to match existing load on drivers through finit_module() and
synchronous probe.

>> That also still would not technically solve the issue of the current
>> existence of the timeout, unless of course we wish to ask systemd to
>> only make the timeout take effect *iff* the global sysctl flag /
>> whatever was enabled.
>
> Userland could backport a fix to set the sysctl.  Given that we need
> both synchrnous and asynchronous behaviors, it's unlikely that we can
> come up with a solution which doesn't need cooperation from userland.

True and then the timeout would also have to be skipped for device
drivers that have the sync_probe flag set, so I guess we'd need to
expose that too. I'm not too sure if systemd is equipped to be happy
with no timeout on module loading based previous discussions [0] so
we'd need to ensure we're all in agreement there that such drivers
exist and we may need *something*, if at the very least a really long
fucking timeout (TM) for such drivers.

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  2:28                                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 6:47 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
>> OK then one only concern I would have with this is that the presence
>> of such a flag doesn't necessarily mean that all drivers on a system
>> have been tested for asynch probe yet. I'd feel much more comfortable
>
> Given that the behvaior change is from driver core and that device
> probing can happen post-loading anyway,

Ah but lets not forget Dmitry's requirement which is for in-kernel
drivers. We'd need to deal with both built-in and modules. Dmitry's
case is completely orthogonal to the systemd issue and is just needed
to help not stall boot but I see no reason to blend these two issues
into one requirement together.

> I don't think we need to worry
> about drivers breaking from probing made asynchronous to loading.  The
> problem is the expectation of the entity which initiated loading of
> the module.  If it's depending on device being probed synchronously
> but insmod returns before that, it can break things.  We probably
> should audit request_module() users and see which ones expect such
> behavior.

Sure. Based on a quick glance I see sloppy uses of this, this should
probably be fixed anyway.

>> if this global flag allowed say specific drivers that *did* have such
>> a bool enabled, for example. Then that would enable synchronous
>> behaviour for the kernel by default, require the flag for enabling the
>> new async feature but only for drivers that have been tested.
>
> If we're gonna do the global switch, I personally think the right
> approach is blacklisting instead of the other way around because each
> specific driver doesn't really have much to do with it and the
> exceptions are about specific use cases that we don't have a good way
> to identify them from module loading path.

OK sure... even if we did whitelist I'm afraid such a white list might
be subjective in terms of design to specific systems anyway... I
suppose the only real way to do it right is to push and strive towards
a full system whitelist and address the black list as you mention.

In terms of approach we would still need to decide on a path for how
to do asynch probing for both in-kernel drivers and modules, do we
want async_schedule(), or queue_work()? If async_schedule() do we want
to use a new domain or a new one shared for all drivers? Priority on
the schedular was one of my other concerns which we'd need to make
right to match existing load on drivers through finit_module() and
synchronous probe.

>> That also still would not technically solve the issue of the current
>> existence of the timeout, unless of course we wish to ask systemd to
>> only make the timeout take effect *iff* the global sysctl flag /
>> whatever was enabled.
>
> Userland could backport a fix to set the sysctl.  Given that we need
> both synchrnous and asynchronous behaviors, it's unlikely that we can
> come up with a solution which doesn't need cooperation from userland.

True and then the timeout would also have to be skipped for device
drivers that have the sync_probe flag set, so I guess we'd need to
expose that too. I'm not too sure if systemd is equipped to be happy
with no timeout on module loading based previous discussions [0] so
we'd need to ensure we're all in agreement there that such drivers
exist and we may need *something*, if at the very least a really long
fucking timeout (TM) for such drivers.

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  2:28                                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 6:47 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
>> OK then one only concern I would have with this is that the presence
>> of such a flag doesn't necessarily mean that all drivers on a system
>> have been tested for asynch probe yet. I'd feel much more comfortable
>
> Given that the behvaior change is from driver core and that device
> probing can happen post-loading anyway,

Ah but lets not forget Dmitry's requirement which is for in-kernel
drivers. We'd need to deal with both built-in and modules. Dmitry's
case is completely orthogonal to the systemd issue and is just needed
to help not stall boot but I see no reason to blend these two issues
into one requirement together.

> I don't think we need to worry
> about drivers breaking from probing made asynchronous to loading.  The
> problem is the expectation of the entity which initiated loading of
> the module.  If it's depending on device being probed synchronously
> but insmod returns before that, it can break things.  We probably
> should audit request_module() users and see which ones expect such
> behavior.

Sure. Based on a quick glance I see sloppy uses of this, this should
probably be fixed anyway.

>> if this global flag allowed say specific drivers that *did* have such
>> a bool enabled, for example. Then that would enable synchronous
>> behaviour for the kernel by default, require the flag for enabling the
>> new async feature but only for drivers that have been tested.
>
> If we're gonna do the global switch, I personally think the right
> approach is blacklisting instead of the other way around because each
> specific driver doesn't really have much to do with it and the
> exceptions are about specific use cases that we don't have a good way
> to identify them from module loading path.

OK sure... even if we did whitelist I'm afraid such a white list might
be subjective in terms of design to specific systems anyway... I
suppose the only real way to do it right is to push and strive towards
a full system whitelist and address the black list as you mention.

In terms of approach we would still need to decide on a path for how
to do asynch probing for both in-kernel drivers and modules, do we
want async_schedule(), or queue_work()? If async_schedule() do we want
to use a new domain or a new one shared for all drivers? Priority on
the schedular was one of my other concerns which we'd need to make
right to match existing load on drivers through finit_module() and
synchronous probe.

>> That also still would not technically solve the issue of the current
>> existence of the timeout, unless of course we wish to ask systemd to
>> only make the timeout take effect *iff* the global sysctl flag /
>> whatever was enabled.
>
> Userland could backport a fix to set the sysctl.  Given that we need
> both synchrnous and asynchronous behaviors, it's unlikely that we can
> come up with a solution which doesn't need cooperation from userland.

True and then the timeout would also have to be skipped for device
drivers that have the sync_probe flag set, so I guess we'd need to
expose that too. I'm not too sure if systemd is equipped to be happy
with no timeout on module loading based previous discussions [0] so
we'd need to ensure we're all in agreement there that such drivers
exist and we may need *something*, if at the very least a really long
fucking timeout (TM) for such drivers.

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  2:28                                 ` Luis R. Rodriguez
  (?)
@ 2014-09-09  2:39                                   ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  2:39 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

Hello,

On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
> > Given that the behvaior change is from driver core and that device
> > probing can happen post-loading anyway,
> 
> Ah but lets not forget Dmitry's requirement which is for in-kernel
> drivers. We'd need to deal with both built-in and modules. Dmitry's
> case is completely orthogonal to the systemd issue and is just needed
> to help not stall boot but I see no reason to blend these two issues
> into one requirement together.

Maybe we can piggy back the two on the same mechanism but as you said
the two issues are orthogonal.  Let's keep it that way for now.  We
need them separate anyway for backports.

> In terms of approach we would still need to decide on a path for how
> to do asynch probing for both in-kernel drivers and modules, do we
> want async_schedule(), or queue_work()? If async_schedule() do we want
> to use a new domain or a new one shared for all drivers? Priority on

I don't think async_schedule() is the right mechanism for this use
case as the mechanism is inherently opportunistic.  It also gets
tangled up with async synchronization at the end of module loading.

> the schedular was one of my other concerns which we'd need to make
> right to match existing load on drivers through finit_module() and
> synchronous probe.

Why do we care about the priority of probing tasks?  Does that
actually make any meaningful difference?  If so, how?

> > Userland could backport a fix to set the sysctl.  Given that we need
> > both synchrnous and asynchronous behaviors, it's unlikely that we can
> > come up with a solution which doesn't need cooperation from userland.
> 
> True and then the timeout would also have to be skipped for device
> drivers that have the sync_probe flag set, so I guess we'd need to

I'm not sure about skipping for sync_probe flag.  That seems like an
implementation detail to me.  Sure, we do that now because we don't
have a better way of figuring out whether request_module() is waiting
for it or not but hopefully we'd be able to in the future.  I think we
just should make exceptions sensible so that it works fine in practice
for now (and I don't think that'd be too hard).  So, the only
cooperation necessary from userland would be just saying "I don't
wanna wait for device probing on module load."

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  2:39                                   ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  2:39 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello,

On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
> > Given that the behvaior change is from driver core and that device
> > probing can happen post-loading anyway,
> 
> Ah but lets not forget Dmitry's requirement which is for in-kernel
> drivers. We'd need to deal with both built-in and modules. Dmitry's
> case is completely orthogonal to the systemd issue and is just needed
> to help not stall boot but I see no reason to blend these two issues
> into one requirement together.

Maybe we can piggy back the two on the same mechanism but as you said
the two issues are orthogonal.  Let's keep it that way for now.  We
need them separate anyway for backports.

> In terms of approach we would still need to decide on a path for how
> to do asynch probing for both in-kernel drivers and modules, do we
> want async_schedule(), or queue_work()? If async_schedule() do we want
> to use a new domain or a new one shared for all drivers? Priority on

I don't think async_schedule() is the right mechanism for this use
case as the mechanism is inherently opportunistic.  It also gets
tangled up with async synchronization at the end of module loading.

> the schedular was one of my other concerns which we'd need to make
> right to match existing load on drivers through finit_module() and
> synchronous probe.

Why do we care about the priority of probing tasks?  Does that
actually make any meaningful difference?  If so, how?

> > Userland could backport a fix to set the sysctl.  Given that we need
> > both synchrnous and asynchronous behaviors, it's unlikely that we can
> > come up with a solution which doesn't need cooperation from userland.
> 
> True and then the timeout would also have to be skipped for device
> drivers that have the sync_probe flag set, so I guess we'd need to

I'm not sure about skipping for sync_probe flag.  That seems like an
implementation detail to me.  Sure, we do that now because we don't
have a better way of figuring out whether request_module() is waiting
for it or not but hopefully we'd be able to in the future.  I think we
just should make exceptions sensible so that it works fine in practice
for now (and I don't think that'd be too hard).  So, the only
cooperation necessary from userland would be just saying "I don't
wanna wait for device probing on module load."

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  2:39                                   ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  2:39 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello,

On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
> > Given that the behvaior change is from driver core and that device
> > probing can happen post-loading anyway,
> 
> Ah but lets not forget Dmitry's requirement which is for in-kernel
> drivers. We'd need to deal with both built-in and modules. Dmitry's
> case is completely orthogonal to the systemd issue and is just needed
> to help not stall boot but I see no reason to blend these two issues
> into one requirement together.

Maybe we can piggy back the two on the same mechanism but as you said
the two issues are orthogonal.  Let's keep it that way for now.  We
need them separate anyway for backports.

> In terms of approach we would still need to decide on a path for how
> to do asynch probing for both in-kernel drivers and modules, do we
> want async_schedule(), or queue_work()? If async_schedule() do we want
> to use a new domain or a new one shared for all drivers? Priority on

I don't think async_schedule() is the right mechanism for this use
case as the mechanism is inherently opportunistic.  It also gets
tangled up with async synchronization at the end of module loading.

> the schedular was one of my other concerns which we'd need to make
> right to match existing load on drivers through finit_module() and
> synchronous probe.

Why do we care about the priority of probing tasks?  Does that
actually make any meaningful difference?  If so, how?

> > Userland could backport a fix to set the sysctl.  Given that we need
> > both synchrnous and asynchronous behaviors, it's unlikely that we can
> > come up with a solution which doesn't need cooperation from userland.
> 
> True and then the timeout would also have to be skipped for device
> drivers that have the sync_probe flag set, so I guess we'd need to

I'm not sure about skipping for sync_probe flag.  That seems like an
implementation detail to me.  Sure, we do that now because we don't
have a better way of figuring out whether request_module() is waiting
for it or not but hopefully we'd be able to in the future.  I think we
just should make exceptions sensible so that it works fine in practice
for now (and I don't think that'd be too hard).  So, the only
cooperation necessary from userland would be just saying "I don't
wanna wait for device probing on module load."

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  2:39                                   ` Tejun Heo
  (?)
@ 2014-09-09  2:57                                     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  2:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 8, 2014 at 7:39 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
>> > Given that the behvaior change is from driver core and that device
>> > probing can happen post-loading anyway,
>>
>> Ah but lets not forget Dmitry's requirement which is for in-kernel
>> drivers. We'd need to deal with both built-in and modules. Dmitry's
>> case is completely orthogonal to the systemd issue and is just needed
>> to help not stall boot but I see no reason to blend these two issues
>> into one requirement together.
>
> Maybe we can piggy back the two on the same mechanism but as you said
> the two issues are orthogonal.  Let's keep it that way for now.  We
> need them separate anyway for backports.

OK.

>> In terms of approach we would still need to decide on a path for how
>> to do asynch probing for both in-kernel drivers and modules, do we
>> want async_schedule(), or queue_work()? If async_schedule() do we want
>> to use a new domain or a new one shared for all drivers? Priority on
>
> I don't think async_schedule() is the right mechanism for this use
> case as the mechanism is inherently opportunistic.  It also gets
> tangled up with async synchronization at the end of module loading.
>
>> the schedular was one of my other concerns which we'd need to make
>> right to match existing load on drivers through finit_module() and
>> synchronous probe.
>
> Why do we care about the priority of probing tasks?  Does that
> actually make any meaningful difference?  If so, how?

As I noted before -- I have yet to provide clear metrics but at least
changing both init paths + probe from finit_module() to kthread
certainly had a measurable time increase, I suspect using
queue_work(system_unbound_wq, async_probe_work) will make probe
slower. I'll get to these metrics this week.

>> > Userland could backport a fix to set the sysctl.  Given that we need
>> > both synchrnous and asynchronous behaviors, it's unlikely that we can
>> > come up with a solution which doesn't need cooperation from userland.
>>
>> True and then the timeout would also have to be skipped for device
>> drivers that have the sync_probe flag set, so I guess we'd need to
>
> I'm not sure about skipping for sync_probe flag.  That seems like an
> implementation detail to me.  Sure, we do that now because we don't
> have a better way of figuring out whether request_module() is waiting
> for it or not but hopefully we'd be able to in the future.

Oh I was not thinking about just request_modules() users but also any
of those stragglers which we might have ended up finding through run
time analysis. The alternative right now is these drivers won't load.
No bueno.

> I think we
> just should make exceptions sensible so that it works fine in practice
> for now (and I don't think that'd be too hard).  So, the only
> cooperation necessary from userland would be just saying "I don't
> wanna wait for device probing on module load."

But we're talking about drivers that have a flag that says 'you gotta
wait sucker', what do we want systemd to do then? I'd be happy if it'd
would not send the sigkill for these drivers, for example.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  2:57                                     ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  2:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 7:39 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
>> > Given that the behvaior change is from driver core and that device
>> > probing can happen post-loading anyway,
>>
>> Ah but lets not forget Dmitry's requirement which is for in-kernel
>> drivers. We'd need to deal with both built-in and modules. Dmitry's
>> case is completely orthogonal to the systemd issue and is just needed
>> to help not stall boot but I see no reason to blend these two issues
>> into one requirement together.
>
> Maybe we can piggy back the two on the same mechanism but as you said
> the two issues are orthogonal.  Let's keep it that way for now.  We
> need them separate anyway for backports.

OK.

>> In terms of approach we would still need to decide on a path for how
>> to do asynch probing for both in-kernel drivers and modules, do we
>> want async_schedule(), or queue_work()? If async_schedule() do we want
>> to use a new domain or a new one shared for all drivers? Priority on
>
> I don't think async_schedule() is the right mechanism for this use
> case as the mechanism is inherently opportunistic.  It also gets
> tangled up with async synchronization at the end of module loading.
>
>> the schedular was one of my other concerns which we'd need to make
>> right to match existing load on drivers through finit_module() and
>> synchronous probe.
>
> Why do we care about the priority of probing tasks?  Does that
> actually make any meaningful difference?  If so, how?

As I noted before -- I have yet to provide clear metrics but at least
changing both init paths + probe from finit_module() to kthread
certainly had a measurable time increase, I suspect using
queue_work(system_unbound_wq, async_probe_work) will make probe
slower. I'll get to these metrics this week.

>> > Userland could backport a fix to set the sysctl.  Given that we need
>> > both synchrnous and asynchronous behaviors, it's unlikely that we can
>> > come up with a solution which doesn't need cooperation from userland.
>>
>> True and then the timeout would also have to be skipped for device
>> drivers that have the sync_probe flag set, so I guess we'd need to
>
> I'm not sure about skipping for sync_probe flag.  That seems like an
> implementation detail to me.  Sure, we do that now because we don't
> have a better way of figuring out whether request_module() is waiting
> for it or not but hopefully we'd be able to in the future.

Oh I was not thinking about just request_modules() users but also any
of those stragglers which we might have ended up finding through run
time analysis. The alternative right now is these drivers won't load.
No bueno.

> I think we
> just should make exceptions sensible so that it works fine in practice
> for now (and I don't think that'd be too hard).  So, the only
> cooperation necessary from userland would be just saying "I don't
> wanna wait for device probing on module load."

But we're talking about drivers that have a flag that says 'you gotta
wait sucker', what do we want systemd to do then? I'd be happy if it'd
would not send the sigkill for these drivers, for example.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  2:57                                     ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  2:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 7:39 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
>> > Given that the behvaior change is from driver core and that device
>> > probing can happen post-loading anyway,
>>
>> Ah but lets not forget Dmitry's requirement which is for in-kernel
>> drivers. We'd need to deal with both built-in and modules. Dmitry's
>> case is completely orthogonal to the systemd issue and is just needed
>> to help not stall boot but I see no reason to blend these two issues
>> into one requirement together.
>
> Maybe we can piggy back the two on the same mechanism but as you said
> the two issues are orthogonal.  Let's keep it that way for now.  We
> need them separate anyway for backports.

OK.

>> In terms of approach we would still need to decide on a path for how
>> to do asynch probing for both in-kernel drivers and modules, do we
>> want async_schedule(), or queue_work()? If async_schedule() do we want
>> to use a new domain or a new one shared for all drivers? Priority on
>
> I don't think async_schedule() is the right mechanism for this use
> case as the mechanism is inherently opportunistic.  It also gets
> tangled up with async synchronization at the end of module loading.
>
>> the schedular was one of my other concerns which we'd need to make
>> right to match existing load on drivers through finit_module() and
>> synchronous probe.
>
> Why do we care about the priority of probing tasks?  Does that
> actually make any meaningful difference?  If so, how?

As I noted before -- I have yet to provide clear metrics but at least
changing both init paths + probe from finit_module() to kthread
certainly had a measurable time increase, I suspect using
queue_work(system_unbound_wq, async_probe_work) will make probe
slower. I'll get to these metrics this week.

>> > Userland could backport a fix to set the sysctl.  Given that we need
>> > both synchrnous and asynchronous behaviors, it's unlikely that we can
>> > come up with a solution which doesn't need cooperation from userland.
>>
>> True and then the timeout would also have to be skipped for device
>> drivers that have the sync_probe flag set, so I guess we'd need to
>
> I'm not sure about skipping for sync_probe flag.  That seems like an
> implementation detail to me.  Sure, we do that now because we don't
> have a better way of figuring out whether request_module() is waiting
> for it or not but hopefully we'd be able to in the future.

Oh I was not thinking about just request_modules() users but also any
of those stragglers which we might have ended up finding through run
time analysis. The alternative right now is these drivers won't load.
No bueno.

> I think we
> just should make exceptions sensible so that it works fine in practice
> for now (and I don't think that'd be too hard).  So, the only
> cooperation necessary from userland would be just saying "I don't
> wanna wait for device probing on module load."

But we're talking about drivers that have a flag that says 'you gotta
wait sucker', what do we want systemd to do then? I'd be happy if it'd
would not send the sigkill for these drivers, for example.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  2:57                                     ` Luis R. Rodriguez
  (?)
@ 2014-09-09  3:03                                       ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  3:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
> > I think we
> > just should make exceptions sensible so that it works fine in practice
> > for now (and I don't think that'd be too hard).  So, the only
> > cooperation necessary from userland would be just saying "I don't
> > wanna wait for device probing on module load."
> 
> But we're talking about drivers that have a flag that says 'you gotta
> wait sucker', what do we want systemd to do then? I'd be happy if it'd
> would not send the sigkill for these drivers, for example.

Hah?  Can you give me an example?  I'm having hard time imagining a
driver with such requirement given our current driver core
implementation.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  3:03                                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  3:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
> > I think we
> > just should make exceptions sensible so that it works fine in practice
> > for now (and I don't think that'd be too hard).  So, the only
> > cooperation necessary from userland would be just saying "I don't
> > wanna wait for device probing on module load."
> 
> But we're talking about drivers that have a flag that says 'you gotta
> wait sucker', what do we want systemd to do then? I'd be happy if it'd
> would not send the sigkill for these drivers, for example.

Hah?  Can you give me an example?  I'm having hard time imagining a
driver with such requirement given our current driver core
implementation.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  3:03                                       ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  3:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
> > I think we
> > just should make exceptions sensible so that it works fine in practice
> > for now (and I don't think that'd be too hard).  So, the only
> > cooperation necessary from userland would be just saying "I don't
> > wanna wait for device probing on module load."
> 
> But we're talking about drivers that have a flag that says 'you gotta
> wait sucker', what do we want systemd to do then? I'd be happy if it'd
> would not send the sigkill for these drivers, for example.

Hah?  Can you give me an example?  I'm having hard time imagining a
driver with such requirement given our current driver core
implementation.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  3:03                                       ` Tejun Heo
  (?)
@ 2014-09-09  3:19                                         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  3:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 8, 2014 at 8:03 PM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
>> > I think we
>> > just should make exceptions sensible so that it works fine in practice
>> > for now (and I don't think that'd be too hard).  So, the only
>> > cooperation necessary from userland would be just saying "I don't
>> > wanna wait for device probing on module load."
>>
>> But we're talking about drivers that have a flag that says 'you gotta
>> wait sucker', what do we want systemd to do then? I'd be happy if it'd
>> would not send the sigkill for these drivers, for example.
>
> Hah?  Can you give me an example?  I'm having hard time imagining a
> driver with such requirement given our current driver core
> implementation.

I didn't say I had one in mind, but if you're certain these *shouldn't
exist* that's sufficient by me as well.

OK so I'll respin this series to enable a sysctl that would enable
async probe for *all drivers* using queue_work(system_unbound_wq) and
only use sync probe for now on request_module() users, we'll address
scheduling issues as they come up. I'll be ignoring built-in.

On the systemd side of things it should enable this sysctl and for
older kernels what should it do?

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  3:19                                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  3:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 8:03 PM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
>> > I think we
>> > just should make exceptions sensible so that it works fine in practice
>> > for now (and I don't think that'd be too hard).  So, the only
>> > cooperation necessary from userland would be just saying "I don't
>> > wanna wait for device probing on module load."
>>
>> But we're talking about drivers that have a flag that says 'you gotta
>> wait sucker', what do we want systemd to do then? I'd be happy if it'd
>> would not send the sigkill for these drivers, for example.
>
> Hah?  Can you give me an example?  I'm having hard time imagining a
> driver with such requirement given our current driver core
> implementation.

I didn't say I had one in mind, but if you're certain these *shouldn't
exist* that's sufficient by me as well.

OK so I'll respin this series to enable a sysctl that would enable
async probe for *all drivers* using queue_work(system_unbound_wq) and
only use sync probe for now on request_module() users, we'll address
scheduling issues as they come up. I'll be ignoring built-in.

On the systemd side of things it should enable this sysctl and for
older kernels what should it do?

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  3:19                                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09  3:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Mon, Sep 8, 2014 at 8:03 PM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
>> > I think we
>> > just should make exceptions sensible so that it works fine in practice
>> > for now (and I don't think that'd be too hard).  So, the only
>> > cooperation necessary from userland would be just saying "I don't
>> > wanna wait for device probing on module load."
>>
>> But we're talking about drivers that have a flag that says 'you gotta
>> wait sucker', what do we want systemd to do then? I'd be happy if it'd
>> would not send the sigkill for these drivers, for example.
>
> Hah?  Can you give me an example?  I'm having hard time imagining a
> driver with such requirement given our current driver core
> implementation.

I didn't say I had one in mind, but if you're certain these *shouldn't
exist* that's sufficient by me as well.

OK so I'll respin this series to enable a sysctl that would enable
async probe for *all drivers* using queue_work(system_unbound_wq) and
only use sync probe for now on request_module() users, we'll address
scheduling issues as they come up. I'll be ignoring built-in.

On the systemd side of things it should enable this sysctl and for
older kernels what should it do?

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  3:19                                         ` Luis R. Rodriguez
  (?)
@ 2014-09-09  3:25                                           ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  3:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

Hello,

On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> On the systemd side of things it should enable this sysctl and for
> older kernels what should it do?

Supposing the change is backported via -stable, it can try to set the
sysctl on all kernels.  If the knob doesn't exist, the fix is not
there and nothing can be done about it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  3:25                                           ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  3:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello,

On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> On the systemd side of things it should enable this sysctl and for
> older kernels what should it do?

Supposing the change is backported via -stable, it can try to set the
sysctl on all kernels.  If the knob doesn't exist, the fix is not
there and nothing can be done about it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  3:25                                           ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09  3:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

Hello,

On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> On the systemd side of things it should enable this sysctl and for
> older kernels what should it do?

Supposing the change is backported via -stable, it can try to set the
sysctl on all kernels.  If the knob doesn't exist, the fix is not
there and nothing can be done about it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:10                     ` Tejun Heo
  (?)
@ 2014-09-09  5:38                       ` James Bottomley
  -1 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09  5:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
> Hello, Luis.
> 
> On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > > I have no idea how the selection should be.  It could be per-insmod or
> > > maybe just a system-wide flag with explicit exceptions marked on
> > > drivers is good enough.  I don't know.
> > 
> > Its perfectly understandable if we don't know what path to take yet
> > and its also understandable for it to take time to figure out --
> > meanwhile though systemd already has merged a policy of a 30 second
> > timeout for *all drivers* though so we therefore need:
> 
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.
> 
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

What's wrong with just fixing systemd?  Arbitrary timeouts in init
scripts for system bring up are plain wrong ... I thought we had this
sorted out ten years ago when we were first having the arguments about
how long to wait for root; I'm surprised it's coming back again.

If we want to sort out some sync/async mechanism for probing devices, as
an agreement between the init systems and the kernel, that's fine, but
its a to-be negotiated enhancement.  For the current bug fix, just fix
the component that broke ... which would be systemd.

James



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  5:38                       ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09  5:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
> Hello, Luis.
> 
> On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > > I have no idea how the selection should be.  It could be per-insmod or
> > > maybe just a system-wide flag with explicit exceptions marked on
> > > drivers is good enough.  I don't know.
> > 
> > Its perfectly understandable if we don't know what path to take yet
> > and its also understandable for it to take time to figure out --
> > meanwhile though systemd already has merged a policy of a 30 second
> > timeout for *all drivers* though so we therefore need:
> 
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.
> 
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

What's wrong with just fixing systemd?  Arbitrary timeouts in init
scripts for system bring up are plain wrong ... I thought we had this
sorted out ten years ago when we were first having the arguments about
how long to wait for root; I'm surprised it's coming back again.

If we want to sort out some sync/async mechanism for probing devices, as
an agreement between the init systems and the kernel, that's fine, but
its a to-be negotiated enhancement.  For the current bug fix, just fix
the component that broke ... which would be systemd.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09  5:38                       ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09  5:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
> Hello, Luis.
> 
> On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > > I have no idea how the selection should be.  It could be per-insmod or
> > > maybe just a system-wide flag with explicit exceptions marked on
> > > drivers is good enough.  I don't know.
> > 
> > Its perfectly understandable if we don't know what path to take yet
> > and its also understandable for it to take time to figure out --
> > meanwhile though systemd already has merged a policy of a 30 second
> > timeout for *all drivers* though so we therefore need:
> 
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.
> 
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

What's wrong with just fixing systemd?  Arbitrary timeouts in init
scripts for system bring up are plain wrong ... I thought we had this
sorted out ten years ago when we were first having the arguments about
how long to wait for root; I'm surprised it's coming back again.

If we want to sort out some sync/async mechanism for probing devices, as
an agreement between the init systems and the kernel, that's fine, but
its a to-be negotiated enhancement.  For the current bug fix, just fix
the component that broke ... which would be systemd.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  5:38                       ` James Bottomley
  (?)
@ 2014-09-09 19:16                         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09 19:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
>> Hello, Luis.
>>
>> On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
>> > > I have no idea how the selection should be.  It could be per-insmod or
>> > > maybe just a system-wide flag with explicit exceptions marked on
>> > > drivers is good enough.  I don't know.
>> >
>> > Its perfectly understandable if we don't know what path to take yet
>> > and its also understandable for it to take time to figure out --
>> > meanwhile though systemd already has merged a policy of a 30 second
>> > timeout for *all drivers* though so we therefore need:
>>
>> I'm not too convinced this is such a difficult problem to figure out.
>> We already have most of logic in place and the only thing missing is
>> how to switch it.  Wouldn't something like the following work?
>>
>> * Add a sysctl knob to enable asynchronous device probing on module
>>   load and enable asynchronous probing globally if the knob is set.
>>
>> * Identify cases which can't be asynchronous and make them
>>   synchronous.  e.g. keep who's doing request_module() and avoid
>>   asynchronous probing if current is probing one of those.
>
> What's wrong with just fixing systemd?  Arbitrary timeouts in init
> scripts for system bring up are plain wrong ... I thought we had this
> sorted out ten years ago when we were first having the arguments about
> how long to wait for root; I'm surprised it's coming back again.

By design it seems systemd should not allow worker processes to block
indefinitely and in fact it currently uses the same timeout for all
types of worker processes. I last recommended a multiplier to at least
allow systemd to distinguish and allow us to modify the timeout based
on the type of process by using an enum used to classify these, kmod
for example would be one type of command:

http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

This was deemed to introduce unnecessary complexity, but I believe
this was before we realized that the timeout was penalizing kmod usage
unfairly given that the original assumption that it was just init that
should be penalized was incorrect given that we batch both init +
probe together. I have been relaying updates back on that thread as we
move along with this discussion on the issues found with the timeout,
but haven't gotten feedback yet as to which path folks on systemd
would like to take in light of recent discussions / clarifications.
Perhaps your arguments might help folks here reconsider things a bit
as well.

If we want *tight* integration between init system / kernel these
discussions are necessary not only when we find issues but also should
be part of the design phase for major changes.

> If we want to sort out some sync/async mechanism for probing devices, as
> an agreement between the init systems and the kernel, that's fine, but
> its a to-be negotiated enhancement.

Unfortunately as Tejun notes the train has left which already made
assumptions on this. I'm afraid distributions that want to avoid this
sigkill at least on the kernel front will have to work around this
issue either on systemd by increasing the default timeout which is now
possible thanks to Hannes' changes or by some other means such as the
combination of a modified non-chatty version of this patch + a check
at the end of load_module() as mentioned earlier on these threads.

> For the current bug fix, just fix  the component that broke ... which would be systemd.

For new systems it seems the proposed fix is to have systemd tell the
kernel what it thought it should be seeing and that is all pure async
probes through a sysctl, and then we'd do async probe on all modules
unless a driver is specifically flagged with a need to run synchronous
(we'll enable this for request_firmware() users for example to start
off with).

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 19:16                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09 19:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
>> Hello, Luis.
>>
>> On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
>> > > I have no idea how the selection should be.  It could be per-insmod or
>> > > maybe just a system-wide flag with explicit exceptions marked on
>> > > drivers is good enough.  I don't know.
>> >
>> > Its perfectly understandable if we don't know what path to take yet
>> > and its also understandable for it to take time to figure out --
>> > meanwhile though systemd already has merged a policy of a 30 second
>> > timeout for *all drivers* though so we therefore need:
>>
>> I'm not too convinced this is such a difficult problem to figure out.
>> We already have most of logic in place and the only thing missing is
>> how to switch it.  Wouldn't something like the following work?
>>
>> * Add a sysctl knob to enable asynchronous device probing on module
>>   load and enable asynchronous probing globally if the knob is set.
>>
>> * Identify cases which can't be asynchronous and make them
>>   synchronous.  e.g. keep who's doing request_module() and avoid
>>   asynchronous probing if current is probing one of those.
>
> What's wrong with just fixing systemd?  Arbitrary timeouts in init
> scripts for system bring up are plain wrong ... I thought we had this
> sorted out ten years ago when we were first having the arguments about
> how long to wait for root; I'm surprised it's coming back again.

By design it seems systemd should not allow worker processes to block
indefinitely and in fact it currently uses the same timeout for all
types of worker processes. I last recommended a multiplier to at least
allow systemd to distinguish and allow us to modify the timeout based
on the type of process by using an enum used to classify these, kmod
for example would be one type of command:

http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

This was deemed to introduce unnecessary complexity, but I believe
this was before we realized that the timeout was penalizing kmod usage
unfairly given that the original assumption that it was just init that
should be penalized was incorrect given that we batch both init +
probe together. I have been relaying updates back on that thread as we
move along with this discussion on the issues found with the timeout,
but haven't gotten feedback yet as to which path folks on systemd
would like to take in light of recent discussions / clarifications.
Perhaps your arguments might help folks here reconsider things a bit
as well.

If we want *tight* integration between init system / kernel these
discussions are necessary not only when we find issues but also should
be part of the design phase for major changes.

> If we want to sort out some sync/async mechanism for probing devices, as
> an agreement between the init systems and the kernel, that's fine, but
> its a to-be negotiated enhancement.

Unfortunately as Tejun notes the train has left which already made
assumptions on this. I'm afraid distributions that want to avoid this
sigkill at least on the kernel front will have to work around this
issue either on systemd by increasing the default timeout which is now
possible thanks to Hannes' changes or by some other means such as the
combination of a modified non-chatty version of this patch + a check
at the end of load_module() as mentioned earlier on these threads.

> For the current bug fix, just fix  the component that broke ... which would be systemd.

For new systems it seems the proposed fix is to have systemd tell the
kernel what it thought it should be seeing and that is all pure async
probes through a sysctl, and then we'd do async probe on all modules
unless a driver is specifically flagged with a need to run synchronous
(we'll enable this for request_firmware() users for example to start
off with).

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 19:16                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09 19:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
>> Hello, Luis.
>>
>> On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
>> > > I have no idea how the selection should be.  It could be per-insmod or
>> > > maybe just a system-wide flag with explicit exceptions marked on
>> > > drivers is good enough.  I don't know.
>> >
>> > Its perfectly understandable if we don't know what path to take yet
>> > and its also understandable for it to take time to figure out --
>> > meanwhile though systemd already has merged a policy of a 30 second
>> > timeout for *all drivers* though so we therefore need:
>>
>> I'm not too convinced this is such a difficult problem to figure out.
>> We already have most of logic in place and the only thing missing is
>> how to switch it.  Wouldn't something like the following work?
>>
>> * Add a sysctl knob to enable asynchronous device probing on module
>>   load and enable asynchronous probing globally if the knob is set.
>>
>> * Identify cases which can't be asynchronous and make them
>>   synchronous.  e.g. keep who's doing request_module() and avoid
>>   asynchronous probing if current is probing one of those.
>
> What's wrong with just fixing systemd?  Arbitrary timeouts in init
> scripts for system bring up are plain wrong ... I thought we had this
> sorted out ten years ago when we were first having the arguments about
> how long to wait for root; I'm surprised it's coming back again.

By design it seems systemd should not allow worker processes to block
indefinitely and in fact it currently uses the same timeout for all
types of worker processes. I last recommended a multiplier to at least
allow systemd to distinguish and allow us to modify the timeout based
on the type of process by using an enum used to classify these, kmod
for example would be one type of command:

http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

This was deemed to introduce unnecessary complexity, but I believe
this was before we realized that the timeout was penalizing kmod usage
unfairly given that the original assumption that it was just init that
should be penalized was incorrect given that we batch both init +
probe together. I have been relaying updates back on that thread as we
move along with this discussion on the issues found with the timeout,
but haven't gotten feedback yet as to which path folks on systemd
would like to take in light of recent discussions / clarifications.
Perhaps your arguments might help folks here reconsider things a bit
as well.

If we want *tight* integration between init system / kernel these
discussions are necessary not only when we find issues but also should
be part of the design phase for major changes.

> If we want to sort out some sync/async mechanism for probing devices, as
> an agreement between the init systems and the kernel, that's fine, but
> its a to-be negotiated enhancement.

Unfortunately as Tejun notes the train has left which already made
assumptions on this. I'm afraid distributions that want to avoid this
sigkill at least on the kernel front will have to work around this
issue either on systemd by increasing the default timeout which is now
possible thanks to Hannes' changes or by some other means such as the
combination of a modified non-chatty version of this patch + a check
at the end of load_module() as mentioned earlier on these threads.

> For the current bug fix, just fix  the component that broke ... which would be systemd.

For new systems it seems the proposed fix is to have systemd tell the
kernel what it thought it should be seeing and that is all pure async
probes through a sysctl, and then we'd do async probe on all modules
unless a driver is specifically flagged with a need to run synchronous
(we'll enable this for request_firmware() users for example to start
off with).

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 19:16                         ` Luis R. Rodriguez
  (?)
@ 2014-09-09 19:35                           ` James Bottomley
  -1 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 19:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > If we want to sort out some sync/async mechanism for probing devices, as
> > an agreement between the init systems and the kernel, that's fine, but
> > its a to-be negotiated enhancement.
> 
> Unfortunately as Tejun notes the train has left which already made
> assumptions on this.

Well, that's why it's a bug.  It's a material regression impacting
users.

>  I'm afraid distributions that want to avoid this
> sigkill at least on the kernel front will have to work around this
> issue either on systemd by increasing the default timeout which is now
> possible thanks to Hannes' changes or by some other means such as the
> combination of a modified non-chatty version of this patch + a check
> at the end of load_module() as mentioned earlier on these threads.

Increasing the default timeout in systemd seems like the obvious bug fix
to me.  If the patch exists already, having distros that want it use it
looks to be correct ... not every bug is a kernel bug, after all.

Negotiating a probe vs init split for drivers is fine too, but it's a
longer term thing rather than a bug fix.

> > For the current bug fix, just fix  the component that broke ... which would be systemd.
> 
> For new systems it seems the proposed fix is to have systemd tell the
> kernel what it thought it should be seeing and that is all pure async
> probes through a sysctl, and then we'd do async probe on all modules
> unless a driver is specifically flagged with a need to run synchronous
> (we'll enable this for request_firmware() users for example to start
> off with).

I don't have very strong views on this one.  However, I've got to say
from a systems point of view that if the desire is to flag when the
module is having problems, probing and initializing synchronously in a
thread spawned by init which the init process can watchdog and thus can
flash up warning messages seems to be more straightforwards than an
elaborate asynchronous mechanism with completion signalling which
achieves the same thing in a more complicated (and thus bug prone)
fashion.

James



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 19:35                           ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 19:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > If we want to sort out some sync/async mechanism for probing devices, as
> > an agreement between the init systems and the kernel, that's fine, but
> > its a to-be negotiated enhancement.
> 
> Unfortunately as Tejun notes the train has left which already made
> assumptions on this.

Well, that's why it's a bug.  It's a material regression impacting
users.

>  I'm afraid distributions that want to avoid this
> sigkill at least on the kernel front will have to work around this
> issue either on systemd by increasing the default timeout which is now
> possible thanks to Hannes' changes or by some other means such as the
> combination of a modified non-chatty version of this patch + a check
> at the end of load_module() as mentioned earlier on these threads.

Increasing the default timeout in systemd seems like the obvious bug fix
to me.  If the patch exists already, having distros that want it use it
looks to be correct ... not every bug is a kernel bug, after all.

Negotiating a probe vs init split for drivers is fine too, but it's a
longer term thing rather than a bug fix.

> > For the current bug fix, just fix  the component that broke ... which would be systemd.
> 
> For new systems it seems the proposed fix is to have systemd tell the
> kernel what it thought it should be seeing and that is all pure async
> probes through a sysctl, and then we'd do async probe on all modules
> unless a driver is specifically flagged with a need to run synchronous
> (we'll enable this for request_firmware() users for example to start
> off with).

I don't have very strong views on this one.  However, I've got to say
from a systems point of view that if the desire is to flag when the
module is having problems, probing and initializing synchronously in a
thread spawned by init which the init process can watchdog and thus can
flash up warning messages seems to be more straightforwards than an
elaborate asynchronous mechanism with completion signalling which
achieves the same thing in a more complicated (and thus bug prone)
fashion.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 19:35                           ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 19:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > If we want to sort out some sync/async mechanism for probing devices, as
> > an agreement between the init systems and the kernel, that's fine, but
> > its a to-be negotiated enhancement.
> 
> Unfortunately as Tejun notes the train has left which already made
> assumptions on this.

Well, that's why it's a bug.  It's a material regression impacting
users.

>  I'm afraid distributions that want to avoid this
> sigkill at least on the kernel front will have to work around this
> issue either on systemd by increasing the default timeout which is now
> possible thanks to Hannes' changes or by some other means such as the
> combination of a modified non-chatty version of this patch + a check
> at the end of load_module() as mentioned earlier on these threads.

Increasing the default timeout in systemd seems like the obvious bug fix
to me.  If the patch exists already, having distros that want it use it
looks to be correct ... not every bug is a kernel bug, after all.

Negotiating a probe vs init split for drivers is fine too, but it's a
longer term thing rather than a bug fix.

> > For the current bug fix, just fix  the component that broke ... which would be systemd.
> 
> For new systems it seems the proposed fix is to have systemd tell the
> kernel what it thought it should be seeing and that is all pure async
> probes through a sysctl, and then we'd do async probe on all modules
> unless a driver is specifically flagged with a need to run synchronous
> (we'll enable this for request_firmware() users for example to start
> off with).

I don't have very strong views on this one.  However, I've got to say
from a systems point of view that if the desire is to flag when the
module is having problems, probing and initializing synchronously in a
thread spawned by init which the init process can watchdog and thus can
flash up warning messages seems to be more straightforwards than an
elaborate asynchronous mechanism with completion signalling which
achieves the same thing in a more complicated (and thus bug prone)
fashion.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 19:35                           ` James Bottomley
  (?)
@ 2014-09-09 20:45                             ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09 20:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev, systemd-devel

On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > If we want to sort out some sync/async mechanism for probing devices, as
>> > an agreement between the init systems and the kernel, that's fine, but
>> > its a to-be negotiated enhancement.
>>
>> Unfortunately as Tejun notes the train has left which already made
>> assumptions on this.
>
> Well, that's why it's a bug.  It's a material regression impacting
> users.

Indeed. I believe the issue with this regression however was that the
original commit e64fae55 (January 2012) was only accepted by *kernel
folks* to be a real regression until recently. More than two years
have gone by on growing design and assumptions on top of that original
commit. I'm not sure if *systemd folks* yet believe its was a design
regression?

>>  I'm afraid distributions that want to avoid this
>> sigkill at least on the kernel front will have to work around this
>> issue either on systemd by increasing the default timeout which is now
>> possible thanks to Hannes' changes or by some other means such as the
>> combination of a modified non-chatty version of this patch + a check
>> at the end of load_module() as mentioned earlier on these threads.
>
> Increasing the default timeout in systemd seems like the obvious bug fix
> to me.  If the patch exists already, having distros that want it use it
> looks to be correct ... not every bug is a kernel bug, after all.

Its merged upstream on systemd now, along with a few fixes on top of
it. I also see Kay merged a change to the default timeout to 60 second
on August 30. Its unclear if these discussions had any impact on that
decision or if that was just because udev firmware loading got now
ripped out. I'll note that the new 60 second timeout wouldn't suffice
for cxgb4 even if it didn't do firmware loading, its probe takes over
one full minute.

> Negotiating a probe vs init split for drivers is fine too, but it's a
> longer term thing rather than a bug fix.

Indeed. What I proposed with a multiplier for the timeout for the
different types of built in commands was deemed complex but saw no
alternatives proposed despite my interest to work on one and
clarifications noted that this was a design regression. Not quite sure
what else I could have done here. I'm interested in learning what the
better approach is for the future as if we want to marry init + kernel
we need a smooth way for us to discuss design without getting worked
up about it, or taking it personal. I really want this to work as I
personally like systemd so far.

>> > For the current bug fix, just fix  the component that broke ... which would be systemd.
>>
>> For new systems it seems the proposed fix is to have systemd tell the
>> kernel what it thought it should be seeing and that is all pure async
>> probes through a sysctl, and then we'd do async probe on all modules
>> unless a driver is specifically flagged with a need to run synchronous
>> (we'll enable this for request_firmware() users for example to start
>> off with).
>
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards

Indeed however it was not understood that module loading did init +
probe synchrounously, and indeed what you recommend is also what I was
hoping systemd *should do* instead of a hard sigkill at the default
timeout.

> than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

I couldn't be in any more agreement with you. It takes two to tango though.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 20:45                             ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09 20:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Oleg Nesterov,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, MPT-FusionLinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd-devel, Linux SCSI List, Dmitry Torokhov

On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > If we want to sort out some sync/async mechanism for probing devices, as
>> > an agreement between the init systems and the kernel, that's fine, but
>> > its a to-be negotiated enhancement.
>>
>> Unfortunately as Tejun notes the train has left which already made
>> assumptions on this.
>
> Well, that's why it's a bug.  It's a material regression impacting
> users.

Indeed. I believe the issue with this regression however was that the
original commit e64fae55 (January 2012) was only accepted by *kernel
folks* to be a real regression until recently. More than two years
have gone by on growing design and assumptions on top of that original
commit. I'm not sure if *systemd folks* yet believe its was a design
regression?

>>  I'm afraid distributions that want to avoid this
>> sigkill at least on the kernel front will have to work around this
>> issue either on systemd by increasing the default timeout which is now
>> possible thanks to Hannes' changes or by some other means such as the
>> combination of a modified non-chatty version of this patch + a check
>> at the end of load_module() as mentioned earlier on these threads.
>
> Increasing the default timeout in systemd seems like the obvious bug fix
> to me.  If the patch exists already, having distros that want it use it
> looks to be correct ... not every bug is a kernel bug, after all.

Its merged upstream on systemd now, along with a few fixes on top of
it. I also see Kay merged a change to the default timeout to 60 second
on August 30. Its unclear if these discussions had any impact on that
decision or if that was just because udev firmware loading got now
ripped out. I'll note that the new 60 second timeout wouldn't suffice
for cxgb4 even if it didn't do firmware loading, its probe takes over
one full minute.

> Negotiating a probe vs init split for drivers is fine too, but it's a
> longer term thing rather than a bug fix.

Indeed. What I proposed with a multiplier for the timeout for the
different types of built in commands was deemed complex but saw no
alternatives proposed despite my interest to work on one and
clarifications noted that this was a design regression. Not quite sure
what else I could have done here. I'm interested in learning what the
better approach is for the future as if we want to marry init + kernel
we need a smooth way for us to discuss design without getting worked
up about it, or taking it personal. I really want this to work as I
personally like systemd so far.

>> > For the current bug fix, just fix  the component that broke ... which would be systemd.
>>
>> For new systems it seems the proposed fix is to have systemd tell the
>> kernel what it thought it should be seeing and that is all pure async
>> probes through a sysctl, and then we'd do async probe on all modules
>> unless a driver is specifically flagged with a need to run synchronous
>> (we'll enable this for request_firmware() users for example to start
>> off with).
>
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards

Indeed however it was not understood that module loading did init +
probe synchrounously, and indeed what you recommend is also what I was
hoping systemd *should do* instead of a hard sigkill at the default
timeout.

> than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

I couldn't be in any more agreement with you. It takes two to tango though.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 20:45                             ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-09 20:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Oleg Nesterov,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, MPT-FusionLinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd-devel, Linux SCSI List, Dmitry Torokhov

On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > If we want to sort out some sync/async mechanism for probing devices, as
>> > an agreement between the init systems and the kernel, that's fine, but
>> > its a to-be negotiated enhancement.
>>
>> Unfortunately as Tejun notes the train has left which already made
>> assumptions on this.
>
> Well, that's why it's a bug.  It's a material regression impacting
> users.

Indeed. I believe the issue with this regression however was that the
original commit e64fae55 (January 2012) was only accepted by *kernel
folks* to be a real regression until recently. More than two years
have gone by on growing design and assumptions on top of that original
commit. I'm not sure if *systemd folks* yet believe its was a design
regression?

>>  I'm afraid distributions that want to avoid this
>> sigkill at least on the kernel front will have to work around this
>> issue either on systemd by increasing the default timeout which is now
>> possible thanks to Hannes' changes or by some other means such as the
>> combination of a modified non-chatty version of this patch + a check
>> at the end of load_module() as mentioned earlier on these threads.
>
> Increasing the default timeout in systemd seems like the obvious bug fix
> to me.  If the patch exists already, having distros that want it use it
> looks to be correct ... not every bug is a kernel bug, after all.

Its merged upstream on systemd now, along with a few fixes on top of
it. I also see Kay merged a change to the default timeout to 60 second
on August 30. Its unclear if these discussions had any impact on that
decision or if that was just because udev firmware loading got now
ripped out. I'll note that the new 60 second timeout wouldn't suffice
for cxgb4 even if it didn't do firmware loading, its probe takes over
one full minute.

> Negotiating a probe vs init split for drivers is fine too, but it's a
> longer term thing rather than a bug fix.

Indeed. What I proposed with a multiplier for the timeout for the
different types of built in commands was deemed complex but saw no
alternatives proposed despite my interest to work on one and
clarifications noted that this was a design regression. Not quite sure
what else I could have done here. I'm interested in learning what the
better approach is for the future as if we want to marry init + kernel
we need a smooth way for us to discuss design without getting worked
up about it, or taking it personal. I really want this to work as I
personally like systemd so far.

>> > For the current bug fix, just fix  the component that broke ... which would be systemd.
>>
>> For new systems it seems the proposed fix is to have systemd tell the
>> kernel what it thought it should be seeing and that is all pure async
>> probes through a sysctl, and then we'd do async probe on all modules
>> unless a driver is specifically flagged with a need to run synchronous
>> (we'll enable this for request_firmware() users for example to start
>> off with).
>
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards

Indeed however it was not understood that module loading did init +
probe synchrounously, and indeed what you recommend is also what I was
hoping systemd *should do* instead of a hard sigkill at the default
timeout.

> than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

I couldn't be in any more agreement with you. It takes two to tango though.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 19:35                           ` James Bottomley
  (?)
@ 2014-09-09 21:42                             ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 21:42 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hey, James.

On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

We no longer report back error on probe failure on module load.  It
used to make sense to indicate error for module load on probe failure
when the hardware was a lot simpler and drivers did their own device
enumeration.  With the current bus / device setup, it doesn't make any
sense and driver core silently suppresses all probe failures.  There's
nothing the probing thread can monitor anymore.

In that sense, we already separated out device probing from module
loading simply because the hardware reality mandated it and we have
dynamic mechanisms to listen for device probes exactly for the same
reason, so I think it makes sense to separate out the waiting too, at
least in the long term.  In a modern dynamic setup, the waits are
essentially arbitrary and doesn't buy us anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 21:42                             ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 21:42 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

Hey, James.

On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

We no longer report back error on probe failure on module load.  It
used to make sense to indicate error for module load on probe failure
when the hardware was a lot simpler and drivers did their own device
enumeration.  With the current bus / device setup, it doesn't make any
sense and driver core silently suppresses all probe failures.  There's
nothing the probing thread can monitor anymore.

In that sense, we already separated out device probing from module
loading simply because the hardware reality mandated it and we have
dynamic mechanisms to listen for device probes exactly for the same
reason, so I think it makes sense to separate out the waiting too, at
least in the long term.  In a modern dynamic setup, the waits are
essentially arbitrary and doesn't buy us anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 21:42                             ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 21:42 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

Hey, James.

On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

We no longer report back error on probe failure on module load.  It
used to make sense to indicate error for module load on probe failure
when the hardware was a lot simpler and drivers did their own device
enumeration.  With the current bus / device setup, it doesn't make any
sense and driver core silently suppresses all probe failures.  There's
nothing the probing thread can monitor anymore.

In that sense, we already separated out device probing from module
loading simply because the hardware reality mandated it and we have
dynamic mechanisms to listen for device probes exactly for the same
reason, so I think it makes sense to separate out the waiting too, at
least in the long term.  In a modern dynamic setup, the waits are
essentially arbitrary and doesn't buy us anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 19:16                         ` Luis R. Rodriguez
  (?)
@ 2014-09-09 22:00                           ` Jiri Kosina
  -1 siblings, 0 replies; 227+ messages in thread
From: Jiri Kosina @ 2014-09-09 22:00 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Tue, 9 Sep 2014, Luis R. Rodriguez wrote:

> By design it seems systemd should not allow worker processes to block
> indefinitely and in fact it currently uses the same timeout for all
> types of worker processes. 

And I whole-heartedly believe this is something that fundamentally needs 
to be addressed in systemd, not in the kernel.

This aproach is actually introducing a user-visible regressions. Look, for 
example, exec() never times out. Therefore if your system is on its knees, 
heavily overloaded (or completely broken), you are likely to be able to 
`reboot' it, because exec("/sbin/reboot") ultimately succeeds.

But with all the timeouts, dbus, "Failed to issue method call: Did 
not receive a reply" messages, this is getting close to impossible.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:00                           ` Jiri Kosina
  0 siblings, 0 replies; 227+ messages in thread
From: Jiri Kosina @ 2014-09-09 22:00 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama

On Tue, 9 Sep 2014, Luis R. Rodriguez wrote:

> By design it seems systemd should not allow worker processes to block
> indefinitely and in fact it currently uses the same timeout for all
> types of worker processes. 

And I whole-heartedly believe this is something that fundamentally needs 
to be addressed in systemd, not in the kernel.

This aproach is actually introducing a user-visible regressions. Look, for 
example, exec() never times out. Therefore if your system is on its knees, 
heavily overloaded (or completely broken), you are likely to be able to 
`reboot' it, because exec("/sbin/reboot") ultimately succeeds.

But with all the timeouts, dbus, "Failed to issue method call: Did 
not receive a reply" messages, this is getting close to impossible.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:00                           ` Jiri Kosina
  0 siblings, 0 replies; 227+ messages in thread
From: Jiri Kosina @ 2014-09-09 22:00 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama

On Tue, 9 Sep 2014, Luis R. Rodriguez wrote:

> By design it seems systemd should not allow worker processes to block
> indefinitely and in fact it currently uses the same timeout for all
> types of worker processes. 

And I whole-heartedly believe this is something that fundamentally needs 
to be addressed in systemd, not in the kernel.

This aproach is actually introducing a user-visible regressions. Look, for 
example, exec() never times out. Therefore if your system is on its knees, 
heavily overloaded (or completely broken), you are likely to be able to 
`reboot' it, because exec("/sbin/reboot") ultimately succeeds.

But with all the timeouts, dbus, "Failed to issue method call: Did 
not receive a reply" messages, this is getting close to impossible.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 21:42                             ` Tejun Heo
  (?)
@ 2014-09-09 22:26                               ` James Bottomley
  -1 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 22:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Wed, 2014-09-10 at 06:42 +0900, Tejun Heo wrote:
> Hey, James.
> 
> On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> > I don't have very strong views on this one.  However, I've got to say
> > from a systems point of view that if the desire is to flag when the
> > module is having problems, probing and initializing synchronously in a
> > thread spawned by init which the init process can watchdog and thus can
> > flash up warning messages seems to be more straightforwards than an
> > elaborate asynchronous mechanism with completion signalling which
> > achieves the same thing in a more complicated (and thus bug prone)
> > fashion.
> 
> We no longer report back error on probe failure on module load.

Yes, we do; for every probe failure of a device on a driver we'll print
a warning (see drivers/base/dd.c).  Now if someone is proposing we
should report this in a better fashion, that's probably a good idea, but
I must have missed that patch.

>   It
> used to make sense to indicate error for module load on probe failure
> when the hardware was a lot simpler and drivers did their own device
> enumeration.  With the current bus / device setup, it doesn't make any
> sense and driver core silently suppresses all probe failures.  There's
> nothing the probing thread can monitor anymore.

Except the length of time taken to probe.  That seems to be what systemd
is interested in, hence this whole thread, right?

> In that sense, we already separated out device probing from module
> loading simply because the hardware reality mandated it and we have
> dynamic mechanisms to listen for device probes exactly for the same
> reason, so I think it makes sense to separate out the waiting too, at
> least in the long term.  In a modern dynamic setup, the waits are
> essentially arbitrary and doesn't buy us anything.

But that's nothing to do with sync or async.  Nowadays we register a
driver, the driver may bind to multiple devices.  If one of those
devices encounters an error during probe, we just report the fact in
dmesg and move on.  The module_init thread currently returns when all
the probe routines for all enumerated devices have been called, so
module_init has no indication of any failures (because they might be
mixed with successes); successes are indicated as the device appears but
we have nothing other than the kernel log to indicate the failures.  How
does moving to async probing alter this?  It doesn't as far as I can
see, except that module_init returns earlier but now we no longer have
an indication of when the probe completes, so we have to add yet another
mechanism to tell us if we're interested in that.  I really don't see
what this buys us.

James



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:26                               ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 22:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Wed, 2014-09-10 at 06:42 +0900, Tejun Heo wrote:
> Hey, James.
> 
> On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> > I don't have very strong views on this one.  However, I've got to say
> > from a systems point of view that if the desire is to flag when the
> > module is having problems, probing and initializing synchronously in a
> > thread spawned by init which the init process can watchdog and thus can
> > flash up warning messages seems to be more straightforwards than an
> > elaborate asynchronous mechanism with completion signalling which
> > achieves the same thing in a more complicated (and thus bug prone)
> > fashion.
> 
> We no longer report back error on probe failure on module load.

Yes, we do; for every probe failure of a device on a driver we'll print
a warning (see drivers/base/dd.c).  Now if someone is proposing we
should report this in a better fashion, that's probably a good idea, but
I must have missed that patch.

>   It
> used to make sense to indicate error for module load on probe failure
> when the hardware was a lot simpler and drivers did their own device
> enumeration.  With the current bus / device setup, it doesn't make any
> sense and driver core silently suppresses all probe failures.  There's
> nothing the probing thread can monitor anymore.

Except the length of time taken to probe.  That seems to be what systemd
is interested in, hence this whole thread, right?

> In that sense, we already separated out device probing from module
> loading simply because the hardware reality mandated it and we have
> dynamic mechanisms to listen for device probes exactly for the same
> reason, so I think it makes sense to separate out the waiting too, at
> least in the long term.  In a modern dynamic setup, the waits are
> essentially arbitrary and doesn't buy us anything.

But that's nothing to do with sync or async.  Nowadays we register a
driver, the driver may bind to multiple devices.  If one of those
devices encounters an error during probe, we just report the fact in
dmesg and move on.  The module_init thread currently returns when all
the probe routines for all enumerated devices have been called, so
module_init has no indication of any failures (because they might be
mixed with successes); successes are indicated as the device appears but
we have nothing other than the kernel log to indicate the failures.  How
does moving to async probing alter this?  It doesn't as far as I can
see, except that module_init returns earlier but now we no longer have
an indication of when the probe completes, so we have to add yet another
mechanism to tell us if we're interested in that.  I really don't see
what this buys us.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:26                               ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 22:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Wed, 2014-09-10 at 06:42 +0900, Tejun Heo wrote:
> Hey, James.
> 
> On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> > I don't have very strong views on this one.  However, I've got to say
> > from a systems point of view that if the desire is to flag when the
> > module is having problems, probing and initializing synchronously in a
> > thread spawned by init which the init process can watchdog and thus can
> > flash up warning messages seems to be more straightforwards than an
> > elaborate asynchronous mechanism with completion signalling which
> > achieves the same thing in a more complicated (and thus bug prone)
> > fashion.
> 
> We no longer report back error on probe failure on module load.

Yes, we do; for every probe failure of a device on a driver we'll print
a warning (see drivers/base/dd.c).  Now if someone is proposing we
should report this in a better fashion, that's probably a good idea, but
I must have missed that patch.

>   It
> used to make sense to indicate error for module load on probe failure
> when the hardware was a lot simpler and drivers did their own device
> enumeration.  With the current bus / device setup, it doesn't make any
> sense and driver core silently suppresses all probe failures.  There's
> nothing the probing thread can monitor anymore.

Except the length of time taken to probe.  That seems to be what systemd
is interested in, hence this whole thread, right?

> In that sense, we already separated out device probing from module
> loading simply because the hardware reality mandated it and we have
> dynamic mechanisms to listen for device probes exactly for the same
> reason, so I think it makes sense to separate out the waiting too, at
> least in the long term.  In a modern dynamic setup, the waits are
> essentially arbitrary and doesn't buy us anything.

But that's nothing to do with sync or async.  Nowadays we register a
driver, the driver may bind to multiple devices.  If one of those
devices encounters an error during probe, we just report the fact in
dmesg and move on.  The module_init thread currently returns when all
the probe routines for all enumerated devices have been called, so
module_init has no indication of any failures (because they might be
mixed with successes); successes are indicated as the device appears but
we have nothing other than the kernel log to indicate the failures.  How
does moving to async probing alter this?  It doesn't as far as I can
see, except that module_init returns earlier but now we no longer have
an indication of when the probe completes, so we have to add yet another
mechanism to tell us if we're interested in that.  I really don't see
what this buys us.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 22:26                               ` James Bottomley
  (?)
@ 2014-09-09 22:41                                 ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 22:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello,

On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > We no longer report back error on probe failure on module load.
> 
> Yes, we do; for every probe failure of a device on a driver we'll print
> a warning (see drivers/base/dd.c).  Now if someone is proposing we
> should report this in a better fashion, that's probably a good idea, but
> I must have missed that patch.

We can do printks all the same from anywhere.  There's nothing special
about printing from the module loading thread.  The only way to
actually take advantage of the synchronisity would be propagating
error return to the waiting issuer, which we used to do but no longer
can.

> >   It
> > used to make sense to indicate error for module load on probe failure
> > when the hardware was a lot simpler and drivers did their own device
> > enumeration.  With the current bus / device setup, it doesn't make any
> > sense and driver core silently suppresses all probe failures.  There's
> > nothing the probing thread can monitor anymore.
> 
> Except the length of time taken to probe.  That seems to be what systemd
> is interested in, hence this whole thread, right?

No, systemd in this case isn't interested in the time taken to probe
at all.  It is expecting module load to just do that - load the
module.  Modern userlands, systemd or not, no longer depend on or make
use of the wait.

> But that's nothing to do with sync or async.  Nowadays we register a
> driver, the driver may bind to multiple devices.  If one of those
> devices encounters an error during probe, we just report the fact in
> dmesg and move on.  The module_init thread currently returns when all
> the probe routines for all enumerated devices have been called, so
> module_init has no indication of any failures (because they might be
> mixed with successes); successes are indicated as the device appears but
> we have nothing other than the kernel log to indicate the failures.  How
> does moving to async probing alter this?  It doesn't as far as I can
> see, except that module_init returns earlier but now we no longer have
> an indication of when the probe completes, so we have to add yet another
> mechanism to tell us if we're interested in that.  I really don't see
> what this buys us.

The thing is that we have to have dynamic mechanism to listen for
device attachments no matter what and such mechanism has been in place
for a long time at this point.  The synchronous wait simply doesn't
serve any purpose anymore and kinda gets in the way in that it makes
it a possibly extremely slow process to tell whether loading of a
module succeeded or not because the wait for the initial round of
probe is piggybacked.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:41                                 ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 22:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

Hello,

On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > We no longer report back error on probe failure on module load.
> 
> Yes, we do; for every probe failure of a device on a driver we'll print
> a warning (see drivers/base/dd.c).  Now if someone is proposing we
> should report this in a better fashion, that's probably a good idea, but
> I must have missed that patch.

We can do printks all the same from anywhere.  There's nothing special
about printing from the module loading thread.  The only way to
actually take advantage of the synchronisity would be propagating
error return to the waiting issuer, which we used to do but no longer
can.

> >   It
> > used to make sense to indicate error for module load on probe failure
> > when the hardware was a lot simpler and drivers did their own device
> > enumeration.  With the current bus / device setup, it doesn't make any
> > sense and driver core silently suppresses all probe failures.  There's
> > nothing the probing thread can monitor anymore.
> 
> Except the length of time taken to probe.  That seems to be what systemd
> is interested in, hence this whole thread, right?

No, systemd in this case isn't interested in the time taken to probe
at all.  It is expecting module load to just do that - load the
module.  Modern userlands, systemd or not, no longer depend on or make
use of the wait.

> But that's nothing to do with sync or async.  Nowadays we register a
> driver, the driver may bind to multiple devices.  If one of those
> devices encounters an error during probe, we just report the fact in
> dmesg and move on.  The module_init thread currently returns when all
> the probe routines for all enumerated devices have been called, so
> module_init has no indication of any failures (because they might be
> mixed with successes); successes are indicated as the device appears but
> we have nothing other than the kernel log to indicate the failures.  How
> does moving to async probing alter this?  It doesn't as far as I can
> see, except that module_init returns earlier but now we no longer have
> an indication of when the probe completes, so we have to add yet another
> mechanism to tell us if we're interested in that.  I really don't see
> what this buys us.

The thing is that we have to have dynamic mechanism to listen for
device attachments no matter what and such mechanism has been in place
for a long time at this point.  The synchronous wait simply doesn't
serve any purpose anymore and kinda gets in the way in that it makes
it a possibly extremely slow process to tell whether loading of a
module succeeded or not because the wait for the initial round of
probe is piggybacked.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:41                                 ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 22:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

Hello,

On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > We no longer report back error on probe failure on module load.
> 
> Yes, we do; for every probe failure of a device on a driver we'll print
> a warning (see drivers/base/dd.c).  Now if someone is proposing we
> should report this in a better fashion, that's probably a good idea, but
> I must have missed that patch.

We can do printks all the same from anywhere.  There's nothing special
about printing from the module loading thread.  The only way to
actually take advantage of the synchronisity would be propagating
error return to the waiting issuer, which we used to do but no longer
can.

> >   It
> > used to make sense to indicate error for module load on probe failure
> > when the hardware was a lot simpler and drivers did their own device
> > enumeration.  With the current bus / device setup, it doesn't make any
> > sense and driver core silently suppresses all probe failures.  There's
> > nothing the probing thread can monitor anymore.
> 
> Except the length of time taken to probe.  That seems to be what systemd
> is interested in, hence this whole thread, right?

No, systemd in this case isn't interested in the time taken to probe
at all.  It is expecting module load to just do that - load the
module.  Modern userlands, systemd or not, no longer depend on or make
use of the wait.

> But that's nothing to do with sync or async.  Nowadays we register a
> driver, the driver may bind to multiple devices.  If one of those
> devices encounters an error during probe, we just report the fact in
> dmesg and move on.  The module_init thread currently returns when all
> the probe routines for all enumerated devices have been called, so
> module_init has no indication of any failures (because they might be
> mixed with successes); successes are indicated as the device appears but
> we have nothing other than the kernel log to indicate the failures.  How
> does moving to async probing alter this?  It doesn't as far as I can
> see, except that module_init returns earlier but now we no longer have
> an indication of when the probe completes, so we have to add yet another
> mechanism to tell us if we're interested in that.  I really don't see
> what this buys us.

The thing is that we have to have dynamic mechanism to listen for
device attachments no matter what and such mechanism has been in place
for a long time at this point.  The synchronous wait simply doesn't
serve any purpose anymore and kinda gets in the way in that it makes
it a possibly extremely slow process to tell whether loading of a
module succeeded or not because the wait for the initial round of
probe is piggybacked.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 22:41                                 ` Tejun Heo
  (?)
@ 2014-09-09 22:46                                   ` James Bottomley
  -1 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 22:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > > We no longer report back error on probe failure on module load.
> > 
> > Yes, we do; for every probe failure of a device on a driver we'll print
> > a warning (see drivers/base/dd.c).  Now if someone is proposing we
> > should report this in a better fashion, that's probably a good idea, but
> > I must have missed that patch.
> 
> We can do printks all the same from anywhere.  There's nothing special
> about printing from the module loading thread.  The only way to
> actually take advantage of the synchronisity would be propagating
> error return to the waiting issuer, which we used to do but no longer
> can.

If you want the return of an individual device probe a log scraper gives
it to you ... and nothing else does currently.  The advantage of the
prink in dd.c is that it's standard for everything and can be scanned
for ... if you take that out, you'll get complaints about the lack of
standard messages (you'd be surprised at the number of enterprise
monitoring systems that actually do log scraping).

> > >   It
> > > used to make sense to indicate error for module load on probe failure
> > > when the hardware was a lot simpler and drivers did their own device
> > > enumeration.  With the current bus / device setup, it doesn't make any
> > > sense and driver core silently suppresses all probe failures.  There's
> > > nothing the probing thread can monitor anymore.
> > 
> > Except the length of time taken to probe.  That seems to be what systemd
> > is interested in, hence this whole thread, right?
> 
> No, systemd in this case isn't interested in the time taken to probe
> at all.  It is expecting module load to just do that - load the
> module.  Modern userlands, systemd or not, no longer depend on or make
> use of the wait.

So what's the problem?  it can just fire and forget; that's what fork()
is for.

> > But that's nothing to do with sync or async.  Nowadays we register a
> > driver, the driver may bind to multiple devices.  If one of those
> > devices encounters an error during probe, we just report the fact in
> > dmesg and move on.  The module_init thread currently returns when all
> > the probe routines for all enumerated devices have been called, so
> > module_init has no indication of any failures (because they might be
> > mixed with successes); successes are indicated as the device appears but
> > we have nothing other than the kernel log to indicate the failures.  How
> > does moving to async probing alter this?  It doesn't as far as I can
> > see, except that module_init returns earlier but now we no longer have
> > an indication of when the probe completes, so we have to add yet another
> > mechanism to tell us if we're interested in that.  I really don't see
> > what this buys us.
> 
> The thing is that we have to have dynamic mechanism to listen for
> device attachments no matter what and such mechanism has been in place
> for a long time at this point.  The synchronous wait simply doesn't
> serve any purpose anymore and kinda gets in the way in that it makes
> it a possibly extremely slow process to tell whether loading of a
> module succeeded or not because the wait for the initial round of
> probe is piggybacked.

OK, so we just fire and forget in userland ... why bother inventing an
elaborate new infrastructure in the kernel to do exactly what

modprobe <mod> &

would do?

James



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:46                                   ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 22:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > > We no longer report back error on probe failure on module load.
> > 
> > Yes, we do; for every probe failure of a device on a driver we'll print
> > a warning (see drivers/base/dd.c).  Now if someone is proposing we
> > should report this in a better fashion, that's probably a good idea, but
> > I must have missed that patch.
> 
> We can do printks all the same from anywhere.  There's nothing special
> about printing from the module loading thread.  The only way to
> actually take advantage of the synchronisity would be propagating
> error return to the waiting issuer, which we used to do but no longer
> can.

If you want the return of an individual device probe a log scraper gives
it to you ... and nothing else does currently.  The advantage of the
prink in dd.c is that it's standard for everything and can be scanned
for ... if you take that out, you'll get complaints about the lack of
standard messages (you'd be surprised at the number of enterprise
monitoring systems that actually do log scraping).

> > >   It
> > > used to make sense to indicate error for module load on probe failure
> > > when the hardware was a lot simpler and drivers did their own device
> > > enumeration.  With the current bus / device setup, it doesn't make any
> > > sense and driver core silently suppresses all probe failures.  There's
> > > nothing the probing thread can monitor anymore.
> > 
> > Except the length of time taken to probe.  That seems to be what systemd
> > is interested in, hence this whole thread, right?
> 
> No, systemd in this case isn't interested in the time taken to probe
> at all.  It is expecting module load to just do that - load the
> module.  Modern userlands, systemd or not, no longer depend on or make
> use of the wait.

So what's the problem?  it can just fire and forget; that's what fork()
is for.

> > But that's nothing to do with sync or async.  Nowadays we register a
> > driver, the driver may bind to multiple devices.  If one of those
> > devices encounters an error during probe, we just report the fact in
> > dmesg and move on.  The module_init thread currently returns when all
> > the probe routines for all enumerated devices have been called, so
> > module_init has no indication of any failures (because they might be
> > mixed with successes); successes are indicated as the device appears but
> > we have nothing other than the kernel log to indicate the failures.  How
> > does moving to async probing alter this?  It doesn't as far as I can
> > see, except that module_init returns earlier but now we no longer have
> > an indication of when the probe completes, so we have to add yet another
> > mechanism to tell us if we're interested in that.  I really don't see
> > what this buys us.
> 
> The thing is that we have to have dynamic mechanism to listen for
> device attachments no matter what and such mechanism has been in place
> for a long time at this point.  The synchronous wait simply doesn't
> serve any purpose anymore and kinda gets in the way in that it makes
> it a possibly extremely slow process to tell whether loading of a
> module succeeded or not because the wait for the initial round of
> probe is piggybacked.

OK, so we just fire and forget in userland ... why bother inventing an
elaborate new infrastructure in the kernel to do exactly what

modprobe <mod> &

would do?

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:46                                   ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-09 22:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > > We no longer report back error on probe failure on module load.
> > 
> > Yes, we do; for every probe failure of a device on a driver we'll print
> > a warning (see drivers/base/dd.c).  Now if someone is proposing we
> > should report this in a better fashion, that's probably a good idea, but
> > I must have missed that patch.
> 
> We can do printks all the same from anywhere.  There's nothing special
> about printing from the module loading thread.  The only way to
> actually take advantage of the synchronisity would be propagating
> error return to the waiting issuer, which we used to do but no longer
> can.

If you want the return of an individual device probe a log scraper gives
it to you ... and nothing else does currently.  The advantage of the
prink in dd.c is that it's standard for everything and can be scanned
for ... if you take that out, you'll get complaints about the lack of
standard messages (you'd be surprised at the number of enterprise
monitoring systems that actually do log scraping).

> > >   It
> > > used to make sense to indicate error for module load on probe failure
> > > when the hardware was a lot simpler and drivers did their own device
> > > enumeration.  With the current bus / device setup, it doesn't make any
> > > sense and driver core silently suppresses all probe failures.  There's
> > > nothing the probing thread can monitor anymore.
> > 
> > Except the length of time taken to probe.  That seems to be what systemd
> > is interested in, hence this whole thread, right?
> 
> No, systemd in this case isn't interested in the time taken to probe
> at all.  It is expecting module load to just do that - load the
> module.  Modern userlands, systemd or not, no longer depend on or make
> use of the wait.

So what's the problem?  it can just fire and forget; that's what fork()
is for.

> > But that's nothing to do with sync or async.  Nowadays we register a
> > driver, the driver may bind to multiple devices.  If one of those
> > devices encounters an error during probe, we just report the fact in
> > dmesg and move on.  The module_init thread currently returns when all
> > the probe routines for all enumerated devices have been called, so
> > module_init has no indication of any failures (because they might be
> > mixed with successes); successes are indicated as the device appears but
> > we have nothing other than the kernel log to indicate the failures.  How
> > does moving to async probing alter this?  It doesn't as far as I can
> > see, except that module_init returns earlier but now we no longer have
> > an indication of when the probe completes, so we have to add yet another
> > mechanism to tell us if we're interested in that.  I really don't see
> > what this buys us.
> 
> The thing is that we have to have dynamic mechanism to listen for
> device attachments no matter what and such mechanism has been in place
> for a long time at this point.  The synchronous wait simply doesn't
> serve any purpose anymore and kinda gets in the way in that it makes
> it a possibly extremely slow process to tell whether loading of a
> module succeeded or not because the wait for the initial round of
> probe is piggybacked.

OK, so we just fire and forget in userland ... why bother inventing an
elaborate new infrastructure in the kernel to do exactly what

modprobe <mod> &

would do?

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 22:46                                   ` James Bottomley
  (?)
@ 2014-09-09 22:52                                     ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 22:52 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

Hello, James.

On Tue, Sep 09, 2014 at 03:46:23PM -0700, James Bottomley wrote:
> If you want the return of an individual device probe a log scraper gives
> it to you ... and nothing else does currently.  The advantage of the
> prink in dd.c is that it's standard for everything and can be scanned
> for ... if you take that out, you'll get complaints about the lack of
> standard messages (you'd be surprised at the number of enterprise
> monitoring systems that actually do log scraping).

Why would a log scaper care about which task is printing the messages?
The printk can stay there.  There's nothing wrong with it.  Log
scapers tend to be asynchronous in nature but if a log scraper wants
to operate synchronously for whatever reason, it can simply not turn
on async probing.

> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe <mod> &
> 
> would do?

I think the argument there is that the issuer wants to know whether
such operations succeeded or not and wants to report and record the
result and possibly take other actions in response.  We're currently
mixing wait and error reporting for one type of operation with wait
for another.  I'm not saying it's a fatal flaw or anything but it can
get in the way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:52                                     ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 22:52 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

Hello, James.

On Tue, Sep 09, 2014 at 03:46:23PM -0700, James Bottomley wrote:
> If you want the return of an individual device probe a log scraper gives
> it to you ... and nothing else does currently.  The advantage of the
> prink in dd.c is that it's standard for everything and can be scanned
> for ... if you take that out, you'll get complaints about the lack of
> standard messages (you'd be surprised at the number of enterprise
> monitoring systems that actually do log scraping).

Why would a log scaper care about which task is printing the messages?
The printk can stay there.  There's nothing wrong with it.  Log
scapers tend to be asynchronous in nature but if a log scraper wants
to operate synchronously for whatever reason, it can simply not turn
on async probing.

> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe <mod> &
> 
> would do?

I think the argument there is that the issuer wants to know whether
such operations succeeded or not and wants to report and record the
result and possibly take other actions in response.  We're currently
mixing wait and error reporting for one type of operation with wait
for another.  I'm not saying it's a fatal flaw or anything but it can
get in the way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 22:52                                     ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 22:52 UTC (permalink / raw)
  To: James Bottomley
  Cc: Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy

Hello, James.

On Tue, Sep 09, 2014 at 03:46:23PM -0700, James Bottomley wrote:
> If you want the return of an individual device probe a log scraper gives
> it to you ... and nothing else does currently.  The advantage of the
> prink in dd.c is that it's standard for everything and can be scanned
> for ... if you take that out, you'll get complaints about the lack of
> standard messages (you'd be surprised at the number of enterprise
> monitoring systems that actually do log scraping).

Why would a log scaper care about which task is printing the messages?
The printk can stay there.  There's nothing wrong with it.  Log
scapers tend to be asynchronous in nature but if a log scraper wants
to operate synchronously for whatever reason, it can simply not turn
on async probing.

> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe <mod> &
> 
> would do?

I think the argument there is that the issuer wants to know whether
such operations succeeded or not and wants to report and record the
result and possibly take other actions in response.  We're currently
mixing wait and error reporting for one type of operation with wait
for another.  I'm not saying it's a fatal flaw or anything but it can
get in the way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 22:46                                   ` James Bottomley
  (?)
@ 2014-09-09 23:01                                     ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-09 23:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Cas ey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > 
> > The thing is that we have to have dynamic mechanism to listen for
> > device attachments no matter what and such mechanism has been in place
> > for a long time at this point.  The synchronous wait simply doesn't
> > serve any purpose anymore and kinda gets in the way in that it makes
> > it a possibly extremely slow process to tell whether loading of a
> > module succeeded or not because the wait for the initial round of
> > probe is piggybacked.
> 
> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe <mod> &
> 
> would do?

Just so we do not forget: we also want the no-modules case to also be able
to probe asynchronously so that a slow device does not stall kernel booting.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 23:01                                     ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-09 23:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > 
> > The thing is that we have to have dynamic mechanism to listen for
> > device attachments no matter what and such mechanism has been in place
> > for a long time at this point.  The synchronous wait simply doesn't
> > serve any purpose anymore and kinda gets in the way in that it makes
> > it a possibly extremely slow process to tell whether loading of a
> > module succeeded or not because the wait for the initial round of
> > probe is piggybacked.
> 
> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe <mod> &
> 
> would do?

Just so we do not forget: we also want the no-modules case to also be able
to probe asynchronously so that a slow device does not stall kernel booting.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 23:01                                     ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-09 23:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > 
> > The thing is that we have to have dynamic mechanism to listen for
> > device attachments no matter what and such mechanism has been in place
> > for a long time at this point.  The synchronous wait simply doesn't
> > serve any purpose anymore and kinda gets in the way in that it makes
> > it a possibly extremely slow process to tell whether loading of a
> > module succeeded or not because the wait for the initial round of
> > probe is piggybacked.
> 
> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe <mod> &
> 
> would do?

Just so we do not forget: we also want the no-modules case to also be able
to probe asynchronously so that a slow device does not stall kernel booting.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  3:25                                           ` Tejun Heo
  (?)
@ 2014-09-09 23:03                                             ` Tejun Heo
  -1 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 23:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> > On the systemd side of things it should enable this sysctl and for
> > older kernels what should it do?
> 
> Supposing the change is backported via -stable, it can try to set the
> sysctl on all kernels.  If the knob doesn't exist, the fix is not
> there and nothing can be done about it.

The more I think about it, the more I think this should be a
per-insmod instance thing rather than a system-wide switch.  Currently
the kernel param code doesn't allow a generic param outside the ones
specified by the module itself but adding support for something like
driver.async_load=1 shouldn't be too difficult, applying that to
existing systems shouldn't be much more difficult than a system-wide
switch, and it'd be siginificantly cleaner than fiddling with driver
blacklist.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 23:03                                             ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 23:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> > On the systemd side of things it should enable this sysctl and for
> > older kernels what should it do?
> 
> Supposing the change is backported via -stable, it can try to set the
> sysctl on all kernels.  If the knob doesn't exist, the fix is not
> there and nothing can be done about it.

The more I think about it, the more I think this should be a
per-insmod instance thing rather than a system-wide switch.  Currently
the kernel param code doesn't allow a generic param outside the ones
specified by the module itself but adding support for something like
driver.async_load=1 shouldn't be too difficult, applying that to
existing systems shouldn't be much more difficult than a system-wide
switch, and it'd be siginificantly cleaner than fiddling with driver
blacklist.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-09 23:03                                             ` Tejun Heo
  0 siblings, 0 replies; 227+ messages in thread
From: Tejun Heo @ 2014-09-09 23:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth

On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> > On the systemd side of things it should enable this sysctl and for
> > older kernels what should it do?
> 
> Supposing the change is backported via -stable, it can try to set the
> sysctl on all kernels.  If the knob doesn't exist, the fix is not
> there and nothing can be done about it.

The more I think about it, the more I think this should be a
per-insmod instance thing rather than a system-wide switch.  Currently
the kernel param code doesn't allow a generic param outside the ones
specified by the module itself but adding support for something like
driver.async_load=1 shouldn't be too difficult, applying that to
existing systems shouldn't be much more difficult than a system-wide
switch, and it'd be siginificantly cleaner than fiddling with driver
blacklist.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  1:26                         ` Luis R. Rodriguez
  (?)
@ 2014-09-10  5:13                           ` Tom Gundersen
  -1 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-10  5:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Tue, Sep 9, 2014 at 3:26 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo <tj@kernel.org> wrote:
>> On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
>>> I'm not too convinced this is such a difficult problem to figure out.
>>> We already have most of logic in place and the only thing missing is
>>> how to switch it.  Wouldn't something like the following work?
>>>
>>> * Add a sysctl knob to enable asynchronous device probing on module
>>>   load and enable asynchronous probing globally if the knob is set.
>>
>> Alternatively, add a module-generic param "async_probe" or whatever
>> and use that to switch the behavior should work too.  I don't know
>> which way is better but either should work fine.
>
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

If the expectation is that this feature should be enabled
unconditionally for all systemd systems, wouldn't it make more sense
to make it a Kconfig option (possibly overridable from the kernel
commandline in case that makes testing simpler)?

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10  5:13                           ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-10  5:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tue, Sep 9, 2014 at 3:26 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo <tj@kernel.org> wrote:
>> On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
>>> I'm not too convinced this is such a difficult problem to figure out.
>>> We already have most of logic in place and the only thing missing is
>>> how to switch it.  Wouldn't something like the following work?
>>>
>>> * Add a sysctl knob to enable asynchronous device probing on module
>>>   load and enable asynchronous probing globally if the knob is set.
>>
>> Alternatively, add a module-generic param "async_probe" or whatever
>> and use that to switch the behavior should work too.  I don't know
>> which way is better but either should work fine.
>
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

If the expectation is that this feature should be enabled
unconditionally for all systemd systems, wouldn't it make more sense
to make it a Kconfig option (possibly overridable from the kernel
commandline in case that makes testing simpler)?

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10  5:13                           ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-10  5:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Tue, Sep 9, 2014 at 3:26 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo <tj@kernel.org> wrote:
>> On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
>>> I'm not too convinced this is such a difficult problem to figure out.
>>> We already have most of logic in place and the only thing missing is
>>> how to switch it.  Wouldn't something like the following work?
>>>
>>> * Add a sysctl knob to enable asynchronous device probing on module
>>>   load and enable asynchronous probing globally if the knob is set.
>>
>> Alternatively, add a module-generic param "async_probe" or whatever
>> and use that to switch the behavior should work too.  I don't know
>> which way is better but either should work fine.
>
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

If the expectation is that this feature should be enabled
unconditionally for all systemd systems, wouldn't it make more sense
to make it a Kconfig option (possibly overridable from the kernel
commandline in case that makes testing simpler)?

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 20:45                             ` Luis R. Rodriguez
  (?)
@ 2014-09-10  6:46                               ` Tom Gundersen
  -1 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-10  6:46 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Tejun Heo, Andrew Morton, Joseph Salisbury

On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>> > If we want to sort out some sync/async mechanism for probing devices, as
>>> > an agreement between the init systems and the kernel, that's fine, but
>>> > its a to-be negotiated enhancement.
>>>
>>> Unfortunately as Tejun notes the train has left which already made
>>> assumptions on this.
>>
>> Well, that's why it's a bug.  It's a material regression impacting
>> users.
>
> Indeed. I believe the issue with this regression however was that the
> original commit e64fae55 (January 2012) was only accepted by *kernel
> folks* to be a real regression until recently.

Just for the record, this only caused user-visible problems after
kernel commit 786235ee (November 2013), right?

> More than two years
> have gone by on growing design and assumptions on top of that original
> commit. I'm not sure if *systemd folks* yet believe its was a design
> regression?

I don't think so. udev should not allow its workers to run for an
unbounded length of time. Whether the upper bound should be 30, 60,
180 seconds or something else is up for debate (currently it is 60,
but if that is too short for some drivers we could certainly revisit
that). Moreover, it seems from this discussion that the aim is (still)
that insmod should be near-instantaneous (i.e., not wait for probe),
so it seems to me that the basic design is correct and all we need is
some temporary work-around and a way to better detect misbehaving
drivers?

>>>  I'm afraid distributions that want to avoid this
>>> sigkill at least on the kernel front will have to work around this
>>> issue either on systemd by increasing the default timeout which is now
>>> possible thanks to Hannes' changes or by some other means such as the
>>> combination of a modified non-chatty version of this patch + a check
>>> at the end of load_module() as mentioned earlier on these threads.
>>
>> Increasing the default timeout in systemd seems like the obvious bug fix
>> to me.  If the patch exists already, having distros that want it use it
>> looks to be correct ... not every bug is a kernel bug, after all.
>
> Its merged upstream on systemd now, along with a few fixes on top of
> it. I also see Kay merged a change to the default timeout to 60 second
> on August 30. Its unclear if these discussions had any impact on that
> decision or if that was just because udev firmware loading got now
> ripped out. I'll note that the new 60 second timeout wouldn't suffice
> for cxgb4 even if it didn't do firmware loading, its probe takes over
> one full minute.
>
>> Negotiating a probe vs init split for drivers is fine too, but it's a
>> longer term thing rather than a bug fix.
>
> Indeed. What I proposed with a multiplier for the timeout for the
> different types of built in commands was deemed complex but saw no
> alternatives proposed despite my interest to work on one and
> clarifications noted that this was a design regression. Not quite sure
> what else I could have done here. I'm interested in learning what the
> better approach is for the future as if we want to marry init + kernel
> we need a smooth way for us to discuss design without getting worked
> up about it, or taking it personal. I really want this to work as I
> personally like systemd so far.

How about this: keep the timeout global, but also introduce a
(relatively short, say 10 or 15 seconds) timeout after which a warning
is printed. Even if nothing is actually killed, having workers (be it
insmod or something else) take longer than a couple of seconds is
likely a sign that something is seriously off somewhere.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10  6:46                               ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-10  6:46 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>> > If we want to sort out some sync/async mechanism for probing devices, as
>>> > an agreement between the init systems and the kernel, that's fine, but
>>> > its a to-be negotiated enhancement.
>>>
>>> Unfortunately as Tejun notes the train has left which already made
>>> assumptions on this.
>>
>> Well, that's why it's a bug.  It's a material regression impacting
>> users.
>
> Indeed. I believe the issue with this regression however was that the
> original commit e64fae55 (January 2012) was only accepted by *kernel
> folks* to be a real regression until recently.

Just for the record, this only caused user-visible problems after
kernel commit 786235ee (November 2013), right?

> More than two years
> have gone by on growing design and assumptions on top of that original
> commit. I'm not sure if *systemd folks* yet believe its was a design
> regression?

I don't think so. udev should not allow its workers to run for an
unbounded length of time. Whether the upper bound should be 30, 60,
180 seconds or something else is up for debate (currently it is 60,
but if that is too short for some drivers we could certainly revisit
that). Moreover, it seems from this discussion that the aim is (still)
that insmod should be near-instantaneous (i.e., not wait for probe),
so it seems to me that the basic design is correct and all we need is
some temporary work-around and a way to better detect misbehaving
drivers?

>>>  I'm afraid distributions that want to avoid this
>>> sigkill at least on the kernel front will have to work around this
>>> issue either on systemd by increasing the default timeout which is now
>>> possible thanks to Hannes' changes or by some other means such as the
>>> combination of a modified non-chatty version of this patch + a check
>>> at the end of load_module() as mentioned earlier on these threads.
>>
>> Increasing the default timeout in systemd seems like the obvious bug fix
>> to me.  If the patch exists already, having distros that want it use it
>> looks to be correct ... not every bug is a kernel bug, after all.
>
> Its merged upstream on systemd now, along with a few fixes on top of
> it. I also see Kay merged a change to the default timeout to 60 second
> on August 30. Its unclear if these discussions had any impact on that
> decision or if that was just because udev firmware loading got now
> ripped out. I'll note that the new 60 second timeout wouldn't suffice
> for cxgb4 even if it didn't do firmware loading, its probe takes over
> one full minute.
>
>> Negotiating a probe vs init split for drivers is fine too, but it's a
>> longer term thing rather than a bug fix.
>
> Indeed. What I proposed with a multiplier for the timeout for the
> different types of built in commands was deemed complex but saw no
> alternatives proposed despite my interest to work on one and
> clarifications noted that this was a design regression. Not quite sure
> what else I could have done here. I'm interested in learning what the
> better approach is for the future as if we want to marry init + kernel
> we need a smooth way for us to discuss design without getting worked
> up about it, or taking it personal. I really want this to work as I
> personally like systemd so far.

How about this: keep the timeout global, but also introduce a
(relatively short, say 10 or 15 seconds) timeout after which a warning
is printed. Even if nothing is actually killed, having workers (be it
insmod or something else) take longer than a couple of seconds is
likely a sign that something is seriously off somewhere.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10  6:46                               ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-10  6:46 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>> > If we want to sort out some sync/async mechanism for probing devices, as
>>> > an agreement between the init systems and the kernel, that's fine, but
>>> > its a to-be negotiated enhancement.
>>>
>>> Unfortunately as Tejun notes the train has left which already made
>>> assumptions on this.
>>
>> Well, that's why it's a bug.  It's a material regression impacting
>> users.
>
> Indeed. I believe the issue with this regression however was that the
> original commit e64fae55 (January 2012) was only accepted by *kernel
> folks* to be a real regression until recently.

Just for the record, this only caused user-visible problems after
kernel commit 786235ee (November 2013), right?

> More than two years
> have gone by on growing design and assumptions on top of that original
> commit. I'm not sure if *systemd folks* yet believe its was a design
> regression?

I don't think so. udev should not allow its workers to run for an
unbounded length of time. Whether the upper bound should be 30, 60,
180 seconds or something else is up for debate (currently it is 60,
but if that is too short for some drivers we could certainly revisit
that). Moreover, it seems from this discussion that the aim is (still)
that insmod should be near-instantaneous (i.e., not wait for probe),
so it seems to me that the basic design is correct and all we need is
some temporary work-around and a way to better detect misbehaving
drivers?

>>>  I'm afraid distributions that want to avoid this
>>> sigkill at least on the kernel front will have to work around this
>>> issue either on systemd by increasing the default timeout which is now
>>> possible thanks to Hannes' changes or by some other means such as the
>>> combination of a modified non-chatty version of this patch + a check
>>> at the end of load_module() as mentioned earlier on these threads.
>>
>> Increasing the default timeout in systemd seems like the obvious bug fix
>> to me.  If the patch exists already, having distros that want it use it
>> looks to be correct ... not every bug is a kernel bug, after all.
>
> Its merged upstream on systemd now, along with a few fixes on top of
> it. I also see Kay merged a change to the default timeout to 60 second
> on August 30. Its unclear if these discussions had any impact on that
> decision or if that was just because udev firmware loading got now
> ripped out. I'll note that the new 60 second timeout wouldn't suffice
> for cxgb4 even if it didn't do firmware loading, its probe takes over
> one full minute.
>
>> Negotiating a probe vs init split for drivers is fine too, but it's a
>> longer term thing rather than a bug fix.
>
> Indeed. What I proposed with a multiplier for the timeout for the
> different types of built in commands was deemed complex but saw no
> alternatives proposed despite my interest to work on one and
> clarifications noted that this was a design regression. Not quite sure
> what else I could have done here. I'm interested in learning what the
> better approach is for the future as if we want to marry init + kernel
> we need a smooth way for us to discuss design without getting worked
> up about it, or taking it personal. I really want this to work as I
> personally like systemd so far.

How about this: keep the timeout global, but also introduce a
(relatively short, say 10 or 15 seconds) timeout after which a warning
is printed. Even if nothing is actually killed, having workers (be it
insmod or something else) take longer than a couple of seconds is
likely a sign that something is seriously off somewhere.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-10  6:46                               ` Tom Gundersen
  (?)
@ 2014-09-10 10:07                                 ` Ceriel Jacobs
  -1 siblings, 0 replies; 227+ messages in thread
From: Ceriel Jacobs @ 2014-09-10 10:07 UTC (permalink / raw)
  To: Tom Gundersen, Luis R. Rodriguez
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Tejun Heo, Andrew Morton, Joseph Salisbury

Tom Gundersen schreef op 10-09-14 om 08:46:
>> >Indeed. What I proposed with a multiplier for the timeout for the
>> >different types of built in commands was deemed complex but saw no
>> >alternatives proposed despite my interest to work on one and
>> >clarifications noted that this was a design regression. Not quite sure
>> >what else I could have done here. I'm interested in learning what the
>> >better approach is for the future as if we want to marry init + kernel
>> >we need a smooth way for us to discuss design without getting worked
>> >up about it, or taking it personal. I really want this to work as I
>> >personally like systemd so far.
> How about this: keep the timeout global, but also introduce a
> (relatively short, say 10 or 15 seconds) timeout after which a warning
> is printed. Even if nothing is actually killed, having workers (be it
> insmod or something else) take longer than a couple of seconds is
> likely a sign that something is seriously off somewhere.

I don't agree with the statement that something is seriously off when it 
takes more then 10 to 15 seconds.

When probing only one hard disk drive, then I do agree that something is 
seriously off after 10 to 15 seconds.

When probing a SAS bus with one hundred hard disk drives in standby 
mode, then I do expect that to take longer then 10 to 15 seconds.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10 10:07                                 ` Ceriel Jacobs
  0 siblings, 0 replies; 227+ messages in thread
From: Ceriel Jacobs @ 2014-09-10 10:07 UTC (permalink / raw)
  To: Tom Gundersen, Luis R. Rodriguez
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan, systemd

Tom Gundersen schreef op 10-09-14 om 08:46:
>> >Indeed. What I proposed with a multiplier for the timeout for the
>> >different types of built in commands was deemed complex but saw no
>> >alternatives proposed despite my interest to work on one and
>> >clarifications noted that this was a design regression. Not quite sure
>> >what else I could have done here. I'm interested in learning what the
>> >better approach is for the future as if we want to marry init + kernel
>> >we need a smooth way for us to discuss design without getting worked
>> >up about it, or taking it personal. I really want this to work as I
>> >personally like systemd so far.
> How about this: keep the timeout global, but also introduce a
> (relatively short, say 10 or 15 seconds) timeout after which a warning
> is printed. Even if nothing is actually killed, having workers (be it
> insmod or something else) take longer than a couple of seconds is
> likely a sign that something is seriously off somewhere.

I don't agree with the statement that something is seriously off when it 
takes more then 10 to 15 seconds.

When probing only one hard disk drive, then I do agree that something is 
seriously off after 10 to 15 seconds.

When probing a SAS bus with one hundred hard disk drives in standby 
mode, then I do expect that to take longer then 10 to 15 seconds.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10 10:07                                 ` Ceriel Jacobs
  0 siblings, 0 replies; 227+ messages in thread
From: Ceriel Jacobs @ 2014-09-10 10:07 UTC (permalink / raw)
  To: Tom Gundersen, Luis R. Rodriguez
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan, systemd

Tom Gundersen schreef op 10-09-14 om 08:46:
>> >Indeed. What I proposed with a multiplier for the timeout for the
>> >different types of built in commands was deemed complex but saw no
>> >alternatives proposed despite my interest to work on one and
>> >clarifications noted that this was a design regression. Not quite sure
>> >what else I could have done here. I'm interested in learning what the
>> >better approach is for the future as if we want to marry init + kernel
>> >we need a smooth way for us to discuss design without getting worked
>> >up about it, or taking it personal. I really want this to work as I
>> >personally like systemd so far.
> How about this: keep the timeout global, but also introduce a
> (relatively short, say 10 or 15 seconds) timeout after which a warning
> is printed. Even if nothing is actually killed, having workers (be it
> insmod or something else) take longer than a couple of seconds is
> likely a sign that something is seriously off somewhere.

I don't agree with the statement that something is seriously off when it 
takes more then 10 to 15 seconds.

When probing only one hard disk drive, then I do agree that something is 
seriously off after 10 to 15 seconds.

When probing a SAS bus with one hundred hard disk drives in standby 
mode, then I do expect that to take longer then 10 to 15 seconds.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-10 10:07                                 ` Ceriel Jacobs
  (?)
@ 2014-09-10 13:31                                   ` James Bottomley
  -1 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-10 13:31 UTC (permalink / raw)
  To: Ceriel Jacobs
  Cc: Tom Gundersen, Luis R. Rodriguez, One Thousand Gnomes,
	Takashi Iwai, Kay Sievers, Oleg Nesterov, Praveen Krishnamoorthy,
	hare, Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Tejun Heo, Andrew Morton, Joseph Salisbury

On Wed, 2014-09-10 at 12:07 +0200, Ceriel Jacobs wrote:
> Tom Gundersen schreef op 10-09-14 om 08:46:
> >> >Indeed. What I proposed with a multiplier for the timeout for the
> >> >different types of built in commands was deemed complex but saw no
> >> >alternatives proposed despite my interest to work on one and
> >> >clarifications noted that this was a design regression. Not quite sure
> >> >what else I could have done here. I'm interested in learning what the
> >> >better approach is for the future as if we want to marry init + kernel
> >> >we need a smooth way for us to discuss design without getting worked
> >> >up about it, or taking it personal. I really want this to work as I
> >> >personally like systemd so far.
> > How about this: keep the timeout global, but also introduce a
> > (relatively short, say 10 or 15 seconds) timeout after which a warning
> > is printed. Even if nothing is actually killed, having workers (be it
> > insmod or something else) take longer than a couple of seconds is
> > likely a sign that something is seriously off somewhere.

> I don't agree with the statement that something is seriously off when it 
> takes more then 10 to 15 seconds.
> 
> When probing only one hard disk drive, then I do agree that something is 
> seriously off after 10 to 15 seconds.

Really?  We keep explaining that arbitrary times are wrong.  A while ago
the Adaptec driver used to use 15s as its bus settle time after the
initial reset (it's now a Kconfig variable set at 5s) and a Parallel bus
takes a minimum of 4s to scan and has to be done sequentially.  If any
probed device is having difficulty, that can escalate way beyond this
into the tens to hundreds of seconds.   If your root disk is on it,
you're waiting or not booting.

> When probing a SAS bus with one hundred hard disk drives in standby 
> mode, then I do expect that to take longer then 10 to 15 seconds.

Good luck with that even on SAS if you have a lot of expanders.

For an installed system, you know what you need (usually root and
possibly one other disc like /home), so you spawn all the insertions
asynchronously and then wait for just the devices you need them but,
since the alternative is panic when init isn't found, this wait better
be quite long (if not forever, given the consequence is guaranteed
failure).   Everything else can be async, but, as I've pointed out
before, it can be async in user space (fire and forget) instead of the
kernel.

James



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10 13:31                                   ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-10 13:31 UTC (permalink / raw)
  To: Ceriel Jacobs
  Cc: Tom Gundersen, Luis R. Rodriguez, One Thousand Gnomes,
	Takashi Iwai, Kay Sievers, Oleg Nesterov, Praveen Krishnamoorthy,
	hare, Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan

On Wed, 2014-09-10 at 12:07 +0200, Ceriel Jacobs wrote:
> Tom Gundersen schreef op 10-09-14 om 08:46:
> >> >Indeed. What I proposed with a multiplier for the timeout for the
> >> >different types of built in commands was deemed complex but saw no
> >> >alternatives proposed despite my interest to work on one and
> >> >clarifications noted that this was a design regression. Not quite sure
> >> >what else I could have done here. I'm interested in learning what the
> >> >better approach is for the future as if we want to marry init + kernel
> >> >we need a smooth way for us to discuss design without getting worked
> >> >up about it, or taking it personal. I really want this to work as I
> >> >personally like systemd so far.
> > How about this: keep the timeout global, but also introduce a
> > (relatively short, say 10 or 15 seconds) timeout after which a warning
> > is printed. Even if nothing is actually killed, having workers (be it
> > insmod or something else) take longer than a couple of seconds is
> > likely a sign that something is seriously off somewhere.

> I don't agree with the statement that something is seriously off when it 
> takes more then 10 to 15 seconds.
> 
> When probing only one hard disk drive, then I do agree that something is 
> seriously off after 10 to 15 seconds.

Really?  We keep explaining that arbitrary times are wrong.  A while ago
the Adaptec driver used to use 15s as its bus settle time after the
initial reset (it's now a Kconfig variable set at 5s) and a Parallel bus
takes a minimum of 4s to scan and has to be done sequentially.  If any
probed device is having difficulty, that can escalate way beyond this
into the tens to hundreds of seconds.   If your root disk is on it,
you're waiting or not booting.

> When probing a SAS bus with one hundred hard disk drives in standby 
> mode, then I do expect that to take longer then 10 to 15 seconds.

Good luck with that even on SAS if you have a lot of expanders.

For an installed system, you know what you need (usually root and
possibly one other disc like /home), so you spawn all the insertions
asynchronously and then wait for just the devices you need them but,
since the alternative is panic when init isn't found, this wait better
be quite long (if not forever, given the consequence is guaranteed
failure).   Everything else can be async, but, as I've pointed out
before, it can be async in user space (fire and forget) instead of the
kernel.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10 13:31                                   ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-10 13:31 UTC (permalink / raw)
  To: Ceriel Jacobs
  Cc: Tom Gundersen, Luis R. Rodriguez, One Thousand Gnomes,
	Takashi Iwai, Kay Sievers, Oleg Nesterov, Praveen Krishnamoorthy,
	hare, Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan

On Wed, 2014-09-10 at 12:07 +0200, Ceriel Jacobs wrote:
> Tom Gundersen schreef op 10-09-14 om 08:46:
> >> >Indeed. What I proposed with a multiplier for the timeout for the
> >> >different types of built in commands was deemed complex but saw no
> >> >alternatives proposed despite my interest to work on one and
> >> >clarifications noted that this was a design regression. Not quite sure
> >> >what else I could have done here. I'm interested in learning what the
> >> >better approach is for the future as if we want to marry init + kernel
> >> >we need a smooth way for us to discuss design without getting worked
> >> >up about it, or taking it personal. I really want this to work as I
> >> >personally like systemd so far.
> > How about this: keep the timeout global, but also introduce a
> > (relatively short, say 10 or 15 seconds) timeout after which a warning
> > is printed. Even if nothing is actually killed, having workers (be it
> > insmod or something else) take longer than a couple of seconds is
> > likely a sign that something is seriously off somewhere.

> I don't agree with the statement that something is seriously off when it 
> takes more then 10 to 15 seconds.
> 
> When probing only one hard disk drive, then I do agree that something is 
> seriously off after 10 to 15 seconds.

Really?  We keep explaining that arbitrary times are wrong.  A while ago
the Adaptec driver used to use 15s as its bus settle time after the
initial reset (it's now a Kconfig variable set at 5s) and a Parallel bus
takes a minimum of 4s to scan and has to be done sequentially.  If any
probed device is having difficulty, that can escalate way beyond this
into the tens to hundreds of seconds.   If your root disk is on it,
you're waiting or not booting.

> When probing a SAS bus with one hundred hard disk drives in standby 
> mode, then I do expect that to take longer then 10 to 15 seconds.

Good luck with that even on SAS if you have a lot of expanders.

For an installed system, you know what you need (usually root and
possibly one other disc like /home), so you spawn all the insertions
asynchronously and then wait for just the devices you need them but,
since the alternative is panic when init isn't found, this wait better
be quite long (if not forever, given the consequence is guaranteed
failure).   Everything else can be async, but, as I've pointed out
before, it can be async in user space (fire and forget) instead of the
kernel.

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-10  6:46                               ` Tom Gundersen
  (?)
@ 2014-09-10 21:10                                 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-10 21:10 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Tejun Heo, Andrew Morton, Joseph Salisbury,
	Luis R. Rodriguez

Tom, thanks for reviewing this! My reply below!

On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> > If we want to sort out some sync/async mechanism for probing devices, as
>>>> > an agreement between the init systems and the kernel, that's fine, but
>>>> > its a to-be negotiated enhancement.
>>>>
>>>> Unfortunately as Tejun notes the train has left which already made
>>>> assumptions on this.
>>>
>>> Well, that's why it's a bug.  It's a material regression impacting
>>> users.
>>
>> Indeed. I believe the issue with this regression however was that the
>> original commit e64fae55 (January 2012) was only accepted by *kernel
>> folks* to be a real regression until recently.
>
> Just for the record, this only caused user-visible problems after
> kernel commit 786235ee (November 2013), right?

Another one was cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
work to debug this and trace the kill down to systemd-udev. A debug
kernel build has been provided now to try to pick up exactly on the
place where the kill was received, but its at least clear this came
from systemd.

>> More than two years
>> have gone by on growing design and assumptions on top of that original
>> commit. I'm not sure if *systemd folks* yet believe its was a design
>> regression?
>
> I don't think so. udev should not allow its workers to run for an
> unbounded length of time. Whether the upper bound should be 30, 60,
> 180 seconds or something else is up for debate (currently it is 60,
> but if that is too short for some drivers we could certainly revisit
> that).

That's the thing -- the timeout was put in place under the assumption
probe was asyncronous and its not, the driver core issues both module
init *and* probe together, the loader has to wait. That alone makes
the timeout a design flaw, and then systemd carried on top of that
design over two years after that. Its not systemd's fault, its just
that we never spoke about this as a design thing broadly and we should
have, and I will mention that even when the first issues creeped up,
the issue was still tossed back a driver problems. It was only until
recently that we realized that both init and probe run together that
we've been thinking about this problem differently. Systemd was trying
to ensure init on driver don't take long but its not init that is
taking long, its probe, and probe gets then penalized as the driver
core batches both init and probe synchronously before finishing module
loading. Furthermore as clarified by Tejun random userland is known to
exist that will wait indefinitely for module loading under the simple
assumption things *are done synchronously*, and its precisely why we
can't just blindly enable async probe upstream through a new driver
boolean as it can be unfair to this old userland. What is being
evaluated is to enable aync probe for *all* drivers through a new
general system-wide option. We cannot regress old userspace and
assumptions but we can create a new shiny universe.

> Moreover, it seems from this discussion that the aim is (still)
> that insmod should be near-instantaneous (i.e., not wait for probe),

The only reason that is being discussed is that systemd has not
accepted the timeout as a system design despite me pointing out the
original design flaw recently and at this point even if was accepted
as a design flaw it seems its too late. The approach taken to help
make all drivers async is simply an afterthought to give systemd what
it *thought* was in place, and it by no measure should be considered
the proper fix to the regression introduced by systemd, it may perhaps
be the right step long term for systemd systems given it goes with
what it assumed was there, but the timeout was flawed. Its not clear
if systemd can help with old kernels, it seems the ship has sailed and
there seems no options but for folks to work around that -- unless of
course some reasonable solution is found which doesn't break current
systemd design?

> so it seems to me that the basic design is correct and all we need is
> some temporary work-around and a way to better detect misbehaving
> drivers?

As part of this series I addressed hunting for the  "misbehaving
drivers" in-kernel as I saw no progress on the systemd side of things
to non-fatally detect "misbehaving drivers" despite my original RFCs
and request for advice. I quote  "misbehaving drivers" as its a flawed
view to consider them misbehaving now in light of clarifications of
how the driver core works in that it batches both init and probe
together always and we can't be penalizing long probes due to the fact
long probes are simply fine. My patch to pick up "misbehaving drivers"
drivers on the kernel front by picking up systemd's signal was
reactive but it was also simply braindead given the same exact reasons
why systemd's original timeout was flawed. We want a general solution
and we don't want to work around the root cause, in this case it was
systemd's assumption on how drivers work.

Keep in mind that the above just addresses kmod built-in cmd on
systemd, its where the timeout was introduced but as has been
clarified here assuming the same timeout on *all* other built-in
likely is likely pretty flawed as well and this does concern me. Its
why I mentioned that more than two years have gone by now on growing
design and assumptions on top of that original commit and its why its
hard for systemd to consider an alternative.

>>>>  I'm afraid distributions that want to avoid this
>>>> sigkill at least on the kernel front will have to work around this
>>>> issue either on systemd by increasing the default timeout which is now
>>>> possible thanks to Hannes' changes or by some other means such as the
>>>> combination of a modified non-chatty version of this patch + a check
>>>> at the end of load_module() as mentioned earlier on these threads.
>>>
>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>> to me.  If the patch exists already, having distros that want it use it
>>> looks to be correct ... not every bug is a kernel bug, after all.
>>
>> Its merged upstream on systemd now, along with a few fixes on top of
>> it. I also see Kay merged a change to the default timeout to 60 second
>> on August 30. Its unclear if these discussions had any impact on that
>> decision or if that was just because udev firmware loading got now
>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>> one full minute.
>>
>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>> longer term thing rather than a bug fix.
>>
>> Indeed. What I proposed with a multiplier for the timeout for the
>> different types of built in commands was deemed complex but saw no
>> alternatives proposed despite my interest to work on one and
>> clarifications noted that this was a design regression. Not quite sure
>> what else I could have done here. I'm interested in learning what the
>> better approach is for the future as if we want to marry init + kernel
>> we need a smooth way for us to discuss design without getting worked
>> up about it, or taking it personal. I really want this to work as I
>> personally like systemd so far.
>
> How about this: keep the timeout global, but also introduce a
> (relatively short, say 10 or 15 seconds) timeout after which a warning
> is printed.

That is something that I originally was looking forward to on systemd,
but here's the thing once that warning comes up  -- what would we do
with it? This patch addresses this warning in-kernel and the idea was
that we'd then peg an async_probe bool as true on the driver as a fix,
that was decided to be silly given all the above. These drivers are
actually not misbehaving and it would be even more incorrect to try to
"fix" them by making them run asynchronously. In fact for some old
storage drivers it may even be the worst thing to do given the
possible slew of userland deployment and scripts which assume things
*are* synchronous.

> Even if nothing is actually killed, having workers (be it
> insmod or something else) take longer than a couple of seconds is
> likely a sign that something is seriously off somewhere.

Probe can take a long time and that's fine, so for kmod the current
assumption is flawed. If we had an option to async probe all drivers
then perhaps this kmod timeout *might be reasonable*, and even then I
do recommend for a clear warning that can be collected on logs on its
first iteration rather than sigkilling, only after a whlie should
sigkilling be done really. If systemd can deal with module loading in
the background for drivers that take a long time and warning on that
intsead of sigkiling it may be good start prior to enabling a default
sigkill on drivers. This is perhaps also true for other workers but
its not clear if this is a reasonable strategy for systemd.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10 21:10                                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-10 21:10 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

Tom, thanks for reviewing this! My reply below!

On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> > If we want to sort out some sync/async mechanism for probing devices, as
>>>> > an agreement between the init systems and the kernel, that's fine, but
>>>> > its a to-be negotiated enhancement.
>>>>
>>>> Unfortunately as Tejun notes the train has left which already made
>>>> assumptions on this.
>>>
>>> Well, that's why it's a bug.  It's a material regression impacting
>>> users.
>>
>> Indeed. I believe the issue with this regression however was that the
>> original commit e64fae55 (January 2012) was only accepted by *kernel
>> folks* to be a real regression until recently.
>
> Just for the record, this only caused user-visible problems after
> kernel commit 786235ee (November 2013), right?

Another one was cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
work to debug this and trace the kill down to systemd-udev. A debug
kernel build has been provided now to try to pick up exactly on the
place where the kill was received, but its at least clear this came
from systemd.

>> More than two years
>> have gone by on growing design and assumptions on top of that original
>> commit. I'm not sure if *systemd folks* yet believe its was a design
>> regression?
>
> I don't think so. udev should not allow its workers to run for an
> unbounded length of time. Whether the upper bound should be 30, 60,
> 180 seconds or something else is up for debate (currently it is 60,
> but if that is too short for some drivers we could certainly revisit
> that).

That's the thing -- the timeout was put in place under the assumption
probe was asyncronous and its not, the driver core issues both module
init *and* probe together, the loader has to wait. That alone makes
the timeout a design flaw, and then systemd carried on top of that
design over two years after that. Its not systemd's fault, its just
that we never spoke about this as a design thing broadly and we should
have, and I will mention that even when the first issues creeped up,
the issue was still tossed back a driver problems. It was only until
recently that we realized that both init and probe run together that
we've been thinking about this problem differently. Systemd was trying
to ensure init on driver don't take long but its not init that is
taking long, its probe, and probe gets then penalized as the driver
core batches both init and probe synchronously before finishing module
loading. Furthermore as clarified by Tejun random userland is known to
exist that will wait indefinitely for module loading under the simple
assumption things *are done synchronously*, and its precisely why we
can't just blindly enable async probe upstream through a new driver
boolean as it can be unfair to this old userland. What is being
evaluated is to enable aync probe for *all* drivers through a new
general system-wide option. We cannot regress old userspace and
assumptions but we can create a new shiny universe.

> Moreover, it seems from this discussion that the aim is (still)
> that insmod should be near-instantaneous (i.e., not wait for probe),

The only reason that is being discussed is that systemd has not
accepted the timeout as a system design despite me pointing out the
original design flaw recently and at this point even if was accepted
as a design flaw it seems its too late. The approach taken to help
make all drivers async is simply an afterthought to give systemd what
it *thought* was in place, and it by no measure should be considered
the proper fix to the regression introduced by systemd, it may perhaps
be the right step long term for systemd systems given it goes with
what it assumed was there, but the timeout was flawed. Its not clear
if systemd can help with old kernels, it seems the ship has sailed and
there seems no options but for folks to work around that -- unless of
course some reasonable solution is found which doesn't break current
systemd design?

> so it seems to me that the basic design is correct and all we need is
> some temporary work-around and a way to better detect misbehaving
> drivers?

As part of this series I addressed hunting for the  "misbehaving
drivers" in-kernel as I saw no progress on the systemd side of things
to non-fatally detect "misbehaving drivers" despite my original RFCs
and request for advice. I quote  "misbehaving drivers" as its a flawed
view to consider them misbehaving now in light of clarifications of
how the driver core works in that it batches both init and probe
together always and we can't be penalizing long probes due to the fact
long probes are simply fine. My patch to pick up "misbehaving drivers"
drivers on the kernel front by picking up systemd's signal was
reactive but it was also simply braindead given the same exact reasons
why systemd's original timeout was flawed. We want a general solution
and we don't want to work around the root cause, in this case it was
systemd's assumption on how drivers work.

Keep in mind that the above just addresses kmod built-in cmd on
systemd, its where the timeout was introduced but as has been
clarified here assuming the same timeout on *all* other built-in
likely is likely pretty flawed as well and this does concern me. Its
why I mentioned that more than two years have gone by now on growing
design and assumptions on top of that original commit and its why its
hard for systemd to consider an alternative.

>>>>  I'm afraid distributions that want to avoid this
>>>> sigkill at least on the kernel front will have to work around this
>>>> issue either on systemd by increasing the default timeout which is now
>>>> possible thanks to Hannes' changes or by some other means such as the
>>>> combination of a modified non-chatty version of this patch + a check
>>>> at the end of load_module() as mentioned earlier on these threads.
>>>
>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>> to me.  If the patch exists already, having distros that want it use it
>>> looks to be correct ... not every bug is a kernel bug, after all.
>>
>> Its merged upstream on systemd now, along with a few fixes on top of
>> it. I also see Kay merged a change to the default timeout to 60 second
>> on August 30. Its unclear if these discussions had any impact on that
>> decision or if that was just because udev firmware loading got now
>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>> one full minute.
>>
>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>> longer term thing rather than a bug fix.
>>
>> Indeed. What I proposed with a multiplier for the timeout for the
>> different types of built in commands was deemed complex but saw no
>> alternatives proposed despite my interest to work on one and
>> clarifications noted that this was a design regression. Not quite sure
>> what else I could have done here. I'm interested in learning what the
>> better approach is for the future as if we want to marry init + kernel
>> we need a smooth way for us to discuss design without getting worked
>> up about it, or taking it personal. I really want this to work as I
>> personally like systemd so far.
>
> How about this: keep the timeout global, but also introduce a
> (relatively short, say 10 or 15 seconds) timeout after which a warning
> is printed.

That is something that I originally was looking forward to on systemd,
but here's the thing once that warning comes up  -- what would we do
with it? This patch addresses this warning in-kernel and the idea was
that we'd then peg an async_probe bool as true on the driver as a fix,
that was decided to be silly given all the above. These drivers are
actually not misbehaving and it would be even more incorrect to try to
"fix" them by making them run asynchronously. In fact for some old
storage drivers it may even be the worst thing to do given the
possible slew of userland deployment and scripts which assume things
*are* synchronous.

> Even if nothing is actually killed, having workers (be it
> insmod or something else) take longer than a couple of seconds is
> likely a sign that something is seriously off somewhere.

Probe can take a long time and that's fine, so for kmod the current
assumption is flawed. If we had an option to async probe all drivers
then perhaps this kmod timeout *might be reasonable*, and even then I
do recommend for a clear warning that can be collected on logs on its
first iteration rather than sigkilling, only after a whlie should
sigkilling be done really. If systemd can deal with module loading in
the background for drivers that take a long time and warning on that
intsead of sigkiling it may be good start prior to enabling a default
sigkill on drivers. This is perhaps also true for other workers but
its not clear if this is a reasonable strategy for systemd.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-10 21:10                                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-10 21:10 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

Tom, thanks for reviewing this! My reply below!

On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> > If we want to sort out some sync/async mechanism for probing devices, as
>>>> > an agreement between the init systems and the kernel, that's fine, but
>>>> > its a to-be negotiated enhancement.
>>>>
>>>> Unfortunately as Tejun notes the train has left which already made
>>>> assumptions on this.
>>>
>>> Well, that's why it's a bug.  It's a material regression impacting
>>> users.
>>
>> Indeed. I believe the issue with this regression however was that the
>> original commit e64fae55 (January 2012) was only accepted by *kernel
>> folks* to be a real regression until recently.
>
> Just for the record, this only caused user-visible problems after
> kernel commit 786235ee (November 2013), right?

Another one was cxgb4:

https://bugzilla.novell.com/show_bug.cgi?id=877622

SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
work to debug this and trace the kill down to systemd-udev. A debug
kernel build has been provided now to try to pick up exactly on the
place where the kill was received, but its at least clear this came
from systemd.

>> More than two years
>> have gone by on growing design and assumptions on top of that original
>> commit. I'm not sure if *systemd folks* yet believe its was a design
>> regression?
>
> I don't think so. udev should not allow its workers to run for an
> unbounded length of time. Whether the upper bound should be 30, 60,
> 180 seconds or something else is up for debate (currently it is 60,
> but if that is too short for some drivers we could certainly revisit
> that).

That's the thing -- the timeout was put in place under the assumption
probe was asyncronous and its not, the driver core issues both module
init *and* probe together, the loader has to wait. That alone makes
the timeout a design flaw, and then systemd carried on top of that
design over two years after that. Its not systemd's fault, its just
that we never spoke about this as a design thing broadly and we should
have, and I will mention that even when the first issues creeped up,
the issue was still tossed back a driver problems. It was only until
recently that we realized that both init and probe run together that
we've been thinking about this problem differently. Systemd was trying
to ensure init on driver don't take long but its not init that is
taking long, its probe, and probe gets then penalized as the driver
core batches both init and probe synchronously before finishing module
loading. Furthermore as clarified by Tejun random userland is known to
exist that will wait indefinitely for module loading under the simple
assumption things *are done synchronously*, and its precisely why we
can't just blindly enable async probe upstream through a new driver
boolean as it can be unfair to this old userland. What is being
evaluated is to enable aync probe for *all* drivers through a new
general system-wide option. We cannot regress old userspace and
assumptions but we can create a new shiny universe.

> Moreover, it seems from this discussion that the aim is (still)
> that insmod should be near-instantaneous (i.e., not wait for probe),

The only reason that is being discussed is that systemd has not
accepted the timeout as a system design despite me pointing out the
original design flaw recently and at this point even if was accepted
as a design flaw it seems its too late. The approach taken to help
make all drivers async is simply an afterthought to give systemd what
it *thought* was in place, and it by no measure should be considered
the proper fix to the regression introduced by systemd, it may perhaps
be the right step long term for systemd systems given it goes with
what it assumed was there, but the timeout was flawed. Its not clear
if systemd can help with old kernels, it seems the ship has sailed and
there seems no options but for folks to work around that -- unless of
course some reasonable solution is found which doesn't break current
systemd design?

> so it seems to me that the basic design is correct and all we need is
> some temporary work-around and a way to better detect misbehaving
> drivers?

As part of this series I addressed hunting for the  "misbehaving
drivers" in-kernel as I saw no progress on the systemd side of things
to non-fatally detect "misbehaving drivers" despite my original RFCs
and request for advice. I quote  "misbehaving drivers" as its a flawed
view to consider them misbehaving now in light of clarifications of
how the driver core works in that it batches both init and probe
together always and we can't be penalizing long probes due to the fact
long probes are simply fine. My patch to pick up "misbehaving drivers"
drivers on the kernel front by picking up systemd's signal was
reactive but it was also simply braindead given the same exact reasons
why systemd's original timeout was flawed. We want a general solution
and we don't want to work around the root cause, in this case it was
systemd's assumption on how drivers work.

Keep in mind that the above just addresses kmod built-in cmd on
systemd, its where the timeout was introduced but as has been
clarified here assuming the same timeout on *all* other built-in
likely is likely pretty flawed as well and this does concern me. Its
why I mentioned that more than two years have gone by now on growing
design and assumptions on top of that original commit and its why its
hard for systemd to consider an alternative.

>>>>  I'm afraid distributions that want to avoid this
>>>> sigkill at least on the kernel front will have to work around this
>>>> issue either on systemd by increasing the default timeout which is now
>>>> possible thanks to Hannes' changes or by some other means such as the
>>>> combination of a modified non-chatty version of this patch + a check
>>>> at the end of load_module() as mentioned earlier on these threads.
>>>
>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>> to me.  If the patch exists already, having distros that want it use it
>>> looks to be correct ... not every bug is a kernel bug, after all.
>>
>> Its merged upstream on systemd now, along with a few fixes on top of
>> it. I also see Kay merged a change to the default timeout to 60 second
>> on August 30. Its unclear if these discussions had any impact on that
>> decision or if that was just because udev firmware loading got now
>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>> one full minute.
>>
>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>> longer term thing rather than a bug fix.
>>
>> Indeed. What I proposed with a multiplier for the timeout for the
>> different types of built in commands was deemed complex but saw no
>> alternatives proposed despite my interest to work on one and
>> clarifications noted that this was a design regression. Not quite sure
>> what else I could have done here. I'm interested in learning what the
>> better approach is for the future as if we want to marry init + kernel
>> we need a smooth way for us to discuss design without getting worked
>> up about it, or taking it personal. I really want this to work as I
>> personally like systemd so far.
>
> How about this: keep the timeout global, but also introduce a
> (relatively short, say 10 or 15 seconds) timeout after which a warning
> is printed.

That is something that I originally was looking forward to on systemd,
but here's the thing once that warning comes up  -- what would we do
with it? This patch addresses this warning in-kernel and the idea was
that we'd then peg an async_probe bool as true on the driver as a fix,
that was decided to be silly given all the above. These drivers are
actually not misbehaving and it would be even more incorrect to try to
"fix" them by making them run asynchronously. In fact for some old
storage drivers it may even be the worst thing to do given the
possible slew of userland deployment and scripts which assume things
*are* synchronous.

> Even if nothing is actually killed, having workers (be it
> insmod or something else) take longer than a couple of seconds is
> likely a sign that something is seriously off somewhere.

Probe can take a long time and that's fine, so for kmod the current
assumption is flawed. If we had an option to async probe all drivers
then perhaps this kmod timeout *might be reasonable*, and even then I
do recommend for a clear warning that can be collected on logs on its
first iteration rather than sigkilling, only after a whlie should
sigkilling be done really. If systemd can deal with module loading in
the background for drivers that take a long time and warning on that
intsead of sigkiling it may be good start prior to enabling a default
sigkill on drivers. This is perhaps also true for other workers but
its not clear if this is a reasonable strategy for systemd.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-10 21:10                                 ` Luis R. Rodriguez
  (?)
@ 2014-09-11  5:42                                   ` Alexander E. Patrakov
  -1 siblings, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-09-11  5:42 UTC (permalink / raw)
  To: Luis R. Rodriguez, Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List,
	Linux SCSI List, netdev, Dmitry Torokhov, Oleg Nesterov,
	linux-kernel, Tejun Heo, Andrew Morton, Joseph Salisbury

11.09.2014 03:10, Luis R. Rodriguez wrote:
> Tom, thanks for reviewing this! My reply below!
>
> On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>>>> If we want to sort out some sync/async mechanism for probing devices, as
>>>>>> an agreement between the init systems and the kernel, that's fine, but
>>>>>> its a to-be negotiated enhancement.
>>>>>
>>>>> Unfortunately as Tejun notes the train has left which already made
>>>>> assumptions on this.
>>>>
>>>> Well, that's why it's a bug.  It's a material regression impacting
>>>> users.
>>>
>>> Indeed. I believe the issue with this regression however was that the
>>> original commit e64fae55 (January 2012) was only accepted by *kernel
>>> folks* to be a real regression until recently.
>>
>> Just for the record, this only caused user-visible problems after
>> kernel commit 786235ee (November 2013), right?
>
> Another one was cxgb4:
>
> https://bugzilla.novell.com/show_bug.cgi?id=877622
>
> SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
> work to debug this and trace the kill down to systemd-udev. A debug
> kernel build has been provided now to try to pick up exactly on the
> place where the kill was received, but its at least clear this came
> from systemd.
>
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading. Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.
>
>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?
>
>> so it seems to me that the basic design is correct and all we need is
>> some temporary work-around and a way to better detect misbehaving
>> drivers?
>
> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.
>
> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.
>
>>>>>   I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it? This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.
>
>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine, so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.
>
>   Luis

Just two small remarks to the whole thread.

First, I am quite surprised that nobody brought up the argument that 
module loading is serialized by the kernel. So, while pata-marvell on my 
laptop does its dirty "wait-reset-wait-reset-work" thing, no other 
module can be loaded. This prevention of loading other drivers is the 
thing that slows down the boot.

Second, I am going to XDC2014, LinuxCon Europe and Plumbers. I will take 
my laptop with me, feel free to see the situation firsthand or try 
debugging patches.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11  5:42                                   ` Alexander E. Patrakov
  0 siblings, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-09-11  5:42 UTC (permalink / raw)
  To: Luis R. Rodriguez, Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

11.09.2014 03:10, Luis R. Rodriguez wrote:
> Tom, thanks for reviewing this! My reply below!
>
> On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>>>> If we want to sort out some sync/async mechanism for probing devices, as
>>>>>> an agreement between the init systems and the kernel, that's fine, but
>>>>>> its a to-be negotiated enhancement.
>>>>>
>>>>> Unfortunately as Tejun notes the train has left which already made
>>>>> assumptions on this.
>>>>
>>>> Well, that's why it's a bug.  It's a material regression impacting
>>>> users.
>>>
>>> Indeed. I believe the issue with this regression however was that the
>>> original commit e64fae55 (January 2012) was only accepted by *kernel
>>> folks* to be a real regression until recently.
>>
>> Just for the record, this only caused user-visible problems after
>> kernel commit 786235ee (November 2013), right?
>
> Another one was cxgb4:
>
> https://bugzilla.novell.com/show_bug.cgi?id=877622
>
> SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
> work to debug this and trace the kill down to systemd-udev. A debug
> kernel build has been provided now to try to pick up exactly on the
> place where the kill was received, but its at least clear this came
> from systemd.
>
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading. Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.
>
>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?
>
>> so it seems to me that the basic design is correct and all we need is
>> some temporary work-around and a way to better detect misbehaving
>> drivers?
>
> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.
>
> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.
>
>>>>>   I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it? This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.
>
>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine, so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.
>
>   Luis

Just two small remarks to the whole thread.

First, I am quite surprised that nobody brought up the argument that 
module loading is serialized by the kernel. So, while pata-marvell on my 
laptop does its dirty "wait-reset-wait-reset-work" thing, no other 
module can be loaded. This prevention of loading other drivers is the 
thing that slows down the boot.

Second, I am going to XDC2014, LinuxCon Europe and Plumbers. I will take 
my laptop with me, feel free to see the situation firsthand or try 
debugging patches.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11  5:42                                   ` Alexander E. Patrakov
  0 siblings, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-09-11  5:42 UTC (permalink / raw)
  To: Luis R. Rodriguez, Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

11.09.2014 03:10, Luis R. Rodriguez wrote:
> Tom, thanks for reviewing this! My reply below!
>
> On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>>>> If we want to sort out some sync/async mechanism for probing devices, as
>>>>>> an agreement between the init systems and the kernel, that's fine, but
>>>>>> its a to-be negotiated enhancement.
>>>>>
>>>>> Unfortunately as Tejun notes the train has left which already made
>>>>> assumptions on this.
>>>>
>>>> Well, that's why it's a bug.  It's a material regression impacting
>>>> users.
>>>
>>> Indeed. I believe the issue with this regression however was that the
>>> original commit e64fae55 (January 2012) was only accepted by *kernel
>>> folks* to be a real regression until recently.
>>
>> Just for the record, this only caused user-visible problems after
>> kernel commit 786235ee (November 2013), right?
>
> Another one was cxgb4:
>
> https://bugzilla.novell.com/show_bug.cgi?id=877622
>
> SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
> work to debug this and trace the kill down to systemd-udev. A debug
> kernel build has been provided now to try to pick up exactly on the
> place where the kill was received, but its at least clear this came
> from systemd.
>
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading. Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.
>
>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?
>
>> so it seems to me that the basic design is correct and all we need is
>> some temporary work-around and a way to better detect misbehaving
>> drivers?
>
> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.
>
> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.
>
>>>>>   I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it? This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.
>
>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine, so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.
>
>   Luis

Just two small remarks to the whole thread.

First, I am quite surprised that nobody brought up the argument that 
module loading is serialized by the kernel. So, while pata-marvell on my 
laptop does its dirty "wait-reset-wait-reset-work" thing, no other 
module can be loaded. This prevention of loading other drivers is the 
thing that slows down the boot.

Second, I am going to XDC2014, LinuxCon Europe and Plumbers. I will take 
my laptop with me, feel free to see the situation firsthand or try 
debugging patches.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 23:01                                     ` Dmitry Torokhov
  (?)
@ 2014-09-11 19:59                                       ` James Bottomley
  -1 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-11 19:59 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Cas ey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev


On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > 
> > > The thing is that we have to have dynamic mechanism to listen for
> > > device attachments no matter what and such mechanism has been in place
> > > for a long time at this point.  The synchronous wait simply doesn't
> > > serve any purpose anymore and kinda gets in the way in that it makes
> > > it a possibly extremely slow process to tell whether loading of a
> > > module succeeded or not because the wait for the initial round of
> > > probe is piggybacked.
> > 
> > OK, so we just fire and forget in userland ... why bother inventing an
> > elaborate new infrastructure in the kernel to do exactly what
> > 
> > modprobe <mod> &
> > 
> > would do?
> 
> Just so we do not forget: we also want the no-modules case to also be able
> to probe asynchronously so that a slow device does not stall kernel booting.

Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
scanning of attached devices (once the cards are probed) but has a sync
point for ordering.

The problem of speeding up boot is different from the one of init
processes killing modprobes.  There are elements in common, but by and
large the biggest headaches at least in large device number boots have
already been tackled by the enterprise crowd (they don't like their
S390's or 1024 core NUMA systems taking half an hour to come up).

James



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 19:59                                       ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-11 19:59 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy


On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > 
> > > The thing is that we have to have dynamic mechanism to listen for
> > > device attachments no matter what and such mechanism has been in place
> > > for a long time at this point.  The synchronous wait simply doesn't
> > > serve any purpose anymore and kinda gets in the way in that it makes
> > > it a possibly extremely slow process to tell whether loading of a
> > > module succeeded or not because the wait for the initial round of
> > > probe is piggybacked.
> > 
> > OK, so we just fire and forget in userland ... why bother inventing an
> > elaborate new infrastructure in the kernel to do exactly what
> > 
> > modprobe <mod> &
> > 
> > would do?
> 
> Just so we do not forget: we also want the no-modules case to also be able
> to probe asynchronously so that a slow device does not stall kernel booting.

Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
scanning of attached devices (once the cards are probed) but has a sync
point for ordering.

The problem of speeding up boot is different from the one of init
processes killing modprobes.  There are elements in common, but by and
large the biggest headaches at least in large device number boots have
already been tackled by the enterprise crowd (they don't like their
S390's or 1024 core NUMA systems taking half an hour to come up).

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 19:59                                       ` James Bottomley
  0 siblings, 0 replies; 227+ messages in thread
From: James Bottomley @ 2014-09-11 19:59 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy


On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > 
> > > The thing is that we have to have dynamic mechanism to listen for
> > > device attachments no matter what and such mechanism has been in place
> > > for a long time at this point.  The synchronous wait simply doesn't
> > > serve any purpose anymore and kinda gets in the way in that it makes
> > > it a possibly extremely slow process to tell whether loading of a
> > > module succeeded or not because the wait for the initial round of
> > > probe is piggybacked.
> > 
> > OK, so we just fire and forget in userland ... why bother inventing an
> > elaborate new infrastructure in the kernel to do exactly what
> > 
> > modprobe <mod> &
> > 
> > would do?
> 
> Just so we do not forget: we also want the no-modules case to also be able
> to probe asynchronously so that a slow device does not stall kernel booting.

Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
scanning of attached devices (once the cards are probed) but has a sync
point for ordering.

The problem of speeding up boot is different from the one of init
processes killing modprobes.  There are elements in common, but by and
large the biggest headaches at least in large device number boots have
already been tackled by the enterprise crowd (they don't like their
S390's or 1024 core NUMA systems taking half an hour to come up).

James

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 19:59                                       ` James Bottomley
  (?)
@ 2014-09-11 20:23                                         ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-11 20:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Cas ey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> 
> On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > 
> > > > The thing is that we have to have dynamic mechanism to listen for
> > > > device attachments no matter what and such mechanism has been in place
> > > > for a long time at this point.  The synchronous wait simply doesn't
> > > > serve any purpose anymore and kinda gets in the way in that it makes
> > > > it a possibly extremely slow process to tell whether loading of a
> > > > module succeeded or not because the wait for the initial round of
> > > > probe is piggybacked.
> > > 
> > > OK, so we just fire and forget in userland ... why bother inventing an
> > > elaborate new infrastructure in the kernel to do exactly what
> > > 
> > > modprobe <mod> &
> > > 
> > > would do?
> > 
> > Just so we do not forget: we also want the no-modules case to also be able
> > to probe asynchronously so that a slow device does not stall kernel booting.
> 
> Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> scanning of attached devices (once the cards are probed)

What would it do it card was a bit slow to probe?

> but has a sync
> point for ordering.

Quite often we do not really care about ordering of devices. I mean,
does it matter if your mouse is discovered before your keyboard or
after?

>
> The problem of speeding up boot is different from the one of init
> processes killing modprobes.

Right. One is systemd doing stupid things, another is kernel could be
smarter.

>  There are elements in common, but by and
> large the biggest headaches at least in large device number boots have
> already been tackled by the enterprise crowd (they don't like their
> S390's or 1024 core NUMA systems taking half an hour to come up).

Please do not position this as a mostly solved large systems problem,
For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
a lot given that we boot in seconds.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 20:23                                         ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-11 20:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> 
> On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > 
> > > > The thing is that we have to have dynamic mechanism to listen for
> > > > device attachments no matter what and such mechanism has been in place
> > > > for a long time at this point.  The synchronous wait simply doesn't
> > > > serve any purpose anymore and kinda gets in the way in that it makes
> > > > it a possibly extremely slow process to tell whether loading of a
> > > > module succeeded or not because the wait for the initial round of
> > > > probe is piggybacked.
> > > 
> > > OK, so we just fire and forget in userland ... why bother inventing an
> > > elaborate new infrastructure in the kernel to do exactly what
> > > 
> > > modprobe <mod> &
> > > 
> > > would do?
> > 
> > Just so we do not forget: we also want the no-modules case to also be able
> > to probe asynchronously so that a slow device does not stall kernel booting.
> 
> Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> scanning of attached devices (once the cards are probed)

What would it do it card was a bit slow to probe?

> but has a sync
> point for ordering.

Quite often we do not really care about ordering of devices. I mean,
does it matter if your mouse is discovered before your keyboard or
after?

>
> The problem of speeding up boot is different from the one of init
> processes killing modprobes.

Right. One is systemd doing stupid things, another is kernel could be
smarter.

>  There are elements in common, but by and
> large the biggest headaches at least in large device number boots have
> already been tackled by the enterprise crowd (they don't like their
> S390's or 1024 core NUMA systems taking half an hour to come up).

Please do not position this as a mostly solved large systems problem,
For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
a lot given that we boot in seconds.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 20:23                                         ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-11 20:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Luis R. Rodriguez, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> 
> On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > 
> > > > The thing is that we have to have dynamic mechanism to listen for
> > > > device attachments no matter what and such mechanism has been in place
> > > > for a long time at this point.  The synchronous wait simply doesn't
> > > > serve any purpose anymore and kinda gets in the way in that it makes
> > > > it a possibly extremely slow process to tell whether loading of a
> > > > module succeeded or not because the wait for the initial round of
> > > > probe is piggybacked.
> > > 
> > > OK, so we just fire and forget in userland ... why bother inventing an
> > > elaborate new infrastructure in the kernel to do exactly what
> > > 
> > > modprobe <mod> &
> > > 
> > > would do?
> > 
> > Just so we do not forget: we also want the no-modules case to also be able
> > to probe asynchronously so that a slow device does not stall kernel booting.
> 
> Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> scanning of attached devices (once the cards are probed)

What would it do it card was a bit slow to probe?

> but has a sync
> point for ordering.

Quite often we do not really care about ordering of devices. I mean,
does it matter if your mouse is discovered before your keyboard or
after?

>
> The problem of speeding up boot is different from the one of init
> processes killing modprobes.

Right. One is systemd doing stupid things, another is kernel could be
smarter.

>  There are elements in common, but by and
> large the biggest headaches at least in large device number boots have
> already been tackled by the enterprise crowd (they don't like their
> S390's or 1024 core NUMA systems taking half an hour to come up).

Please do not position this as a mostly solved large systems problem,
For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
a lot given that we boot in seconds.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 20:23                                         ` Dmitry Torokhov
  (?)
@ 2014-09-11 20:42                                           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 20:42 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Cas ey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
>
>>  There are elements in common, but by and
>> large the biggest headaches at least in large device number boots have
>> already been tackled by the enterprise crowd (they don't like their
>> S390's or 1024 core NUMA systems taking half an hour to come up).
>
> Please do not position this as a mostly solved large systems problem,
> For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
> a lot given that we boot in seconds.

Dmitry, would working on top of the aysnc series be reasonable? Then
we could address these as separate things which we'd build on top of.
The one aspect I see us needing to share is the "async probe universe
is OK" flag.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 20:42                                           ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 20:42 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
>
>>  There are elements in common, but by and
>> large the biggest headaches at least in large device number boots have
>> already been tackled by the enterprise crowd (they don't like their
>> S390's or 1024 core NUMA systems taking half an hour to come up).
>
> Please do not position this as a mostly solved large systems problem,
> For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
> a lot given that we boot in seconds.

Dmitry, would working on top of the aysnc series be reasonable? Then
we could address these as separate things which we'd build on top of.
The one aspect I see us needing to share is the "async probe universe
is OK" flag.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 20:42                                           ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 20:42 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
>
>>  There are elements in common, but by and
>> large the biggest headaches at least in large device number boots have
>> already been tackled by the enterprise crowd (they don't like their
>> S390's or 1024 core NUMA systems taking half an hour to come up).
>
> Please do not position this as a mostly solved large systems problem,
> For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
> a lot given that we boot in seconds.

Dmitry, would working on top of the aysnc series be reasonable? Then
we could address these as separate things which we'd build on top of.
The one aspect I see us needing to share is the "async probe universe
is OK" flag.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 20:42                                           ` Luis R. Rodriguez
  (?)
@ 2014-09-11 20:53                                             ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Cas ey Leedom, Hariprasad S,
	MPT-FusionLinux.pdl, Linux SCSI List, netdev

On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
> On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> >
> >>  There are elements in common, but by and
> >> large the biggest headaches at least in large device number boots have
> >> already been tackled by the enterprise crowd (they don't like their
> >> S390's or 1024 core NUMA systems taking half an hour to come up).
> >
> > Please do not position this as a mostly solved large systems problem,
> > For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
> > a lot given that we boot in seconds.
> 
> Dmitry, would working on top of the aysnc series be reasonable? Then
> we could address these as separate things which we'd build on top of.
> The one aspect I see us needing to share is the "async probe universe
> is OK" flag.

Sure. Are you planning on refreshing your series? I think the
code-related discussion kind of stalled...

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 20:53                                             ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
> On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> >
> >>  There are elements in common, but by and
> >> large the biggest headaches at least in large device number boots have
> >> already been tackled by the enterprise crowd (they don't like their
> >> S390's or 1024 core NUMA systems taking half an hour to come up).
> >
> > Please do not position this as a mostly solved large systems problem,
> > For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
> > a lot given that we boot in seconds.
> 
> Dmitry, would working on top of the aysnc series be reasonable? Then
> we could address these as separate things which we'd build on top of.
> The one aspect I see us needing to share is the "async probe universe
> is OK" flag.

Sure. Are you planning on refreshing your series? I think the
code-related discussion kind of stalled...

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 20:53                                             ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-11 20:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
> On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> >
> >>  There are elements in common, but by and
> >> large the biggest headaches at least in large device number boots have
> >> already been tackled by the enterprise crowd (they don't like their
> >> S390's or 1024 core NUMA systems taking half an hour to come up).
> >
> > Please do not position this as a mostly solved large systems problem,
> > For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
> > a lot given that we boot in seconds.
> 
> Dmitry, would working on top of the aysnc series be reasonable? Then
> we could address these as separate things which we'd build on top of.
> The one aspect I see us needing to share is the "async probe universe
> is OK" flag.

Sure. Are you planning on refreshing your series? I think the
code-related discussion kind of stalled...

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 20:53                                             ` Dmitry Torokhov
  (?)
@ 2014-09-11 21:08                                               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 21:08 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Cas ey Leedom, Hariprasad S,
	mpt-fusionlinux.pdl, Linux SCSI List, netdev, Tom Gundersen

On Thu, Sep 11, 2014 at 1:53 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
>> On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>> >
>> >>  There are elements in common, but by and
>> >> large the biggest headaches at least in large device number boots have
>> >> already been tackled by the enterprise crowd (they don't like their
>> >> S390's or 1024 core NUMA systems taking half an hour to come up).
>> >
>> > Please do not position this as a mostly solved large systems problem,
>> > For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
>> > a lot given that we boot in seconds.
>>
>> Dmitry, would working on top of the aysnc series be reasonable? Then
>> we could address these as separate things which we'd build on top of.
>> The one aspect I see us needing to share is the "async probe universe
>> is OK" flag.
>
> Sure. Are you planning on refreshing your series?

Yes.

> I think the code-related discussion kind of stalled...

I was just waiting for any possible brain farts to flush out before a
new respin. I'll tackle this now.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 21:08                                               ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 21:08 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 1:53 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
>> On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>> >
>> >>  There are elements in common, but by and
>> >> large the biggest headaches at least in large device number boots have
>> >> already been tackled by the enterprise crowd (they don't like their
>> >> S390's or 1024 core NUMA systems taking half an hour to come up).
>> >
>> > Please do not position this as a mostly solved large systems problem,
>> > For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
>> > a lot given that we boot in seconds.
>>
>> Dmitry, would working on top of the aysnc series be reasonable? Then
>> we could address these as separate things which we'd build on top of.
>> The one aspect I see us needing to share is the "async probe universe
>> is OK" flag.
>
> Sure. Are you planning on refreshing your series?

Yes.

> I think the code-related discussion kind of stalled...

I was just waiting for any possible brain farts to flush out before a
new respin. I'll tackle this now.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 21:08                                               ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 21:08 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Lennart Poettering, Kay Sievers,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy

On Thu, Sep 11, 2014 at 1:53 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
>> On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>> >
>> >>  There are elements in common, but by and
>> >> large the biggest headaches at least in large device number boots have
>> >> already been tackled by the enterprise crowd (they don't like their
>> >> S390's or 1024 core NUMA systems taking half an hour to come up).
>> >
>> > Please do not position this as a mostly solved large systems problem,
>> > For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
>> > a lot given that we boot in seconds.
>>
>> Dmitry, would working on top of the aysnc series be reasonable? Then
>> we could address these as separate things which we'd build on top of.
>> The one aspect I see us needing to share is the "async probe universe
>> is OK" flag.
>
> Sure. Are you planning on refreshing your series?

Yes.

> I think the code-related discussion kind of stalled...

I was just waiting for any possible brain farts to flush out before a
new respin. I'll tackle this now.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-10 21:10                                 ` Luis R. Rodriguez
  (?)
@ 2014-09-11 21:43                                   ` Tom Gundersen
  -1 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-11 21:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Tejun Heo, Andrew Morton, Joseph Salisbury,
	Luis R. Rodriguez

On Wed, Sep 10, 2014 at 11:10 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading.

Just to clarify: udev/systemd is not trying to get into the business
of what the kernel does on finit_module(), we just need to make sure
that none of our workers stay around forever, which is why we have a
global timeout. If necessary we can bump this higher (as mentioned in
another thread I just bumped it to 180 secs), but we cannot abolish it
entirely.

> Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.

How about simply introducing a new flag to finit_module() to indicate
that the caller does not care about asynchronicity. We could then pass
this from udev, but existing scripts calling modprobe/insmod will not
be affected.

>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?

If I read the git logs correctly the hard timeout was introduced in
April 2011, so reverting it now seems indeed not to help much with all
the running systems out there.

> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.

Would your ongoing work to make probing asynchronous solve this
problem in the long-term? In the short-term I guess bumping the udev
timeout should be sufficient.

> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.

All built-ins should be near-instantaneous. If they are not, that
needs to be fixed, or they should not be udev built-ins at all. I have
now added a warning to udev if any builtin-in takes more than a third
of the timeout, so hopefully any problems should be spotted early.

>>>>>  I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it?

Short term: bump the timeout further. Long-term, hopefully the driver
(core) can be changed to avoid the problem.

> This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.

As mentioned above, it probably makes sense to switch on the
asynchronous behaviour only for a given call to finit_module(), and
not globally to avoid problems with userland assumptions.

>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine,

But isn't finit_module() taking a long time a serious problem given
that it means no other module can be loaded in parallel? Even if you
have some storage device which legitimately needs to take a couple of
minutes to probe, you probably still want your computer to boot and
get on with its other tasks whilst you wait... Or worse still, some
insignificant driver is broken and simply hangs in probe, but surely
you still want the rest of the system to boot?

> so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 21:43                                   ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-11 21:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Wed, Sep 10, 2014 at 11:10 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading.

Just to clarify: udev/systemd is not trying to get into the business
of what the kernel does on finit_module(), we just need to make sure
that none of our workers stay around forever, which is why we have a
global timeout. If necessary we can bump this higher (as mentioned in
another thread I just bumped it to 180 secs), but we cannot abolish it
entirely.

> Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.

How about simply introducing a new flag to finit_module() to indicate
that the caller does not care about asynchronicity. We could then pass
this from udev, but existing scripts calling modprobe/insmod will not
be affected.

>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?

If I read the git logs correctly the hard timeout was introduced in
April 2011, so reverting it now seems indeed not to help much with all
the running systems out there.

> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.

Would your ongoing work to make probing asynchronous solve this
problem in the long-term? In the short-term I guess bumping the udev
timeout should be sufficient.

> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.

All built-ins should be near-instantaneous. If they are not, that
needs to be fixed, or they should not be udev built-ins at all. I have
now added a warning to udev if any builtin-in takes more than a third
of the timeout, so hopefully any problems should be spotted early.

>>>>>  I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it?

Short term: bump the timeout further. Long-term, hopefully the driver
(core) can be changed to avoid the problem.

> This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.

As mentioned above, it probably makes sense to switch on the
asynchronous behaviour only for a given call to finit_module(), and
not globally to avoid problems with userland assumptions.

>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine,

But isn't finit_module() taking a long time a serious problem given
that it means no other module can be loaded in parallel? Even if you
have some storage device which legitimately needs to take a couple of
minutes to probe, you probably still want your computer to boot and
get on with its other tasks whilst you wait... Or worse still, some
insignificant driver is broken and simply hangs in probe, but surely
you still want the rest of the system to boot?

> so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 21:43                                   ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-11 21:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Wed, Sep 10, 2014 at 11:10 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading.

Just to clarify: udev/systemd is not trying to get into the business
of what the kernel does on finit_module(), we just need to make sure
that none of our workers stay around forever, which is why we have a
global timeout. If necessary we can bump this higher (as mentioned in
another thread I just bumped it to 180 secs), but we cannot abolish it
entirely.

> Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.

How about simply introducing a new flag to finit_module() to indicate
that the caller does not care about asynchronicity. We could then pass
this from udev, but existing scripts calling modprobe/insmod will not
be affected.

>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?

If I read the git logs correctly the hard timeout was introduced in
April 2011, so reverting it now seems indeed not to help much with all
the running systems out there.

> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.

Would your ongoing work to make probing asynchronous solve this
problem in the long-term? In the short-term I guess bumping the udev
timeout should be sufficient.

> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.

All built-ins should be near-instantaneous. If they are not, that
needs to be fixed, or they should not be udev built-ins at all. I have
now added a warning to udev if any builtin-in takes more than a third
of the timeout, so hopefully any problems should be spotted early.

>>>>>  I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it?

Short term: bump the timeout further. Long-term, hopefully the driver
(core) can be changed to avoid the problem.

> This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.

As mentioned above, it probably makes sense to switch on the
asynchronous behaviour only for a given call to finit_module(), and
not globally to avoid problems with userland assumptions.

>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine,

But isn't finit_module() taking a long time a serious problem given
that it means no other module can be loaded in parallel? Even if you
have some storage device which legitimately needs to take a couple of
minutes to probe, you probably still want your computer to boot and
get on with its other tasks whilst you wait... Or worse still, some
insignificant driver is broken and simply hangs in probe, but surely
you still want the rest of the system to boot?

> so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 21:43                                   ` Tom Gundersen
  (?)
@ 2014-09-11 22:26                                     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 22:26 UTC (permalink / raw)
  To: Tom Gundersen, Tejun Heo
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Andrew Morton, Joseph Salisbury

On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Wed, Sep 10, 2014 at 11:10 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>>>> More than two years
>>>> have gone by on growing design and assumptions on top of that original
>>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>>> regression?
>>>
>>> I don't think so. udev should not allow its workers to run for an
>>> unbounded length of time. Whether the upper bound should be 30, 60,
>>> 180 seconds or something else is up for debate (currently it is 60,
>>> but if that is too short for some drivers we could certainly revisit
>>> that).
>>
>> That's the thing -- the timeout was put in place under the assumption
>> probe was asyncronous and its not, the driver core issues both module
>> init *and* probe together, the loader has to wait. That alone makes
>> the timeout a design flaw, and then systemd carried on top of that
>> design over two years after that. Its not systemd's fault, its just
>> that we never spoke about this as a design thing broadly and we should
>> have, and I will mention that even when the first issues creeped up,
>> the issue was still tossed back a driver problems. It was only until
>> recently that we realized that both init and probe run together that
>> we've been thinking about this problem differently. Systemd was trying
>> to ensure init on driver don't take long but its not init that is
>> taking long, its probe, and probe gets then penalized as the driver
>> core batches both init and probe synchronously before finishing module
>> loading.
>
> Just to clarify: udev/systemd is not trying to get into the business
> of what the kernel does on finit_module(), we just need to make sure
> that none of our workers stay around forever, which is why we have a
> global timeout. If necessary we can bump this higher (as mentioned in
> another thread I just bumped it to 180 secs), but we cannot abolish it
> entirely.

180 seconds is certainly better than 30, but let me be clear here on
the extent to which the timeout at least for kmod built-in command can
be an issue. The driver core not only batches init and probe together
synchronously, it also runs probe for *all* devices that the device
driver can claim and all those series of probes run synchronously
within itself, that is bus_for_each_dev() runs synchronously on each
device. So, if a init takes 1 second and probe for each device takes
120 seconds and the system has 2 devices with the new timeout the
second device would not be successfully probed (and in fact I'm not
sure if this would kill the first).

>> Furthermore as clarified by Tejun random userland is known to
>> exist that will wait indefinitely for module loading under the simple
>> assumption things *are done synchronously*, and its precisely why we
>> can't just blindly enable async probe upstream through a new driver
>> boolean as it can be unfair to this old userland. What is being
>> evaluated is to enable aync probe for *all* drivers through a new
>> general system-wide option. We cannot regress old userspace and
>> assumptions but we can create a new shiny universe.
>
> How about simply introducing a new flag to finit_module() to indicate
> that the caller does not care about asynchronicity. We could then pass
> this from udev, but existing scripts calling modprobe/insmod will not
> be affected.

Do you mean that you *do want asynchronicity*?

>>> Moreover, it seems from this discussion that the aim is (still)
>>> that insmod should be near-instantaneous (i.e., not wait for probe),
>>
>> The only reason that is being discussed is that systemd has not
>> accepted the timeout as a system design despite me pointing out the
>> original design flaw recently and at this point even if was accepted
>> as a design flaw it seems its too late. The approach taken to help
>> make all drivers async is simply an afterthought to give systemd what
>> it *thought* was in place, and it by no measure should be considered
>> the proper fix to the regression introduced by systemd, it may perhaps
>> be the right step long term for systemd systems given it goes with
>> what it assumed was there, but the timeout was flawed. Its not clear
>> if systemd can help with old kernels, it seems the ship has sailed and
>> there seems no options but for folks to work around that -- unless of
>> course some reasonable solution is found which doesn't break current
>> systemd design?
>
> If I read the git logs correctly the hard timeout was introduced in
> April 2011, so reverting it now seems indeed not to help much with all
> the running systems out there.

yeah figured :(

>> As part of this series I addressed hunting for the  "misbehaving
>> drivers" in-kernel as I saw no progress on the systemd side of things
>> to non-fatally detect "misbehaving drivers" despite my original RFCs
>> and request for advice. I quote  "misbehaving drivers" as its a flawed
>> view to consider them misbehaving now in light of clarifications of
>> how the driver core works in that it batches both init and probe
>> together always and we can't be penalizing long probes due to the fact
>> long probes are simply fine. My patch to pick up "misbehaving drivers"
>> drivers on the kernel front by picking up systemd's signal was
>> reactive but it was also simply braindead given the same exact reasons
>> why systemd's original timeout was flawed. We want a general solution
>> and we don't want to work around the root cause, in this case it was
>> systemd's assumption on how drivers work.
>
> Would your ongoing work to make probing asynchronous solve this
> problem in the long-term? In the short-term I guess bumping the udev
> timeout should be sufficient.

That and the global flag / module param to specify the async desire
which would not regress old userspace. Probe afterall is the main
source of the issue.

>> Keep in mind that the above just addresses kmod built-in cmd on
>> systemd, its where the timeout was introduced but as has been
>> clarified here assuming the same timeout on *all* other built-in
>> likely is likely pretty flawed as well and this does concern me. Its
>> why I mentioned that more than two years have gone by now on growing
>> design and assumptions on top of that original commit and its why its
>> hard for systemd to consider an alternative.
>
> All built-ins should be near-instantaneous. If they are not, that
> needs to be fixed, or they should not be udev built-ins at all. I have
> now added a warning to udev if any builtin-in takes more than a third
> of the timeout, so hopefully any problems should be spotted early.

Great thanks. Collecting these should be valuable and help being proactive.

>>>>>>  I'm afraid distributions that want to avoid this
>>>>>> sigkill at least on the kernel front will have to work around this
>>>>>> issue either on systemd by increasing the default timeout which is now
>>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>>> combination of a modified non-chatty version of this patch + a check
>>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>>
>>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>>> to me.  If the patch exists already, having distros that want it use it
>>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>>
>>>> Its merged upstream on systemd now, along with a few fixes on top of
>>>> it. I also see Kay merged a change to the default timeout to 60 second
>>>> on August 30. Its unclear if these discussions had any impact on that
>>>> decision or if that was just because udev firmware loading got now
>>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>>> one full minute.
>>>>
>>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>>> longer term thing rather than a bug fix.
>>>>
>>>> Indeed. What I proposed with a multiplier for the timeout for the
>>>> different types of built in commands was deemed complex but saw no
>>>> alternatives proposed despite my interest to work on one and
>>>> clarifications noted that this was a design regression. Not quite sure
>>>> what else I could have done here. I'm interested in learning what the
>>>> better approach is for the future as if we want to marry init + kernel
>>>> we need a smooth way for us to discuss design without getting worked
>>>> up about it, or taking it personal. I really want this to work as I
>>>> personally like systemd so far.
>>>
>>> How about this: keep the timeout global, but also introduce a
>>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>>> is printed.
>>
>> That is something that I originally was looking forward to on systemd,
>> but here's the thing once that warning comes up  -- what would we do
>> with it?
>
> Short term: bump the timeout further. Long-term, hopefully the driver
> (core) can be changed to avoid the problem.

Fine by me, although I think some folks still have concerns with the
sigkill completely. But not sure if we escape it now.

>> This patch addresses this warning in-kernel and the idea was
>> that we'd then peg an async_probe bool as true on the driver as a fix,
>> that was decided to be silly given all the above. These drivers are
>> actually not misbehaving and it would be even more incorrect to try to
>> "fix" them by making them run asynchronously. In fact for some old
>> storage drivers it may even be the worst thing to do given the
>> possible slew of userland deployment and scripts which assume things
>> *are* synchronous.
>
> As mentioned above, it probably makes sense to switch on the
> asynchronous behaviour only for a given call to finit_module(), and
> not globally to avoid problems with userland assumptions.

Sure that's one way.

>>> Even if nothing is actually killed, having workers (be it
>>> insmod or something else) take longer than a couple of seconds is
>>> likely a sign that something is seriously off somewhere.
>>
>> Probe can take a long time and that's fine,
>
> But isn't finit_module() taking a long time a serious problem given
> that it means no other module can be loaded in parallel?

Indeed but having a desire to make the init() complete fast is
different than the desire to have the combination of both init and
probe fast synchronously. If userspace wants init to be fast and let
probe be async then userspace has no option but to deal with the fact
that async probe will be async, and it should then use other methods
to match any dependencies if its doing that itself. For example
networking should not kick off after a network driver is loaded but
rather one the device creeps up on udev. We should be good with
networking dealing with this correctly today but not sure about other
subsystems. depmod should be able to load the required modules in
order and if bus drivers work right then probe of the remnant devices
should happen asynchronously. The one case I can think of that is a
bit different is modules-load.d things but those *do not rely on the
timeout*, but are loaded prior to a service requirement. Note though
that if those modules had probe and they then run async'd then systemd
service would probably need to consider that the requirements may not
be there until later. If this is not carefully considered that could
introduce regression to users of modules-load.d when async probe is
fully deployed. The same applies to systemd making assumptions of kmod
loading a module and a dependency being complete as probe would have
run it before.

> Even if you
> have some storage device which legitimately needs to take a couple of
> minutes to probe, you probably still want your computer to boot and
> get on with its other tasks whilst you wait... Or worse still, some
> insignificant driver is broken and simply hangs in probe, but surely
> you still want the rest of the system to boot?

Agreed, I believe one concern here lies in on whether or not userspace
is properly equipped to deal with the requirements on module loading
doing async probing and that possibly failing. Perhaps systemd might
think all userspace is ready for that but are we sure that's the case?
Another obvious issue was if the driver was a storage driver and your
boot depended upon it. If it takes a while we kill it and you can't
boot, no bueno. If systemd can avoid those situations that'd be nice.
That was the source of the first major issue reported by Joseph.

Chattiness on issues before the timeout should help a lot, we should
start collecting these somehow. These should be collected and
addressed. If we really want to be good on this we should put a bit of
effort on monitoring these and not being reactive.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 22:26                                     ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 22:26 UTC (permalink / raw)
  To: Tom Gundersen, Tejun Heo
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan, systemd

On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Wed, Sep 10, 2014 at 11:10 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>>>> More than two years
>>>> have gone by on growing design and assumptions on top of that original
>>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>>> regression?
>>>
>>> I don't think so. udev should not allow its workers to run for an
>>> unbounded length of time. Whether the upper bound should be 30, 60,
>>> 180 seconds or something else is up for debate (currently it is 60,
>>> but if that is too short for some drivers we could certainly revisit
>>> that).
>>
>> That's the thing -- the timeout was put in place under the assumption
>> probe was asyncronous and its not, the driver core issues both module
>> init *and* probe together, the loader has to wait. That alone makes
>> the timeout a design flaw, and then systemd carried on top of that
>> design over two years after that. Its not systemd's fault, its just
>> that we never spoke about this as a design thing broadly and we should
>> have, and I will mention that even when the first issues creeped up,
>> the issue was still tossed back a driver problems. It was only until
>> recently that we realized that both init and probe run together that
>> we've been thinking about this problem differently. Systemd was trying
>> to ensure init on driver don't take long but its not init that is
>> taking long, its probe, and probe gets then penalized as the driver
>> core batches both init and probe synchronously before finishing module
>> loading.
>
> Just to clarify: udev/systemd is not trying to get into the business
> of what the kernel does on finit_module(), we just need to make sure
> that none of our workers stay around forever, which is why we have a
> global timeout. If necessary we can bump this higher (as mentioned in
> another thread I just bumped it to 180 secs), but we cannot abolish it
> entirely.

180 seconds is certainly better than 30, but let me be clear here on
the extent to which the timeout at least for kmod built-in command can
be an issue. The driver core not only batches init and probe together
synchronously, it also runs probe for *all* devices that the device
driver can claim and all those series of probes run synchronously
within itself, that is bus_for_each_dev() runs synchronously on each
device. So, if a init takes 1 second and probe for each device takes
120 seconds and the system has 2 devices with the new timeout the
second device would not be successfully probed (and in fact I'm not
sure if this would kill the first).

>> Furthermore as clarified by Tejun random userland is known to
>> exist that will wait indefinitely for module loading under the simple
>> assumption things *are done synchronously*, and its precisely why we
>> can't just blindly enable async probe upstream through a new driver
>> boolean as it can be unfair to this old userland. What is being
>> evaluated is to enable aync probe for *all* drivers through a new
>> general system-wide option. We cannot regress old userspace and
>> assumptions but we can create a new shiny universe.
>
> How about simply introducing a new flag to finit_module() to indicate
> that the caller does not care about asynchronicity. We could then pass
> this from udev, but existing scripts calling modprobe/insmod will not
> be affected.

Do you mean that you *do want asynchronicity*?

>>> Moreover, it seems from this discussion that the aim is (still)
>>> that insmod should be near-instantaneous (i.e., not wait for probe),
>>
>> The only reason that is being discussed is that systemd has not
>> accepted the timeout as a system design despite me pointing out the
>> original design flaw recently and at this point even if was accepted
>> as a design flaw it seems its too late. The approach taken to help
>> make all drivers async is simply an afterthought to give systemd what
>> it *thought* was in place, and it by no measure should be considered
>> the proper fix to the regression introduced by systemd, it may perhaps
>> be the right step long term for systemd systems given it goes with
>> what it assumed was there, but the timeout was flawed. Its not clear
>> if systemd can help with old kernels, it seems the ship has sailed and
>> there seems no options but for folks to work around that -- unless of
>> course some reasonable solution is found which doesn't break current
>> systemd design?
>
> If I read the git logs correctly the hard timeout was introduced in
> April 2011, so reverting it now seems indeed not to help much with all
> the running systems out there.

yeah figured :(

>> As part of this series I addressed hunting for the  "misbehaving
>> drivers" in-kernel as I saw no progress on the systemd side of things
>> to non-fatally detect "misbehaving drivers" despite my original RFCs
>> and request for advice. I quote  "misbehaving drivers" as its a flawed
>> view to consider them misbehaving now in light of clarifications of
>> how the driver core works in that it batches both init and probe
>> together always and we can't be penalizing long probes due to the fact
>> long probes are simply fine. My patch to pick up "misbehaving drivers"
>> drivers on the kernel front by picking up systemd's signal was
>> reactive but it was also simply braindead given the same exact reasons
>> why systemd's original timeout was flawed. We want a general solution
>> and we don't want to work around the root cause, in this case it was
>> systemd's assumption on how drivers work.
>
> Would your ongoing work to make probing asynchronous solve this
> problem in the long-term? In the short-term I guess bumping the udev
> timeout should be sufficient.

That and the global flag / module param to specify the async desire
which would not regress old userspace. Probe afterall is the main
source of the issue.

>> Keep in mind that the above just addresses kmod built-in cmd on
>> systemd, its where the timeout was introduced but as has been
>> clarified here assuming the same timeout on *all* other built-in
>> likely is likely pretty flawed as well and this does concern me. Its
>> why I mentioned that more than two years have gone by now on growing
>> design and assumptions on top of that original commit and its why its
>> hard for systemd to consider an alternative.
>
> All built-ins should be near-instantaneous. If they are not, that
> needs to be fixed, or they should not be udev built-ins at all. I have
> now added a warning to udev if any builtin-in takes more than a third
> of the timeout, so hopefully any problems should be spotted early.

Great thanks. Collecting these should be valuable and help being proactive.

>>>>>>  I'm afraid distributions that want to avoid this
>>>>>> sigkill at least on the kernel front will have to work around this
>>>>>> issue either on systemd by increasing the default timeout which is now
>>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>>> combination of a modified non-chatty version of this patch + a check
>>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>>
>>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>>> to me.  If the patch exists already, having distros that want it use it
>>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>>
>>>> Its merged upstream on systemd now, along with a few fixes on top of
>>>> it. I also see Kay merged a change to the default timeout to 60 second
>>>> on August 30. Its unclear if these discussions had any impact on that
>>>> decision or if that was just because udev firmware loading got now
>>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>>> one full minute.
>>>>
>>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>>> longer term thing rather than a bug fix.
>>>>
>>>> Indeed. What I proposed with a multiplier for the timeout for the
>>>> different types of built in commands was deemed complex but saw no
>>>> alternatives proposed despite my interest to work on one and
>>>> clarifications noted that this was a design regression. Not quite sure
>>>> what else I could have done here. I'm interested in learning what the
>>>> better approach is for the future as if we want to marry init + kernel
>>>> we need a smooth way for us to discuss design without getting worked
>>>> up about it, or taking it personal. I really want this to work as I
>>>> personally like systemd so far.
>>>
>>> How about this: keep the timeout global, but also introduce a
>>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>>> is printed.
>>
>> That is something that I originally was looking forward to on systemd,
>> but here's the thing once that warning comes up  -- what would we do
>> with it?
>
> Short term: bump the timeout further. Long-term, hopefully the driver
> (core) can be changed to avoid the problem.

Fine by me, although I think some folks still have concerns with the
sigkill completely. But not sure if we escape it now.

>> This patch addresses this warning in-kernel and the idea was
>> that we'd then peg an async_probe bool as true on the driver as a fix,
>> that was decided to be silly given all the above. These drivers are
>> actually not misbehaving and it would be even more incorrect to try to
>> "fix" them by making them run asynchronously. In fact for some old
>> storage drivers it may even be the worst thing to do given the
>> possible slew of userland deployment and scripts which assume things
>> *are* synchronous.
>
> As mentioned above, it probably makes sense to switch on the
> asynchronous behaviour only for a given call to finit_module(), and
> not globally to avoid problems with userland assumptions.

Sure that's one way.

>>> Even if nothing is actually killed, having workers (be it
>>> insmod or something else) take longer than a couple of seconds is
>>> likely a sign that something is seriously off somewhere.
>>
>> Probe can take a long time and that's fine,
>
> But isn't finit_module() taking a long time a serious problem given
> that it means no other module can be loaded in parallel?

Indeed but having a desire to make the init() complete fast is
different than the desire to have the combination of both init and
probe fast synchronously. If userspace wants init to be fast and let
probe be async then userspace has no option but to deal with the fact
that async probe will be async, and it should then use other methods
to match any dependencies if its doing that itself. For example
networking should not kick off after a network driver is loaded but
rather one the device creeps up on udev. We should be good with
networking dealing with this correctly today but not sure about other
subsystems. depmod should be able to load the required modules in
order and if bus drivers work right then probe of the remnant devices
should happen asynchronously. The one case I can think of that is a
bit different is modules-load.d things but those *do not rely on the
timeout*, but are loaded prior to a service requirement. Note though
that if those modules had probe and they then run async'd then systemd
service would probably need to consider that the requirements may not
be there until later. If this is not carefully considered that could
introduce regression to users of modules-load.d when async probe is
fully deployed. The same applies to systemd making assumptions of kmod
loading a module and a dependency being complete as probe would have
run it before.

> Even if you
> have some storage device which legitimately needs to take a couple of
> minutes to probe, you probably still want your computer to boot and
> get on with its other tasks whilst you wait... Or worse still, some
> insignificant driver is broken and simply hangs in probe, but surely
> you still want the rest of the system to boot?

Agreed, I believe one concern here lies in on whether or not userspace
is properly equipped to deal with the requirements on module loading
doing async probing and that possibly failing. Perhaps systemd might
think all userspace is ready for that but are we sure that's the case?
Another obvious issue was if the driver was a storage driver and your
boot depended upon it. If it takes a while we kill it and you can't
boot, no bueno. If systemd can avoid those situations that'd be nice.
That was the source of the first major issue reported by Joseph.

Chattiness on issues before the timeout should help a lot, we should
start collecting these somehow. These should be collected and
addressed. If we really want to be good on this we should put a bit of
effort on monitoring these and not being reactive.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-11 22:26                                     ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-11 22:26 UTC (permalink / raw)
  To: Tom Gundersen, Tejun Heo
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan, systemd

On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Wed, Sep 10, 2014 at 11:10 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>>>> More than two years
>>>> have gone by on growing design and assumptions on top of that original
>>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>>> regression?
>>>
>>> I don't think so. udev should not allow its workers to run for an
>>> unbounded length of time. Whether the upper bound should be 30, 60,
>>> 180 seconds or something else is up for debate (currently it is 60,
>>> but if that is too short for some drivers we could certainly revisit
>>> that).
>>
>> That's the thing -- the timeout was put in place under the assumption
>> probe was asyncronous and its not, the driver core issues both module
>> init *and* probe together, the loader has to wait. That alone makes
>> the timeout a design flaw, and then systemd carried on top of that
>> design over two years after that. Its not systemd's fault, its just
>> that we never spoke about this as a design thing broadly and we should
>> have, and I will mention that even when the first issues creeped up,
>> the issue was still tossed back a driver problems. It was only until
>> recently that we realized that both init and probe run together that
>> we've been thinking about this problem differently. Systemd was trying
>> to ensure init on driver don't take long but its not init that is
>> taking long, its probe, and probe gets then penalized as the driver
>> core batches both init and probe synchronously before finishing module
>> loading.
>
> Just to clarify: udev/systemd is not trying to get into the business
> of what the kernel does on finit_module(), we just need to make sure
> that none of our workers stay around forever, which is why we have a
> global timeout. If necessary we can bump this higher (as mentioned in
> another thread I just bumped it to 180 secs), but we cannot abolish it
> entirely.

180 seconds is certainly better than 30, but let me be clear here on
the extent to which the timeout at least for kmod built-in command can
be an issue. The driver core not only batches init and probe together
synchronously, it also runs probe for *all* devices that the device
driver can claim and all those series of probes run synchronously
within itself, that is bus_for_each_dev() runs synchronously on each
device. So, if a init takes 1 second and probe for each device takes
120 seconds and the system has 2 devices with the new timeout the
second device would not be successfully probed (and in fact I'm not
sure if this would kill the first).

>> Furthermore as clarified by Tejun random userland is known to
>> exist that will wait indefinitely for module loading under the simple
>> assumption things *are done synchronously*, and its precisely why we
>> can't just blindly enable async probe upstream through a new driver
>> boolean as it can be unfair to this old userland. What is being
>> evaluated is to enable aync probe for *all* drivers through a new
>> general system-wide option. We cannot regress old userspace and
>> assumptions but we can create a new shiny universe.
>
> How about simply introducing a new flag to finit_module() to indicate
> that the caller does not care about asynchronicity. We could then pass
> this from udev, but existing scripts calling modprobe/insmod will not
> be affected.

Do you mean that you *do want asynchronicity*?

>>> Moreover, it seems from this discussion that the aim is (still)
>>> that insmod should be near-instantaneous (i.e., not wait for probe),
>>
>> The only reason that is being discussed is that systemd has not
>> accepted the timeout as a system design despite me pointing out the
>> original design flaw recently and at this point even if was accepted
>> as a design flaw it seems its too late. The approach taken to help
>> make all drivers async is simply an afterthought to give systemd what
>> it *thought* was in place, and it by no measure should be considered
>> the proper fix to the regression introduced by systemd, it may perhaps
>> be the right step long term for systemd systems given it goes with
>> what it assumed was there, but the timeout was flawed. Its not clear
>> if systemd can help with old kernels, it seems the ship has sailed and
>> there seems no options but for folks to work around that -- unless of
>> course some reasonable solution is found which doesn't break current
>> systemd design?
>
> If I read the git logs correctly the hard timeout was introduced in
> April 2011, so reverting it now seems indeed not to help much with all
> the running systems out there.

yeah figured :(

>> As part of this series I addressed hunting for the  "misbehaving
>> drivers" in-kernel as I saw no progress on the systemd side of things
>> to non-fatally detect "misbehaving drivers" despite my original RFCs
>> and request for advice. I quote  "misbehaving drivers" as its a flawed
>> view to consider them misbehaving now in light of clarifications of
>> how the driver core works in that it batches both init and probe
>> together always and we can't be penalizing long probes due to the fact
>> long probes are simply fine. My patch to pick up "misbehaving drivers"
>> drivers on the kernel front by picking up systemd's signal was
>> reactive but it was also simply braindead given the same exact reasons
>> why systemd's original timeout was flawed. We want a general solution
>> and we don't want to work around the root cause, in this case it was
>> systemd's assumption on how drivers work.
>
> Would your ongoing work to make probing asynchronous solve this
> problem in the long-term? In the short-term I guess bumping the udev
> timeout should be sufficient.

That and the global flag / module param to specify the async desire
which would not regress old userspace. Probe afterall is the main
source of the issue.

>> Keep in mind that the above just addresses kmod built-in cmd on
>> systemd, its where the timeout was introduced but as has been
>> clarified here assuming the same timeout on *all* other built-in
>> likely is likely pretty flawed as well and this does concern me. Its
>> why I mentioned that more than two years have gone by now on growing
>> design and assumptions on top of that original commit and its why its
>> hard for systemd to consider an alternative.
>
> All built-ins should be near-instantaneous. If they are not, that
> needs to be fixed, or they should not be udev built-ins at all. I have
> now added a warning to udev if any builtin-in takes more than a third
> of the timeout, so hopefully any problems should be spotted early.

Great thanks. Collecting these should be valuable and help being proactive.

>>>>>>  I'm afraid distributions that want to avoid this
>>>>>> sigkill at least on the kernel front will have to work around this
>>>>>> issue either on systemd by increasing the default timeout which is now
>>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>>> combination of a modified non-chatty version of this patch + a check
>>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>>
>>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>>> to me.  If the patch exists already, having distros that want it use it
>>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>>
>>>> Its merged upstream on systemd now, along with a few fixes on top of
>>>> it. I also see Kay merged a change to the default timeout to 60 second
>>>> on August 30. Its unclear if these discussions had any impact on that
>>>> decision or if that was just because udev firmware loading got now
>>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>>> one full minute.
>>>>
>>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>>> longer term thing rather than a bug fix.
>>>>
>>>> Indeed. What I proposed with a multiplier for the timeout for the
>>>> different types of built in commands was deemed complex but saw no
>>>> alternatives proposed despite my interest to work on one and
>>>> clarifications noted that this was a design regression. Not quite sure
>>>> what else I could have done here. I'm interested in learning what the
>>>> better approach is for the future as if we want to marry init + kernel
>>>> we need a smooth way for us to discuss design without getting worked
>>>> up about it, or taking it personal. I really want this to work as I
>>>> personally like systemd so far.
>>>
>>> How about this: keep the timeout global, but also introduce a
>>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>>> is printed.
>>
>> That is something that I originally was looking forward to on systemd,
>> but here's the thing once that warning comes up  -- what would we do
>> with it?
>
> Short term: bump the timeout further. Long-term, hopefully the driver
> (core) can be changed to avoid the problem.

Fine by me, although I think some folks still have concerns with the
sigkill completely. But not sure if we escape it now.

>> This patch addresses this warning in-kernel and the idea was
>> that we'd then peg an async_probe bool as true on the driver as a fix,
>> that was decided to be silly given all the above. These drivers are
>> actually not misbehaving and it would be even more incorrect to try to
>> "fix" them by making them run asynchronously. In fact for some old
>> storage drivers it may even be the worst thing to do given the
>> possible slew of userland deployment and scripts which assume things
>> *are* synchronous.
>
> As mentioned above, it probably makes sense to switch on the
> asynchronous behaviour only for a given call to finit_module(), and
> not globally to avoid problems with userland assumptions.

Sure that's one way.

>>> Even if nothing is actually killed, having workers (be it
>>> insmod or something else) take longer than a couple of seconds is
>>> likely a sign that something is seriously off somewhere.
>>
>> Probe can take a long time and that's fine,
>
> But isn't finit_module() taking a long time a serious problem given
> that it means no other module can be loaded in parallel?

Indeed but having a desire to make the init() complete fast is
different than the desire to have the combination of both init and
probe fast synchronously. If userspace wants init to be fast and let
probe be async then userspace has no option but to deal with the fact
that async probe will be async, and it should then use other methods
to match any dependencies if its doing that itself. For example
networking should not kick off after a network driver is loaded but
rather one the device creeps up on udev. We should be good with
networking dealing with this correctly today but not sure about other
subsystems. depmod should be able to load the required modules in
order and if bus drivers work right then probe of the remnant devices
should happen asynchronously. The one case I can think of that is a
bit different is modules-load.d things but those *do not rely on the
timeout*, but are loaded prior to a service requirement. Note though
that if those modules had probe and they then run async'd then systemd
service would probably need to consider that the requirements may not
be there until later. If this is not carefully considered that could
introduce regression to users of modules-load.d when async probe is
fully deployed. The same applies to systemd making assumptions of kmod
loading a module and a dependency being complete as probe would have
run it before.

> Even if you
> have some storage device which legitimately needs to take a couple of
> minutes to probe, you probably still want your computer to boot and
> get on with its other tasks whilst you wait... Or worse still, some
> insignificant driver is broken and simply hangs in probe, but surely
> you still want the rest of the system to boot?

Agreed, I believe one concern here lies in on whether or not userspace
is properly equipped to deal with the requirements on module loading
doing async probing and that possibly failing. Perhaps systemd might
think all userspace is ready for that but are we sure that's the case?
Another obvious issue was if the driver was a storage driver and your
boot depended upon it. If it takes a while we kill it and you can't
boot, no bueno. If systemd can avoid those situations that'd be nice.
That was the source of the first major issue reported by Joseph.

Chattiness on issues before the timeout should help a lot, we should
start collecting these somehow. These should be collected and
addressed. If we really want to be good on this we should put a bit of
effort on monitoring these and not being reactive.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 22:26                                     ` Luis R. Rodriguez
  (?)
@ 2014-09-12  5:48                                       ` Tom Gundersen
  -1 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-12  5:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tejun Heo, James Bottomley, One Thousand Gnomes, Takashi Iwai,
	Kay Sievers, Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Andrew Morton, Joseph Salisbury

On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>> How about simply introducing a new flag to finit_module() to indicate
>> that the caller does not care about asynchronicity. We could then pass
>> this from udev, but existing scripts calling modprobe/insmod will not
>> be affected.
>
> Do you mean that you *do want asynchronicity*?

Precisely, udev would opt-in, but existing scripts etc would not.

>> But isn't finit_module() taking a long time a serious problem given
>> that it means no other module can be loaded in parallel?
>
> Indeed but having a desire to make the init() complete fast is
> different than the desire to have the combination of both init and
> probe fast synchronously.

I guess no one is arguing that probe should somehow be required to be
fast, but rather:

> If userspace wants init to be fast and let
> probe be async then userspace has no option but to deal with the fact
> that async probe will be async, and it should then use other methods
> to match any dependencies if its doing that itself.

Correct. And this therefore likely needs to be opt-in behaviour per
finit_module() invocation to avoid breaking old assumptions.

> For example
> networking should not kick off after a network driver is loaded but
> rather one the device creeps up on udev. We should be good with
> networking dealing with this correctly today but not sure about other
> subsystems. depmod should be able to load the required modules in
> order and if bus drivers work right then probe of the remnant devices
> should happen asynchronously. The one case I can think of that is a
> bit different is modules-load.d things but those *do not rely on the
> timeout*, but are loaded prior to a service requirement. Note though
> that if those modules had probe and they then run async'd then systemd
> service would probably need to consider that the requirements may not
> be there until later. If this is not carefully considered that could
> introduce regression to users of modules-load.d when async probe is
> fully deployed. The same applies to systemd making assumptions of kmod
> loading a module and a dependency being complete as probe would have
> run it before.

Yeah, these all needs to be considered when deciding whether or not to
enable async in each specific case.

> I believe one concern here lies in on whether or not userspace
> is properly equipped to deal with the requirements on module loading
> doing async probing and that possibly failing. Perhaps systemd might
> think all userspace is ready for that but are we sure that's the case?

There almost certainly are custom things out there relying on the
synchronous behaviour, but if we make it opt-in we should not have a
problem.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-12  5:48                                       ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-12  5:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>> How about simply introducing a new flag to finit_module() to indicate
>> that the caller does not care about asynchronicity. We could then pass
>> this from udev, but existing scripts calling modprobe/insmod will not
>> be affected.
>
> Do you mean that you *do want asynchronicity*?

Precisely, udev would opt-in, but existing scripts etc would not.

>> But isn't finit_module() taking a long time a serious problem given
>> that it means no other module can be loaded in parallel?
>
> Indeed but having a desire to make the init() complete fast is
> different than the desire to have the combination of both init and
> probe fast synchronously.

I guess no one is arguing that probe should somehow be required to be
fast, but rather:

> If userspace wants init to be fast and let
> probe be async then userspace has no option but to deal with the fact
> that async probe will be async, and it should then use other methods
> to match any dependencies if its doing that itself.

Correct. And this therefore likely needs to be opt-in behaviour per
finit_module() invocation to avoid breaking old assumptions.

> For example
> networking should not kick off after a network driver is loaded but
> rather one the device creeps up on udev. We should be good with
> networking dealing with this correctly today but not sure about other
> subsystems. depmod should be able to load the required modules in
> order and if bus drivers work right then probe of the remnant devices
> should happen asynchronously. The one case I can think of that is a
> bit different is modules-load.d things but those *do not rely on the
> timeout*, but are loaded prior to a service requirement. Note though
> that if those modules had probe and they then run async'd then systemd
> service would probably need to consider that the requirements may not
> be there until later. If this is not carefully considered that could
> introduce regression to users of modules-load.d when async probe is
> fully deployed. The same applies to systemd making assumptions of kmod
> loading a module and a dependency being complete as probe would have
> run it before.

Yeah, these all needs to be considered when deciding whether or not to
enable async in each specific case.

> I believe one concern here lies in on whether or not userspace
> is properly equipped to deal with the requirements on module loading
> doing async probing and that possibly failing. Perhaps systemd might
> think all userspace is ready for that but are we sure that's the case?

There almost certainly are custom things out there relying on the
synchronous behaviour, but if we make it opt-in we should not have a
problem.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-12  5:48                                       ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-09-12  5:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>> How about simply introducing a new flag to finit_module() to indicate
>> that the caller does not care about asynchronicity. We could then pass
>> this from udev, but existing scripts calling modprobe/insmod will not
>> be affected.
>
> Do you mean that you *do want asynchronicity*?

Precisely, udev would opt-in, but existing scripts etc would not.

>> But isn't finit_module() taking a long time a serious problem given
>> that it means no other module can be loaded in parallel?
>
> Indeed but having a desire to make the init() complete fast is
> different than the desire to have the combination of both init and
> probe fast synchronously.

I guess no one is arguing that probe should somehow be required to be
fast, but rather:

> If userspace wants init to be fast and let
> probe be async then userspace has no option but to deal with the fact
> that async probe will be async, and it should then use other methods
> to match any dependencies if its doing that itself.

Correct. And this therefore likely needs to be opt-in behaviour per
finit_module() invocation to avoid breaking old assumptions.

> For example
> networking should not kick off after a network driver is loaded but
> rather one the device creeps up on udev. We should be good with
> networking dealing with this correctly today but not sure about other
> subsystems. depmod should be able to load the required modules in
> order and if bus drivers work right then probe of the remnant devices
> should happen asynchronously. The one case I can think of that is a
> bit different is modules-load.d things but those *do not rely on the
> timeout*, but are loaded prior to a service requirement. Note though
> that if those modules had probe and they then run async'd then systemd
> service would probably need to consider that the requirements may not
> be there until later. If this is not carefully considered that could
> introduce regression to users of modules-load.d when async probe is
> fully deployed. The same applies to systemd making assumptions of kmod
> loading a module and a dependency being complete as probe would have
> run it before.

Yeah, these all needs to be considered when deciding whether or not to
enable async in each specific case.

> I believe one concern here lies in on whether or not userspace
> is properly equipped to deal with the requirements on module loading
> doing async probing and that possibly failing. Perhaps systemd might
> think all userspace is ready for that but are we sure that's the case?

There almost certainly are custom things out there relying on the
synchronous behaviour, but if we make it opt-in we should not have a
problem.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-12  5:48                                       ` Tom Gundersen
  (?)
@ 2014-09-12 20:09                                         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-12 20:09 UTC (permalink / raw)
  To: Tom Gundersen, Tejun Heo
  Cc: James Bottomley, One Thousand Gnomes, Takashi Iwai, Kay Sievers,
	Oleg Nesterov, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Sreekanth Reddy, Arjan van de Ven, Abhijit Mahajan,
	systemd Mailing List, Linux SCSI List, Dmitry Torokhov,
	linux-kernel, netdev, Andrew Morton, Joseph Salisbury

On Thu, Sep 11, 2014 at 10:48 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>>> How about simply introducing a new flag to finit_module() to indicate
>>> that the caller does not care about asynchronicity. We could then pass
>>> this from udev, but existing scripts calling modprobe/insmod will not
>>> be affected.
>>
>> Do you mean that you *do want asynchronicity*?
>
> Precisely, udev would opt-in, but existing scripts etc would not.

Sure that's the other alternative that Tejun was mentioning.

>>> But isn't finit_module() taking a long time a serious problem given
>>> that it means no other module can be loaded in parallel?
>>
>> Indeed but having a desire to make the init() complete fast is
>> different than the desire to have the combination of both init and
>> probe fast synchronously.
>
> I guess no one is arguing that probe should somehow be required to be
> fast, but rather:
>
>> If userspace wants init to be fast and let
>> probe be async then userspace has no option but to deal with the fact
>> that async probe will be async, and it should then use other methods
>> to match any dependencies if its doing that itself.
>
> Correct. And this therefore likely needs to be opt-in behaviour per
> finit_module() invocation to avoid breaking old assumptions.

Sure.

>> For example
>> networking should not kick off after a network driver is loaded but
>> rather one the device creeps up on udev. We should be good with
>> networking dealing with this correctly today but not sure about other
>> subsystems. depmod should be able to load the required modules in
>> order and if bus drivers work right then probe of the remnant devices
>> should happen asynchronously. The one case I can think of that is a
>> bit different is modules-load.d things but those *do not rely on the
>> timeout*, but are loaded prior to a service requirement. Note though
>> that if those modules had probe and they then run async'd then systemd
>> service would probably need to consider that the requirements may not
>> be there until later. If this is not carefully considered that could
>> introduce regression to users of modules-load.d when async probe is
>> fully deployed. The same applies to systemd making assumptions of kmod
>> loading a module and a dependency being complete as probe would have
>> run it before.
>
> Yeah, these all needs to be considered when deciding whether or not to
> enable async in each specific case.

Yes and come to think of it I'd recommend opting out of async
functionality for modules-load.d given that it does *not* hooked with
the timeout and there is a good chances its users likely do want to
wait for probe to run at this point.

Given this I also am inclined now for the per module request to be
async or not (default) from userspace. The above would be a good
example starting use case.

>> I believe one concern here lies in on whether or not userspace
>> is properly equipped to deal with the requirements on module loading
>> doing async probing and that possibly failing. Perhaps systemd might
>> think all userspace is ready for that but are we sure that's the case?
>
> There almost certainly are custom things out there relying on the
> synchronous behaviour, but if we make it opt-in we should not have a
> problem.

Indeed.

BTW as for the cxgb4 device driver it fails to load because it relies
on get_vpd_params() on probe, that end sup calling
pci_vpd_pci22_wait() which will fail if if
fatal_signal_pending(current). This is an example now completely
unrelated to the OOM series, and any other uses of
fatal_signal_pending(current) should trigger similar failures on
device drivers.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-12 20:09                                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-12 20:09 UTC (permalink / raw)
  To: Tom Gundersen, Tejun Heo
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Thu, Sep 11, 2014 at 10:48 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>>> How about simply introducing a new flag to finit_module() to indicate
>>> that the caller does not care about asynchronicity. We could then pass
>>> this from udev, but existing scripts calling modprobe/insmod will not
>>> be affected.
>>
>> Do you mean that you *do want asynchronicity*?
>
> Precisely, udev would opt-in, but existing scripts etc would not.

Sure that's the other alternative that Tejun was mentioning.

>>> But isn't finit_module() taking a long time a serious problem given
>>> that it means no other module can be loaded in parallel?
>>
>> Indeed but having a desire to make the init() complete fast is
>> different than the desire to have the combination of both init and
>> probe fast synchronously.
>
> I guess no one is arguing that probe should somehow be required to be
> fast, but rather:
>
>> If userspace wants init to be fast and let
>> probe be async then userspace has no option but to deal with the fact
>> that async probe will be async, and it should then use other methods
>> to match any dependencies if its doing that itself.
>
> Correct. And this therefore likely needs to be opt-in behaviour per
> finit_module() invocation to avoid breaking old assumptions.

Sure.

>> For example
>> networking should not kick off after a network driver is loaded but
>> rather one the device creeps up on udev. We should be good with
>> networking dealing with this correctly today but not sure about other
>> subsystems. depmod should be able to load the required modules in
>> order and if bus drivers work right then probe of the remnant devices
>> should happen asynchronously. The one case I can think of that is a
>> bit different is modules-load.d things but those *do not rely on the
>> timeout*, but are loaded prior to a service requirement. Note though
>> that if those modules had probe and they then run async'd then systemd
>> service would probably need to consider that the requirements may not
>> be there until later. If this is not carefully considered that could
>> introduce regression to users of modules-load.d when async probe is
>> fully deployed. The same applies to systemd making assumptions of kmod
>> loading a module and a dependency being complete as probe would have
>> run it before.
>
> Yeah, these all needs to be considered when deciding whether or not to
> enable async in each specific case.

Yes and come to think of it I'd recommend opting out of async
functionality for modules-load.d given that it does *not* hooked with
the timeout and there is a good chances its users likely do want to
wait for probe to run at this point.

Given this I also am inclined now for the per module request to be
async or not (default) from userspace. The above would be a good
example starting use case.

>> I believe one concern here lies in on whether or not userspace
>> is properly equipped to deal with the requirements on module loading
>> doing async probing and that possibly failing. Perhaps systemd might
>> think all userspace is ready for that but are we sure that's the case?
>
> There almost certainly are custom things out there relying on the
> synchronous behaviour, but if we make it opt-in we should not have a
> problem.

Indeed.

BTW as for the cxgb4 device driver it fails to load because it relies
on get_vpd_params() on probe, that end sup calling
pci_vpd_pci22_wait() which will fail if if
fatal_signal_pending(current). This is an example now completely
unrelated to the OOM series, and any other uses of
fatal_signal_pending(current) should trigger similar failures on
device drivers.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-12 20:09                                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-12 20:09 UTC (permalink / raw)
  To: Tom Gundersen, Tejun Heo
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Thu, Sep 11, 2014 at 10:48 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
>> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>>> How about simply introducing a new flag to finit_module() to indicate
>>> that the caller does not care about asynchronicity. We could then pass
>>> this from udev, but existing scripts calling modprobe/insmod will not
>>> be affected.
>>
>> Do you mean that you *do want asynchronicity*?
>
> Precisely, udev would opt-in, but existing scripts etc would not.

Sure that's the other alternative that Tejun was mentioning.

>>> But isn't finit_module() taking a long time a serious problem given
>>> that it means no other module can be loaded in parallel?
>>
>> Indeed but having a desire to make the init() complete fast is
>> different than the desire to have the combination of both init and
>> probe fast synchronously.
>
> I guess no one is arguing that probe should somehow be required to be
> fast, but rather:
>
>> If userspace wants init to be fast and let
>> probe be async then userspace has no option but to deal with the fact
>> that async probe will be async, and it should then use other methods
>> to match any dependencies if its doing that itself.
>
> Correct. And this therefore likely needs to be opt-in behaviour per
> finit_module() invocation to avoid breaking old assumptions.

Sure.

>> For example
>> networking should not kick off after a network driver is loaded but
>> rather one the device creeps up on udev. We should be good with
>> networking dealing with this correctly today but not sure about other
>> subsystems. depmod should be able to load the required modules in
>> order and if bus drivers work right then probe of the remnant devices
>> should happen asynchronously. The one case I can think of that is a
>> bit different is modules-load.d things but those *do not rely on the
>> timeout*, but are loaded prior to a service requirement. Note though
>> that if those modules had probe and they then run async'd then systemd
>> service would probably need to consider that the requirements may not
>> be there until later. If this is not carefully considered that could
>> introduce regression to users of modules-load.d when async probe is
>> fully deployed. The same applies to systemd making assumptions of kmod
>> loading a module and a dependency being complete as probe would have
>> run it before.
>
> Yeah, these all needs to be considered when deciding whether or not to
> enable async in each specific case.

Yes and come to think of it I'd recommend opting out of async
functionality for modules-load.d given that it does *not* hooked with
the timeout and there is a good chances its users likely do want to
wait for probe to run at this point.

Given this I also am inclined now for the per module request to be
async or not (default) from userspace. The above would be a good
example starting use case.

>> I believe one concern here lies in on whether or not userspace
>> is properly equipped to deal with the requirements on module loading
>> doing async probing and that possibly failing. Perhaps systemd might
>> think all userspace is ready for that but are we sure that's the case?
>
> There almost certainly are custom things out there relying on the
> synchronous behaviour, but if we make it opt-in we should not have a
> problem.

Indeed.

BTW as for the cxgb4 device driver it fails to load because it relies
on get_vpd_params() on probe, that end sup calling
pci_vpd_pci22_wait() which will fail if if
fatal_signal_pending(current). This is an example now completely
unrelated to the OOM series, and any other uses of
fatal_signal_pending(current) should trigger similar failures on
device drivers.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09 23:03                                             ` Tejun Heo
@ 2014-09-12 20:14                                               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-12 20:14 UTC (permalink / raw)
  To: Tejun Heo, Tom Gundersen
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, mpt-fusionlinux.pdl,
	Linux SCSI List, netdev

On Tue, Sep 9, 2014 at 4:03 PM, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
>> Hello,
>>
>> On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
>> > On the systemd side of things it should enable this sysctl and for
>> > older kernels what should it do?
>>
>> Supposing the change is backported via -stable, it can try to set the
>> sysctl on all kernels.  If the knob doesn't exist, the fix is not
>> there and nothing can be done about it.
>
> The more I think about it, the more I think this should be a
> per-insmod instance thing rather than a system-wide switch.

Agreed, a good use case that comes to mind would be systemd's
modules-load.d lists used by systemd services to require modules, the
hooks there however likely expect probe to complete as part of the
service, since the timeout is not applicable to these the synchronous
probe for them would be good while systemd would use async probe for
regular modules.

> Currently
> the kernel param code doesn't allow a generic param outside the ones
> specified by the module itself but adding support for something like
> driver.async_load=1 shouldn't be too difficult, applying that to
> existing systems shouldn't be much more difficult than a system-wide
> switch, and it'd be siginificantly cleaner than fiddling with driver
> blacklist.

Agreed.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-12 20:14                                               ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-12 20:14 UTC (permalink / raw)
  To: Tejun Heo, Tom Gundersen
  Cc: Lennart Poettering, Kay Sievers, Dmitry Torokhov,
	Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Arjan van de Ven,
	linux-kernel, Oleg Nesterov, hare, Andrew Morton, Tetsuo Handa,
	Joseph Salisbury, Benjamin Poirier, Santosh Rastapur,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekan

On Tue, Sep 9, 2014 at 4:03 PM, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
>> Hello,
>>
>> On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
>> > On the systemd side of things it should enable this sysctl and for
>> > older kernels what should it do?
>>
>> Supposing the change is backported via -stable, it can try to set the
>> sysctl on all kernels.  If the knob doesn't exist, the fix is not
>> there and nothing can be done about it.
>
> The more I think about it, the more I think this should be a
> per-insmod instance thing rather than a system-wide switch.

Agreed, a good use case that comes to mind would be systemd's
modules-load.d lists used by systemd services to require modules, the
hooks there however likely expect probe to complete as part of the
service, since the timeout is not applicable to these the synchronous
probe for them would be good while systemd would use async probe for
regular modules.

> Currently
> the kernel param code doesn't allow a generic param outside the ones
> specified by the module itself but adding support for something like
> driver.async_load=1 shouldn't be too difficult, applying that to
> existing systems shouldn't be much more difficult than a system-wide
> switch, and it'd be siginificantly cleaner than fiddling with driver
> blacklist.

Agreed.

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-09  2:57                                     ` Luis R. Rodriguez
@ 2014-09-22 16:36                                       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-22 16:36 UTC (permalink / raw)
  To: Tejun Heo, Santosh Rastapur, Jiri Kosina, Petr Mladek
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasad S, mpt-fusionlinux.pdl,
	Linux SCSI List, netdev, Hannes Reinecke, Luis R. Rodriguez,
	Tom Gundersen

On Mon, Sep 8, 2014 at 7:57 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
>> Why do we care about the priority of probing tasks?  Does that
>> actually make any meaningful difference?  If so, how?
>
> As I noted before -- I have yet to provide clear metrics but at least
> changing both init paths + probe from finit_module() to kthread
> certainly had a measurable time increase, I suspect using
> queue_work(system_unbound_wq, async_probe_work) will make probe
> slower. I'll get to these metrics this week.

The results are in and I'm glad to report my suspicions were incorrect
about kthread() being slower than queue_work(system_unbound_wq), it
actually works faster. Results will likely vary depending on
subsystems but in this particular case the cxgb4 driver was tested
requiring firmware loading and then without requiring firmware loading
and for these two types of driver loading all mechanisms make probe
take just about the same out of time. What was surprising was that
when firmware loading is required the amount of time it takes to run
probe does vary and quite considerably in terms of microseconds. The
discrepancies are by no means terrible... but should be considered if
one is thinking of large systems and if we do wish to optimize things
further and offer equivalent behavior, specially when probing multiple
devices with the same driver. The method used to collect the amount of
time for probe was to use:

ktime_t calltime, delta, rettime;
calltime = ktime_get();
driver_attach();
rettime = ktime_get();
delta = ktime_sub(rettime, calltime);
duration = (unsigned long long) ktime_to_ns(delta) >> 10;

And then print that time of microsecond out right after it finishes,
whether that be through the default kernel synchronous run or the
async runs.

The collection and testing was then done by Santosh. Details of the
collections are at:

https://bugzilla.novell.com/show_bug.cgi?id=877622

The summary:

The driver actually probed 2 cards in the tests so we don't have
results for 1 card, the kernel serially calls probe for each device so
to get the amount of time for one run lets just divide the results by
2. For each strategy there is the requirement of using firmware and a
run where no firmware loading is required. The results for both cards
are:

=====================================================================|
strategy                                fw (usec)       no-fw (usec) |
---------------------------------------------------------------------|
synchronous                             48945138        2615126      |
kthread                                 50132831        2619737      |
queue_work(system_unbound_wq)           49827323        2615262      |
---------------------------------------------------------------------|

For one device then that comes out to:

=====================================================================|
strategy                                fw (usec)       no-fw (usec) |
---------------------------------------------------------------------|
synchronous                             24472569        1307563      |
kthread                                 25066415.5      1309868.5    |
queue_work(system_unbound_wq)           24913661.5      1307631      |
---------------------------------------------------------------------|

Converting that to seconds:

=====================================================================|
strategy                                fw (s)          no-fw (s)    |
---------------------------------------------------------------------|
synchronous                             24.47           1.31         |
kthread                                 25.07           1.31         |
queue_work(system_unbound_wq)           24.91           1.31         |
---------------------------------------------------------------------|

Graph friendly versions of the results for probe of 1 device:

Probe with firmware:

http://drvbp1.linux-foundation.org/~mcgrof/images/probe-measurements/probe-cgxb4-firmware.png

Probe without firmware:

http://drvbp1.linux-foundation.org/~mcgrof/images/probe-measurements/probe-cgxb4-no-firmware.png

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-22 16:36                                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-09-22 16:36 UTC (permalink / raw)
  To: Tejun Heo, Santosh Rastapur, Jiri Kosina, Petr Mladek
  Cc: Dmitry Torokhov, Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, Andrew Morton,
	Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, Casey Leedom, Hariprasa

On Mon, Sep 8, 2014 at 7:57 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
>> Why do we care about the priority of probing tasks?  Does that
>> actually make any meaningful difference?  If so, how?
>
> As I noted before -- I have yet to provide clear metrics but at least
> changing both init paths + probe from finit_module() to kthread
> certainly had a measurable time increase, I suspect using
> queue_work(system_unbound_wq, async_probe_work) will make probe
> slower. I'll get to these metrics this week.

The results are in and I'm glad to report my suspicions were incorrect
about kthread() being slower than queue_work(system_unbound_wq), it
actually works faster. Results will likely vary depending on
subsystems but in this particular case the cxgb4 driver was tested
requiring firmware loading and then without requiring firmware loading
and for these two types of driver loading all mechanisms make probe
take just about the same out of time. What was surprising was that
when firmware loading is required the amount of time it takes to run
probe does vary and quite considerably in terms of microseconds. The
discrepancies are by no means terrible... but should be considered if
one is thinking of large systems and if we do wish to optimize things
further and offer equivalent behavior, specially when probing multiple
devices with the same driver. The method used to collect the amount of
time for probe was to use:

ktime_t calltime, delta, rettime;
calltime = ktime_get();
driver_attach();
rettime = ktime_get();
delta = ktime_sub(rettime, calltime);
duration = (unsigned long long) ktime_to_ns(delta) >> 10;

And then print that time of microsecond out right after it finishes,
whether that be through the default kernel synchronous run or the
async runs.

The collection and testing was then done by Santosh. Details of the
collections are at:

https://bugzilla.novell.com/show_bug.cgi?id=877622

The summary:

The driver actually probed 2 cards in the tests so we don't have
results for 1 card, the kernel serially calls probe for each device so
to get the amount of time for one run lets just divide the results by
2. For each strategy there is the requirement of using firmware and a
run where no firmware loading is required. The results for both cards
are:

=====================================================================|
strategy                                fw (usec)       no-fw (usec) |
---------------------------------------------------------------------|
synchronous                             48945138        2615126      |
kthread                                 50132831        2619737      |
queue_work(system_unbound_wq)           49827323        2615262      |
---------------------------------------------------------------------|

For one device then that comes out to:

=====================================================================|
strategy                                fw (usec)       no-fw (usec) |
---------------------------------------------------------------------|
synchronous                             24472569        1307563      |
kthread                                 25066415.5      1309868.5    |
queue_work(system_unbound_wq)           24913661.5      1307631      |
---------------------------------------------------------------------|

Converting that to seconds:

=====================================================================|
strategy                                fw (s)          no-fw (s)    |
---------------------------------------------------------------------|
synchronous                             24.47           1.31         |
kthread                                 25.07           1.31         |
queue_work(system_unbound_wq)           24.91           1.31         |
---------------------------------------------------------------------|

Graph friendly versions of the results for probe of 1 device:

Probe with firmware:

http://drvbp1.linux-foundation.org/~mcgrof/images/probe-measurements/probe-cgxb4-firmware.png

Probe without firmware:

http://drvbp1.linux-foundation.org/~mcgrof/images/probe-measurements/probe-cgxb4-no-firmware.png

  Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-11 20:23                                         ` Dmitry Torokhov
  (?)
@ 2014-09-22 19:49                                           ` Pavel Machek
  -1 siblings, 0 replies; 227+ messages in thread
From: Pavel Machek @ 2014-09-22 19:49 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Cas ey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> > 
> > On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > > 
> > > > > The thing is that we have to have dynamic mechanism to listen for
> > > > > device attachments no matter what and such mechanism has been in place
> > > > > for a long time at this point.  The synchronous wait simply doesn't
> > > > > serve any purpose anymore and kinda gets in the way in that it makes
> > > > > it a possibly extremely slow process to tell whether loading of a
> > > > > module succeeded or not because the wait for the initial round of
> > > > > probe is piggybacked.
> > > > 
> > > > OK, so we just fire and forget in userland ... why bother inventing an
> > > > elaborate new infrastructure in the kernel to do exactly what
> > > > 
> > > > modprobe <mod> &
> > > > 
> > > > would do?
> > > 
> > > Just so we do not forget: we also want the no-modules case to also be able
> > > to probe asynchronously so that a slow device does not stall kernel booting.
> > 
> > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > scanning of attached devices (once the cards are probed)
> 
> What would it do it card was a bit slow to probe?
> 
> > but has a sync
> > point for ordering.
> 
> Quite often we do not really care about ordering of devices. I mean,
> does it matter if your mouse is discovered before your keyboard or
> after?

Actually yes, I suspect it does.

I do evtest /dev/input/eventX by hand, occassionaly. It would be
annoying if they moved between reboots.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-22 19:49                                           ` Pavel Machek
  0 siblings, 0 replies; 227+ messages in thread
From: Pavel Machek @ 2014-09-22 19:49 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama

On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> > 
> > On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > > 
> > > > > The thing is that we have to have dynamic mechanism to listen for
> > > > > device attachments no matter what and such mechanism has been in place
> > > > > for a long time at this point.  The synchronous wait simply doesn't
> > > > > serve any purpose anymore and kinda gets in the way in that it makes
> > > > > it a possibly extremely slow process to tell whether loading of a
> > > > > module succeeded or not because the wait for the initial round of
> > > > > probe is piggybacked.
> > > > 
> > > > OK, so we just fire and forget in userland ... why bother inventing an
> > > > elaborate new infrastructure in the kernel to do exactly what
> > > > 
> > > > modprobe <mod> &
> > > > 
> > > > would do?
> > > 
> > > Just so we do not forget: we also want the no-modules case to also be able
> > > to probe asynchronously so that a slow device does not stall kernel booting.
> > 
> > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > scanning of attached devices (once the cards are probed)
> 
> What would it do it card was a bit slow to probe?
> 
> > but has a sync
> > point for ordering.
> 
> Quite often we do not really care about ordering of devices. I mean,
> does it matter if your mouse is discovered before your keyboard or
> after?

Actually yes, I suspect it does.

I do evtest /dev/input/eventX by hand, occassionaly. It would be
annoying if they moved between reboots.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-22 19:49                                           ` Pavel Machek
  0 siblings, 0 replies; 227+ messages in thread
From: Pavel Machek @ 2014-09-22 19:49 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama

On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> > 
> > On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > > 
> > > > > The thing is that we have to have dynamic mechanism to listen for
> > > > > device attachments no matter what and such mechanism has been in place
> > > > > for a long time at this point.  The synchronous wait simply doesn't
> > > > > serve any purpose anymore and kinda gets in the way in that it makes
> > > > > it a possibly extremely slow process to tell whether loading of a
> > > > > module succeeded or not because the wait for the initial round of
> > > > > probe is piggybacked.
> > > > 
> > > > OK, so we just fire and forget in userland ... why bother inventing an
> > > > elaborate new infrastructure in the kernel to do exactly what
> > > > 
> > > > modprobe <mod> &
> > > > 
> > > > would do?
> > > 
> > > Just so we do not forget: we also want the no-modules case to also be able
> > > to probe asynchronously so that a slow device does not stall kernel booting.
> > 
> > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > scanning of attached devices (once the cards are probed)
> 
> What would it do it card was a bit slow to probe?
> 
> > but has a sync
> > point for ordering.
> 
> Quite often we do not really care about ordering of devices. I mean,
> does it matter if your mouse is discovered before your keyboard or
> after?

Actually yes, I suspect it does.

I do evtest /dev/input/eventX by hand, occassionaly. It would be
annoying if they moved between reboots.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-22 19:49                                           ` Pavel Machek
  (?)
@ 2014-09-22 20:23                                             ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-22 20:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Cas ey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> > > On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > > > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > > > The thing is that we have to have dynamic mechanism to listen for
> > > > > > device attachments no matter what and such mechanism has been in
> > > > > > place
> > > > > > for a long time at this point.  The synchronous wait simply
> > > > > > doesn't
> > > > > > serve any purpose anymore and kinda gets in the way in that it
> > > > > > makes
> > > > > > it a possibly extremely slow process to tell whether loading of a
> > > > > > module succeeded or not because the wait for the initial round of
> > > > > > probe is piggybacked.
> > > > > 
> > > > > OK, so we just fire and forget in userland ... why bother inventing
> > > > > an
> > > > > elaborate new infrastructure in the kernel to do exactly what
> > > > > 
> > > > > modprobe <mod> &
> > > > > 
> > > > > would do?
> > > > 
> > > > Just so we do not forget: we also want the no-modules case to also be
> > > > able
> > > > to probe asynchronously so that a slow device does not stall kernel
> > > > booting.> > 
> > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > scanning of attached devices (once the cards are probed)
> > 
> > What would it do it card was a bit slow to probe?
> > 
> > > but has a sync
> > > point for ordering.
> > 
> > Quite often we do not really care about ordering of devices. I mean,
> > does it matter if your mouse is discovered before your keyboard or
> > after?
> 
> Actually yes, I suspect it does.
> 
> I do evtest /dev/input/eventX by hand, occassionaly. It would be
> annoying if they moved between reboots.

I am sorry but you will have to cope with such annoyances. It' snot like we 
fail to boot the box here.

The systems are now mostly hot-pluggable and userland is supposed to
handle it, and it does, at least for input devices. If you want stable naming
use udev facilities to rename devices as needed or add needed symlinks (by-id, 
etc.).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-22 20:23                                             ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-22 20:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama

On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> > > On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > > > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > > > The thing is that we have to have dynamic mechanism to listen for
> > > > > > device attachments no matter what and such mechanism has been in
> > > > > > place
> > > > > > for a long time at this point.  The synchronous wait simply
> > > > > > doesn't
> > > > > > serve any purpose anymore and kinda gets in the way in that it
> > > > > > makes
> > > > > > it a possibly extremely slow process to tell whether loading of a
> > > > > > module succeeded or not because the wait for the initial round of
> > > > > > probe is piggybacked.
> > > > > 
> > > > > OK, so we just fire and forget in userland ... why bother inventing
> > > > > an
> > > > > elaborate new infrastructure in the kernel to do exactly what
> > > > > 
> > > > > modprobe <mod> &
> > > > > 
> > > > > would do?
> > > > 
> > > > Just so we do not forget: we also want the no-modules case to also be
> > > > able
> > > > to probe asynchronously so that a slow device does not stall kernel
> > > > booting.> > 
> > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > scanning of attached devices (once the cards are probed)
> > 
> > What would it do it card was a bit slow to probe?
> > 
> > > but has a sync
> > > point for ordering.
> > 
> > Quite often we do not really care about ordering of devices. I mean,
> > does it matter if your mouse is discovered before your keyboard or
> > after?
> 
> Actually yes, I suspect it does.
> 
> I do evtest /dev/input/eventX by hand, occassionaly. It would be
> annoying if they moved between reboots.

I am sorry but you will have to cope with such annoyances. It' snot like we 
fail to boot the box here.

The systems are now mostly hot-pluggable and userland is supposed to
handle it, and it does, at least for input devices. If you want stable naming
use udev facilities to rename devices as needed or add needed symlinks (by-id, 
etc.).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-22 20:23                                             ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-22 20:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama

On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> > > On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
> > > > On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
> > > > > On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
> > > > > > The thing is that we have to have dynamic mechanism to listen for
> > > > > > device attachments no matter what and such mechanism has been in
> > > > > > place
> > > > > > for a long time at this point.  The synchronous wait simply
> > > > > > doesn't
> > > > > > serve any purpose anymore and kinda gets in the way in that it
> > > > > > makes
> > > > > > it a possibly extremely slow process to tell whether loading of a
> > > > > > module succeeded or not because the wait for the initial round of
> > > > > > probe is piggybacked.
> > > > > 
> > > > > OK, so we just fire and forget in userland ... why bother inventing
> > > > > an
> > > > > elaborate new infrastructure in the kernel to do exactly what
> > > > > 
> > > > > modprobe <mod> &
> > > > > 
> > > > > would do?
> > > > 
> > > > Just so we do not forget: we also want the no-modules case to also be
> > > > able
> > > > to probe asynchronously so that a slow device does not stall kernel
> > > > booting.> > 
> > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > scanning of attached devices (once the cards are probed)
> > 
> > What would it do it card was a bit slow to probe?
> > 
> > > but has a sync
> > > point for ordering.
> > 
> > Quite often we do not really care about ordering of devices. I mean,
> > does it matter if your mouse is discovered before your keyboard or
> > after?
> 
> Actually yes, I suspect it does.
> 
> I do evtest /dev/input/eventX by hand, occassionaly. It would be
> annoying if they moved between reboots.

I am sorry but you will have to cope with such annoyances. It' snot like we 
fail to boot the box here.

The systems are now mostly hot-pluggable and userland is supposed to
handle it, and it does, at least for input devices. If you want stable naming
use udev facilities to rename devices as needed or add needed symlinks (by-id, 
etc.).

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-22 20:23                                             ` Dmitry Torokhov
  (?)
@ 2014-09-30 21:06                                               ` Pavel Machek
  -1 siblings, 0 replies; 227+ messages in thread
From: Pavel Machek @ 2014-09-30 21:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Cas ey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev


On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
> On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> > On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:

> > > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > > scanning of attached devices (once the cards are probed)
> > > 
> > > What would it do it card was a bit slow to probe?
> > > 
> > > > but has a sync
> > > > point for ordering.
> > > 
> > > Quite often we do not really care about ordering of devices. I mean,
> > > does it matter if your mouse is discovered before your keyboard or
> > > after?
> > 
> > Actually yes, I suspect it does.
> > 
> > I do evtest /dev/input/eventX by hand, occassionaly. It would be
> > annoying if they moved between reboots.
> 
> I am sorry but you will have to cope with such annoyances. It' snot like we 
> fail to boot the box here.
> 
> The systems are now mostly hot-pluggable and userland is supposed to
> handle it, and it does, at least for input devices. If you want stable naming
> use udev facilities to rename devices as needed or add needed symlinks (by-id, 
> etc.).

Well, it would be nice if udev was not mandatory. Do the sync points
for ordering actually cost us something?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-30 21:06                                               ` Pavel Machek
  0 siblings, 0 replies; 227+ messages in thread
From: Pavel Machek @ 2014-09-30 21:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama


On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
> On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> > On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:

> > > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > > scanning of attached devices (once the cards are probed)
> > > 
> > > What would it do it card was a bit slow to probe?
> > > 
> > > > but has a sync
> > > > point for ordering.
> > > 
> > > Quite often we do not really care about ordering of devices. I mean,
> > > does it matter if your mouse is discovered before your keyboard or
> > > after?
> > 
> > Actually yes, I suspect it does.
> > 
> > I do evtest /dev/input/eventX by hand, occassionaly. It would be
> > annoying if they moved between reboots.
> 
> I am sorry but you will have to cope with such annoyances. It' snot like we 
> fail to boot the box here.
> 
> The systems are now mostly hot-pluggable and userland is supposed to
> handle it, and it does, at least for input devices. If you want stable naming
> use udev facilities to rename devices as needed or add needed symlinks (by-id, 
> etc.).

Well, it would be nice if udev was not mandatory. Do the sync points
for ordering actually cost us something?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-30 21:06                                               ` Pavel Machek
  0 siblings, 0 replies; 227+ messages in thread
From: Pavel Machek @ 2014-09-30 21:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama


On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
> On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> > On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:

> > > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > > scanning of attached devices (once the cards are probed)
> > > 
> > > What would it do it card was a bit slow to probe?
> > > 
> > > > but has a sync
> > > > point for ordering.
> > > 
> > > Quite often we do not really care about ordering of devices. I mean,
> > > does it matter if your mouse is discovered before your keyboard or
> > > after?
> > 
> > Actually yes, I suspect it does.
> > 
> > I do evtest /dev/input/eventX by hand, occassionaly. It would be
> > annoying if they moved between reboots.
> 
> I am sorry but you will have to cope with such annoyances. It' snot like we 
> fail to boot the box here.
> 
> The systems are now mostly hot-pluggable and userland is supposed to
> handle it, and it does, at least for input devices. If you want stable naming
> use udev facilities to rename devices as needed or add needed symlinks (by-id, 
> etc.).

Well, it would be nice if udev was not mandatory. Do the sync points
for ordering actually cost us something?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-30 21:06                                               ` Pavel Machek
  (?)
@ 2014-09-30 21:34                                                 ` Dmitry Torokhov
  -1 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-30 21:34 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Cas ey Leedom, Hariprasad S, MPT-FusionLinux.pdl,
	Linux SCSI List, netdev

On Tue, Sep 30, 2014 at 11:06:34PM +0200, Pavel Machek wrote:
> 
> On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
> > On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> > > On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > > > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> 
> > > > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > > > scanning of attached devices (once the cards are probed)
> > > > 
> > > > What would it do it card was a bit slow to probe?
> > > > 
> > > > > but has a sync
> > > > > point for ordering.
> > > > 
> > > > Quite often we do not really care about ordering of devices. I mean,
> > > > does it matter if your mouse is discovered before your keyboard or
> > > > after?
> > > 
> > > Actually yes, I suspect it does.
> > > 
> > > I do evtest /dev/input/eventX by hand, occassionaly. It would be
> > > annoying if they moved between reboots.
> > 
> > I am sorry but you will have to cope with such annoyances. It' snot like we 
> > fail to boot the box here.
> > 
> > The systems are now mostly hot-pluggable and userland is supposed to
> > handle it, and it does, at least for input devices. If you want stable naming
> > use udev facilities to rename devices as needed or add needed symlinks (by-id, 
> > etc.).
> 
> Well, it would be nice if udev was not mandatory. Do the sync points
> for ordering actually cost us something?

Yes, boot time. We can save a second or two off the boot time if we probe
several devices/drivers simultaneously.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-30 21:34                                                 ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-30 21:34 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama

On Tue, Sep 30, 2014 at 11:06:34PM +0200, Pavel Machek wrote:
> 
> On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
> > On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> > > On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > > > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> 
> > > > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > > > scanning of attached devices (once the cards are probed)
> > > > 
> > > > What would it do it card was a bit slow to probe?
> > > > 
> > > > > but has a sync
> > > > > point for ordering.
> > > > 
> > > > Quite often we do not really care about ordering of devices. I mean,
> > > > does it matter if your mouse is discovered before your keyboard or
> > > > after?
> > > 
> > > Actually yes, I suspect it does.
> > > 
> > > I do evtest /dev/input/eventX by hand, occassionaly. It would be
> > > annoying if they moved between reboots.
> > 
> > I am sorry but you will have to cope with such annoyances. It' snot like we 
> > fail to boot the box here.
> > 
> > The systems are now mostly hot-pluggable and userland is supposed to
> > handle it, and it does, at least for input devices. If you want stable naming
> > use udev facilities to rename devices as needed or add needed symlinks (by-id, 
> > etc.).
> 
> Well, it would be nice if udev was not mandatory. Do the sync points
> for ordering actually cost us something?

Yes, boot time. We can save a second or two off the boot time if we probe
several devices/drivers simultaneously.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-09-30 21:34                                                 ` Dmitry Torokhov
  0 siblings, 0 replies; 227+ messages in thread
From: Dmitry Torokhov @ 2014-09-30 21:34 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Tejun Heo, Luis R. Rodriguez,
	Lennart Poettering, Kay Sievers, Greg Kroah-Hartman, Wu Zhangjin,
	Takashi Iwai, Arjan van de Ven, linux-kernel, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, One Thousand Gnomes,
	Tim Gardner, Pierre Fersing, Nagalakshmi Nandigama

On Tue, Sep 30, 2014 at 11:06:34PM +0200, Pavel Machek wrote:
> 
> On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
> > On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
> > > On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
> > > > On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
> 
> > > > > Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
> > > > > scanning of attached devices (once the cards are probed)
> > > > 
> > > > What would it do it card was a bit slow to probe?
> > > > 
> > > > > but has a sync
> > > > > point for ordering.
> > > > 
> > > > Quite often we do not really care about ordering of devices. I mean,
> > > > does it matter if your mouse is discovered before your keyboard or
> > > > after?
> > > 
> > > Actually yes, I suspect it does.
> > > 
> > > I do evtest /dev/input/eventX by hand, occassionaly. It would be
> > > annoying if they moved between reboots.
> > 
> > I am sorry but you will have to cope with such annoyances. It' snot like we 
> > fail to boot the box here.
> > 
> > The systems are now mostly hot-pluggable and userland is supposed to
> > handle it, and it does, at least for input devices. If you want stable naming
> > use udev facilities to rename devices as needed or add needed symlinks (by-id, 
> > etc.).
> 
> Well, it would be nice if udev was not mandatory. Do the sync points
> for ordering actually cost us something?

Yes, boot time. We can save a second or two off the boot time if we probe
several devices/drivers simultaneously.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-09-12 20:09                                         ` Luis R. Rodriguez
  (?)
@ 2014-10-10 21:54                                           ` Anatol Pomozov
  -1 siblings, 0 replies; 227+ messages in thread
From: Anatol Pomozov @ 2014-10-10 21:54 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tom Gundersen, Tejun Heo, One Thousand Gnomes, Takashi Iwai,
	Kay Sievers, Sreekanth Reddy, James Bottomley,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List,
	Linux SCSI List, netdev, Dmitry Torokhov, Oleg Nesterov,
	linux-kernel, Andrew Morton, Joseph Salisbury

Hi

On Fri, Sep 12, 2014 at 1:09 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Thu, Sep 11, 2014 at 10:48 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>>>> How about simply introducing a new flag to finit_module() to indicate
>>>> that the caller does not care about asynchronicity. We could then pass
>>>> this from udev, but existing scripts calling modprobe/insmod will not
>>>> be affected.
>>>
>>> Do you mean that you *do want asynchronicity*?
>>
>> Precisely, udev would opt-in, but existing scripts etc would not.
>
> Sure that's the other alternative that Tejun was mentioning.
>
>>>> But isn't finit_module() taking a long time a serious problem given
>>>> that it means no other module can be loaded in parallel?
>>>
>>> Indeed but having a desire to make the init() complete fast is
>>> different than the desire to have the combination of both init and
>>> probe fast synchronously.
>>
>> I guess no one is arguing that probe should somehow be required to be
>> fast, but rather:
>>
>>> If userspace wants init to be fast and let
>>> probe be async then userspace has no option but to deal with the fact
>>> that async probe will be async, and it should then use other methods
>>> to match any dependencies if its doing that itself.
>>
>> Correct. And this therefore likely needs to be opt-in behaviour per
>> finit_module() invocation to avoid breaking old assumptions.
>
> Sure.
>
>>> For example
>>> networking should not kick off after a network driver is loaded but
>>> rather one the device creeps up on udev. We should be good with
>>> networking dealing with this correctly today but not sure about other
>>> subsystems. depmod should be able to load the required modules in
>>> order and if bus drivers work right then probe of the remnant devices
>>> should happen asynchronously. The one case I can think of that is a
>>> bit different is modules-load.d things but those *do not rely on the
>>> timeout*, but are loaded prior to a service requirement. Note though
>>> that if those modules had probe and they then run async'd then systemd
>>> service would probably need to consider that the requirements may not
>>> be there until later. If this is not carefully considered that could
>>> introduce regression to users of modules-load.d when async probe is
>>> fully deployed. The same applies to systemd making assumptions of kmod
>>> loading a module and a dependency being complete as probe would have
>>> run it before.
>>
>> Yeah, these all needs to be considered when deciding whether or not to
>> enable async in each specific case.
>
> Yes and come to think of it I'd recommend opting out of async
> functionality for modules-load.d given that it does *not* hooked with
> the timeout and there is a good chances its users likely do want to
> wait for probe to run at this point.
>
> Given this I also am inclined now for the per module request to be
> async or not (default) from userspace. The above would be a good
> example starting use case.
>
>>> I believe one concern here lies in on whether or not userspace
>>> is properly equipped to deal with the requirements on module loading
>>> doing async probing and that possibly failing. Perhaps systemd might
>>> think all userspace is ready for that but are we sure that's the case?
>>
>> There almost certainly are custom things out there relying on the
>> synchronous behaviour, but if we make it opt-in we should not have a
>> problem.


We recently discussed this "timeout module loading" issue in Arch IRC
and here are few more ideas:

1) Why not to make the timeout configurable through config file? There
is already udev.conf you can put config option there. Thus people with
modprobe issues can easily "fix" the problem. And then decrease
default timeout back to 30 seconds. I agree that long module loading
(more than 30 secs) is abnormal and should be investigated by driver
authors.

2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
before killing the "modprobe" thread? sysrq will print information
about stuck threads (including modprobe itself) this will make
debugging easier. e.g. dmesg here
https://bugs.archlinux.org/task/40454 says nothing where the threads
were stuck.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-10 21:54                                           ` Anatol Pomozov
  0 siblings, 0 replies; 227+ messages in thread
From: Anatol Pomozov @ 2014-10-10 21:54 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Andrew Morton, Abhijit Mahajan,
	systemd Mailing List, Lin

Hi

On Fri, Sep 12, 2014 at 1:09 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Thu, Sep 11, 2014 at 10:48 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>>>> How about simply introducing a new flag to finit_module() to indicate
>>>> that the caller does not care about asynchronicity. We could then pass
>>>> this from udev, but existing scripts calling modprobe/insmod will not
>>>> be affected.
>>>
>>> Do you mean that you *do want asynchronicity*?
>>
>> Precisely, udev would opt-in, but existing scripts etc would not.
>
> Sure that's the other alternative that Tejun was mentioning.
>
>>>> But isn't finit_module() taking a long time a serious problem given
>>>> that it means no other module can be loaded in parallel?
>>>
>>> Indeed but having a desire to make the init() complete fast is
>>> different than the desire to have the combination of both init and
>>> probe fast synchronously.
>>
>> I guess no one is arguing that probe should somehow be required to be
>> fast, but rather:
>>
>>> If userspace wants init to be fast and let
>>> probe be async then userspace has no option but to deal with the fact
>>> that async probe will be async, and it should then use other methods
>>> to match any dependencies if its doing that itself.
>>
>> Correct. And this therefore likely needs to be opt-in behaviour per
>> finit_module() invocation to avoid breaking old assumptions.
>
> Sure.
>
>>> For example
>>> networking should not kick off after a network driver is loaded but
>>> rather one the device creeps up on udev. We should be good with
>>> networking dealing with this correctly today but not sure about other
>>> subsystems. depmod should be able to load the required modules in
>>> order and if bus drivers work right then probe of the remnant devices
>>> should happen asynchronously. The one case I can think of that is a
>>> bit different is modules-load.d things but those *do not rely on the
>>> timeout*, but are loaded prior to a service requirement. Note though
>>> that if those modules had probe and they then run async'd then systemd
>>> service would probably need to consider that the requirements may not
>>> be there until later. If this is not carefully considered that could
>>> introduce regression to users of modules-load.d when async probe is
>>> fully deployed. The same applies to systemd making assumptions of kmod
>>> loading a module and a dependency being complete as probe would have
>>> run it before.
>>
>> Yeah, these all needs to be considered when deciding whether or not to
>> enable async in each specific case.
>
> Yes and come to think of it I'd recommend opting out of async
> functionality for modules-load.d given that it does *not* hooked with
> the timeout and there is a good chances its users likely do want to
> wait for probe to run at this point.
>
> Given this I also am inclined now for the per module request to be
> async or not (default) from userspace. The above would be a good
> example starting use case.
>
>>> I believe one concern here lies in on whether or not userspace
>>> is properly equipped to deal with the requirements on module loading
>>> doing async probing and that possibly failing. Perhaps systemd might
>>> think all userspace is ready for that but are we sure that's the case?
>>
>> There almost certainly are custom things out there relying on the
>> synchronous behaviour, but if we make it opt-in we should not have a
>> problem.


We recently discussed this "timeout module loading" issue in Arch IRC
and here are few more ideas:

1) Why not to make the timeout configurable through config file? There
is already udev.conf you can put config option there. Thus people with
modprobe issues can easily "fix" the problem. And then decrease
default timeout back to 30 seconds. I agree that long module loading
(more than 30 secs) is abnormal and should be investigated by driver
authors.

2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
before killing the "modprobe" thread? sysrq will print information
about stuck threads (including modprobe itself) this will make
debugging easier. e.g. dmesg here
https://bugs.archlinux.org/task/40454 says nothing where the threads
were stuck.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-10 21:54                                           ` Anatol Pomozov
  0 siblings, 0 replies; 227+ messages in thread
From: Anatol Pomozov @ 2014-10-10 21:54 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Andrew Morton, Abhijit Mahajan,
	systemd Mailing List

Hi

On Fri, Sep 12, 2014 at 1:09 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> On Thu, Sep 11, 2014 at 10:48 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Fri, Sep 12, 2014 at 12:26 AM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Thu, Sep 11, 2014 at 2:43 PM, Tom Gundersen <teg@jklm.no> wrote:
>>>> How about simply introducing a new flag to finit_module() to indicate
>>>> that the caller does not care about asynchronicity. We could then pass
>>>> this from udev, but existing scripts calling modprobe/insmod will not
>>>> be affected.
>>>
>>> Do you mean that you *do want asynchronicity*?
>>
>> Precisely, udev would opt-in, but existing scripts etc would not.
>
> Sure that's the other alternative that Tejun was mentioning.
>
>>>> But isn't finit_module() taking a long time a serious problem given
>>>> that it means no other module can be loaded in parallel?
>>>
>>> Indeed but having a desire to make the init() complete fast is
>>> different than the desire to have the combination of both init and
>>> probe fast synchronously.
>>
>> I guess no one is arguing that probe should somehow be required to be
>> fast, but rather:
>>
>>> If userspace wants init to be fast and let
>>> probe be async then userspace has no option but to deal with the fact
>>> that async probe will be async, and it should then use other methods
>>> to match any dependencies if its doing that itself.
>>
>> Correct. And this therefore likely needs to be opt-in behaviour per
>> finit_module() invocation to avoid breaking old assumptions.
>
> Sure.
>
>>> For example
>>> networking should not kick off after a network driver is loaded but
>>> rather one the device creeps up on udev. We should be good with
>>> networking dealing with this correctly today but not sure about other
>>> subsystems. depmod should be able to load the required modules in
>>> order and if bus drivers work right then probe of the remnant devices
>>> should happen asynchronously. The one case I can think of that is a
>>> bit different is modules-load.d things but those *do not rely on the
>>> timeout*, but are loaded prior to a service requirement. Note though
>>> that if those modules had probe and they then run async'd then systemd
>>> service would probably need to consider that the requirements may not
>>> be there until later. If this is not carefully considered that could
>>> introduce regression to users of modules-load.d when async probe is
>>> fully deployed. The same applies to systemd making assumptions of kmod
>>> loading a module and a dependency being complete as probe would have
>>> run it before.
>>
>> Yeah, these all needs to be considered when deciding whether or not to
>> enable async in each specific case.
>
> Yes and come to think of it I'd recommend opting out of async
> functionality for modules-load.d given that it does *not* hooked with
> the timeout and there is a good chances its users likely do want to
> wait for probe to run at this point.
>
> Given this I also am inclined now for the per module request to be
> async or not (default) from userspace. The above would be a good
> example starting use case.
>
>>> I believe one concern here lies in on whether or not userspace
>>> is properly equipped to deal with the requirements on module loading
>>> doing async probing and that possibly failing. Perhaps systemd might
>>> think all userspace is ready for that but are we sure that's the case?
>>
>> There almost certainly are custom things out there relying on the
>> synchronous behaviour, but if we make it opt-in we should not have a
>> problem.


We recently discussed this "timeout module loading" issue in Arch IRC
and here are few more ideas:

1) Why not to make the timeout configurable through config file? There
is already udev.conf you can put config option there. Thus people with
modprobe issues can easily "fix" the problem. And then decrease
default timeout back to 30 seconds. I agree that long module loading
(more than 30 secs) is abnormal and should be investigated by driver
authors.

2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
before killing the "modprobe" thread? sysrq will print information
about stuck threads (including modprobe itself) this will make
debugging easier. e.g. dmesg here
https://bugs.archlinux.org/task/40454 says nothing where the threads
were stuck.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-10-10 21:54                                           ` Anatol Pomozov
  (?)
@ 2014-10-10 22:45                                             ` Tom Gundersen
  -1 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-10-10 22:45 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: Luis R. Rodriguez, Tejun Heo, One Thousand Gnomes, Takashi Iwai,
	Kay Sievers, Sreekanth Reddy, James Bottomley,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List,
	Linux SCSI List, netdev, Dmitry Torokhov, Oleg Nesterov,
	linux-kernel, Andrew Morton, Joseph Salisbury

On Fri, Oct 10, 2014 at 11:54 PM, Anatol Pomozov
<anatol.pomozov@gmail.com> wrote:
> 1) Why not to make the timeout configurable through config file? There
> is already udev.conf you can put config option there. Thus people with
> modprobe issues can easily "fix" the problem. And then decrease
> default timeout back to 30 seconds. I agree that long module loading
> (more than 30 secs) is abnormal and should be investigated by driver
> authors.

We can already configure this either on the udev or kernel
commandline, is that not sufficient (I don't object to also adding it
to the config file, just asking)?

> 2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
> before killing the "modprobe" thread? sysrq will print information
> about stuck threads (including modprobe itself) this will make
> debugging easier. e.g. dmesg here
> https://bugs.archlinux.org/task/40454 says nothing where the threads
> were stuck.

Are the current warnings (in udev git) sufficient (should tell you
which module is taking long, but still won't tell you which kernel
thread of course)?

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-10 22:45                                             ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-10-10 22:45 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Fri, Oct 10, 2014 at 11:54 PM, Anatol Pomozov
<anatol.pomozov@gmail.com> wrote:
> 1) Why not to make the timeout configurable through config file? There
> is already udev.conf you can put config option there. Thus people with
> modprobe issues can easily "fix" the problem. And then decrease
> default timeout back to 30 seconds. I agree that long module loading
> (more than 30 secs) is abnormal and should be investigated by driver
> authors.

We can already configure this either on the udev or kernel
commandline, is that not sufficient (I don't object to also adding it
to the config file, just asking)?

> 2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
> before killing the "modprobe" thread? sysrq will print information
> about stuck threads (including modprobe itself) this will make
> debugging easier. e.g. dmesg here
> https://bugs.archlinux.org/task/40454 says nothing where the threads
> were stuck.

Are the current warnings (in udev git) sufficient (should tell you
which module is taking long, but still won't tell you which kernel
thread of course)?

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-10 22:45                                             ` Tom Gundersen
  0 siblings, 0 replies; 227+ messages in thread
From: Tom Gundersen @ 2014-10-10 22:45 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

On Fri, Oct 10, 2014 at 11:54 PM, Anatol Pomozov
<anatol.pomozov@gmail.com> wrote:
> 1) Why not to make the timeout configurable through config file? There
> is already udev.conf you can put config option there. Thus people with
> modprobe issues can easily "fix" the problem. And then decrease
> default timeout back to 30 seconds. I agree that long module loading
> (more than 30 secs) is abnormal and should be investigated by driver
> authors.

We can already configure this either on the udev or kernel
commandline, is that not sufficient (I don't object to also adding it
to the config file, just asking)?

> 2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
> before killing the "modprobe" thread? sysrq will print information
> about stuck threads (including modprobe itself) this will make
> debugging easier. e.g. dmesg here
> https://bugs.archlinux.org/task/40454 says nothing where the threads
> were stuck.

Are the current warnings (in udev git) sufficient (should tell you
which module is taking long, but still won't tell you which kernel
thread of course)?

Cheers,

Tom

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-10-10 22:45                                             ` Tom Gundersen
  (?)
@ 2014-10-15 19:41                                               ` Anatol Pomozov
  -1 siblings, 0 replies; 227+ messages in thread
From: Anatol Pomozov @ 2014-10-15 19:41 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Luis R. Rodriguez, Tejun Heo, One Thousand Gnomes, Takashi Iwai,
	Kay Sievers, Sreekanth Reddy, James Bottomley,
	Praveen Krishnamoorthy, hare, Nagalakshmi Nandigama, Wu Zhangjin,
	Tetsuo Handa, mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List,
	Linux SCSI List, netdev, Dmitry Torokhov, Oleg Nesterov,
	linux-kernel, Andrew Morton, Joseph Salisbury

Hi

On Fri, Oct 10, 2014 at 3:45 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Fri, Oct 10, 2014 at 11:54 PM, Anatol Pomozov
> <anatol.pomozov@gmail.com> wrote:
>> 1) Why not to make the timeout configurable through config file? There
>> is already udev.conf you can put config option there. Thus people with
>> modprobe issues can easily "fix" the problem. And then decrease
>> default timeout back to 30 seconds. I agree that long module loading
>> (more than 30 secs) is abnormal and should be investigated by driver
>> authors.
>
> We can already configure this either on the udev or kernel
> commandline, is that not sufficient (I don't object to also adding it
> to the config file, just asking)?

I did not know that udev timeout can be configured via kernel cmd. And
because other people ask about changing timeout they most like did not
know about it neither. Actually looking at
http://www.freedesktop.org/software/systemd/man/kernel-command-line.html
I do not see where it mentions udev timeout.

I think adding configuration to the right place (udev config file) and
adding documentation to make the option more discoverable will solve
the topic starter issue. Now anyone can easily set timeout they want.
The default timeout can go back to 30 sec in this case.

>> 2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
>> before killing the "modprobe" thread? sysrq will print information
>> about stuck threads (including modprobe itself) this will make
>> debugging easier. e.g. dmesg here
>> https://bugs.archlinux.org/task/40454 says nothing where the threads
>> were stuck.
>
> Are the current warnings (in udev git) sufficient (should tell you
> which module is taking long, but still won't tell you which kernel
> thread of course)?

True. module name should be enough. In this case to debug the issue user needs:
 - disable failing udev rule (or blacklist module?)
 - reboot, it will let the user get into shell
 - modprobe the failing module
 - use sysrq-trigger to get more information about stuck process

So it is more matter of easier problem debugging. Not critical but it
will be useful imho. This feature can be configured via udev.conf

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-15 19:41                                               ` Anatol Pomozov
  0 siblings, 0 replies; 227+ messages in thread
From: Anatol Pomozov @ 2014-10-15 19:41 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

Hi

On Fri, Oct 10, 2014 at 3:45 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Fri, Oct 10, 2014 at 11:54 PM, Anatol Pomozov
> <anatol.pomozov@gmail.com> wrote:
>> 1) Why not to make the timeout configurable through config file? There
>> is already udev.conf you can put config option there. Thus people with
>> modprobe issues can easily "fix" the problem. And then decrease
>> default timeout back to 30 seconds. I agree that long module loading
>> (more than 30 secs) is abnormal and should be investigated by driver
>> authors.
>
> We can already configure this either on the udev or kernel
> commandline, is that not sufficient (I don't object to also adding it
> to the config file, just asking)?

I did not know that udev timeout can be configured via kernel cmd. And
because other people ask about changing timeout they most like did not
know about it neither. Actually looking at
http://www.freedesktop.org/software/systemd/man/kernel-command-line.html
I do not see where it mentions udev timeout.

I think adding configuration to the right place (udev config file) and
adding documentation to make the option more discoverable will solve
the topic starter issue. Now anyone can easily set timeout they want.
The default timeout can go back to 30 sec in this case.

>> 2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
>> before killing the "modprobe" thread? sysrq will print information
>> about stuck threads (including modprobe itself) this will make
>> debugging easier. e.g. dmesg here
>> https://bugs.archlinux.org/task/40454 says nothing where the threads
>> were stuck.
>
> Are the current warnings (in udev git) sufficient (should tell you
> which module is taking long, but still won't tell you which kernel
> thread of course)?

True. module name should be enough. In this case to debug the issue user needs:
 - disable failing udev rule (or blacklist module?)
 - reboot, it will let the user get into shell
 - modprobe the failing module
 - use sysrq-trigger to get more information about stuck process

So it is more matter of easier problem debugging. Not critical but it
will be useful imho. This feature can be configured via udev.conf

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-15 19:41                                               ` Anatol Pomozov
  0 siblings, 0 replies; 227+ messages in thread
From: Anatol Pomozov @ 2014-10-15 19:41 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

Hi

On Fri, Oct 10, 2014 at 3:45 PM, Tom Gundersen <teg@jklm.no> wrote:
> On Fri, Oct 10, 2014 at 11:54 PM, Anatol Pomozov
> <anatol.pomozov@gmail.com> wrote:
>> 1) Why not to make the timeout configurable through config file? There
>> is already udev.conf you can put config option there. Thus people with
>> modprobe issues can easily "fix" the problem. And then decrease
>> default timeout back to 30 seconds. I agree that long module loading
>> (more than 30 secs) is abnormal and should be investigated by driver
>> authors.
>
> We can already configure this either on the udev or kernel
> commandline, is that not sufficient (I don't object to also adding it
> to the config file, just asking)?

I did not know that udev timeout can be configured via kernel cmd. And
because other people ask about changing timeout they most like did not
know about it neither. Actually looking at
http://www.freedesktop.org/software/systemd/man/kernel-command-line.html
I do not see where it mentions udev timeout.

I think adding configuration to the right place (udev config file) and
adding documentation to make the option more discoverable will solve
the topic starter issue. Now anyone can easily set timeout they want.
The default timeout can go back to 30 sec in this case.

>> 2) Could you add 'echo w > /proc/sysrq-trigger' to udev code right
>> before killing the "modprobe" thread? sysrq will print information
>> about stuck threads (including modprobe itself) this will make
>> debugging easier. e.g. dmesg here
>> https://bugs.archlinux.org/task/40454 says nothing where the threads
>> were stuck.
>
> Are the current warnings (in udev git) sufficient (should tell you
> which module is taking long, but still won't tell you which kernel
> thread of course)?

True. module name should be enough. In this case to debug the issue user needs:
 - disable failing udev rule (or blacklist module?)
 - reboot, it will let the user get into shell
 - modprobe the failing module
 - use sysrq-trigger to get more information about stuck process

So it is more matter of easier problem debugging. Not critical but it
will be useful imho. This feature can be configured via udev.conf

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [systemd-devel] [RFC v2 3/6] kthread: warn on kill signal if not OOM
  2014-10-15 19:41                                               ` Anatol Pomozov
  (?)
@ 2014-10-15 19:46                                                 ` Alexander E. Patrakov
  -1 siblings, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-10-15 19:46 UTC (permalink / raw)
  To: Anatol Pomozov, Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List,
	Linux SCSI List, netdev, Dmitry Torokhov, Oleg Nesterov,
	linux-kernel, Tejun Heo, Andrew Morton, Joseph Salisbury

16.10.2014 01:41, Anatol Pomozov wrote:
> True. module name should be enough. In this case to debug the issue user needs:
>   - disable failing udev rule (or blacklist module?)
>   - reboot, it will let the user get into shell
>   - modprobe the failing module
>   - use sysrq-trigger to get more information about stuck process

Nitpick: this only works only if the "stuck modprobe" bug is 100% 
reproducible. Which is not a given. So it is better to collect as much 
information about the bug when it is noticed by systemd.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-15 19:46                                                 ` Alexander E. Patrakov
  0 siblings, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-10-15 19:46 UTC (permalink / raw)
  To: Anatol Pomozov, Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

16.10.2014 01:41, Anatol Pomozov wrote:
> True. module name should be enough. In this case to debug the issue user needs:
>   - disable failing udev rule (or blacklist module?)
>   - reboot, it will let the user get into shell
>   - modprobe the failing module
>   - use sysrq-trigger to get more information about stuck process

Nitpick: this only works only if the "stuck modprobe" bug is 100% 
reproducible. Which is not a given. So it is better to collect as much 
information about the bug when it is noticed by systemd.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
@ 2014-10-15 19:46                                                 ` Alexander E. Patrakov
  0 siblings, 0 replies; 227+ messages in thread
From: Alexander E. Patrakov @ 2014-10-15 19:46 UTC (permalink / raw)
  To: Anatol Pomozov, Tom Gundersen
  Cc: One Thousand Gnomes, Takashi Iwai, Kay Sievers, Sreekanth Reddy,
	James Bottomley, Praveen Krishnamoorthy, hare,
	Nagalakshmi Nandigama, Wu Zhangjin, Tetsuo Handa,
	mpt-fusionlinux.pdl, Tim Gardner, Benjamin Poirier,
	Santosh Rastapur, Casey Leedom, Hariprasad S, Pierre Fersing,
	Arjan van de Ven, Abhijit Mahajan, systemd Mailing List

16.10.2014 01:41, Anatol Pomozov wrote:
> True. module name should be enough. In this case to debug the issue user needs:
>   - disable failing udev rule (or blacklist module?)
>   - reboot, it will let the user get into shell
>   - modprobe the failing module
>   - use sysrq-trigger to get more information about stuck process

Nitpick: this only works only if the "stuck modprobe" bug is 100% 
reproducible. Which is not a given. So it is better to collect as much 
information about the bug when it is noticed by systemd.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
  2014-09-05 22:10   ` Dmitry Torokhov
  2014-10-20 23:43       ` Luis R. Rodriguez
@ 2014-10-20 23:43       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-10-20 23:43 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Tejun Heo,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Casey Leedom, Hariprasad S,
	mpt-fusionlinux.pdl, Linux SCSI List, netdev

> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 83e910a..49fe573 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -10,6 +10,7 @@
>   *
>   */
>
> +#include <linux/async.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/errno.h>
> @@ -547,15 +548,12 @@ void bus_probe_device(struct device *dev)
>  {
>         struct bus_type *bus = dev->bus;
>         struct subsys_interface *sif;
> -       int ret;
>
>         if (!bus)
>                 return;
>
> -       if (bus->p->drivers_autoprobe) {
> -               ret = device_attach(dev);
> -               WARN_ON(ret < 0);
> -       }
> +       if (bus->p->drivers_autoprobe)
> +               device_initial_probe(dev);
>
>         mutex_lock(&bus->p->mutex);
>         list_for_each_entry(sif, &bus->p->interfaces, node)
> @@ -657,6 +655,17 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf,
>  }
>  static DRIVER_ATTR_WO(uevent);

Based on my review with my latest changes this is what I was missing,
I'll be sure to address this.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
@ 2014-10-20 23:43       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-10-20 23:43 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Tejun Heo,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 83e910a..49fe573 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -10,6 +10,7 @@
>   *
>   */
>
> +#include <linux/async.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/errno.h>
> @@ -547,15 +548,12 @@ void bus_probe_device(struct device *dev)
>  {
>         struct bus_type *bus = dev->bus;
>         struct subsys_interface *sif;
> -       int ret;
>
>         if (!bus)
>                 return;
>
> -       if (bus->p->drivers_autoprobe) {
> -               ret = device_attach(dev);
> -               WARN_ON(ret < 0);
> -       }
> +       if (bus->p->drivers_autoprobe)
> +               device_initial_probe(dev);
>
>         mutex_lock(&bus->p->mutex);
>         list_for_each_entry(sif, &bus->p->interfaces, node)
> @@ -657,6 +655,17 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf,
>  }
>  static DRIVER_ATTR_WO(uevent);

Based on my review with my latest changes this is what I was missing,
I'll be sure to address this.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
@ 2014-10-20 23:43       ` Luis R. Rodriguez
  0 siblings, 0 replies; 227+ messages in thread
From: Luis R. Rodriguez @ 2014-10-20 23:43 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Tejun Heo,
	Arjan van de Ven, linux-kernel, Oleg Nesterov, hare,
	Andrew Morton, Tetsuo Handa, Joseph Salisbury, Benjamin Poirier,
	Santosh Rastapur, Kay Sievers, One Thousand Gnomes, Tim Gardner,
	Pierre Fersing, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan

> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 83e910a..49fe573 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -10,6 +10,7 @@
>   *
>   */
>
> +#include <linux/async.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/errno.h>
> @@ -547,15 +548,12 @@ void bus_probe_device(struct device *dev)
>  {
>         struct bus_type *bus = dev->bus;
>         struct subsys_interface *sif;
> -       int ret;
>
>         if (!bus)
>                 return;
>
> -       if (bus->p->drivers_autoprobe) {
> -               ret = device_attach(dev);
> -               WARN_ON(ret < 0);
> -       }
> +       if (bus->p->drivers_autoprobe)
> +               device_initial_probe(dev);
>
>         mutex_lock(&bus->p->mutex);
>         list_for_each_entry(sif, &bus->p->interfaces, node)
> @@ -657,6 +655,17 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf,
>  }
>  static DRIVER_ATTR_WO(uevent);

Based on my review with my latest changes this is what I was missing,
I'll be sure to address this.

 Luis

^ permalink raw reply	[flat|nested] 227+ messages in thread

end of thread, other threads:[~2014-10-20 23:43 UTC | newest]

Thread overview: 227+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05  6:37 [RFC v2 0/6] driver-core: add asynch probe support Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 1/6] driver-core: generalize freeing driver private member Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
2014-09-05 11:24   ` Oleg Nesterov
2014-09-05 11:24     ` Oleg Nesterov
2014-09-05 17:25     ` Luis R. Rodriguez
2014-09-05 17:25       ` Luis R. Rodriguez
2014-09-05 22:10   ` Dmitry Torokhov
2014-10-20 23:43     ` Luis R. Rodriguez
2014-10-20 23:43       ` Luis R. Rodriguez
2014-10-20 23:43       ` Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 3/6] kthread: warn on kill signal if not OOM Luis R. Rodriguez
2014-09-05  6:37   ` Luis R. Rodriguez
2014-09-05  7:19   ` Tejun Heo
2014-09-05  7:19     ` Tejun Heo
2014-09-05  7:47     ` Luis R. Rodriguez
2014-09-05  7:47       ` Luis R. Rodriguez
2014-09-05  7:47       ` Luis R. Rodriguez
2014-09-05  9:14       ` Mike Galbraith
2014-09-05  9:14         ` Mike Galbraith
2014-09-05  9:14         ` Mike Galbraith
2014-09-05 14:12       ` Tejun Heo
2014-09-05 14:12         ` Tejun Heo
2014-09-05 14:12         ` Tejun Heo
2014-09-05 16:44         ` Dmitry Torokhov
2014-09-05 16:44           ` Dmitry Torokhov
2014-09-05 17:49           ` Tejun Heo
2014-09-05 17:49             ` Tejun Heo
2014-09-05 18:10             ` Dmitry Torokhov
2014-09-05 18:10               ` Dmitry Torokhov
2014-09-05 22:29               ` Tejun Heo
2014-09-05 22:29                 ` Tejun Heo
2014-09-05 22:31                 ` Tejun Heo
2014-09-05 22:31                   ` Tejun Heo
2014-09-05 22:49                   ` Dmitry Torokhov
2014-09-05 22:49                     ` Dmitry Torokhov
2014-09-05 22:55                     ` Tejun Heo
2014-09-05 22:55                       ` Tejun Heo
2014-09-05 23:22                       ` Dmitry Torokhov
2014-09-05 23:22                         ` Dmitry Torokhov
2014-09-05 23:32                         ` Tejun Heo
2014-09-05 23:32                           ` Tejun Heo
2014-09-05 22:45                 ` Arjan van de Ven
2014-09-05 22:45                   ` Arjan van de Ven
2014-09-05 22:52                   ` Dmitry Torokhov
2014-09-05 22:52                     ` Dmitry Torokhov
2014-09-05 22:52                     ` Dmitry Torokhov
2014-09-05 22:57                     ` Tejun Heo
2014-09-05 22:57                       ` Tejun Heo
2014-09-05 23:05                     ` Arjan van de Ven
2014-09-05 23:05                       ` Arjan van de Ven
2014-09-05 23:05                       ` Arjan van de Ven
2014-09-05 23:18                       ` Dmitry Torokhov
2014-09-05 23:18                         ` Dmitry Torokhov
2014-09-05 23:18                         ` Dmitry Torokhov
2014-09-05 18:12             ` Luis R. Rodriguez
2014-09-05 18:12               ` Luis R. Rodriguez
2014-09-05 18:12               ` Luis R. Rodriguez
2014-09-05 18:29               ` Dmitry Torokhov
2014-09-05 18:29                 ` Dmitry Torokhov
2014-09-05 18:29                 ` Dmitry Torokhov
2014-09-05 22:40               ` Tejun Heo
2014-09-05 22:40                 ` Tejun Heo
2014-09-05 22:40                 ` Tejun Heo
2014-09-09  1:04                 ` Luis R. Rodriguez
2014-09-09  1:04                   ` Luis R. Rodriguez
2014-09-09  1:04                   ` Luis R. Rodriguez
2014-09-09  1:10                   ` Tejun Heo
2014-09-09  1:10                     ` Tejun Heo
2014-09-09  1:10                     ` Tejun Heo
2014-09-09  1:13                     ` Tejun Heo
2014-09-09  1:13                       ` Tejun Heo
2014-09-09  1:13                       ` Tejun Heo
2014-09-09  1:22                     ` Tejun Heo
2014-09-09  1:22                       ` Tejun Heo
2014-09-09  1:22                       ` Tejun Heo
2014-09-09  1:26                       ` Luis R. Rodriguez
2014-09-09  1:26                         ` Luis R. Rodriguez
2014-09-09  1:26                         ` Luis R. Rodriguez
2014-09-09  1:29                         ` Tejun Heo
2014-09-09  1:29                           ` Tejun Heo
2014-09-09  1:29                           ` Tejun Heo
2014-09-09  1:38                           ` Luis R. Rodriguez
2014-09-09  1:38                             ` Luis R. Rodriguez
2014-09-09  1:38                             ` Luis R. Rodriguez
2014-09-09  1:47                             ` Tejun Heo
2014-09-09  1:47                               ` Tejun Heo
2014-09-09  1:47                               ` Tejun Heo
2014-09-09  2:28                               ` Luis R. Rodriguez
2014-09-09  2:28                                 ` Luis R. Rodriguez
2014-09-09  2:28                                 ` Luis R. Rodriguez
2014-09-09  2:39                                 ` Tejun Heo
2014-09-09  2:39                                   ` Tejun Heo
2014-09-09  2:39                                   ` Tejun Heo
2014-09-09  2:57                                   ` Luis R. Rodriguez
2014-09-09  2:57                                     ` Luis R. Rodriguez
2014-09-09  2:57                                     ` Luis R. Rodriguez
2014-09-09  3:03                                     ` Tejun Heo
2014-09-09  3:03                                       ` Tejun Heo
2014-09-09  3:03                                       ` Tejun Heo
2014-09-09  3:19                                       ` Luis R. Rodriguez
2014-09-09  3:19                                         ` Luis R. Rodriguez
2014-09-09  3:19                                         ` Luis R. Rodriguez
2014-09-09  3:25                                         ` Tejun Heo
2014-09-09  3:25                                           ` Tejun Heo
2014-09-09  3:25                                           ` Tejun Heo
2014-09-09 23:03                                           ` Tejun Heo
2014-09-09 23:03                                             ` Tejun Heo
2014-09-09 23:03                                             ` Tejun Heo
2014-09-12 20:14                                             ` Luis R. Rodriguez
2014-09-12 20:14                                               ` Luis R. Rodriguez
2014-09-22 16:36                                     ` Luis R. Rodriguez
2014-09-22 16:36                                       ` Luis R. Rodriguez
2014-09-10  5:13                         ` Tom Gundersen
2014-09-10  5:13                           ` Tom Gundersen
2014-09-10  5:13                           ` Tom Gundersen
2014-09-09  5:38                     ` James Bottomley
2014-09-09  5:38                       ` James Bottomley
2014-09-09  5:38                       ` James Bottomley
2014-09-09 19:16                       ` Luis R. Rodriguez
2014-09-09 19:16                         ` Luis R. Rodriguez
2014-09-09 19:16                         ` Luis R. Rodriguez
2014-09-09 19:35                         ` James Bottomley
2014-09-09 19:35                           ` James Bottomley
2014-09-09 19:35                           ` James Bottomley
2014-09-09 20:45                           ` Luis R. Rodriguez
2014-09-09 20:45                             ` Luis R. Rodriguez
2014-09-09 20:45                             ` Luis R. Rodriguez
2014-09-10  6:46                             ` [systemd-devel] " Tom Gundersen
2014-09-10  6:46                               ` Tom Gundersen
2014-09-10  6:46                               ` Tom Gundersen
2014-09-10 10:07                               ` [systemd-devel] " Ceriel Jacobs
2014-09-10 10:07                                 ` Ceriel Jacobs
2014-09-10 10:07                                 ` Ceriel Jacobs
2014-09-10 13:31                                 ` James Bottomley
2014-09-10 13:31                                   ` James Bottomley
2014-09-10 13:31                                   ` James Bottomley
2014-09-10 21:10                               ` Luis R. Rodriguez
2014-09-10 21:10                                 ` Luis R. Rodriguez
2014-09-10 21:10                                 ` Luis R. Rodriguez
2014-09-11  5:42                                 ` [systemd-devel] " Alexander E. Patrakov
2014-09-11  5:42                                   ` Alexander E. Patrakov
2014-09-11  5:42                                   ` Alexander E. Patrakov
2014-09-11 21:43                                 ` [systemd-devel] " Tom Gundersen
2014-09-11 21:43                                   ` Tom Gundersen
2014-09-11 21:43                                   ` Tom Gundersen
2014-09-11 22:26                                   ` [systemd-devel] " Luis R. Rodriguez
2014-09-11 22:26                                     ` Luis R. Rodriguez
2014-09-11 22:26                                     ` Luis R. Rodriguez
2014-09-12  5:48                                     ` Tom Gundersen
2014-09-12  5:48                                       ` Tom Gundersen
2014-09-12  5:48                                       ` Tom Gundersen
2014-09-12 20:09                                       ` [systemd-devel] " Luis R. Rodriguez
2014-09-12 20:09                                         ` Luis R. Rodriguez
2014-09-12 20:09                                         ` Luis R. Rodriguez
2014-10-10 21:54                                         ` [systemd-devel] " Anatol Pomozov
2014-10-10 21:54                                           ` Anatol Pomozov
2014-10-10 21:54                                           ` Anatol Pomozov
2014-10-10 22:45                                           ` [systemd-devel] " Tom Gundersen
2014-10-10 22:45                                             ` Tom Gundersen
2014-10-10 22:45                                             ` Tom Gundersen
2014-10-15 19:41                                             ` [systemd-devel] " Anatol Pomozov
2014-10-15 19:41                                               ` Anatol Pomozov
2014-10-15 19:41                                               ` Anatol Pomozov
2014-10-15 19:46                                               ` [systemd-devel] " Alexander E. Patrakov
2014-10-15 19:46                                                 ` Alexander E. Patrakov
2014-10-15 19:46                                                 ` Alexander E. Patrakov
2014-09-09 21:42                           ` Tejun Heo
2014-09-09 21:42                             ` Tejun Heo
2014-09-09 21:42                             ` Tejun Heo
2014-09-09 22:26                             ` James Bottomley
2014-09-09 22:26                               ` James Bottomley
2014-09-09 22:26                               ` James Bottomley
2014-09-09 22:41                               ` Tejun Heo
2014-09-09 22:41                                 ` Tejun Heo
2014-09-09 22:41                                 ` Tejun Heo
2014-09-09 22:46                                 ` James Bottomley
2014-09-09 22:46                                   ` James Bottomley
2014-09-09 22:46                                   ` James Bottomley
2014-09-09 22:52                                   ` Tejun Heo
2014-09-09 22:52                                     ` Tejun Heo
2014-09-09 22:52                                     ` Tejun Heo
2014-09-09 23:01                                   ` Dmitry Torokhov
2014-09-09 23:01                                     ` Dmitry Torokhov
2014-09-09 23:01                                     ` Dmitry Torokhov
2014-09-11 19:59                                     ` James Bottomley
2014-09-11 19:59                                       ` James Bottomley
2014-09-11 19:59                                       ` James Bottomley
2014-09-11 20:23                                       ` Dmitry Torokhov
2014-09-11 20:23                                         ` Dmitry Torokhov
2014-09-11 20:23                                         ` Dmitry Torokhov
2014-09-11 20:42                                         ` Luis R. Rodriguez
2014-09-11 20:42                                           ` Luis R. Rodriguez
2014-09-11 20:42                                           ` Luis R. Rodriguez
2014-09-11 20:53                                           ` Dmitry Torokhov
2014-09-11 20:53                                             ` Dmitry Torokhov
2014-09-11 20:53                                             ` Dmitry Torokhov
2014-09-11 21:08                                             ` Luis R. Rodriguez
2014-09-11 21:08                                               ` Luis R. Rodriguez
2014-09-11 21:08                                               ` Luis R. Rodriguez
2014-09-22 19:49                                         ` Pavel Machek
2014-09-22 19:49                                           ` Pavel Machek
2014-09-22 19:49                                           ` Pavel Machek
2014-09-22 20:23                                           ` Dmitry Torokhov
2014-09-22 20:23                                             ` Dmitry Torokhov
2014-09-22 20:23                                             ` Dmitry Torokhov
2014-09-30 21:06                                             ` Pavel Machek
2014-09-30 21:06                                               ` Pavel Machek
2014-09-30 21:06                                               ` Pavel Machek
2014-09-30 21:34                                               ` Dmitry Torokhov
2014-09-30 21:34                                                 ` Dmitry Torokhov
2014-09-30 21:34                                                 ` Dmitry Torokhov
2014-09-09 22:00                         ` Jiri Kosina
2014-09-09 22:00                           ` Jiri Kosina
2014-09-09 22:00                           ` Jiri Kosina
2014-09-05 10:59   ` Oleg Nesterov
2014-09-05 10:59     ` Oleg Nesterov
2014-09-05 17:35     ` Luis R. Rodriguez
2014-09-05 17:35       ` Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 4/6] cxgb4: use async probe Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
2014-09-05  7:16   ` Tejun Heo
2014-09-05  7:23   ` Hannes Reinecke
2014-09-05  6:37 ` [RFC v2 6/6] pata_marvell: " Luis R. Rodriguez
2014-09-05  6:59   ` Alexander E. Patrakov
2014-09-05  7:15   ` Tejun Heo
2014-09-05  7:11 ` [RFC v2 0/6] driver-core: add asynch probe support Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.