All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans de Goede <hans@hansg.org>
To: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Bartosz Golaszewski <brgl@bgdev.pl>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	Andy Shevchenko <andy@kernel.org>,
	"open list:GPIO SUBSYSTEM" <linux-gpio@vger.kernel.org>
Subject: [6.9 gpiolib regression] gpiolib: triggers: kobject: 'gpiochipX' is not, initialized, yet kobject_get() errors
Date: Fri, 29 Mar 2024 15:11:21 +0100	[thread overview]
Message-ID: <bdea97a5-93e5-471f-88fc-a3c6ae74970a@hansg.org> (raw)

Hi All,

I've already tried to fix this, so let me just copy and paste my half finished patch
to explain the problem.

I was planning on submitting this as a RFC patch at least, but there are also some
other new issues with 6.9 on this tablet and I'm not sure how this interacts
with those issues and I don't have time to work on this any further this weekend.

Anyways below is the patch / bug report.

I'm wondering if a better fix would be to add a "ready" flag to gdev
and may gpiochip_find ignore not yet ready chips (we need them on
the list before they are ready to reserve the GPIO numbers) ?

Regards,

Hans



From 78e1bdf439e42a7346c01e0817b05b854ac8e16f Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Fri, 29 Mar 2024 14:44:27 +0100
Subject: [PATCH] gpiolib: Fix triggering kobject: 'gpiochipX' is not
 initialized, yet kobject_get() errors

When a gpiochip gets added by loading a module, then another driver may
be waiting for that gpiochip to load on the deferred-probe list.

If the deferred-probe for the consumer of gpiochip then triggers between
the gpiodev_add_to_list_unlocked() calls which makes gpio_device_find()
see the chip and the gpiochip_setup_dev() later then gpio_device_find()
does a kobject_get() on an uninitialzed kobject since the kobject is
initialized by gpiochip_setup_dev() calling device_initialize().

To fix this move the device_initialize() from gpiochip_setup_dev()
to gpiochip_add_data_with_key(). This also allows removing the weird
if (gdev->dev.release) in the error-exit cleanup handling.

As noted in the added "HACK" comment this patch is not complete
yet, the scru_init needs to be moved to before the device_initialize()
to avoid the need for the cleanup_scru_on_release bool flag hack.

This is why I've marked this as a RFC for now, I don't have time
to finish this atm but I wanted to get this out there to let others
know about the issue and think about the proposed fix.

Here is the backtrace of the problem triggering:

[   30.408904] arizona spi-10WM5102:00: cannot find GPIO chip arizona, deferring
[   30.422987] arizona spi-10WM5102:00: cannot find GPIO chip arizona, deferring
[   30.456477] arizona spi-10WM5102:00: cannot find GPIO chip arizona, deferring
[   30.619517] ------------[ cut here ]------------
[   30.619580] kobject: 'gpiochip5' (00000000241466f2): is not initialized, yet kobject_get() is being called.
[   30.619664] WARNING: CPU: 3 PID: 42 at lib/kobject.c:640 kobject_get+0x43/0x70
[   30.619685] Modules linked in: rmi_core(+) lenovo_yoga_tab2_pro_1380_fastcharger(E) cs_dsp industrialio gpio_arizona(+) extcon_lc824206xa(+) arizona_micsupp(+) lp855x_bl bq27xxx_battery_i2c pn544_mei bq27xxx_battery phy_tusb1210 mei_phy dwc3 pn544 hci nfc snd_soc_sst_bytcr_wm5102 udc_core mei_hdcp ulpi mei_pxp gpio_keys intel_rapl_msr intel_soc_dts_thermal intel_soc_dts_iosf brcmfmac_wcc intel_powerclamp coretemp kvm_intel bq24190_charger x86_android_tablets(E) brcmfmac kvm snd_sof_acpi_intel_byt snd_sof_acpi snd_sof_intel_atom brcmutil punit_atom_debug snd_sof_xtensa_dsp cfg80211 intel_cstate snd_sof atomisp(C) snd_sof_utils atomisp_gmin_platform(C) ipu_bridge pcspkr v4l2_fwnode v4l2_async snd_intel_sst_acpi videobuf2_vmalloc snd_intel_sst_core videobuf2_memops snd_soc_sst_atom_hifi2_platform videobuf2_v4l2 intel_bytcrc_pwrsrc(E) videodev snd_soc_acpi_intel_match videobuf2_common snd_soc_acpi snd_intel_dspcfg mei_txe snd_intel_sdw_acpi snd_hdmi_lpe_audio mei mc snd_soc_core lpc_ich dwc3_pci hci_uart btqca btrtl
[   30.620209]  btintel snd_compress ac97_bus btbcm snd_pcm_dmaengine int3401_thermal processor_thermal_device processor_thermal_wt_hint bluetooth processor_thermal_rfim snd_seq processor_thermal_rapl binfmt_misc intel_rapl_common soc_button_array snd_seq_device int3406_thermal snd_pcm processor_thermal_wt_req processor_thermal_power_floor int3403_thermal processor_thermal_mbox dptf_power int340x_thermal_zone int3400_thermal acpi_thermal_rel ecdh_generic rfkill_gpio intel_int0002_vgpio(E) snd_timer rfkill arizona_spi arizona_ldo1 snd arizona acpi_pad regmap_spi soundcore vfat fat loop nfnetlink zram i915 crct10dif_pclmul crc32_pclmul mmc_block crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 wdat_wdt sha1_ssse3 i2c_algo_bit drm_buddy ttm drm_display_helper cec video sdhci_acpi wmi sdhci spi_pxa2xx_platform mmc_core i2c_hid_acpi i2c_hid dw_dmac pwm_lpss_platform pwm_lpss ip6_tables ip_tables i2c_dev fuse
[   30.620662] CPU: 3 PID: 42 Comm: kworker/u18:0 Tainted: G         C  E      6.9.0-rc1+ #9
[   30.620675] Hardware name: Intel Corp. VALLEYVIEW C0 PLATFORM/BYT-T FFD8, BIOS BLADE_21.X64.0005.R00.1504101516 FFD8_X64_R_2015_04_10_1516 04/10/2015
[   30.620685] Workqueue: events_unbound deferred_probe_work_func
[   30.620708] RIP: 0010:kobject_get+0x43/0x70
[   30.620722] Code: 0f c1 43 38 85 c0 74 39 8d 50 01 09 c2 78 1f 48 89 d8 5b c3 cc cc cc cc 48 8b 37 48 89 fa 48 c7 c7 00 b5 b1 a9 e8 ed 1a 0b ff <0f> 0b eb c8 be 01 00 00 00 e8 ef 47 82 ff 48 89 d8 5b c3 cc cc cc
[   30.620733] RSP: 0000:ffffb743401b7b88 EFLAGS: 00010296
[   30.620749] RAX: 000000000000005f RBX: ffff9ea156b13800 RCX: 0000000000000000
[   30.620758] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
[   30.620767] RBP: ffff9ea156b13800 R08: 0000000000000000 R09: ffffb743401b7a30
[   30.620776] R10: ffff9ea1adbe2fa8 R11: 0000000000000003 R12: 0000000000000000
[   30.620785] R13: ffff9ea148d8f830 R14: ffff9ea156b13e80 R15: ffffffffa8947926
[   30.620794] FS:  0000000000000000(0000) GS:ffff9ea1b9580000(0000) knlGS:0000000000000000
[   30.620805] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.620814] CR2: 00007f67ea538478 CR3: 0000000004850000 CR4: 00000000001006f0
[   30.620824] Call Trace:
[   30.620834]  <TASK>
[   30.620849]  ? __warn.cold+0xb1/0x13e
[   30.620867]  ? kobject_get+0x43/0x70
[   30.620885]  ? report_bug+0xe6/0x170
[   30.620910]  ? handle_bug+0x3c/0x80
[   30.620927]  ? exc_invalid_op+0x13/0x60
[   30.620943]  ? asm_exc_invalid_op+0x16/0x20
[   30.620957]  ? gpio_device_find+0x16/0x260
[   30.620997]  ? kobject_get+0x43/0x70
[   30.621016]  ? kobject_get+0x43/0x70
[   30.621029]  gpio_device_find+0x216/0x260
[   30.621047]  ? __pfx_gpio_chip_match_by_label+0x10/0x10
[   30.621120]  gpiod_find_and_request+0x33a/0x480
[   30.621140]  ? __pfx_device_match_name+0x10/0x10
[   30.621170]  gpiod_get+0x41/0x60
[   30.621191]  snd_byt_wm5102_mc_probe+0xfd/0x500 [snd_soc_sst_bytcr_wm5102]
[   30.621231]  ? __pfx___device_attach_driver+0x10/0x10
[   30.621247]  platform_probe+0x40/0xa0
[   30.621269]  really_probe+0xde/0x340
[   30.621282]  ? pm_runtime_barrier+0x50/0x90
[   30.621304]  __driver_probe_device+0x78/0x110
[   30.621324]  driver_probe_device+0x1f/0xa0
[   30.621343]  __device_attach_driver+0x85/0x110
[   30.621364]  bus_for_each_drv+0x78/0xc0
[   30.621389]  __device_attach+0xb0/0x1b0
[   30.621413]  bus_probe_device+0x94/0xb0
[   30.621435]  deferred_probe_work_func+0x99/0xf0
[   30.621452]  process_one_work+0x222/0x5a0
[   30.621470]  ? move_linked_works+0x70/0xa0
[   30.621502]  worker_thread+0x1d1/0x3e0
[   30.621526]  ? __pfx_worker_thread+0x10/0x10
[   30.621539]  kthread+0xee/0x120
[   30.621554]  ? __pfx_kthread+0x10/0x10
[   30.621572]  ret_from_fork+0x30/0x50
[   30.621587]  ? __pfx_kthread+0x10/0x10
[   30.621602]  ret_from_fork_asm+0x1a/0x30
[   30.621652]  </TASK>
[   30.621661] irq event stamp: 11481
[   30.621669] hardirqs last  enabled at (11487): [<ffffffffa81b9ccd>] console_unlock+0x10d/0x140
[   30.621683] hardirqs last disabled at (11492): [<ffffffffa81b9cb2>] console_unlock+0xf2/0x140
[   30.621695] softirqs last  enabled at (11330): [<ffffffffa81143eb>] __irq_exit_rcu+0x9b/0x100
[   30.621708] softirqs last disabled at (11325): [<ffffffffa81143eb>] __irq_exit_rcu+0x9b/0x100
[   30.621720] ---[ end trace 0000000000000000 ]---

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/gpio/gpiolib.c | 34 ++++++++++++++++++++--------------
 drivers/gpio/gpiolib.h |  1 +
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index ce94e37bcbee..2dbd1183d35c 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -697,13 +697,15 @@ static void gpiodev_release(struct device *dev)
 	struct gpio_device *gdev = to_gpio_device(dev);
 	unsigned int i;
 
-	for (i = 0; i < gdev->ngpio; i++)
-		cleanup_srcu_struct(&gdev->descs[i].srcu);
+	if (gdev->cleanup_scru_on_release) {
+		cleanup_srcu_struct(&gdev->srcu);
+		for (i = 0; i < gdev->ngpio; i++)
+			cleanup_srcu_struct(&gdev->descs[i].srcu);
+	}
 
 	ida_free(&gpio_ida, gdev->id);
 	kfree_const(gdev->label);
 	kfree(gdev->descs);
-	cleanup_srcu_struct(&gdev->srcu);
 	kfree(gdev);
 }
 
@@ -729,8 +731,6 @@ static int gpiochip_setup_dev(struct gpio_device *gdev)
 	struct fwnode_handle *fwnode = dev_fwnode(&gdev->dev);
 	int ret;
 
-	device_initialize(&gdev->dev);
-
 	/*
 	 * If fwnode doesn't belong to another device, it's safe to clear its
 	 * initialized flag.
@@ -927,6 +927,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 	gdev->ngpio = gc->ngpio;
 	gdev->can_sleep = gc->can_sleep;
 
+	device_initialize(&gdev->dev);
+
 	scoped_guard(mutex, &gpio_devices_lock) {
 		/*
 		 * TODO: this allocates a Linux GPIO number base in the global
@@ -941,7 +943,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 			if (base < 0) {
 				ret = base;
 				base = 0;
-				goto err_free_label;
+				goto err_gpio_device_put;
 			}
 
 			/*
@@ -961,7 +963,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 		ret = gpiodev_add_to_list_unlocked(gdev);
 		if (ret) {
 			chip_err(gc, "GPIO integer space overlap, cannot add chip\n");
-			goto err_free_label;
+			goto err_gpio_device_put;
 		}
 	}
 
@@ -1045,6 +1047,14 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 		if (ret)
 			goto err_remove_irqchip;
 	}
+
+	/*
+	 * HACK instead initializing all the scru structs should be moved
+	 * to above the device_initialize(&gdev->dev) call, and their cleanup
+	 * on error should be moved accordingly and skipped with the
+	 * goto err_print_message; when error-exiting after device_initialize().
+	 */
+	gdev->cleanup_scru_on_release = true;
 	return 0;
 
 err_remove_irqchip:
@@ -1067,13 +1077,9 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 	scoped_guard(mutex, &gpio_devices_lock)
 		list_del_rcu(&gdev->list);
 	synchronize_srcu(&gpio_devices_srcu);
-	if (gdev->dev.release) {
-		/* release() has been registered by gpiochip_setup_dev() */
-		gpio_device_put(gdev);
-		goto err_print_message;
-	}
-err_free_label:
-	kfree_const(gdev->label);
+err_gpio_device_put:
+	gpio_device_put(gdev);
+	goto err_print_message;
 err_free_descs:
 	kfree(gdev->descs);
 err_free_dev_name:
diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h
index f67d5991ab1c..199b9c49974e 100644
--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -64,6 +64,7 @@ struct gpio_device {
 	int			base;
 	u16			ngpio;
 	bool			can_sleep;
+	bool			cleanup_scru_on_release;
 	const char		*label;
 	void			*data;
 	struct list_head        list;
-- 
2.44.0


             reply	other threads:[~2024-03-29 14:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-29 14:11 Hans de Goede [this message]
2024-03-29 15:16 ` [6.9 gpiolib regression] gpiolib: triggers: kobject: 'gpiochipX' is not, initialized, yet kobject_get() errors Bartosz Golaszewski
2024-03-29 19:40   ` Andy Shevchenko
2024-04-02 13:41   ` Hans de Goede
2024-04-02 13:54     ` Andy Shevchenko
2024-04-02 14:11       ` Bartosz Golaszewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bdea97a5-93e5-471f-88fc-a3c6ae74970a@hansg.org \
    --to=hans@hansg.org \
    --cc=andy@kernel.org \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=brgl@bgdev.pl \
    --cc=linus.walleij@linaro.org \
    --cc=linux-gpio@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.