linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
@ 2016-12-18  1:08 Ioan-Adrian Ratiu
  2016-12-20  1:25 ` Takashi Sakamoto
  0 siblings, 1 reply; 8+ messages in thread
From: Ioan-Adrian Ratiu @ 2016-12-18  1:08 UTC (permalink / raw)
  To: tiwai, perex; +Cc: alsa-devel, linux-kernel

Commit 16200948d835 ("ALSA: usb-audio: Fix race at stopping the stream")
fixes a race-codition but it also introduces another really nasty data race
regression which makes my usb sound card [1] completely useless, throwing
the kernel into a panic if anything from userspace tries to start playback.

The problem is this: ep->data_subs is now set to NULL every time inside
wait_clear_urbs(). ep->data_subs is initalized only in one place in
start_endpoints(), then it is immediately wiped by the pre-existing call to
wait_clear_urbs() inside snd_usb_endpoint_start().

To ilustrate, this is what happens in the non-irq part of the code:

Step 1 (inside start_endpoints): ep->data_subs = subs;
Step 2 (inside start_endpoints): snd_usb_endpoint_start(ep, can_sleep);
Step 3 (inside snd_usb_endpoint_start): wait_clear_urbs(ep);
Step 4 (inside wait_clear_urbs): ep->data_subs = NULL;

Here's a simplified call stack for the IRQ part (full one at the end):

(NULL dereference, param "subs" is passed NULL value of ep->data_subs)
retire_playback_urb
retire_outbound_urb
snd_complete_urb
(...)
xhci_irq

Looking at the git log there seems to be quite a lot of history in this
part of the codebase, dating back to 2012 or earlier. My fix is based on
015618b90 ("ALSA: snd-usb: Fix URB cancellation at stream start") and
e9ba389c5 ("ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture
stream") with a few modifications of my own. My idea is to do the
ep->data_subs wiping before endpoint initialization in a manner similar
to these older commits by using stop_endpoints() in snd_usb_pcm_prepare()
and at the same time keep the ep->data_subs = NULL in wait_clear_urbs() to
not trigger the recently fixed stream stopping race again.

Full call stack:

[  131.093240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  131.094313] IP: retire_playback_urb+0x1b/0x160 [snd_usb_audio]
[  131.095046] PGD 0
[  131.095047]
[  131.096509] Oops: 0000 [#1] PREEMPT SMP
[  131.097255] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ctr ccm arc4 ath9k intel_rapl ath9k_common x86_pkg_temp_thermal ath9k_hw intel_powerclamp coretemp joydev mousedev ath kvm_intel mac80211 kvm
 input_leds hid_generic psmouse irqbypass usbhid hid crct10dif_pclmul crc32_pclmul serio_raw crc32c_intel atkbd ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcbc aesni_intel libps2 cfg8
0211 aes_x86_64 crypto_simd dell_laptop glue_helper r8169 dell_smbios snd_hda_codec sr_mod cryptd led_class mii rfkill snd_hwdep cdrom snd_hda_core ac dcdbas i8042 xhci_pci serio xhci_hcd snd_pcm tpm_tis battery tpm_tis_core lpc_ich tpm
snd_timer evdev shpchp i2c_i801 sch_fq_codel
[  131.105551] CPU: 2 PID: 165 Comm: irq/29-xhci_hcd Not tainted 4.9.0-gd824cdc58ba0 #10
[  131.107516] Hardware name: Dell Inc. Inspiron 3521/018DYG, BIOS A14 07/31/2015
[  131.109592] task: ffff880154a70000 task.stack: ffffc90000f48000
[  131.111746] RIP: 0010:retire_playback_urb+0x1b/0x160 [snd_usb_audio]
[  131.113899] RSP: 0018:ffffc90000f4bc10 EFLAGS: 00010082
[  131.116080] RAX: ffffffffa04cabe0 RBX: 0000000000000000 RCX: 0000000000000000
[  131.118284] RDX: 0000000000000000 RSI: ffff8801435a4c00 RDI: 0000000000000000
[  131.120505] RBP: ffffc90000f4bc40 R08: 0000000000000001 R09: 0000000000000001
[  131.122807] R10: 0000000000000001 R11: ffffffff82f60d6d R12: ffff8801432a0238
[  131.125265] R13: ffff8801435a4c00 R14: 0000000000000000 R15: ffff8801535c83b8
[  131.127723] FS:  0000000000000000(0000) GS:ffff88015a000000(0000) knlGS:0000000000000000
[  131.130217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  131.132568] CR2: 0000000000000010 CR3: 0000000001e0b000 CR4: 00000000001406e0
[  131.135007] Call Trace:
[  131.137329]  snd_complete_urb+0x190/0x2b0 [snd_usb_audio]
[  131.139526]  __usb_hcd_giveback_urb+0x8e/0x160
[  131.141948]  usb_hcd_giveback_urb+0x43/0x110
[  131.144104]  xhci_giveback_urb_in_irq.isra.22+0x7d/0xb0 [xhci_hcd]
[  131.146521]  finish_td.constprop.38+0x1de/0x2f0 [xhci_hcd]
[  131.148736]  xhci_irq+0x13a2/0x1ca0 [xhci_hcd]
[  131.150972]  ? trace_hardirqs_on+0xd/0x10
[  131.153209]  ? _raw_spin_unlock_irq+0x2c/0x50
[  131.155390]  ? irq_thread+0xb5/0x1d0
[  131.157771]  xhci_msi_irq+0x11/0x20 [xhci_hcd]
[  131.159938]  irq_forced_thread_fn+0x2f/0x70
[  131.162072]  ? irq_thread+0xb5/0x1d0
[  131.164141]  irq_thread+0x149/0x1d0
[  131.166132]  ? irq_finalize_oneshot.part.2+0xe0/0xe0
[  131.168142]  ? wake_threads_waitq+0x30/0x30
[  131.170149]  kthread+0x10f/0x150
[  131.172144]  ? irq_thread_dtor+0xc0/0xc0
[  131.174139]  ? kthread_create_on_node+0x40/0x40
[  131.176116]  ret_from_fork+0x2a/0x40
[  131.178074] Code: 8b 77 64 4c 89 e7 e8 e5 fe ff ff eb c3 0f 1f 00 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 <48> 8b 47 10 48 8b 4f 70 48 c7 c7 88 7b 4d a0 4c 8b a0 78 01 00
[  131.182382] RIP: retire_playback_urb+0x1b/0x160 [snd_usb_audio] RSP: ffffc90000f4bc10
[  131.184562] CR2: 0000000000000010

[1] 041e:3232 Creative Technology SoundBlaster X-FI HD
[2] http://mailman.alsa-project.org/pipermail/alsa-devel/2016-December/115425.html

Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
---
 sound/usb/endpoint.c | 11 ++---------
 sound/usb/endpoint.h |  2 +-
 sound/usb/pcm.c      | 13 ++++++-------
 3 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
index a2cdf3370afe..4465f324c2c2 100644
--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -920,9 +920,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
 /**
  * snd_usb_endpoint_start: start an snd_usb_endpoint
  *
- * @ep:		the endpoint to start
- * @can_sleep:	flag indicating whether the operation is executed in
- * 		non-atomic context
+ * @ep: the endpoint to start
  *
  * A call to this function will increment the use count of the endpoint.
  * In case it is not already running, the URBs for this endpoint will be
@@ -932,7 +930,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
  *
  * Returns an error if the URB submission failed, 0 in all other cases.
  */
-int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
+int snd_usb_endpoint_start(struct snd_usb_endpoint *ep)
 {
 	int err;
 	unsigned int i;
@@ -944,11 +942,6 @@ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
 	if (++ep->use_count != 1)
 		return 0;
 
-	/* just to be sure */
-	deactivate_urbs(ep, false);
-	if (can_sleep)
-		wait_clear_urbs(ep);
-
 	ep->active_mask = 0;
 	ep->unlink_mask = 0;
 	ep->phase = 0;
diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
index 6428392d8f62..584f295d7c77 100644
--- a/sound/usb/endpoint.h
+++ b/sound/usb/endpoint.h
@@ -18,7 +18,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
 				struct audioformat *fmt,
 				struct snd_usb_endpoint *sync_ep);
 
-int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep);
+int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep);
 void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep);
 void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
 int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 34c6d4f2c0b6..db26f767f851 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -218,7 +218,7 @@ int snd_usb_init_pitch(struct snd_usb_audio *chip, int iface,
 	}
 }
 
-static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
+static int start_endpoints(struct snd_usb_substream *subs)
 {
 	int err;
 
@@ -231,7 +231,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
 		dev_dbg(&subs->dev->dev, "Starting data EP @%p\n", ep);
 
 		ep->data_subs = subs;
-		err = snd_usb_endpoint_start(ep, can_sleep);
+		err = snd_usb_endpoint_start(ep);
 		if (err < 0) {
 			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
 			return err;
@@ -260,7 +260,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
 		dev_dbg(&subs->dev->dev, "Starting sync EP @%p\n", ep);
 
 		ep->sync_slave = subs->data_endpoint;
-		err = snd_usb_endpoint_start(ep, can_sleep);
+		err = snd_usb_endpoint_start(ep);
 		if (err < 0) {
 			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
 			return err;
@@ -809,8 +809,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 		goto unlock;
 	}
 
-	snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
-	snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
+	stop_endpoints(subs, true);
 
 	ret = set_format(subs, subs->cur_audiofmt);
 	if (ret < 0)
@@ -850,7 +849,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 	/* for playback, submit the URBs now; otherwise, the first hwptr_done
 	 * updates for all URBs would happen at the same time when starting */
 	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
-		ret = start_endpoints(subs, true);
+		return start_endpoints(subs);
 
  unlock:
 	snd_usb_unlock_shutdown(subs->stream->chip);
@@ -1666,7 +1665,7 @@ static int snd_usb_substream_capture_trigger(struct snd_pcm_substream *substream
 
 	switch (cmd) {
 	case SNDRV_PCM_TRIGGER_START:
-		err = start_endpoints(subs, false);
+		err = start_endpoints(subs);
 		if (err < 0)
 			return err;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-18  1:08 [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference Ioan-Adrian Ratiu
@ 2016-12-20  1:25 ` Takashi Sakamoto
  2016-12-20 11:47   ` Ioan-Adrian Ratiu
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Sakamoto @ 2016-12-20  1:25 UTC (permalink / raw)
  To: clemens, tiwai, adi; +Cc: alsa-devel, linux-kernel

---

Hi,

> Commit 16200948d835 ("ALSA: usb-audio: Fix race at stopping the stream")
> fixes a race-codition but it also introduces another really nasty data race
> regression which makes my usb sound card [1] completely useless, throwing
> the kernel into a panic if anything from userspace tries to start playback.
> 
> The problem is this: ep->data_subs is now set to NULL every time inside
> wait_clear_urbs(). ep->data_subs is initalized only in one place in
> start_endpoints(), then it is immediately wiped by the pre-existing call to
> wait_clear_urbs() inside snd_usb_endpoint_start().
> 
> To ilustrate, this is what happens in the non-irq part of the code:
> 
> Step 1 (inside start_endpoints): ep->data_subs = subs;
> Step 2 (inside start_endpoints): snd_usb_endpoint_start(ep, can_sleep);
> Step 3 (inside snd_usb_endpoint_start): wait_clear_urbs(ep);
> Step 4 (inside wait_clear_urbs): ep->data_subs = NULL;
> 
> Here's a simplified call stack for the IRQ part (full one at the end):
> 
> (NULL dereference, param "subs" is passed NULL value of ep->data_subs)
> retire_playback_urb
> retire_outbound_urb
> snd_complete_urb
> (...)
> xhci_irq
> 
> Looking at the git log there seems to be quite a lot of history in this
> part of the codebase, dating back to 2012 or earlier. My fix is based on
> 015618b90 ("ALSA: snd-usb: Fix URB cancellation at stream start") and
> e9ba389c5 ("ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture
> stream") with a few modifications of my own. My idea is to do the
> ep->data_subs wiping before endpoint initialization in a manner similar
> to these older commits by using stop_endpoints() in snd_usb_pcm_prepare()
> and at the same time keep the ep->data_subs = NULL in wait_clear_urbs() to
> not trigger the recently fixed stream stopping race again.
> 
> Full call stack:
> 
> [  131.093240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [  131.094313] IP: retire_playback_urb+0x1b/0x160 [snd_usb_audio]
> [  131.095046] PGD 0
> [  131.095047]
> [  131.096509] Oops: 0000 [#1] PREEMPT SMP
> [  131.097255] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ctr ccm arc4 ath9k intel_rapl ath9k_common x86_pkg_temp_thermal ath9k_hw intel_powerclamp coretemp joydev mousedev ath kvm_intel mac80211 kvm
>  input_leds hid_generic psmouse irqbypass usbhid hid crct10dif_pclmul crc32_pclmul serio_raw crc32c_intel atkbd ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcbc aesni_intel libps2 cfg8
> 0211 aes_x86_64 crypto_simd dell_laptop glue_helper r8169 dell_smbios snd_hda_codec sr_mod cryptd led_class mii rfkill snd_hwdep cdrom snd_hda_core ac dcdbas i8042 xhci_pci serio xhci_hcd snd_pcm tpm_tis battery tpm_tis_core lpc_ich tpm
> snd_timer evdev shpchp i2c_i801 sch_fq_codel
> [  131.105551] CPU: 2 PID: 165 Comm: irq/29-xhci_hcd Not tainted 4.9.0-gd824cdc58ba0 #10
> [  131.107516] Hardware name: Dell Inc. Inspiron 3521/018DYG, BIOS A14 07/31/2015
> [  131.109592] task: ffff880154a70000 task.stack: ffffc90000f48000
> [  131.111746] RIP: 0010:retire_playback_urb+0x1b/0x160 [snd_usb_audio]
> [  131.113899] RSP: 0018:ffffc90000f4bc10 EFLAGS: 00010082
> [  131.116080] RAX: ffffffffa04cabe0 RBX: 0000000000000000 RCX: 0000000000000000
> [  131.118284] RDX: 0000000000000000 RSI: ffff8801435a4c00 RDI: 0000000000000000
> [  131.120505] RBP: ffffc90000f4bc40 R08: 0000000000000001 R09: 0000000000000001
> [  131.122807] R10: 0000000000000001 R11: ffffffff82f60d6d R12: ffff8801432a0238
> [  131.125265] R13: ffff8801435a4c00 R14: 0000000000000000 R15: ffff8801535c83b8
> [  131.127723] FS:  0000000000000000(0000) GS:ffff88015a000000(0000) knlGS:0000000000000000
> [  131.130217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  131.132568] CR2: 0000000000000010 CR3: 0000000001e0b000 CR4: 00000000001406e0
> [  131.135007] Call Trace:
> [  131.137329]  snd_complete_urb+0x190/0x2b0 [snd_usb_audio]
> [  131.139526]  __usb_hcd_giveback_urb+0x8e/0x160
> [  131.141948]  usb_hcd_giveback_urb+0x43/0x110
> [  131.144104]  xhci_giveback_urb_in_irq.isra.22+0x7d/0xb0 [xhci_hcd]
> [  131.146521]  finish_td.constprop.38+0x1de/0x2f0 [xhci_hcd]
> [  131.148736]  xhci_irq+0x13a2/0x1ca0 [xhci_hcd]
> [  131.150972]  ? trace_hardirqs_on+0xd/0x10
> [  131.153209]  ? _raw_spin_unlock_irq+0x2c/0x50
> [  131.155390]  ? irq_thread+0xb5/0x1d0
> [  131.157771]  xhci_msi_irq+0x11/0x20 [xhci_hcd]
> [  131.159938]  irq_forced_thread_fn+0x2f/0x70
> [  131.162072]  ? irq_thread+0xb5/0x1d0
> [  131.164141]  irq_thread+0x149/0x1d0
> [  131.166132]  ? irq_finalize_oneshot.part.2+0xe0/0xe0
> [  131.168142]  ? wake_threads_waitq+0x30/0x30
> [  131.170149]  kthread+0x10f/0x150
> [  131.172144]  ? irq_thread_dtor+0xc0/0xc0
> [  131.174139]  ? kthread_create_on_node+0x40/0x40
> [  131.176116]  ret_from_fork+0x2a/0x40
> [  131.178074] Code: 8b 77 64 4c 89 e7 e8 e5 fe ff ff eb c3 0f 1f 00 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 <48> 8b 47 10 48 8b 4f 70 48 c7 c7 88 7b 4d a0 4c 8b a0 78 01 00
> [  131.182382] RIP: retire_playback_urb+0x1b/0x160 [snd_usb_audio] RSP: ffffc90000f4bc10
> [  131.184562] CR2: 0000000000000010
> 
> [1] 041e:3232 Creative Technology SoundBlaster X-FI HD
> [2] http://mailman.alsa-project.org/pipermail/alsa-devel/2016-December/115425.html

I can regenerate this bug with my EMU 0404 USB. My understanding of this bug is
quite similar to your perspective. This bug is quite critical. In my
understanding, we encounter this bug in all of cases in which snd-usb-audio is
applied for PCM playback substream.

Below workaround can also suppress this bug.

diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
index a2cdf33..9b34f76 100644
--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -537,11 +537,14 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
                        alive, ep->ep_num);
        clear_bit(EP_FLAG_STOPPING, &ep->flags);
 
-       ep->data_subs = NULL;
        ep->sync_slave = NULL;
        ep->retire_data_urb = NULL;
        ep->prepare_data_urb = NULL;
 
+       msleep(200);
+
+       ep->data_subs = NULL;
+
        return 0;
 }

If initialization of 'struct snd_usb_endpoint.data_subs' can be done after all
of queued URBs are processed and corresponding complete interrupts are cought,
we can solve this critical bug.

> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
> ---
>  sound/usb/endpoint.c | 11 ++---------
>  sound/usb/endpoint.h |  2 +-
>  sound/usb/pcm.c      | 13 ++++++-------
>  3 files changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> index a2cdf3370afe..4465f324c2c2 100644
> --- a/sound/usb/endpoint.c
> +++ b/sound/usb/endpoint.c
> @@ -920,9 +920,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>  /**
>   * snd_usb_endpoint_start: start an snd_usb_endpoint
>   *
> - * @ep:		the endpoint to start
> - * @can_sleep:	flag indicating whether the operation is executed in
> - * 		non-atomic context
> + * @ep: the endpoint to start
>   *
>   * A call to this function will increment the use count of the endpoint.
>   * In case it is not already running, the URBs for this endpoint will be
> @@ -932,7 +930,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>   *
>   * Returns an error if the URB submission failed, 0 in all other cases.
>   */
> -int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
> +int snd_usb_endpoint_start(struct snd_usb_endpoint *ep)
>  {
>  	int err;
>  	unsigned int i;
> @@ -944,11 +942,6 @@ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>  	if (++ep->use_count != 1)
>  		return 0;
>  
> -	/* just to be sure */
> -	deactivate_urbs(ep, false);
> -	if (can_sleep)
> -		wait_clear_urbs(ep);
> -
>  	ep->active_mask = 0;
>  	ep->unlink_mask = 0;
>  	ep->phase = 0;
> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
> index 6428392d8f62..584f295d7c77 100644
> --- a/sound/usb/endpoint.h
> +++ b/sound/usb/endpoint.h
> @@ -18,7 +18,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>  				struct audioformat *fmt,
>  				struct snd_usb_endpoint *sync_ep);
>  
> -int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep);
> +int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep);
>  void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep);
>  void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
>  int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
> index 34c6d4f2c0b6..db26f767f851 100644
> --- a/sound/usb/pcm.c
> +++ b/sound/usb/pcm.c
> @@ -218,7 +218,7 @@ int snd_usb_init_pitch(struct snd_usb_audio *chip, int iface,
>  	}
>  }
>  
> -static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
> +static int start_endpoints(struct snd_usb_substream *subs)
>  {
>  	int err;
>  
> @@ -231,7 +231,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>  		dev_dbg(&subs->dev->dev, "Starting data EP @%p\n", ep);
>  
>  		ep->data_subs = subs;
> -		err = snd_usb_endpoint_start(ep, can_sleep);
> +		err = snd_usb_endpoint_start(ep);
>  		if (err < 0) {
>  			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
>  			return err;
> @@ -260,7 +260,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>  		dev_dbg(&subs->dev->dev, "Starting sync EP @%p\n", ep);
>  
>  		ep->sync_slave = subs->data_endpoint;
> -		err = snd_usb_endpoint_start(ep, can_sleep);
> +		err = snd_usb_endpoint_start(ep);
>  		if (err < 0) {
>  			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
>  			return err;
> @@ -809,8 +809,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>  		goto unlock;
>  	}
>  
> -	snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
> -	snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
> +	stop_endpoints(subs, true);
>  
>  	ret = set_format(subs, subs->cur_audiofmt);
>  	if (ret < 0)
> @@ -850,7 +849,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>  	/* for playback, submit the URBs now; otherwise, the first hwptr_done
>  	 * updates for all URBs would happen at the same time when starting */
>  	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
> -		ret = start_endpoints(subs, true);
> +		return start_endpoints(subs);
>  
>   unlock:
>  	snd_usb_unlock_shutdown(subs->stream->chip);
> @@ -1666,7 +1665,7 @@ static int snd_usb_substream_capture_trigger(struct snd_pcm_substream *substream
>  
>  	switch (cmd) {
>  	case SNDRV_PCM_TRIGGER_START:
> -		err = start_endpoints(subs, false);
> +		err = start_endpoints(subs);
>  		if (err < 0)
>  			return err;

This patch works better, but not the best. It's a bit intrusive.

What we need to synchronize process contexts to interrupt context till all of
queued URBs are handled. This seems to be a main purpose of wait_clear_bits,
while current implementation of snd_complete_urb()/wait_clear_bits() is not
enough.

Thanks for your reporting.

Takashi Sakamoto
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-20  1:25 ` Takashi Sakamoto
@ 2016-12-20 11:47   ` Ioan-Adrian Ratiu
  2016-12-20 13:04     ` Takashi Sakamoto
  0 siblings, 1 reply; 8+ messages in thread
From: Ioan-Adrian Ratiu @ 2016-12-20 11:47 UTC (permalink / raw)
  To: Takashi Sakamoto, clemens, tiwai; +Cc: alsa-devel, linux-kernel

On Tue, 20 Dec 2016, Takashi Sakamoto <o-takashi@sakamocchi.jp> wrote:
> ---
>
> Hi,
>
>> Commit 16200948d835 ("ALSA: usb-audio: Fix race at stopping the stream")
>> fixes a race-codition but it also introduces another really nasty data race
>> regression which makes my usb sound card [1] completely useless, throwing
>> the kernel into a panic if anything from userspace tries to start playback.
>> 
>> The problem is this: ep->data_subs is now set to NULL every time inside
>> wait_clear_urbs(). ep->data_subs is initalized only in one place in
>> start_endpoints(), then it is immediately wiped by the pre-existing call to
>> wait_clear_urbs() inside snd_usb_endpoint_start().
>> 
>> To ilustrate, this is what happens in the non-irq part of the code:
>> 
>> Step 1 (inside start_endpoints): ep->data_subs = subs;
>> Step 2 (inside start_endpoints): snd_usb_endpoint_start(ep, can_sleep);
>> Step 3 (inside snd_usb_endpoint_start): wait_clear_urbs(ep);
>> Step 4 (inside wait_clear_urbs): ep->data_subs = NULL;
>> 
>> Here's a simplified call stack for the IRQ part (full one at the end):
>> 
>> (NULL dereference, param "subs" is passed NULL value of ep->data_subs)
>> retire_playback_urb
>> retire_outbound_urb
>> snd_complete_urb
>> (...)
>> xhci_irq
>> 
>> Looking at the git log there seems to be quite a lot of history in this
>> part of the codebase, dating back to 2012 or earlier. My fix is based on
>> 015618b90 ("ALSA: snd-usb: Fix URB cancellation at stream start") and
>> e9ba389c5 ("ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture
>> stream") with a few modifications of my own. My idea is to do the
>> ep->data_subs wiping before endpoint initialization in a manner similar
>> to these older commits by using stop_endpoints() in snd_usb_pcm_prepare()
>> and at the same time keep the ep->data_subs = NULL in wait_clear_urbs() to
>> not trigger the recently fixed stream stopping race again.
>> 
>> Full call stack:
>> 
>> [  131.093240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>> [  131.094313] IP: retire_playback_urb+0x1b/0x160 [snd_usb_audio]
>> [  131.095046] PGD 0
>> [  131.095047]
>> [  131.096509] Oops: 0000 [#1] PREEMPT SMP
>> [  131.097255] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ctr ccm arc4 ath9k intel_rapl ath9k_common x86_pkg_temp_thermal ath9k_hw intel_powerclamp coretemp joydev mousedev ath kvm_intel mac80211 kvm
>>  input_leds hid_generic psmouse irqbypass usbhid hid crct10dif_pclmul crc32_pclmul serio_raw crc32c_intel atkbd ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcbc aesni_intel libps2 cfg8
>> 0211 aes_x86_64 crypto_simd dell_laptop glue_helper r8169 dell_smbios snd_hda_codec sr_mod cryptd led_class mii rfkill snd_hwdep cdrom snd_hda_core ac dcdbas i8042 xhci_pci serio xhci_hcd snd_pcm tpm_tis battery tpm_tis_core lpc_ich tpm
>> snd_timer evdev shpchp i2c_i801 sch_fq_codel
>> [  131.105551] CPU: 2 PID: 165 Comm: irq/29-xhci_hcd Not tainted 4.9.0-gd824cdc58ba0 #10
>> [  131.107516] Hardware name: Dell Inc. Inspiron 3521/018DYG, BIOS A14 07/31/2015
>> [  131.109592] task: ffff880154a70000 task.stack: ffffc90000f48000
>> [  131.111746] RIP: 0010:retire_playback_urb+0x1b/0x160 [snd_usb_audio]
>> [  131.113899] RSP: 0018:ffffc90000f4bc10 EFLAGS: 00010082
>> [  131.116080] RAX: ffffffffa04cabe0 RBX: 0000000000000000 RCX: 0000000000000000
>> [  131.118284] RDX: 0000000000000000 RSI: ffff8801435a4c00 RDI: 0000000000000000
>> [  131.120505] RBP: ffffc90000f4bc40 R08: 0000000000000001 R09: 0000000000000001
>> [  131.122807] R10: 0000000000000001 R11: ffffffff82f60d6d R12: ffff8801432a0238
>> [  131.125265] R13: ffff8801435a4c00 R14: 0000000000000000 R15: ffff8801535c83b8
>> [  131.127723] FS:  0000000000000000(0000) GS:ffff88015a000000(0000) knlGS:0000000000000000
>> [  131.130217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  131.132568] CR2: 0000000000000010 CR3: 0000000001e0b000 CR4: 00000000001406e0
>> [  131.135007] Call Trace:
>> [  131.137329]  snd_complete_urb+0x190/0x2b0 [snd_usb_audio]
>> [  131.139526]  __usb_hcd_giveback_urb+0x8e/0x160
>> [  131.141948]  usb_hcd_giveback_urb+0x43/0x110
>> [  131.144104]  xhci_giveback_urb_in_irq.isra.22+0x7d/0xb0 [xhci_hcd]
>> [  131.146521]  finish_td.constprop.38+0x1de/0x2f0 [xhci_hcd]
>> [  131.148736]  xhci_irq+0x13a2/0x1ca0 [xhci_hcd]
>> [  131.150972]  ? trace_hardirqs_on+0xd/0x10
>> [  131.153209]  ? _raw_spin_unlock_irq+0x2c/0x50
>> [  131.155390]  ? irq_thread+0xb5/0x1d0
>> [  131.157771]  xhci_msi_irq+0x11/0x20 [xhci_hcd]
>> [  131.159938]  irq_forced_thread_fn+0x2f/0x70
>> [  131.162072]  ? irq_thread+0xb5/0x1d0
>> [  131.164141]  irq_thread+0x149/0x1d0
>> [  131.166132]  ? irq_finalize_oneshot.part.2+0xe0/0xe0
>> [  131.168142]  ? wake_threads_waitq+0x30/0x30
>> [  131.170149]  kthread+0x10f/0x150
>> [  131.172144]  ? irq_thread_dtor+0xc0/0xc0
>> [  131.174139]  ? kthread_create_on_node+0x40/0x40
>> [  131.176116]  ret_from_fork+0x2a/0x40
>> [  131.178074] Code: 8b 77 64 4c 89 e7 e8 e5 fe ff ff eb c3 0f 1f 00 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 <48> 8b 47 10 48 8b 4f 70 48 c7 c7 88 7b 4d a0 4c 8b a0 78 01 00
>> [  131.182382] RIP: retire_playback_urb+0x1b/0x160 [snd_usb_audio] RSP: ffffc90000f4bc10
>> [  131.184562] CR2: 0000000000000010
>> 
>> [1] 041e:3232 Creative Technology SoundBlaster X-FI HD
>> [2] http://mailman.alsa-project.org/pipermail/alsa-devel/2016-December/115425.html
>
> I can regenerate this bug with my EMU 0404 USB. My understanding of this bug is
> quite similar to your perspective. This bug is quite critical. In my
> understanding, we encounter this bug in all of cases in which snd-usb-audio is
> applied for PCM playback substream.
>
> Below workaround can also suppress this bug.
>
> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> index a2cdf33..9b34f76 100644
> --- a/sound/usb/endpoint.c
> +++ b/sound/usb/endpoint.c
> @@ -537,11 +537,14 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
>                         alive, ep->ep_num);
>         clear_bit(EP_FLAG_STOPPING, &ep->flags);
>  
> -       ep->data_subs = NULL;
>         ep->sync_slave = NULL;
>         ep->retire_data_urb = NULL;
>         ep->prepare_data_urb = NULL;
>  
> +       msleep(200);
> +
> +       ep->data_subs = NULL;
> +
>         return 0;
>  }
>
> If initialization of 'struct snd_usb_endpoint.data_subs' can be done after all
> of queued URBs are processed and corresponding complete interrupts are cought,
> we can solve this critical bug.
>
>> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
>> ---
>>  sound/usb/endpoint.c | 11 ++---------
>>  sound/usb/endpoint.h |  2 +-
>>  sound/usb/pcm.c      | 13 ++++++-------
>>  3 files changed, 9 insertions(+), 17 deletions(-)
>> 
>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>> index a2cdf3370afe..4465f324c2c2 100644
>> --- a/sound/usb/endpoint.c
>> +++ b/sound/usb/endpoint.c
>> @@ -920,9 +920,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>  /**
>>   * snd_usb_endpoint_start: start an snd_usb_endpoint
>>   *
>> - * @ep:		the endpoint to start
>> - * @can_sleep:	flag indicating whether the operation is executed in
>> - * 		non-atomic context
>> + * @ep: the endpoint to start
>>   *
>>   * A call to this function will increment the use count of the endpoint.
>>   * In case it is not already running, the URBs for this endpoint will be
>> @@ -932,7 +930,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>   *
>>   * Returns an error if the URB submission failed, 0 in all other cases.
>>   */
>> -int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>> +int snd_usb_endpoint_start(struct snd_usb_endpoint *ep)
>>  {
>>  	int err;
>>  	unsigned int i;
>> @@ -944,11 +942,6 @@ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>>  	if (++ep->use_count != 1)
>>  		return 0;
>>  
>> -	/* just to be sure */
>> -	deactivate_urbs(ep, false);
>> -	if (can_sleep)
>> -		wait_clear_urbs(ep);
>> -
>>  	ep->active_mask = 0;
>>  	ep->unlink_mask = 0;
>>  	ep->phase = 0;
>> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
>> index 6428392d8f62..584f295d7c77 100644
>> --- a/sound/usb/endpoint.h
>> +++ b/sound/usb/endpoint.h
>> @@ -18,7 +18,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>  				struct audioformat *fmt,
>>  				struct snd_usb_endpoint *sync_ep);
>>  
>> -int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep);
>> +int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep);
>>  void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep);
>>  void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
>>  int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
>> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
>> index 34c6d4f2c0b6..db26f767f851 100644
>> --- a/sound/usb/pcm.c
>> +++ b/sound/usb/pcm.c
>> @@ -218,7 +218,7 @@ int snd_usb_init_pitch(struct snd_usb_audio *chip, int iface,
>>  	}
>>  }
>>  
>> -static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>> +static int start_endpoints(struct snd_usb_substream *subs)
>>  {
>>  	int err;
>>  
>> @@ -231,7 +231,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>>  		dev_dbg(&subs->dev->dev, "Starting data EP @%p\n", ep);
>>  
>>  		ep->data_subs = subs;
>> -		err = snd_usb_endpoint_start(ep, can_sleep);
>> +		err = snd_usb_endpoint_start(ep);
>>  		if (err < 0) {
>>  			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
>>  			return err;
>> @@ -260,7 +260,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>>  		dev_dbg(&subs->dev->dev, "Starting sync EP @%p\n", ep);
>>  
>>  		ep->sync_slave = subs->data_endpoint;
>> -		err = snd_usb_endpoint_start(ep, can_sleep);
>> +		err = snd_usb_endpoint_start(ep);
>>  		if (err < 0) {
>>  			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
>>  			return err;
>> @@ -809,8 +809,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>>  		goto unlock;
>>  	}
>>  
>> -	snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
>> -	snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
>> +	stop_endpoints(subs, true);
>>  
>>  	ret = set_format(subs, subs->cur_audiofmt);
>>  	if (ret < 0)
>> @@ -850,7 +849,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>>  	/* for playback, submit the URBs now; otherwise, the first hwptr_done
>>  	 * updates for all URBs would happen at the same time when starting */
>>  	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
>> -		ret = start_endpoints(subs, true);
>> +		return start_endpoints(subs);
>>  
>>   unlock:
>>  	snd_usb_unlock_shutdown(subs->stream->chip);
>> @@ -1666,7 +1665,7 @@ static int snd_usb_substream_capture_trigger(struct snd_pcm_substream *substream
>>  
>>  	switch (cmd) {
>>  	case SNDRV_PCM_TRIGGER_START:
>> -		err = start_endpoints(subs, false);
>> +		err = start_endpoints(subs);
>>  		if (err < 0)
>>  			return err;
>
> This patch works better, but not the best. It's a bit intrusive.

I disagree. Being intrusive is not a good reason to reject a patch,
especially if the alternative is to insert random magic number delays
in hopes of synchronizing process and irq contexts to avoid critical
errors.

Please take the time to fully analyze my patch and let's have a
discussion based on it, not reject it outright and replace it with
a quick and dirty delay hack.

Ionel

>
> What we need to synchronize process contexts to interrupt context till all of
> queued URBs are handled. This seems to be a main purpose of wait_clear_bits,
> while current implementation of snd_complete_urb()/wait_clear_bits() is not
> enough.
>
> Thanks for your reporting.
>
> Takashi Sakamoto
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-20 11:47   ` Ioan-Adrian Ratiu
@ 2016-12-20 13:04     ` Takashi Sakamoto
  2016-12-21 10:17       ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Sakamoto @ 2016-12-20 13:04 UTC (permalink / raw)
  To: Ioan-Adrian Ratiu, clemens, tiwai; +Cc: alsa-devel, linux-kernel

Hi,

On Dec 20 2016 20:47, Ioan-Adrian Ratiu wrote:
> On Tue, 20 Dec 2016, Takashi Sakamoto <o-takashi@sakamocchi.jp> wrote:
>> ---
>>
>> Hi,
>>
>>> Commit 16200948d835 ("ALSA: usb-audio: Fix race at stopping the stream")
>>> fixes a race-codition but it also introduces another really nasty data race
>>> regression which makes my usb sound card [1] completely useless, throwing
>>> the kernel into a panic if anything from userspace tries to start playback.
>>>
>>> The problem is this: ep->data_subs is now set to NULL every time inside
>>> wait_clear_urbs(). ep->data_subs is initalized only in one place in
>>> start_endpoints(), then it is immediately wiped by the pre-existing call to
>>> wait_clear_urbs() inside snd_usb_endpoint_start().
>>>
>>> To ilustrate, this is what happens in the non-irq part of the code:
>>>
>>> Step 1 (inside start_endpoints): ep->data_subs = subs;
>>> Step 2 (inside start_endpoints): snd_usb_endpoint_start(ep, can_sleep);
>>> Step 3 (inside snd_usb_endpoint_start): wait_clear_urbs(ep);
>>> Step 4 (inside wait_clear_urbs): ep->data_subs = NULL;
>>>
>>> Here's a simplified call stack for the IRQ part (full one at the end):
>>>
>>> (NULL dereference, param "subs" is passed NULL value of ep->data_subs)
>>> retire_playback_urb
>>> retire_outbound_urb
>>> snd_complete_urb
>>> (...)
>>> xhci_irq
>>>
>>> Looking at the git log there seems to be quite a lot of history in this
>>> part of the codebase, dating back to 2012 or earlier. My fix is based on
>>> 015618b90 ("ALSA: snd-usb: Fix URB cancellation at stream start") and
>>> e9ba389c5 ("ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture
>>> stream") with a few modifications of my own. My idea is to do the
>>> ep->data_subs wiping before endpoint initialization in a manner similar
>>> to these older commits by using stop_endpoints() in snd_usb_pcm_prepare()
>>> and at the same time keep the ep->data_subs = NULL in wait_clear_urbs() to
>>> not trigger the recently fixed stream stopping race again.
>>>
>>> Full call stack:
>>>
>>> [  131.093240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>>> [  131.094313] IP: retire_playback_urb+0x1b/0x160 [snd_usb_audio]
>>> [  131.095046] PGD 0
>>> [  131.095047]
>>> [  131.096509] Oops: 0000 [#1] PREEMPT SMP
>>> [  131.097255] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ctr ccm arc4 ath9k intel_rapl ath9k_common x86_pkg_temp_thermal ath9k_hw intel_powerclamp coretemp joydev mousedev ath kvm_intel mac80211 kvm
>>>  input_leds hid_generic psmouse irqbypass usbhid hid crct10dif_pclmul crc32_pclmul serio_raw crc32c_intel atkbd ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcbc aesni_intel libps2 cfg8
>>> 0211 aes_x86_64 crypto_simd dell_laptop glue_helper r8169 dell_smbios snd_hda_codec sr_mod cryptd led_class mii rfkill snd_hwdep cdrom snd_hda_core ac dcdbas i8042 xhci_pci serio xhci_hcd snd_pcm tpm_tis battery tpm_tis_core lpc_ich tpm
>>> snd_timer evdev shpchp i2c_i801 sch_fq_codel
>>> [  131.105551] CPU: 2 PID: 165 Comm: irq/29-xhci_hcd Not tainted 4.9.0-gd824cdc58ba0 #10
>>> [  131.107516] Hardware name: Dell Inc. Inspiron 3521/018DYG, BIOS A14 07/31/2015
>>> [  131.109592] task: ffff880154a70000 task.stack: ffffc90000f48000
>>> [  131.111746] RIP: 0010:retire_playback_urb+0x1b/0x160 [snd_usb_audio]
>>> [  131.113899] RSP: 0018:ffffc90000f4bc10 EFLAGS: 00010082
>>> [  131.116080] RAX: ffffffffa04cabe0 RBX: 0000000000000000 RCX: 0000000000000000
>>> [  131.118284] RDX: 0000000000000000 RSI: ffff8801435a4c00 RDI: 0000000000000000
>>> [  131.120505] RBP: ffffc90000f4bc40 R08: 0000000000000001 R09: 0000000000000001
>>> [  131.122807] R10: 0000000000000001 R11: ffffffff82f60d6d R12: ffff8801432a0238
>>> [  131.125265] R13: ffff8801435a4c00 R14: 0000000000000000 R15: ffff8801535c83b8
>>> [  131.127723] FS:  0000000000000000(0000) GS:ffff88015a000000(0000) knlGS:0000000000000000
>>> [  131.130217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  131.132568] CR2: 0000000000000010 CR3: 0000000001e0b000 CR4: 00000000001406e0
>>> [  131.135007] Call Trace:
>>> [  131.137329]  snd_complete_urb+0x190/0x2b0 [snd_usb_audio]
>>> [  131.139526]  __usb_hcd_giveback_urb+0x8e/0x160
>>> [  131.141948]  usb_hcd_giveback_urb+0x43/0x110
>>> [  131.144104]  xhci_giveback_urb_in_irq.isra.22+0x7d/0xb0 [xhci_hcd]
>>> [  131.146521]  finish_td.constprop.38+0x1de/0x2f0 [xhci_hcd]
>>> [  131.148736]  xhci_irq+0x13a2/0x1ca0 [xhci_hcd]
>>> [  131.150972]  ? trace_hardirqs_on+0xd/0x10
>>> [  131.153209]  ? _raw_spin_unlock_irq+0x2c/0x50
>>> [  131.155390]  ? irq_thread+0xb5/0x1d0
>>> [  131.157771]  xhci_msi_irq+0x11/0x20 [xhci_hcd]
>>> [  131.159938]  irq_forced_thread_fn+0x2f/0x70
>>> [  131.162072]  ? irq_thread+0xb5/0x1d0
>>> [  131.164141]  irq_thread+0x149/0x1d0
>>> [  131.166132]  ? irq_finalize_oneshot.part.2+0xe0/0xe0
>>> [  131.168142]  ? wake_threads_waitq+0x30/0x30
>>> [  131.170149]  kthread+0x10f/0x150
>>> [  131.172144]  ? irq_thread_dtor+0xc0/0xc0
>>> [  131.174139]  ? kthread_create_on_node+0x40/0x40
>>> [  131.176116]  ret_from_fork+0x2a/0x40
>>> [  131.178074] Code: 8b 77 64 4c 89 e7 e8 e5 fe ff ff eb c3 0f 1f 00 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 <48> 8b 47 10 48 8b 4f 70 48 c7 c7 88 7b 4d a0 4c 8b a0 78 01 00
>>> [  131.182382] RIP: retire_playback_urb+0x1b/0x160 [snd_usb_audio] RSP: ffffc90000f4bc10
>>> [  131.184562] CR2: 0000000000000010
>>>
>>> [1] 041e:3232 Creative Technology SoundBlaster X-FI HD
>>> [2] http://mailman.alsa-project.org/pipermail/alsa-devel/2016-December/115425.html
>>
>> I can regenerate this bug with my EMU 0404 USB. My understanding of this bug is
>> quite similar to your perspective. This bug is quite critical. In my
>> understanding, we encounter this bug in all of cases in which snd-usb-audio is
>> applied for PCM playback substream.
>>
>> Below workaround can also suppress this bug.
>>
>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>> index a2cdf33..9b34f76 100644
>> --- a/sound/usb/endpoint.c
>> +++ b/sound/usb/endpoint.c
>> @@ -537,11 +537,14 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
>>                         alive, ep->ep_num);
>>         clear_bit(EP_FLAG_STOPPING, &ep->flags);
>>
>> -       ep->data_subs = NULL;
>>         ep->sync_slave = NULL;
>>         ep->retire_data_urb = NULL;
>>         ep->prepare_data_urb = NULL;
>>
>> +       msleep(200);
>> +
>> +       ep->data_subs = NULL;
>> +
>>         return 0;
>>  }
>>
>> If initialization of 'struct snd_usb_endpoint.data_subs' can be done after all
>> of queued URBs are processed and corresponding complete interrupts are cought,
>> we can solve this critical bug.
>>
>>> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
>>> ---
>>>  sound/usb/endpoint.c | 11 ++---------
>>>  sound/usb/endpoint.h |  2 +-
>>>  sound/usb/pcm.c      | 13 ++++++-------
>>>  3 files changed, 9 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>>> index a2cdf3370afe..4465f324c2c2 100644
>>> --- a/sound/usb/endpoint.c
>>> +++ b/sound/usb/endpoint.c
>>> @@ -920,9 +920,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>>  /**
>>>   * snd_usb_endpoint_start: start an snd_usb_endpoint
>>>   *
>>> - * @ep:		the endpoint to start
>>> - * @can_sleep:	flag indicating whether the operation is executed in
>>> - * 		non-atomic context
>>> + * @ep: the endpoint to start
>>>   *
>>>   * A call to this function will increment the use count of the endpoint.
>>>   * In case it is not already running, the URBs for this endpoint will be
>>> @@ -932,7 +930,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>>   *
>>>   * Returns an error if the URB submission failed, 0 in all other cases.
>>>   */
>>> -int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>>> +int snd_usb_endpoint_start(struct snd_usb_endpoint *ep)
>>>  {
>>>  	int err;
>>>  	unsigned int i;
>>> @@ -944,11 +942,6 @@ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>>>  	if (++ep->use_count != 1)
>>>  		return 0;
>>>
>>> -	/* just to be sure */
>>> -	deactivate_urbs(ep, false);
>>> -	if (can_sleep)
>>> -		wait_clear_urbs(ep);
>>> -
>>>  	ep->active_mask = 0;
>>>  	ep->unlink_mask = 0;
>>>  	ep->phase = 0;
>>> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
>>> index 6428392d8f62..584f295d7c77 100644
>>> --- a/sound/usb/endpoint.h
>>> +++ b/sound/usb/endpoint.h
>>> @@ -18,7 +18,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>>  				struct audioformat *fmt,
>>>  				struct snd_usb_endpoint *sync_ep);
>>>
>>> -int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep);
>>> +int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep);
>>>  void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep);
>>>  void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
>>>  int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
>>> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
>>> index 34c6d4f2c0b6..db26f767f851 100644
>>> --- a/sound/usb/pcm.c
>>> +++ b/sound/usb/pcm.c
>>> @@ -218,7 +218,7 @@ int snd_usb_init_pitch(struct snd_usb_audio *chip, int iface,
>>>  	}
>>>  }
>>>
>>> -static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>>> +static int start_endpoints(struct snd_usb_substream *subs)
>>>  {
>>>  	int err;
>>>
>>> @@ -231,7 +231,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>>>  		dev_dbg(&subs->dev->dev, "Starting data EP @%p\n", ep);
>>>
>>>  		ep->data_subs = subs;
>>> -		err = snd_usb_endpoint_start(ep, can_sleep);
>>> +		err = snd_usb_endpoint_start(ep);
>>>  		if (err < 0) {
>>>  			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
>>>  			return err;
>>> @@ -260,7 +260,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>>>  		dev_dbg(&subs->dev->dev, "Starting sync EP @%p\n", ep);
>>>
>>>  		ep->sync_slave = subs->data_endpoint;
>>> -		err = snd_usb_endpoint_start(ep, can_sleep);
>>> +		err = snd_usb_endpoint_start(ep);
>>>  		if (err < 0) {
>>>  			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
>>>  			return err;
>>> @@ -809,8 +809,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>>>  		goto unlock;
>>>  	}
>>>
>>> -	snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
>>> -	snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
>>> +	stop_endpoints(subs, true);
>>>
>>>  	ret = set_format(subs, subs->cur_audiofmt);
>>>  	if (ret < 0)
>>> @@ -850,7 +849,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>>>  	/* for playback, submit the URBs now; otherwise, the first hwptr_done
>>>  	 * updates for all URBs would happen at the same time when starting */
>>>  	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
>>> -		ret = start_endpoints(subs, true);
>>> +		return start_endpoints(subs);
>>>
>>>   unlock:
>>>  	snd_usb_unlock_shutdown(subs->stream->chip);
>>> @@ -1666,7 +1665,7 @@ static int snd_usb_substream_capture_trigger(struct snd_pcm_substream *substream
>>>
>>>  	switch (cmd) {
>>>  	case SNDRV_PCM_TRIGGER_START:
>>> -		err = start_endpoints(subs, false);
>>> +		err = start_endpoints(subs);
>>>  		if (err < 0)
>>>  			return err;
>>
>> This patch works better, but not the best. It's a bit intrusive.
>
> I disagree. Being intrusive is not a good reason to reject a patch,
> especially if the alternative is to insert random magic number delays
> in hopes of synchronizing process and irq contexts to avoid critical
> errors.

I apologize if you got some negative opinions in my previous message.

My intention to show the ugly workaround is not to merge it instead of 
yours, but clear what causes this bug.

For example, we can also apply below patch for this bug, to solve it
roughly.

diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
index a2cdf33..6ec4eb1 100644
--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -162,7 +162,7 @@ int snd_usb_endpoint_next_packet_size(struct 
snd_usb_endpoint *ep)
  static void retire_outbound_urb(struct snd_usb_endpoint *ep,
                                 struct snd_urb_ctx *urb_ctx)
  {
-       if (ep->retire_data_urb)
+       if (ep->retire_data_urb && ep->data_subs)
                 ep->retire_data_urb(ep->data_subs, urb_ctx->urb);
  }

But this is just a workaround and can't fix it thoroughly. The main 
issue is synchronization between interrupt/process contexts.

> Please take the time to fully analyze my patch and let's have a
> discussion based on it, not reject it outright and replace it with
> a quick and dirty delay hack.

OK. I'll deliberately check it again so that I have no overlooks. I with 
this thread also catch the other developers enough helpful to you. (I 
just eventually caught your patch in LKML and not developer for this 
category of devices.)

> Ionel
>
>>
>> What we need to synchronize process contexts to interrupt context till all of
>> queued URBs are handled. This seems to be a main purpose of wait_clear_bits,
>> while current implementation of snd_complete_urb()/wait_clear_bits() is not
>> enough.
>>
>> Thanks for your reporting.
>>
>> Takashi Sakamoto
>> --
>> 2.9.3


Thanks

Takashi Sakamoto

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-20 13:04     ` Takashi Sakamoto
@ 2016-12-21 10:17       ` Takashi Iwai
  2016-12-21 10:38         ` Ioan-Adrian Ratiu
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2016-12-21 10:17 UTC (permalink / raw)
  To: Takashi Sakamoto; +Cc: Ioan-Adrian Ratiu, clemens, alsa-devel, linux-kernel

On Tue, 20 Dec 2016 14:04:56 +0100,
Takashi Sakamoto wrote:
> 
> Hi,
> 
> On Dec 20 2016 20:47, Ioan-Adrian Ratiu wrote:
> > On Tue, 20 Dec 2016, Takashi Sakamoto <o-takashi@sakamocchi.jp> wrote:
> >> ---
> >>
> >> Hi,
> >>
> >>> Commit 16200948d835 ("ALSA: usb-audio: Fix race at stopping the stream")
> >>> fixes a race-codition but it also introduces another really nasty data race
> >>> regression which makes my usb sound card [1] completely useless, throwing
> >>> the kernel into a panic if anything from userspace tries to start playback.
> >>>
> >>> The problem is this: ep->data_subs is now set to NULL every time inside
> >>> wait_clear_urbs(). ep->data_subs is initalized only in one place in
> >>> start_endpoints(), then it is immediately wiped by the pre-existing call to
> >>> wait_clear_urbs() inside snd_usb_endpoint_start().
> >>>
> >>> To ilustrate, this is what happens in the non-irq part of the code:
> >>>
> >>> Step 1 (inside start_endpoints): ep->data_subs = subs;
> >>> Step 2 (inside start_endpoints): snd_usb_endpoint_start(ep, can_sleep);
> >>> Step 3 (inside snd_usb_endpoint_start): wait_clear_urbs(ep);
> >>> Step 4 (inside wait_clear_urbs): ep->data_subs = NULL;
> >>>
> >>> Here's a simplified call stack for the IRQ part (full one at the end):
> >>>
> >>> (NULL dereference, param "subs" is passed NULL value of ep->data_subs)
> >>> retire_playback_urb
> >>> retire_outbound_urb
> >>> snd_complete_urb
> >>> (...)
> >>> xhci_irq
> >>>
> >>> Looking at the git log there seems to be quite a lot of history in this
> >>> part of the codebase, dating back to 2012 or earlier. My fix is based on
> >>> 015618b90 ("ALSA: snd-usb: Fix URB cancellation at stream start") and
> >>> e9ba389c5 ("ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture
> >>> stream") with a few modifications of my own. My idea is to do the
> >>> ep->data_subs wiping before endpoint initialization in a manner similar
> >>> to these older commits by using stop_endpoints() in snd_usb_pcm_prepare()
> >>> and at the same time keep the ep->data_subs = NULL in wait_clear_urbs() to
> >>> not trigger the recently fixed stream stopping race again.
> >>>
> >>> Full call stack:
> >>>
> >>> [  131.093240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> >>> [  131.094313] IP: retire_playback_urb+0x1b/0x160 [snd_usb_audio]
> >>> [  131.095046] PGD 0
> >>> [  131.095047]
> >>> [  131.096509] Oops: 0000 [#1] PREEMPT SMP
> >>> [  131.097255] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ctr ccm arc4 ath9k intel_rapl ath9k_common x86_pkg_temp_thermal ath9k_hw intel_powerclamp coretemp joydev mousedev ath kvm_intel mac80211 kvm
> >>>  input_leds hid_generic psmouse irqbypass usbhid hid crct10dif_pclmul crc32_pclmul serio_raw crc32c_intel atkbd ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcbc aesni_intel libps2 cfg8
> >>> 0211 aes_x86_64 crypto_simd dell_laptop glue_helper r8169 dell_smbios snd_hda_codec sr_mod cryptd led_class mii rfkill snd_hwdep cdrom snd_hda_core ac dcdbas i8042 xhci_pci serio xhci_hcd snd_pcm tpm_tis battery tpm_tis_core lpc_ich tpm
> >>> snd_timer evdev shpchp i2c_i801 sch_fq_codel
> >>> [  131.105551] CPU: 2 PID: 165 Comm: irq/29-xhci_hcd Not tainted 4.9.0-gd824cdc58ba0 #10
> >>> [  131.107516] Hardware name: Dell Inc. Inspiron 3521/018DYG, BIOS A14 07/31/2015
> >>> [  131.109592] task: ffff880154a70000 task.stack: ffffc90000f48000
> >>> [  131.111746] RIP: 0010:retire_playback_urb+0x1b/0x160 [snd_usb_audio]
> >>> [  131.113899] RSP: 0018:ffffc90000f4bc10 EFLAGS: 00010082
> >>> [  131.116080] RAX: ffffffffa04cabe0 RBX: 0000000000000000 RCX: 0000000000000000
> >>> [  131.118284] RDX: 0000000000000000 RSI: ffff8801435a4c00 RDI: 0000000000000000
> >>> [  131.120505] RBP: ffffc90000f4bc40 R08: 0000000000000001 R09: 0000000000000001
> >>> [  131.122807] R10: 0000000000000001 R11: ffffffff82f60d6d R12: ffff8801432a0238
> >>> [  131.125265] R13: ffff8801435a4c00 R14: 0000000000000000 R15: ffff8801535c83b8
> >>> [  131.127723] FS:  0000000000000000(0000) GS:ffff88015a000000(0000) knlGS:0000000000000000
> >>> [  131.130217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [  131.132568] CR2: 0000000000000010 CR3: 0000000001e0b000 CR4: 00000000001406e0
> >>> [  131.135007] Call Trace:
> >>> [  131.137329]  snd_complete_urb+0x190/0x2b0 [snd_usb_audio]
> >>> [  131.139526]  __usb_hcd_giveback_urb+0x8e/0x160
> >>> [  131.141948]  usb_hcd_giveback_urb+0x43/0x110
> >>> [  131.144104]  xhci_giveback_urb_in_irq.isra.22+0x7d/0xb0 [xhci_hcd]
> >>> [  131.146521]  finish_td.constprop.38+0x1de/0x2f0 [xhci_hcd]
> >>> [  131.148736]  xhci_irq+0x13a2/0x1ca0 [xhci_hcd]
> >>> [  131.150972]  ? trace_hardirqs_on+0xd/0x10
> >>> [  131.153209]  ? _raw_spin_unlock_irq+0x2c/0x50
> >>> [  131.155390]  ? irq_thread+0xb5/0x1d0
> >>> [  131.157771]  xhci_msi_irq+0x11/0x20 [xhci_hcd]
> >>> [  131.159938]  irq_forced_thread_fn+0x2f/0x70
> >>> [  131.162072]  ? irq_thread+0xb5/0x1d0
> >>> [  131.164141]  irq_thread+0x149/0x1d0
> >>> [  131.166132]  ? irq_finalize_oneshot.part.2+0xe0/0xe0
> >>> [  131.168142]  ? wake_threads_waitq+0x30/0x30
> >>> [  131.170149]  kthread+0x10f/0x150
> >>> [  131.172144]  ? irq_thread_dtor+0xc0/0xc0
> >>> [  131.174139]  ? kthread_create_on_node+0x40/0x40
> >>> [  131.176116]  ret_from_fork+0x2a/0x40
> >>> [  131.178074] Code: 8b 77 64 4c 89 e7 e8 e5 fe ff ff eb c3 0f 1f 00 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 <48> 8b 47 10 48 8b 4f 70 48 c7 c7 88 7b 4d a0 4c 8b a0 78 01 00
> >>> [  131.182382] RIP: retire_playback_urb+0x1b/0x160 [snd_usb_audio] RSP: ffffc90000f4bc10
> >>> [  131.184562] CR2: 0000000000000010
> >>>
> >>> [1] 041e:3232 Creative Technology SoundBlaster X-FI HD
> >>> [2] http://mailman.alsa-project.org/pipermail/alsa-devel/2016-December/115425.html
> >>
> >> I can regenerate this bug with my EMU 0404 USB. My understanding of this bug is
> >> quite similar to your perspective. This bug is quite critical. In my
> >> understanding, we encounter this bug in all of cases in which snd-usb-audio is
> >> applied for PCM playback substream.
> >>
> >> Below workaround can also suppress this bug.
> >>
> >> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> >> index a2cdf33..9b34f76 100644
> >> --- a/sound/usb/endpoint.c
> >> +++ b/sound/usb/endpoint.c
> >> @@ -537,11 +537,14 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
> >>                         alive, ep->ep_num);
> >>         clear_bit(EP_FLAG_STOPPING, &ep->flags);
> >>
> >> -       ep->data_subs = NULL;
> >>         ep->sync_slave = NULL;
> >>         ep->retire_data_urb = NULL;
> >>         ep->prepare_data_urb = NULL;
> >>
> >> +       msleep(200);
> >> +
> >> +       ep->data_subs = NULL;
> >> +
> >>         return 0;
> >>  }
> >>
> >> If initialization of 'struct snd_usb_endpoint.data_subs' can be done after all
> >> of queued URBs are processed and corresponding complete interrupts are cought,
> >> we can solve this critical bug.
> >>
> >>> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
> >>> ---
> >>>  sound/usb/endpoint.c | 11 ++---------
> >>>  sound/usb/endpoint.h |  2 +-
> >>>  sound/usb/pcm.c      | 13 ++++++-------
> >>>  3 files changed, 9 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> >>> index a2cdf3370afe..4465f324c2c2 100644
> >>> --- a/sound/usb/endpoint.c
> >>> +++ b/sound/usb/endpoint.c
> >>> @@ -920,9 +920,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
> >>>  /**
> >>>   * snd_usb_endpoint_start: start an snd_usb_endpoint
> >>>   *
> >>> - * @ep:		the endpoint to start
> >>> - * @can_sleep:	flag indicating whether the operation is executed in
> >>> - * 		non-atomic context
> >>> + * @ep: the endpoint to start
> >>>   *
> >>>   * A call to this function will increment the use count of the endpoint.
> >>>   * In case it is not already running, the URBs for this endpoint will be
> >>> @@ -932,7 +930,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
> >>>   *
> >>>   * Returns an error if the URB submission failed, 0 in all other cases.
> >>>   */
> >>> -int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
> >>> +int snd_usb_endpoint_start(struct snd_usb_endpoint *ep)
> >>>  {
> >>>  	int err;
> >>>  	unsigned int i;
> >>> @@ -944,11 +942,6 @@ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
> >>>  	if (++ep->use_count != 1)
> >>>  		return 0;
> >>>
> >>> -	/* just to be sure */
> >>> -	deactivate_urbs(ep, false);
> >>> -	if (can_sleep)
> >>> -		wait_clear_urbs(ep);
> >>> -
> >>>  	ep->active_mask = 0;
> >>>  	ep->unlink_mask = 0;
> >>>  	ep->phase = 0;
> >>> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
> >>> index 6428392d8f62..584f295d7c77 100644
> >>> --- a/sound/usb/endpoint.h
> >>> +++ b/sound/usb/endpoint.h
> >>> @@ -18,7 +18,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
> >>>  				struct audioformat *fmt,
> >>>  				struct snd_usb_endpoint *sync_ep);
> >>>
> >>> -int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep);
> >>> +int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep);
> >>>  void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep);
> >>>  void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
> >>>  int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
> >>> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
> >>> index 34c6d4f2c0b6..db26f767f851 100644
> >>> --- a/sound/usb/pcm.c
> >>> +++ b/sound/usb/pcm.c
> >>> @@ -218,7 +218,7 @@ int snd_usb_init_pitch(struct snd_usb_audio *chip, int iface,
> >>>  	}
> >>>  }
> >>>
> >>> -static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
> >>> +static int start_endpoints(struct snd_usb_substream *subs)
> >>>  {
> >>>  	int err;
> >>>
> >>> @@ -231,7 +231,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
> >>>  		dev_dbg(&subs->dev->dev, "Starting data EP @%p\n", ep);
> >>>
> >>>  		ep->data_subs = subs;
> >>> -		err = snd_usb_endpoint_start(ep, can_sleep);
> >>> +		err = snd_usb_endpoint_start(ep);
> >>>  		if (err < 0) {
> >>>  			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
> >>>  			return err;
> >>> @@ -260,7 +260,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
> >>>  		dev_dbg(&subs->dev->dev, "Starting sync EP @%p\n", ep);
> >>>
> >>>  		ep->sync_slave = subs->data_endpoint;
> >>> -		err = snd_usb_endpoint_start(ep, can_sleep);
> >>> +		err = snd_usb_endpoint_start(ep);
> >>>  		if (err < 0) {
> >>>  			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
> >>>  			return err;
> >>> @@ -809,8 +809,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
> >>>  		goto unlock;
> >>>  	}
> >>>
> >>> -	snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
> >>> -	snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
> >>> +	stop_endpoints(subs, true);
> >>>
> >>>  	ret = set_format(subs, subs->cur_audiofmt);
> >>>  	if (ret < 0)
> >>> @@ -850,7 +849,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
> >>>  	/* for playback, submit the URBs now; otherwise, the first hwptr_done
> >>>  	 * updates for all URBs would happen at the same time when starting */
> >>>  	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
> >>> -		ret = start_endpoints(subs, true);
> >>> +		return start_endpoints(subs);
> >>>
> >>>   unlock:
> >>>  	snd_usb_unlock_shutdown(subs->stream->chip);
> >>> @@ -1666,7 +1665,7 @@ static int snd_usb_substream_capture_trigger(struct snd_pcm_substream *substream
> >>>
> >>>  	switch (cmd) {
> >>>  	case SNDRV_PCM_TRIGGER_START:
> >>> -		err = start_endpoints(subs, false);
> >>> +		err = start_endpoints(subs);
> >>>  		if (err < 0)
> >>>  			return err;
> >>
> >> This patch works better, but not the best. It's a bit intrusive.
> >
> > I disagree. Being intrusive is not a good reason to reject a patch,
> > especially if the alternative is to insert random magic number delays
> > in hopes of synchronizing process and irq contexts to avoid critical
> > errors.
> 
> I apologize if you got some negative opinions in my previous message.
> 
> My intention to show the ugly workaround is not to merge it instead of
> yours, but clear what causes this bug.
> 
> For example, we can also apply below patch for this bug, to solve it
> roughly.
> 
> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> index a2cdf33..6ec4eb1 100644
> --- a/sound/usb/endpoint.c
> +++ b/sound/usb/endpoint.c
> @@ -162,7 +162,7 @@ int snd_usb_endpoint_next_packet_size(struct
> snd_usb_endpoint *ep)
>  static void retire_outbound_urb(struct snd_usb_endpoint *ep,
>                                 struct snd_urb_ctx *urb_ctx)
>  {
> -       if (ep->retire_data_urb)
> +       if (ep->retire_data_urb && ep->data_subs)
>                 ep->retire_data_urb(ep->data_subs, urb_ctx->urb);
>  }
> 
> But this is just a workaround and can't fix it thoroughly. The main
> issue is synchronization between interrupt/process contexts.
> 
> > Please take the time to fully analyze my patch and let's have a
> > discussion based on it, not reject it outright and replace it with
> > a quick and dirty delay hack.
> 
> OK. I'll deliberately check it again so that I have no overlooks. I
> with this thread also catch the other developers enough helpful to
> you. (I just eventually caught your patch in LKML and not developer
> for this category of devices.)

Sorry for the late reply, as I've been (still) off and had bad net
connections.

About your fix: Sakamoto-san is right, it's no good way to fix this
kind or problem.  The easiest option right now is just to revert my
previous fix, as it obviously introduces another regression.  The
correct fix will be taken after that.

I'm going to prepare a revert patch and ask Linus to take it before
rc1.


thanks,

Takashi



> 
> > Ionel
> >
> >>
> >> What we need to synchronize process contexts to interrupt context till all of
> >> queued URBs are handled. This seems to be a main purpose of wait_clear_bits,
> >> while current implementation of snd_complete_urb()/wait_clear_bits() is not
> >> enough.
> >>
> >> Thanks for your reporting.
> >>
> >> Takashi Sakamoto
> >> --
> >> 2.9.3
> 
> 
> Thanks
> 
> Takashi Sakamoto
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-21 10:17       ` Takashi Iwai
@ 2016-12-21 10:38         ` Ioan-Adrian Ratiu
  2016-12-30 14:39           ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Ioan-Adrian Ratiu @ 2016-12-21 10:38 UTC (permalink / raw)
  To: Takashi Iwai, Takashi Sakamoto; +Cc: clemens, alsa-devel, linux-kernel

Hi

On Wed, 21 Dec 2016, Takashi Iwai <tiwai@suse.de> wrote:
> On Tue, 20 Dec 2016 14:04:56 +0100,
> Takashi Sakamoto wrote:
>> 
>> Hi,
>> 
>> On Dec 20 2016 20:47, Ioan-Adrian Ratiu wrote:
>> > On Tue, 20 Dec 2016, Takashi Sakamoto <o-takashi@sakamocchi.jp> wrote:
>> >> ---
>> >>
>> >> Hi,
>> >>
>> >>> Commit 16200948d835 ("ALSA: usb-audio: Fix race at stopping the stream")
>> >>> fixes a race-codition but it also introduces another really nasty data race
>> >>> regression which makes my usb sound card [1] completely useless, throwing
>> >>> the kernel into a panic if anything from userspace tries to start playback.
>> >>>
>> >>> The problem is this: ep->data_subs is now set to NULL every time inside
>> >>> wait_clear_urbs(). ep->data_subs is initalized only in one place in
>> >>> start_endpoints(), then it is immediately wiped by the pre-existing call to
>> >>> wait_clear_urbs() inside snd_usb_endpoint_start().
>> >>>
>> >>> To ilustrate, this is what happens in the non-irq part of the code:
>> >>>
>> >>> Step 1 (inside start_endpoints): ep->data_subs = subs;
>> >>> Step 2 (inside start_endpoints): snd_usb_endpoint_start(ep, can_sleep);
>> >>> Step 3 (inside snd_usb_endpoint_start): wait_clear_urbs(ep);
>> >>> Step 4 (inside wait_clear_urbs): ep->data_subs = NULL;
>> >>>
>> >>> Here's a simplified call stack for the IRQ part (full one at the end):
>> >>>
>> >>> (NULL dereference, param "subs" is passed NULL value of ep->data_subs)
>> >>> retire_playback_urb
>> >>> retire_outbound_urb
>> >>> snd_complete_urb
>> >>> (...)
>> >>> xhci_irq
>> >>>
>> >>> Looking at the git log there seems to be quite a lot of history in this
>> >>> part of the codebase, dating back to 2012 or earlier. My fix is based on
>> >>> 015618b90 ("ALSA: snd-usb: Fix URB cancellation at stream start") and
>> >>> e9ba389c5 ("ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture
>> >>> stream") with a few modifications of my own. My idea is to do the
>> >>> ep->data_subs wiping before endpoint initialization in a manner similar
>> >>> to these older commits by using stop_endpoints() in snd_usb_pcm_prepare()
>> >>> and at the same time keep the ep->data_subs = NULL in wait_clear_urbs() to
>> >>> not trigger the recently fixed stream stopping race again.
>> >>>
>> >>> Full call stack:
>> >>>
>> >>> [  131.093240] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>> >>> [  131.094313] IP: retire_playback_urb+0x1b/0x160 [snd_usb_audio]
>> >>> [  131.095046] PGD 0
>> >>> [  131.095047]
>> >>> [  131.096509] Oops: 0000 [#1] PREEMPT SMP
>> >>> [  131.097255] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device ctr ccm arc4 ath9k intel_rapl ath9k_common x86_pkg_temp_thermal ath9k_hw intel_powerclamp coretemp joydev mousedev ath kvm_intel mac80211 kvm
>> >>>  input_leds hid_generic psmouse irqbypass usbhid hid crct10dif_pclmul crc32_pclmul serio_raw crc32c_intel atkbd ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcbc aesni_intel libps2 cfg8
>> >>> 0211 aes_x86_64 crypto_simd dell_laptop glue_helper r8169 dell_smbios snd_hda_codec sr_mod cryptd led_class mii rfkill snd_hwdep cdrom snd_hda_core ac dcdbas i8042 xhci_pci serio xhci_hcd snd_pcm tpm_tis battery tpm_tis_core lpc_ich tpm
>> >>> snd_timer evdev shpchp i2c_i801 sch_fq_codel
>> >>> [  131.105551] CPU: 2 PID: 165 Comm: irq/29-xhci_hcd Not tainted 4.9.0-gd824cdc58ba0 #10
>> >>> [  131.107516] Hardware name: Dell Inc. Inspiron 3521/018DYG, BIOS A14 07/31/2015
>> >>> [  131.109592] task: ffff880154a70000 task.stack: ffffc90000f48000
>> >>> [  131.111746] RIP: 0010:retire_playback_urb+0x1b/0x160 [snd_usb_audio]
>> >>> [  131.113899] RSP: 0018:ffffc90000f4bc10 EFLAGS: 00010082
>> >>> [  131.116080] RAX: ffffffffa04cabe0 RBX: 0000000000000000 RCX: 0000000000000000
>> >>> [  131.118284] RDX: 0000000000000000 RSI: ffff8801435a4c00 RDI: 0000000000000000
>> >>> [  131.120505] RBP: ffffc90000f4bc40 R08: 0000000000000001 R09: 0000000000000001
>> >>> [  131.122807] R10: 0000000000000001 R11: ffffffff82f60d6d R12: ffff8801432a0238
>> >>> [  131.125265] R13: ffff8801435a4c00 R14: 0000000000000000 R15: ffff8801535c83b8
>> >>> [  131.127723] FS:  0000000000000000(0000) GS:ffff88015a000000(0000) knlGS:0000000000000000
>> >>> [  131.130217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >>> [  131.132568] CR2: 0000000000000010 CR3: 0000000001e0b000 CR4: 00000000001406e0
>> >>> [  131.135007] Call Trace:
>> >>> [  131.137329]  snd_complete_urb+0x190/0x2b0 [snd_usb_audio]
>> >>> [  131.139526]  __usb_hcd_giveback_urb+0x8e/0x160
>> >>> [  131.141948]  usb_hcd_giveback_urb+0x43/0x110
>> >>> [  131.144104]  xhci_giveback_urb_in_irq.isra.22+0x7d/0xb0 [xhci_hcd]
>> >>> [  131.146521]  finish_td.constprop.38+0x1de/0x2f0 [xhci_hcd]
>> >>> [  131.148736]  xhci_irq+0x13a2/0x1ca0 [xhci_hcd]
>> >>> [  131.150972]  ? trace_hardirqs_on+0xd/0x10
>> >>> [  131.153209]  ? _raw_spin_unlock_irq+0x2c/0x50
>> >>> [  131.155390]  ? irq_thread+0xb5/0x1d0
>> >>> [  131.157771]  xhci_msi_irq+0x11/0x20 [xhci_hcd]
>> >>> [  131.159938]  irq_forced_thread_fn+0x2f/0x70
>> >>> [  131.162072]  ? irq_thread+0xb5/0x1d0
>> >>> [  131.164141]  irq_thread+0x149/0x1d0
>> >>> [  131.166132]  ? irq_finalize_oneshot.part.2+0xe0/0xe0
>> >>> [  131.168142]  ? wake_threads_waitq+0x30/0x30
>> >>> [  131.170149]  kthread+0x10f/0x150
>> >>> [  131.172144]  ? irq_thread_dtor+0xc0/0xc0
>> >>> [  131.174139]  ? kthread_create_on_node+0x40/0x40
>> >>> [  131.176116]  ret_from_fork+0x2a/0x40
>> >>> [  131.178074] Code: 8b 77 64 4c 89 e7 e8 e5 fe ff ff eb c3 0f 1f 00 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 <48> 8b 47 10 48 8b 4f 70 48 c7 c7 88 7b 4d a0 4c 8b a0 78 01 00
>> >>> [  131.182382] RIP: retire_playback_urb+0x1b/0x160 [snd_usb_audio] RSP: ffffc90000f4bc10
>> >>> [  131.184562] CR2: 0000000000000010
>> >>>
>> >>> [1] 041e:3232 Creative Technology SoundBlaster X-FI HD
>> >>> [2] http://mailman.alsa-project.org/pipermail/alsa-devel/2016-December/115425.html
>> >>
>> >> I can regenerate this bug with my EMU 0404 USB. My understanding of this bug is
>> >> quite similar to your perspective. This bug is quite critical. In my
>> >> understanding, we encounter this bug in all of cases in which snd-usb-audio is
>> >> applied for PCM playback substream.
>> >>
>> >> Below workaround can also suppress this bug.
>> >>
>> >> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>> >> index a2cdf33..9b34f76 100644
>> >> --- a/sound/usb/endpoint.c
>> >> +++ b/sound/usb/endpoint.c
>> >> @@ -537,11 +537,14 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
>> >>                         alive, ep->ep_num);
>> >>         clear_bit(EP_FLAG_STOPPING, &ep->flags);
>> >>
>> >> -       ep->data_subs = NULL;
>> >>         ep->sync_slave = NULL;
>> >>         ep->retire_data_urb = NULL;
>> >>         ep->prepare_data_urb = NULL;
>> >>
>> >> +       msleep(200);
>> >> +
>> >> +       ep->data_subs = NULL;
>> >> +
>> >>         return 0;
>> >>  }
>> >>
>> >> If initialization of 'struct snd_usb_endpoint.data_subs' can be done after all
>> >> of queued URBs are processed and corresponding complete interrupts are cought,
>> >> we can solve this critical bug.
>> >>
>> >>> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
>> >>> ---
>> >>>  sound/usb/endpoint.c | 11 ++---------
>> >>>  sound/usb/endpoint.h |  2 +-
>> >>>  sound/usb/pcm.c      | 13 ++++++-------
>> >>>  3 files changed, 9 insertions(+), 17 deletions(-)
>> >>>
>> >>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>> >>> index a2cdf3370afe..4465f324c2c2 100644
>> >>> --- a/sound/usb/endpoint.c
>> >>> +++ b/sound/usb/endpoint.c
>> >>> @@ -920,9 +920,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>> >>>  /**
>> >>>   * snd_usb_endpoint_start: start an snd_usb_endpoint
>> >>>   *
>> >>> - * @ep:		the endpoint to start
>> >>> - * @can_sleep:	flag indicating whether the operation is executed in
>> >>> - * 		non-atomic context
>> >>> + * @ep: the endpoint to start
>> >>>   *
>> >>>   * A call to this function will increment the use count of the endpoint.
>> >>>   * In case it is not already running, the URBs for this endpoint will be
>> >>> @@ -932,7 +930,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>> >>>   *
>> >>>   * Returns an error if the URB submission failed, 0 in all other cases.
>> >>>   */
>> >>> -int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>> >>> +int snd_usb_endpoint_start(struct snd_usb_endpoint *ep)
>> >>>  {
>> >>>  	int err;
>> >>>  	unsigned int i;
>> >>> @@ -944,11 +942,6 @@ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep)
>> >>>  	if (++ep->use_count != 1)
>> >>>  		return 0;
>> >>>
>> >>> -	/* just to be sure */
>> >>> -	deactivate_urbs(ep, false);
>> >>> -	if (can_sleep)
>> >>> -		wait_clear_urbs(ep);
>> >>> -
>> >>>  	ep->active_mask = 0;
>> >>>  	ep->unlink_mask = 0;
>> >>>  	ep->phase = 0;
>> >>> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
>> >>> index 6428392d8f62..584f295d7c77 100644
>> >>> --- a/sound/usb/endpoint.h
>> >>> +++ b/sound/usb/endpoint.h
>> >>> @@ -18,7 +18,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>> >>>  				struct audioformat *fmt,
>> >>>  				struct snd_usb_endpoint *sync_ep);
>> >>>
>> >>> -int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, bool can_sleep);
>> >>> +int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep);
>> >>>  void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep);
>> >>>  void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
>> >>>  int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
>> >>> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
>> >>> index 34c6d4f2c0b6..db26f767f851 100644
>> >>> --- a/sound/usb/pcm.c
>> >>> +++ b/sound/usb/pcm.c
>> >>> @@ -218,7 +218,7 @@ int snd_usb_init_pitch(struct snd_usb_audio *chip, int iface,
>> >>>  	}
>> >>>  }
>> >>>
>> >>> -static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>> >>> +static int start_endpoints(struct snd_usb_substream *subs)
>> >>>  {
>> >>>  	int err;
>> >>>
>> >>> @@ -231,7 +231,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>> >>>  		dev_dbg(&subs->dev->dev, "Starting data EP @%p\n", ep);
>> >>>
>> >>>  		ep->data_subs = subs;
>> >>> -		err = snd_usb_endpoint_start(ep, can_sleep);
>> >>> +		err = snd_usb_endpoint_start(ep);
>> >>>  		if (err < 0) {
>> >>>  			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
>> >>>  			return err;
>> >>> @@ -260,7 +260,7 @@ static int start_endpoints(struct snd_usb_substream *subs, bool can_sleep)
>> >>>  		dev_dbg(&subs->dev->dev, "Starting sync EP @%p\n", ep);
>> >>>
>> >>>  		ep->sync_slave = subs->data_endpoint;
>> >>> -		err = snd_usb_endpoint_start(ep, can_sleep);
>> >>> +		err = snd_usb_endpoint_start(ep);
>> >>>  		if (err < 0) {
>> >>>  			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
>> >>>  			return err;
>> >>> @@ -809,8 +809,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>> >>>  		goto unlock;
>> >>>  	}
>> >>>
>> >>> -	snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
>> >>> -	snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
>> >>> +	stop_endpoints(subs, true);
>> >>>
>> >>>  	ret = set_format(subs, subs->cur_audiofmt);
>> >>>  	if (ret < 0)
>> >>> @@ -850,7 +849,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>> >>>  	/* for playback, submit the URBs now; otherwise, the first hwptr_done
>> >>>  	 * updates for all URBs would happen at the same time when starting */
>> >>>  	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
>> >>> -		ret = start_endpoints(subs, true);
>> >>> +		return start_endpoints(subs);
>> >>>
>> >>>   unlock:
>> >>>  	snd_usb_unlock_shutdown(subs->stream->chip);
>> >>> @@ -1666,7 +1665,7 @@ static int snd_usb_substream_capture_trigger(struct snd_pcm_substream *substream
>> >>>
>> >>>  	switch (cmd) {
>> >>>  	case SNDRV_PCM_TRIGGER_START:
>> >>> -		err = start_endpoints(subs, false);
>> >>> +		err = start_endpoints(subs);
>> >>>  		if (err < 0)
>> >>>  			return err;
>> >>
>> >> This patch works better, but not the best. It's a bit intrusive.
>> >
>> > I disagree. Being intrusive is not a good reason to reject a patch,
>> > especially if the alternative is to insert random magic number delays
>> > in hopes of synchronizing process and irq contexts to avoid critical
>> > errors.
>> 
>> I apologize if you got some negative opinions in my previous message.
>> 
>> My intention to show the ugly workaround is not to merge it instead of
>> yours, but clear what causes this bug.
>> 
>> For example, we can also apply below patch for this bug, to solve it
>> roughly.
>> 
>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>> index a2cdf33..6ec4eb1 100644
>> --- a/sound/usb/endpoint.c
>> +++ b/sound/usb/endpoint.c
>> @@ -162,7 +162,7 @@ int snd_usb_endpoint_next_packet_size(struct
>> snd_usb_endpoint *ep)
>>  static void retire_outbound_urb(struct snd_usb_endpoint *ep,
>>                                 struct snd_urb_ctx *urb_ctx)
>>  {
>> -       if (ep->retire_data_urb)
>> +       if (ep->retire_data_urb && ep->data_subs)
>>                 ep->retire_data_urb(ep->data_subs, urb_ctx->urb);
>>  }
>> 
>> But this is just a workaround and can't fix it thoroughly. The main
>> issue is synchronization between interrupt/process contexts.
>> 
>> > Please take the time to fully analyze my patch and let's have a
>> > discussion based on it, not reject it outright and replace it with
>> > a quick and dirty delay hack.
>> 
>> OK. I'll deliberately check it again so that I have no overlooks. I
>> with this thread also catch the other developers enough helpful to
>> you. (I just eventually caught your patch in LKML and not developer
>> for this category of devices.)
>
> Sorry for the late reply, as I've been (still) off and had bad net
> connections.
>
> About your fix: Sakamoto-san is right, it's no good way to fix this
> kind or problem.  The easiest option right now is just to revert my
> previous fix, as it obviously introduces another regression.  The
> correct fix will be taken after that.
>
> I'm going to prepare a revert patch and ask Linus to take it before
> rc1.

I agree with reverting the initial commit decision because my problem
disappears with it.

But can you please state a reason for why my patch is "no good way to
fix"? Being too intrusive is not a good reason.

Ionel

>
>
> thanks,
>
> Takashi
>
>
>
>> 
>> > Ionel
>> >
>> >>
>> >> What we need to synchronize process contexts to interrupt context till all of
>> >> queued URBs are handled. This seems to be a main purpose of wait_clear_bits,
>> >> while current implementation of snd_complete_urb()/wait_clear_bits() is not
>> >> enough.
>> >>
>> >> Thanks for your reporting.
>> >>
>> >> Takashi Sakamoto
>> >> --
>> >> 2.9.3
>> 
>> 
>> Thanks
>> 
>> Takashi Sakamoto
>> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-21 10:38         ` Ioan-Adrian Ratiu
@ 2016-12-30 14:39           ` Takashi Iwai
  2016-12-30 21:00             ` Ioan-Adrian Ratiu
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2016-12-30 14:39 UTC (permalink / raw)
  To: Ioan-Adrian Ratiu; +Cc: Takashi Sakamoto, clemens, alsa-devel, linux-kernel

On Wed, 21 Dec 2016 11:38:54 +0100,
Ioan-Adrian Ratiu wrote:
> 
> >> > Please take the time to fully analyze my patch and let's have a
> >> > discussion based on it, not reject it outright and replace it with
> >> > a quick and dirty delay hack.
> >> 
> >> OK. I'll deliberately check it again so that I have no overlooks. I
> >> with this thread also catch the other developers enough helpful to
> >> you. (I just eventually caught your patch in LKML and not developer
> >> for this category of devices.)
> >
> > Sorry for the late reply, as I've been (still) off and had bad net
> > connections.
> >
> > About your fix: Sakamoto-san is right, it's no good way to fix this
> > kind or problem.  The easiest option right now is just to revert my
> > previous fix, as it obviously introduces another regression.  The
> > correct fix will be taken after that.
> >
> > I'm going to prepare a revert patch and ask Linus to take it before
> > rc1.
> 
> I agree with reverting the initial commit decision because my problem
> disappears with it.
> 
> But can you please state a reason for why my patch is "no good way to
> fix"? Being too intrusive is not a good reason.

"Being too intrusive" is the exact reason why it's not good as a
"regression fix" like this case.  The logic you've implemented in the
patch itself looks good (although the code introduces a bug, the
unbalance of snd_usb_*lock_shutdown()).  The only point I couldn't
take it is that it's rather a fundamental change, not a quick fix for
a regression.

What's the worst case scenario in a regression fix?  It's when a fix
introduces yet another regression.  It'd be much worse if the new
regression is deeper.  The complicated logical change has a potential
risk of such.  Thus an intrusive change is not always good as a
"regression fix".

In general, if the change were trivial, it's obviously OK to take as a
regression fix -- where the logic is also trivial in most cases, too.
But when the fix becomes complex, we need to rethink.  Especially when
the original buggy commit is small, reverting it is often better as a
clear cut.

Think in that way: you're addressing a deeper problem that was
revealed accidentally by my previous bad fix.  Writing the change as
if it were merely a regression fix gives the wrong understanding to
readers :)

That said, I'd appreciate if you respin the fix again with a
combination of my previous fix and brush up the code a bit as a whole,
and document it not as a regression fix but as a complete fix of the
existing races.  I can write it in my side quickly, but it'd be almost
in the same form as you posted, I suppose.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference
  2016-12-30 14:39           ` Takashi Iwai
@ 2016-12-30 21:00             ` Ioan-Adrian Ratiu
  0 siblings, 0 replies; 8+ messages in thread
From: Ioan-Adrian Ratiu @ 2016-12-30 21:00 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Takashi Sakamoto, clemens, alsa-devel, linux-kernel

On Fri, 30 Dec 2016, Takashi Iwai <tiwai@suse.de> wrote:
> On Wed, 21 Dec 2016 11:38:54 +0100,
> Ioan-Adrian Ratiu wrote:
>> 
>> >> > Please take the time to fully analyze my patch and let's have a
>> >> > discussion based on it, not reject it outright and replace it with
>> >> > a quick and dirty delay hack.
>> >> 
>> >> OK. I'll deliberately check it again so that I have no overlooks. I
>> >> with this thread also catch the other developers enough helpful to
>> >> you. (I just eventually caught your patch in LKML and not developer
>> >> for this category of devices.)
>> >
>> > Sorry for the late reply, as I've been (still) off and had bad net
>> > connections.
>> >
>> > About your fix: Sakamoto-san is right, it's no good way to fix this
>> > kind or problem.  The easiest option right now is just to revert my
>> > previous fix, as it obviously introduces another regression.  The
>> > correct fix will be taken after that.
>> >
>> > I'm going to prepare a revert patch and ask Linus to take it before
>> > rc1.
>> 
>> I agree with reverting the initial commit decision because my problem
>> disappears with it.
>> 
>> But can you please state a reason for why my patch is "no good way to
>> fix"? Being too intrusive is not a good reason.
>
> "Being too intrusive" is the exact reason why it's not good as a
> "regression fix" like this case.  The logic you've implemented in the
> patch itself looks good (although the code introduces a bug, the
> unbalance of snd_usb_*lock_shutdown()).  The only point I couldn't
> take it is that it's rather a fundamental change, not a quick fix for
> a regression.
>
> What's the worst case scenario in a regression fix?  It's when a fix
> introduces yet another regression.  It'd be much worse if the new
> regression is deeper.  The complicated logical change has a potential
> risk of such.  Thus an intrusive change is not always good as a
> "regression fix".
>
> In general, if the change were trivial, it's obviously OK to take as a
> regression fix -- where the logic is also trivial in most cases, too.
> But when the fix becomes complex, we need to rethink.  Especially when
> the original buggy commit is small, reverting it is often better as a
> clear cut.
>
> Think in that way: you're addressing a deeper problem that was
> revealed accidentally by my previous bad fix.  Writing the change as
> if it were merely a regression fix gives the wrong understanding to
> readers :)

Yes, this makes sense. Thank you.

>
> That said, I'd appreciate if you respin the fix again with a
> combination of my previous fix and brush up the code a bit as a whole,
> and document it not as a regression fix but as a complete fix of the
> existing races.  I can write it in my side quickly, but it'd be almost
> in the same form as you posted, I suppose.

I'll find some time to do a rebase either tomorrow (31 Dec) or until
2 Jan at the latest.

Ionel

>
>
> thanks,
>
> Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-12-30 20:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-18  1:08 [PATCH] ALSA: snd-usb: fix IRQ triggered NULL pointer dereference Ioan-Adrian Ratiu
2016-12-20  1:25 ` Takashi Sakamoto
2016-12-20 11:47   ` Ioan-Adrian Ratiu
2016-12-20 13:04     ` Takashi Sakamoto
2016-12-21 10:17       ` Takashi Iwai
2016-12-21 10:38         ` Ioan-Adrian Ratiu
2016-12-30 14:39           ` Takashi Iwai
2016-12-30 21:00             ` Ioan-Adrian Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).