All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30  6:10 ` Song liwei
  0 siblings, 0 replies; 12+ messages in thread
From: Song liwei @ 2019-04-30  6:10 UTC (permalink / raw)
  To: alsa-devel
  Cc: Jaroslav Kysela, Takashi Iwai, Yu Zhao, Mark Brown, Keyon Jie,
	linux-kernel, LiweiSong

From: Liwei Song <liwei.song@windriver.com>

Fix the following BUG:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
Workqueue: events azx_probe_work [snd_hda_intel]
RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
Call Trace:
 <IRQ>
 azx_interrupt+0x78/0x140 [snd_hda_codec]
 __handle_irq_event_percpu+0x49/0x300
 handle_irq_event_percpu+0x23/0x60
 handle_irq_event+0x3c/0x60
 handle_edge_irq+0xdb/0x180
 handle_irq+0x23/0x30
 do_IRQ+0x6a/0x140
 common_interrupt+0xf/0xf

The Call Trace happened when run kdump on a NFS rootfs system.
Exist the following calling sequence when boot the second kernel:

azx_first_init()
   --> azx_acquire_irq()
                      <-- interrupt come in, azx_interrupt() was called
   --> hda_intel_init_chip()
      --> azx_init_chip()
         --> snd_hdac_bus_init_chip()
              --> snd_hdac_bus_init_cmd_io();
                    --> init rirb.buf and corb.buf

Interrupt happened after azx_acquire_irq() while RIRB still didn't got
initialized, then NULL pointer will be used when process the interrupt.

Check the value of RIRB to ensure it is not NULL, to aviod some special
case may hang the system.

Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
Signed-off-by: Liwei Song <liwei.song@windriver.com>
---
 sound/hda/hdac_controller.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
index 74244d8e2909..2f0fa5353361 100644
--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -195,6 +195,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 		return;
 	bus->rirb.wp = wp;
 
+	if (!bus->rirb.buf)
+		return;
+
 	while (bus->rirb.rp != wp) {
 		bus->rirb.rp++;
 		bus->rirb.rp %= AZX_MAX_RIRB_ENTRIES;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30  6:10 ` Song liwei
  0 siblings, 0 replies; 12+ messages in thread
From: Song liwei @ 2019-04-30  6:10 UTC (permalink / raw)
  To: alsa-devel
  Cc: Jaroslav Kysela, Takashi Iwai, Yu Zhao, Mark Brown, Keyon Jie,
	linux-kernel, LiweiSong

From: Liwei Song <liwei.song@windriver.com>

Fix the following BUG:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
Workqueue: events azx_probe_work [snd_hda_intel]
RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
Call Trace:
 <IRQ>
 azx_interrupt+0x78/0x140 [snd_hda_codec]
 __handle_irq_event_percpu+0x49/0x300
 handle_irq_event_percpu+0x23/0x60
 handle_irq_event+0x3c/0x60
 handle_edge_irq+0xdb/0x180
 handle_irq+0x23/0x30
 do_IRQ+0x6a/0x140
 common_interrupt+0xf/0xf

The Call Trace happened when run kdump on a NFS rootfs system.
Exist the following calling sequence when boot the second kernel:

azx_first_init()
   --> azx_acquire_irq()
                      <-- interrupt come in, azx_interrupt() was called
   --> hda_intel_init_chip()
      --> azx_init_chip()
         --> snd_hdac_bus_init_chip()
              --> snd_hdac_bus_init_cmd_io();
                    --> init rirb.buf and corb.buf

Interrupt happened after azx_acquire_irq() while RIRB still didn't got
initialized, then NULL pointer will be used when process the interrupt.

Check the value of RIRB to ensure it is not NULL, to aviod some special
case may hang the system.

Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
Signed-off-by: Liwei Song <liwei.song@windriver.com>
---
 sound/hda/hdac_controller.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
index 74244d8e2909..2f0fa5353361 100644
--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -195,6 +195,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 		return;
 	bus->rirb.wp = wp;
 
+	if (!bus->rirb.buf)
+		return;
+
 	while (bus->rirb.rp != wp) {
 		bus->rirb.rp++;
 		bus->rirb.rp %= AZX_MAX_RIRB_ENTRIES;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
  2019-04-30  6:10 ` Song liwei
@ 2019-04-30  7:31   ` Takashi Iwai
  -1 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30  7:31 UTC (permalink / raw)
  To: Song liwei
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel

On Tue, 30 Apr 2019 08:10:53 +0200,
Song liwei wrote:
> 
> From: Liwei Song <liwei.song@windriver.com>
> 
> Fix the following BUG:
> 
> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
> Workqueue: events azx_probe_work [snd_hda_intel]
> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
> Call Trace:
>  <IRQ>
>  azx_interrupt+0x78/0x140 [snd_hda_codec]
>  __handle_irq_event_percpu+0x49/0x300
>  handle_irq_event_percpu+0x23/0x60
>  handle_irq_event+0x3c/0x60
>  handle_edge_irq+0xdb/0x180
>  handle_irq+0x23/0x30
>  do_IRQ+0x6a/0x140
>  common_interrupt+0xf/0xf
> 
> The Call Trace happened when run kdump on a NFS rootfs system.
> Exist the following calling sequence when boot the second kernel:
> 
> azx_first_init()
>    --> azx_acquire_irq()
>                       <-- interrupt come in, azx_interrupt() was called
>    --> hda_intel_init_chip()
>       --> azx_init_chip()
>          --> snd_hdac_bus_init_chip()
>               --> snd_hdac_bus_init_cmd_io();
>                     --> init rirb.buf and corb.buf
> 
> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
> initialized, then NULL pointer will be used when process the interrupt.
> 
> Check the value of RIRB to ensure it is not NULL, to aviod some special
> case may hang the system.
> 
> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
> Signed-off-by: Liwei Song <liwei.song@windriver.com>

Oh, that's indeed a race there.

But I guess the check introduced by the patch is still error-prone.
Basically the interrupt handling should be moved after the chip
initialization.  I suppose that your platform uses the shared
interrupt, not the MSI?

In anyway, alternative (and likely more certain) fix would be to move
the azx_acquir_irq() call like the patch below (note: totally
untested).  Could you check whether it works?


thanks,

Takashi


--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1788,9 +1788,6 @@ static int azx_first_init(struct azx *chip)
 			chip->msi = 0;
 	}
 
-	if (azx_acquire_irq(chip, 0) < 0)
-		return -EBUSY;
-
 	pci_set_master(pci);
 	synchronize_irq(bus->irq);
 
@@ -1904,6 +1901,9 @@ static int azx_first_init(struct azx *chip)
 		return -ENODEV;
 	}
 
+	if (azx_acquire_irq(chip, 0) < 0)
+		return -EBUSY;
+
 	strcpy(card->driver, "HDA-Intel");
 	strlcpy(card->shortname, driver_short_names[chip->driver_type],
 		sizeof(card->shortname));

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30  7:31   ` Takashi Iwai
  0 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30  7:31 UTC (permalink / raw)
  To: Song liwei
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel

On Tue, 30 Apr 2019 08:10:53 +0200,
Song liwei wrote:
> 
> From: Liwei Song <liwei.song@windriver.com>
> 
> Fix the following BUG:
> 
> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
> Workqueue: events azx_probe_work [snd_hda_intel]
> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
> Call Trace:
>  <IRQ>
>  azx_interrupt+0x78/0x140 [snd_hda_codec]
>  __handle_irq_event_percpu+0x49/0x300
>  handle_irq_event_percpu+0x23/0x60
>  handle_irq_event+0x3c/0x60
>  handle_edge_irq+0xdb/0x180
>  handle_irq+0x23/0x30
>  do_IRQ+0x6a/0x140
>  common_interrupt+0xf/0xf
> 
> The Call Trace happened when run kdump on a NFS rootfs system.
> Exist the following calling sequence when boot the second kernel:
> 
> azx_first_init()
>    --> azx_acquire_irq()
>                       <-- interrupt come in, azx_interrupt() was called
>    --> hda_intel_init_chip()
>       --> azx_init_chip()
>          --> snd_hdac_bus_init_chip()
>               --> snd_hdac_bus_init_cmd_io();
>                     --> init rirb.buf and corb.buf
> 
> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
> initialized, then NULL pointer will be used when process the interrupt.
> 
> Check the value of RIRB to ensure it is not NULL, to aviod some special
> case may hang the system.
> 
> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
> Signed-off-by: Liwei Song <liwei.song@windriver.com>

Oh, that's indeed a race there.

But I guess the check introduced by the patch is still error-prone.
Basically the interrupt handling should be moved after the chip
initialization.  I suppose that your platform uses the shared
interrupt, not the MSI?

In anyway, alternative (and likely more certain) fix would be to move
the azx_acquir_irq() call like the patch below (note: totally
untested).  Could you check whether it works?


thanks,

Takashi


--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1788,9 +1788,6 @@ static int azx_first_init(struct azx *chip)
 			chip->msi = 0;
 	}
 
-	if (azx_acquire_irq(chip, 0) < 0)
-		return -EBUSY;
-
 	pci_set_master(pci);
 	synchronize_irq(bus->irq);
 
@@ -1904,6 +1901,9 @@ static int azx_first_init(struct azx *chip)
 		return -ENODEV;
 	}
 
+	if (azx_acquire_irq(chip, 0) < 0)
+		return -EBUSY;
+
 	strcpy(card->driver, "HDA-Intel");
 	strlcpy(card->shortname, driver_short_names[chip->driver_type],
 		sizeof(card->shortname));

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
  2019-04-30  7:31   ` Takashi Iwai
@ 2019-04-30  8:32     ` Liwei Song
  -1 siblings, 0 replies; 12+ messages in thread
From: Liwei Song @ 2019-04-30  8:32 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel



On 04/30/2019 03:31 PM, Takashi Iwai wrote:
> On Tue, 30 Apr 2019 08:10:53 +0200,
> Song liwei wrote:
>>
>> From: Liwei Song <liwei.song@windriver.com>
>>
>> Fix the following BUG:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
>> Workqueue: events azx_probe_work [snd_hda_intel]
>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
>> Call Trace:
>>  <IRQ>
>>  azx_interrupt+0x78/0x140 [snd_hda_codec]
>>  __handle_irq_event_percpu+0x49/0x300
>>  handle_irq_event_percpu+0x23/0x60
>>  handle_irq_event+0x3c/0x60
>>  handle_edge_irq+0xdb/0x180
>>  handle_irq+0x23/0x30
>>  do_IRQ+0x6a/0x140
>>  common_interrupt+0xf/0xf
>>
>> The Call Trace happened when run kdump on a NFS rootfs system.
>> Exist the following calling sequence when boot the second kernel:
>>
>> azx_first_init()
>>    --> azx_acquire_irq()
>>                       <-- interrupt come in, azx_interrupt() was called
>>    --> hda_intel_init_chip()
>>       --> azx_init_chip()
>>          --> snd_hdac_bus_init_chip()
>>               --> snd_hdac_bus_init_cmd_io();
>>                     --> init rirb.buf and corb.buf
>>
>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
>> initialized, then NULL pointer will be used when process the interrupt.
>>
>> Check the value of RIRB to ensure it is not NULL, to aviod some special
>> case may hang the system.
>>
>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
>> Signed-off-by: Liwei Song <liwei.song@windriver.com>
> 
> Oh, that's indeed a race there.
> 
> But I guess the check introduced by the patch is still error-prone.
> Basically the interrupt handling should be moved after the chip
> initialization.  I suppose that your platform uses the shared
> interrupt, not the MSI?

This is the information from /proc/interrupt
134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0


> 
> In anyway, alternative (and likely more certain) fix would be to move
> the azx_acquir_irq() call like the patch below (note: totally
> untested).  Could you check whether it works?

Yes, It works.

Considering a previous patch like the one you provide will import some issue, 
so I choose check the invalid value to low the risk, but just as you mentioned,
It is not a good solution.

commit 542cedec53c9e8b73f3f05bf8468823598c50489
Author: Yu Zhao <yuzhao@google.com>
Date:   Tue Sep 11 15:12:46 2018 -0600

    Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
    
    This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
    
    The patch doesn't fix accessing memory with null pointer in
    skl_interrupt().
    
    There are two problems: 1) skl_init_chip() is called twice, before
    and after dma buffer is allocate. The first call sets bus->chip_init
    which prevents the second from initializing bus->corb.buf and
    rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
    interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
    There is a small window which skl_interrupt() can be called if irq
    has been acquired. If so, it crashes when using null dma buffer
    pointers.

Thanks,
Liwei.


> 
> 
> thanks,
> 
> Takashi
> 
> 
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1788,9 +1788,6 @@ static int azx_first_init(struct azx *chip)
>  			chip->msi = 0;
>  	}
>  
> -	if (azx_acquire_irq(chip, 0) < 0)
> -		return -EBUSY;
> -
>  	pci_set_master(pci);
>  	synchronize_irq(bus->irq);
>  
> @@ -1904,6 +1901,9 @@ static int azx_first_init(struct azx *chip)
>  		return -ENODEV;
>  	}
>  
> +	if (azx_acquire_irq(chip, 0) < 0)
> +		return -EBUSY;
> +
>  	strcpy(card->driver, "HDA-Intel");
>  	strlcpy(card->shortname, driver_short_names[chip->driver_type],
>  		sizeof(card->shortname));
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30  8:32     ` Liwei Song
  0 siblings, 0 replies; 12+ messages in thread
From: Liwei Song @ 2019-04-30  8:32 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel, Yu Zhao, Keyon Jie, linux-kernel, Mark Brown



On 04/30/2019 03:31 PM, Takashi Iwai wrote:
> On Tue, 30 Apr 2019 08:10:53 +0200,
> Song liwei wrote:
>>
>> From: Liwei Song <liwei.song@windriver.com>
>>
>> Fix the following BUG:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
>> Workqueue: events azx_probe_work [snd_hda_intel]
>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
>> Call Trace:
>>  <IRQ>
>>  azx_interrupt+0x78/0x140 [snd_hda_codec]
>>  __handle_irq_event_percpu+0x49/0x300
>>  handle_irq_event_percpu+0x23/0x60
>>  handle_irq_event+0x3c/0x60
>>  handle_edge_irq+0xdb/0x180
>>  handle_irq+0x23/0x30
>>  do_IRQ+0x6a/0x140
>>  common_interrupt+0xf/0xf
>>
>> The Call Trace happened when run kdump on a NFS rootfs system.
>> Exist the following calling sequence when boot the second kernel:
>>
>> azx_first_init()
>>    --> azx_acquire_irq()
>>                       <-- interrupt come in, azx_interrupt() was called
>>    --> hda_intel_init_chip()
>>       --> azx_init_chip()
>>          --> snd_hdac_bus_init_chip()
>>               --> snd_hdac_bus_init_cmd_io();
>>                     --> init rirb.buf and corb.buf
>>
>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
>> initialized, then NULL pointer will be used when process the interrupt.
>>
>> Check the value of RIRB to ensure it is not NULL, to aviod some special
>> case may hang the system.
>>
>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
>> Signed-off-by: Liwei Song <liwei.song@windriver.com>
> 
> Oh, that's indeed a race there.
> 
> But I guess the check introduced by the patch is still error-prone.
> Basically the interrupt handling should be moved after the chip
> initialization.  I suppose that your platform uses the shared
> interrupt, not the MSI?

This is the information from /proc/interrupt
134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0


> 
> In anyway, alternative (and likely more certain) fix would be to move
> the azx_acquir_irq() call like the patch below (note: totally
> untested).  Could you check whether it works?

Yes, It works.

Considering a previous patch like the one you provide will import some issue, 
so I choose check the invalid value to low the risk, but just as you mentioned,
It is not a good solution.

commit 542cedec53c9e8b73f3f05bf8468823598c50489
Author: Yu Zhao <yuzhao@google.com>
Date:   Tue Sep 11 15:12:46 2018 -0600

    Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
    
    This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
    
    The patch doesn't fix accessing memory with null pointer in
    skl_interrupt().
    
    There are two problems: 1) skl_init_chip() is called twice, before
    and after dma buffer is allocate. The first call sets bus->chip_init
    which prevents the second from initializing bus->corb.buf and
    rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
    interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
    There is a small window which skl_interrupt() can be called if irq
    has been acquired. If so, it crashes when using null dma buffer
    pointers.

Thanks,
Liwei.


> 
> 
> thanks,
> 
> Takashi
> 
> 
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1788,9 +1788,6 @@ static int azx_first_init(struct azx *chip)
>  			chip->msi = 0;
>  	}
>  
> -	if (azx_acquire_irq(chip, 0) < 0)
> -		return -EBUSY;
> -
>  	pci_set_master(pci);
>  	synchronize_irq(bus->irq);
>  
> @@ -1904,6 +1901,9 @@ static int azx_first_init(struct azx *chip)
>  		return -ENODEV;
>  	}
>  
> +	if (azx_acquire_irq(chip, 0) < 0)
> +		return -EBUSY;
> +
>  	strcpy(card->driver, "HDA-Intel");
>  	strlcpy(card->shortname, driver_short_names[chip->driver_type],
>  		sizeof(card->shortname));
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
  2019-04-30  8:32     ` Liwei Song
@ 2019-04-30  8:53       ` Takashi Iwai
  -1 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30  8:53 UTC (permalink / raw)
  To: Liwei Song
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel

On Tue, 30 Apr 2019 10:32:47 +0200,
Liwei Song wrote:
> 
> 
> 
> On 04/30/2019 03:31 PM, Takashi Iwai wrote:
> > On Tue, 30 Apr 2019 08:10:53 +0200,
> > Song liwei wrote:
> >>
> >> From: Liwei Song <liwei.song@windriver.com>
> >>
> >> Fix the following BUG:
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
> >> Workqueue: events azx_probe_work [snd_hda_intel]
> >> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
> >> Call Trace:
> >>  <IRQ>
> >>  azx_interrupt+0x78/0x140 [snd_hda_codec]
> >>  __handle_irq_event_percpu+0x49/0x300
> >>  handle_irq_event_percpu+0x23/0x60
> >>  handle_irq_event+0x3c/0x60
> >>  handle_edge_irq+0xdb/0x180
> >>  handle_irq+0x23/0x30
> >>  do_IRQ+0x6a/0x140
> >>  common_interrupt+0xf/0xf
> >>
> >> The Call Trace happened when run kdump on a NFS rootfs system.
> >> Exist the following calling sequence when boot the second kernel:
> >>
> >> azx_first_init()
> >>    --> azx_acquire_irq()
> >>                       <-- interrupt come in, azx_interrupt() was called
> >>    --> hda_intel_init_chip()
> >>       --> azx_init_chip()
> >>          --> snd_hdac_bus_init_chip()
> >>               --> snd_hdac_bus_init_cmd_io();
> >>                     --> init rirb.buf and corb.buf
> >>
> >> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
> >> initialized, then NULL pointer will be used when process the interrupt.
> >>
> >> Check the value of RIRB to ensure it is not NULL, to aviod some special
> >> case may hang the system.
> >>
> >> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
> >> Signed-off-by: Liwei Song <liwei.song@windriver.com>
> > 
> > Oh, that's indeed a race there.
> > 
> > But I guess the check introduced by the patch is still error-prone.
> > Basically the interrupt handling should be moved after the chip
> > initialization.  I suppose that your platform uses the shared
> > interrupt, not the MSI?
> 
> This is the information from /proc/interrupt
> 134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0

Hm, then it's interesting...


> > In anyway, alternative (and likely more certain) fix would be to move
> > the azx_acquir_irq() call like the patch below (note: totally
> > untested).  Could you check whether it works?
> 
> Yes, It works.
> 
> Considering a previous patch like the one you provide will import some issue, 
> so I choose check the invalid value to low the risk, but just as you mentioned,
> It is not a good solution.
> 
> commit 542cedec53c9e8b73f3f05bf8468823598c50489
> Author: Yu Zhao <yuzhao@google.com>
> Date:   Tue Sep 11 15:12:46 2018 -0600
> 
>     Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
>     
>     This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
>     
>     The patch doesn't fix accessing memory with null pointer in
>     skl_interrupt().
>     
>     There are two problems: 1) skl_init_chip() is called twice, before
>     and after dma buffer is allocate. The first call sets bus->chip_init
>     which prevents the second from initializing bus->corb.buf and
>     rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
>     interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
>     There is a small window which skl_interrupt() can be called if irq
>     has been acquired. If so, it crashes when using null dma buffer
>     pointers.

Actually this followed by another fix b61749a89f82,
    sound: enable interrupt after dma buffer initialization
    
and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io().

So I wonder how the irq gets triggered in your case.
If it were a shared irq, it's understandable.  But for MSI, it should
have been the isolated source.

In anyway, for the latest tree, the change I suggested would cover
better although it's more radical as you pointed.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30  8:53       ` Takashi Iwai
  0 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30  8:53 UTC (permalink / raw)
  To: Liwei Song
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel

On Tue, 30 Apr 2019 10:32:47 +0200,
Liwei Song wrote:
> 
> 
> 
> On 04/30/2019 03:31 PM, Takashi Iwai wrote:
> > On Tue, 30 Apr 2019 08:10:53 +0200,
> > Song liwei wrote:
> >>
> >> From: Liwei Song <liwei.song@windriver.com>
> >>
> >> Fix the following BUG:
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
> >> Workqueue: events azx_probe_work [snd_hda_intel]
> >> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
> >> Call Trace:
> >>  <IRQ>
> >>  azx_interrupt+0x78/0x140 [snd_hda_codec]
> >>  __handle_irq_event_percpu+0x49/0x300
> >>  handle_irq_event_percpu+0x23/0x60
> >>  handle_irq_event+0x3c/0x60
> >>  handle_edge_irq+0xdb/0x180
> >>  handle_irq+0x23/0x30
> >>  do_IRQ+0x6a/0x140
> >>  common_interrupt+0xf/0xf
> >>
> >> The Call Trace happened when run kdump on a NFS rootfs system.
> >> Exist the following calling sequence when boot the second kernel:
> >>
> >> azx_first_init()
> >>    --> azx_acquire_irq()
> >>                       <-- interrupt come in, azx_interrupt() was called
> >>    --> hda_intel_init_chip()
> >>       --> azx_init_chip()
> >>          --> snd_hdac_bus_init_chip()
> >>               --> snd_hdac_bus_init_cmd_io();
> >>                     --> init rirb.buf and corb.buf
> >>
> >> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
> >> initialized, then NULL pointer will be used when process the interrupt.
> >>
> >> Check the value of RIRB to ensure it is not NULL, to aviod some special
> >> case may hang the system.
> >>
> >> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
> >> Signed-off-by: Liwei Song <liwei.song@windriver.com>
> > 
> > Oh, that's indeed a race there.
> > 
> > But I guess the check introduced by the patch is still error-prone.
> > Basically the interrupt handling should be moved after the chip
> > initialization.  I suppose that your platform uses the shared
> > interrupt, not the MSI?
> 
> This is the information from /proc/interrupt
> 134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0

Hm, then it's interesting...


> > In anyway, alternative (and likely more certain) fix would be to move
> > the azx_acquir_irq() call like the patch below (note: totally
> > untested).  Could you check whether it works?
> 
> Yes, It works.
> 
> Considering a previous patch like the one you provide will import some issue, 
> so I choose check the invalid value to low the risk, but just as you mentioned,
> It is not a good solution.
> 
> commit 542cedec53c9e8b73f3f05bf8468823598c50489
> Author: Yu Zhao <yuzhao@google.com>
> Date:   Tue Sep 11 15:12:46 2018 -0600
> 
>     Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
>     
>     This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
>     
>     The patch doesn't fix accessing memory with null pointer in
>     skl_interrupt().
>     
>     There are two problems: 1) skl_init_chip() is called twice, before
>     and after dma buffer is allocate. The first call sets bus->chip_init
>     which prevents the second from initializing bus->corb.buf and
>     rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
>     interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
>     There is a small window which skl_interrupt() can be called if irq
>     has been acquired. If so, it crashes when using null dma buffer
>     pointers.

Actually this followed by another fix b61749a89f82,
    sound: enable interrupt after dma buffer initialization
    
and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io().

So I wonder how the irq gets triggered in your case.
If it were a shared irq, it's understandable.  But for MSI, it should
have been the isolated source.

In anyway, for the latest tree, the change I suggested would cover
better although it's more radical as you pointed.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
  2019-04-30  8:53       ` Takashi Iwai
@ 2019-04-30  9:29         ` Liwei Song
  -1 siblings, 0 replies; 12+ messages in thread
From: Liwei Song @ 2019-04-30  9:29 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel



On 04/30/2019 04:53 PM, Takashi Iwai wrote:
> On Tue, 30 Apr 2019 10:32:47 +0200,
> Liwei Song wrote:
>>
>>
>>
>> On 04/30/2019 03:31 PM, Takashi Iwai wrote:
>>> On Tue, 30 Apr 2019 08:10:53 +0200,
>>> Song liwei wrote:
>>>>
>>>> From: Liwei Song <liwei.song@windriver.com>
>>>>
>>>> Fix the following BUG:
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
>>>> Workqueue: events azx_probe_work [snd_hda_intel]
>>>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
>>>> Call Trace:
>>>>  <IRQ>
>>>>  azx_interrupt+0x78/0x140 [snd_hda_codec]
>>>>  __handle_irq_event_percpu+0x49/0x300
>>>>  handle_irq_event_percpu+0x23/0x60
>>>>  handle_irq_event+0x3c/0x60
>>>>  handle_edge_irq+0xdb/0x180
>>>>  handle_irq+0x23/0x30
>>>>  do_IRQ+0x6a/0x140
>>>>  common_interrupt+0xf/0xf
>>>>
>>>> The Call Trace happened when run kdump on a NFS rootfs system.
>>>> Exist the following calling sequence when boot the second kernel:
>>>>
>>>> azx_first_init()
>>>>    --> azx_acquire_irq()
>>>>                       <-- interrupt come in, azx_interrupt() was called
>>>>    --> hda_intel_init_chip()
>>>>       --> azx_init_chip()
>>>>          --> snd_hdac_bus_init_chip()
>>>>               --> snd_hdac_bus_init_cmd_io();
>>>>                     --> init rirb.buf and corb.buf
>>>>
>>>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
>>>> initialized, then NULL pointer will be used when process the interrupt.
>>>>
>>>> Check the value of RIRB to ensure it is not NULL, to aviod some special
>>>> case may hang the system.
>>>>
>>>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
>>>> Signed-off-by: Liwei Song <liwei.song@windriver.com>
>>>
>>> Oh, that's indeed a race there.
>>>
>>> But I guess the check introduced by the patch is still error-prone.
>>> Basically the interrupt handling should be moved after the chip
>>> initialization.  I suppose that your platform uses the shared
>>> interrupt, not the MSI?
>>
>> This is the information from /proc/interrupt
>> 134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0
> 
> Hm, then it's interesting...
> 
> 
>>> In anyway, alternative (and likely more certain) fix would be to move
>>> the azx_acquir_irq() call like the patch below (note: totally
>>> untested).  Could you check whether it works?
>>
>> Yes, It works.
>>
>> Considering a previous patch like the one you provide will import some issue, 
>> so I choose check the invalid value to low the risk, but just as you mentioned,
>> It is not a good solution.
>>
>> commit 542cedec53c9e8b73f3f05bf8468823598c50489
>> Author: Yu Zhao <yuzhao@google.com>
>> Date:   Tue Sep 11 15:12:46 2018 -0600
>>
>>     Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
>>     
>>     This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
>>     
>>     The patch doesn't fix accessing memory with null pointer in
>>     skl_interrupt().
>>     
>>     There are two problems: 1) skl_init_chip() is called twice, before
>>     and after dma buffer is allocate. The first call sets bus->chip_init
>>     which prevents the second from initializing bus->corb.buf and
>>     rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
>>     interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
>>     There is a small window which skl_interrupt() can be called if irq
>>     has been acquired. If so, it crashes when using null dma buffer
>>     pointers.
> 
> Actually this followed by another fix b61749a89f82,
>     sound: enable interrupt after dma buffer initialization
>     
> and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io().
> 
> So I wonder how the irq gets triggered in your case.
> If it were a shared irq, it's understandable.  But for MSI, it should
> have been the isolated source.

I'm still working on how the irq was triggered,
it is a little complex to reproduce it, first it must run with NFS rootfs,
without NFS rootfs it can not reproduced.
Then with kdump enabled, after "echo c > /proc/sysrq-trigger" crash the kernel,
the kernel specified by kdump will boot, then interrupt will trigger
soon after azx interrupt was register.


> 
> In anyway, for the latest tree, the change I suggested would cover
> better although it's more radical as you pointed.

Got it, Thanks.

Liwei.


> 
> 
> thanks,
> 
> Takashi
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30  9:29         ` Liwei Song
  0 siblings, 0 replies; 12+ messages in thread
From: Liwei Song @ 2019-04-30  9:29 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel, Yu Zhao, Keyon Jie, linux-kernel, Mark Brown



On 04/30/2019 04:53 PM, Takashi Iwai wrote:
> On Tue, 30 Apr 2019 10:32:47 +0200,
> Liwei Song wrote:
>>
>>
>>
>> On 04/30/2019 03:31 PM, Takashi Iwai wrote:
>>> On Tue, 30 Apr 2019 08:10:53 +0200,
>>> Song liwei wrote:
>>>>
>>>> From: Liwei Song <liwei.song@windriver.com>
>>>>
>>>> Fix the following BUG:
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
>>>> Workqueue: events azx_probe_work [snd_hda_intel]
>>>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
>>>> Call Trace:
>>>>  <IRQ>
>>>>  azx_interrupt+0x78/0x140 [snd_hda_codec]
>>>>  __handle_irq_event_percpu+0x49/0x300
>>>>  handle_irq_event_percpu+0x23/0x60
>>>>  handle_irq_event+0x3c/0x60
>>>>  handle_edge_irq+0xdb/0x180
>>>>  handle_irq+0x23/0x30
>>>>  do_IRQ+0x6a/0x140
>>>>  common_interrupt+0xf/0xf
>>>>
>>>> The Call Trace happened when run kdump on a NFS rootfs system.
>>>> Exist the following calling sequence when boot the second kernel:
>>>>
>>>> azx_first_init()
>>>>    --> azx_acquire_irq()
>>>>                       <-- interrupt come in, azx_interrupt() was called
>>>>    --> hda_intel_init_chip()
>>>>       --> azx_init_chip()
>>>>          --> snd_hdac_bus_init_chip()
>>>>               --> snd_hdac_bus_init_cmd_io();
>>>>                     --> init rirb.buf and corb.buf
>>>>
>>>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
>>>> initialized, then NULL pointer will be used when process the interrupt.
>>>>
>>>> Check the value of RIRB to ensure it is not NULL, to aviod some special
>>>> case may hang the system.
>>>>
>>>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
>>>> Signed-off-by: Liwei Song <liwei.song@windriver.com>
>>>
>>> Oh, that's indeed a race there.
>>>
>>> But I guess the check introduced by the patch is still error-prone.
>>> Basically the interrupt handling should be moved after the chip
>>> initialization.  I suppose that your platform uses the shared
>>> interrupt, not the MSI?
>>
>> This is the information from /proc/interrupt
>> 134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0
> 
> Hm, then it's interesting...
> 
> 
>>> In anyway, alternative (and likely more certain) fix would be to move
>>> the azx_acquir_irq() call like the patch below (note: totally
>>> untested).  Could you check whether it works?
>>
>> Yes, It works.
>>
>> Considering a previous patch like the one you provide will import some issue, 
>> so I choose check the invalid value to low the risk, but just as you mentioned,
>> It is not a good solution.
>>
>> commit 542cedec53c9e8b73f3f05bf8468823598c50489
>> Author: Yu Zhao <yuzhao@google.com>
>> Date:   Tue Sep 11 15:12:46 2018 -0600
>>
>>     Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
>>     
>>     This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
>>     
>>     The patch doesn't fix accessing memory with null pointer in
>>     skl_interrupt().
>>     
>>     There are two problems: 1) skl_init_chip() is called twice, before
>>     and after dma buffer is allocate. The first call sets bus->chip_init
>>     which prevents the second from initializing bus->corb.buf and
>>     rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
>>     interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
>>     There is a small window which skl_interrupt() can be called if irq
>>     has been acquired. If so, it crashes when using null dma buffer
>>     pointers.
> 
> Actually this followed by another fix b61749a89f82,
>     sound: enable interrupt after dma buffer initialization
>     
> and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io().
> 
> So I wonder how the irq gets triggered in your case.
> If it were a shared irq, it's understandable.  But for MSI, it should
> have been the isolated source.

I'm still working on how the irq was triggered,
it is a little complex to reproduce it, first it must run with NFS rootfs,
without NFS rootfs it can not reproduced.
Then with kdump enabled, after "echo c > /proc/sysrq-trigger" crash the kernel,
the kernel specified by kdump will boot, then interrupt will trigger
soon after azx interrupt was register.


> 
> In anyway, for the latest tree, the change I suggested would cover
> better although it's more radical as you pointed.

Got it, Thanks.

Liwei.


> 
> 
> thanks,
> 
> Takashi
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
  2019-04-30  9:29         ` Liwei Song
@ 2019-04-30 10:17           ` Takashi Iwai
  -1 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30 10:17 UTC (permalink / raw)
  To: Liwei Song
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel

On Tue, 30 Apr 2019 11:29:19 +0200,
Liwei Song wrote:
> 
> 
> 
> On 04/30/2019 04:53 PM, Takashi Iwai wrote:
> > On Tue, 30 Apr 2019 10:32:47 +0200,
> > Liwei Song wrote:
> >>
> >>
> >>
> >> On 04/30/2019 03:31 PM, Takashi Iwai wrote:
> >>> On Tue, 30 Apr 2019 08:10:53 +0200,
> >>> Song liwei wrote:
> >>>>
> >>>> From: Liwei Song <liwei.song@windriver.com>
> >>>>
> >>>> Fix the following BUG:
> >>>>
> >>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
> >>>> Workqueue: events azx_probe_work [snd_hda_intel]
> >>>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
> >>>> Call Trace:
> >>>>  <IRQ>
> >>>>  azx_interrupt+0x78/0x140 [snd_hda_codec]
> >>>>  __handle_irq_event_percpu+0x49/0x300
> >>>>  handle_irq_event_percpu+0x23/0x60
> >>>>  handle_irq_event+0x3c/0x60
> >>>>  handle_edge_irq+0xdb/0x180
> >>>>  handle_irq+0x23/0x30
> >>>>  do_IRQ+0x6a/0x140
> >>>>  common_interrupt+0xf/0xf
> >>>>
> >>>> The Call Trace happened when run kdump on a NFS rootfs system.
> >>>> Exist the following calling sequence when boot the second kernel:
> >>>>
> >>>> azx_first_init()
> >>>>    --> azx_acquire_irq()
> >>>>                       <-- interrupt come in, azx_interrupt() was called
> >>>>    --> hda_intel_init_chip()
> >>>>       --> azx_init_chip()
> >>>>          --> snd_hdac_bus_init_chip()
> >>>>               --> snd_hdac_bus_init_cmd_io();
> >>>>                     --> init rirb.buf and corb.buf
> >>>>
> >>>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
> >>>> initialized, then NULL pointer will be used when process the interrupt.
> >>>>
> >>>> Check the value of RIRB to ensure it is not NULL, to aviod some special
> >>>> case may hang the system.
> >>>>
> >>>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
> >>>> Signed-off-by: Liwei Song <liwei.song@windriver.com>
> >>>
> >>> Oh, that's indeed a race there.
> >>>
> >>> But I guess the check introduced by the patch is still error-prone.
> >>> Basically the interrupt handling should be moved after the chip
> >>> initialization.  I suppose that your platform uses the shared
> >>> interrupt, not the MSI?
> >>
> >> This is the information from /proc/interrupt
> >> 134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0
> > 
> > Hm, then it's interesting...
> > 
> > 
> >>> In anyway, alternative (and likely more certain) fix would be to move
> >>> the azx_acquir_irq() call like the patch below (note: totally
> >>> untested).  Could you check whether it works?
> >>
> >> Yes, It works.
> >>
> >> Considering a previous patch like the one you provide will import some issue, 
> >> so I choose check the invalid value to low the risk, but just as you mentioned,
> >> It is not a good solution.
> >>
> >> commit 542cedec53c9e8b73f3f05bf8468823598c50489
> >> Author: Yu Zhao <yuzhao@google.com>
> >> Date:   Tue Sep 11 15:12:46 2018 -0600
> >>
> >>     Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
> >>     
> >>     This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
> >>     
> >>     The patch doesn't fix accessing memory with null pointer in
> >>     skl_interrupt().
> >>     
> >>     There are two problems: 1) skl_init_chip() is called twice, before
> >>     and after dma buffer is allocate. The first call sets bus->chip_init
> >>     which prevents the second from initializing bus->corb.buf and
> >>     rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
> >>     interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
> >>     There is a small window which skl_interrupt() can be called if irq
> >>     has been acquired. If so, it crashes when using null dma buffer
> >>     pointers.
> > 
> > Actually this followed by another fix b61749a89f82,
> >     sound: enable interrupt after dma buffer initialization
> >     
> > and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io().
> > 
> > So I wonder how the irq gets triggered in your case.
> > If it were a shared irq, it's understandable.  But for MSI, it should
> > have been the isolated source.
> 
> I'm still working on how the irq was triggered,
> it is a little complex to reproduce it, first it must run with NFS rootfs,
> without NFS rootfs it can not reproduced.
> Then with kdump enabled, after "echo c > /proc/sysrq-trigger" crash the kernel,
> the kernel specified by kdump will boot, then interrupt will trigger
> soon after azx interrupt was register.

Ah, so it happens in a kdump kernel?  It implies that the interrupt
line may be still active (or confused).  Then it's no wonder a stale
interrupt comes up.

> > In anyway, for the latest tree, the change I suggested would cover
> > better although it's more radical as you pointed.
> 
> Got it, Thanks.

OK, I'm going to submit and apply the proper patch.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer
@ 2019-04-30 10:17           ` Takashi Iwai
  0 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30 10:17 UTC (permalink / raw)
  To: Liwei Song
  Cc: alsa-devel, Yu Zhao, Mark Brown, Keyon Jie, Jaroslav Kysela,
	linux-kernel

On Tue, 30 Apr 2019 11:29:19 +0200,
Liwei Song wrote:
> 
> 
> 
> On 04/30/2019 04:53 PM, Takashi Iwai wrote:
> > On Tue, 30 Apr 2019 10:32:47 +0200,
> > Liwei Song wrote:
> >>
> >>
> >>
> >> On 04/30/2019 03:31 PM, Takashi Iwai wrote:
> >>> On Tue, 30 Apr 2019 08:10:53 +0200,
> >>> Song liwei wrote:
> >>>>
> >>>> From: Liwei Song <liwei.song@windriver.com>
> >>>>
> >>>> Fix the following BUG:
> >>>>
> >>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
> >>>> Workqueue: events azx_probe_work [snd_hda_intel]
> >>>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core]
> >>>> Call Trace:
> >>>>  <IRQ>
> >>>>  azx_interrupt+0x78/0x140 [snd_hda_codec]
> >>>>  __handle_irq_event_percpu+0x49/0x300
> >>>>  handle_irq_event_percpu+0x23/0x60
> >>>>  handle_irq_event+0x3c/0x60
> >>>>  handle_edge_irq+0xdb/0x180
> >>>>  handle_irq+0x23/0x30
> >>>>  do_IRQ+0x6a/0x140
> >>>>  common_interrupt+0xf/0xf
> >>>>
> >>>> The Call Trace happened when run kdump on a NFS rootfs system.
> >>>> Exist the following calling sequence when boot the second kernel:
> >>>>
> >>>> azx_first_init()
> >>>>    --> azx_acquire_irq()
> >>>>                       <-- interrupt come in, azx_interrupt() was called
> >>>>    --> hda_intel_init_chip()
> >>>>       --> azx_init_chip()
> >>>>          --> snd_hdac_bus_init_chip()
> >>>>               --> snd_hdac_bus_init_cmd_io();
> >>>>                     --> init rirb.buf and corb.buf
> >>>>
> >>>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got
> >>>> initialized, then NULL pointer will be used when process the interrupt.
> >>>>
> >>>> Check the value of RIRB to ensure it is not NULL, to aviod some special
> >>>> case may hang the system.
> >>>>
> >>>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module")
> >>>> Signed-off-by: Liwei Song <liwei.song@windriver.com>
> >>>
> >>> Oh, that's indeed a race there.
> >>>
> >>> But I guess the check introduced by the patch is still error-prone.
> >>> Basically the interrupt handling should be moved after the chip
> >>> initialization.  I suppose that your platform uses the shared
> >>> interrupt, not the MSI?
> >>
> >> This is the information from /proc/interrupt
> >> 134:          0        102          0          0  IR-PCI-MSI 514048-edge      snd_hda_intel:card0
> > 
> > Hm, then it's interesting...
> > 
> > 
> >>> In anyway, alternative (and likely more certain) fix would be to move
> >>> the azx_acquir_irq() call like the patch below (note: totally
> >>> untested).  Could you check whether it works?
> >>
> >> Yes, It works.
> >>
> >> Considering a previous patch like the one you provide will import some issue, 
> >> so I choose check the invalid value to low the risk, but just as you mentioned,
> >> It is not a good solution.
> >>
> >> commit 542cedec53c9e8b73f3f05bf8468823598c50489
> >> Author: Yu Zhao <yuzhao@google.com>
> >> Date:   Tue Sep 11 15:12:46 2018 -0600
> >>
> >>     Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
> >>     
> >>     This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19.
> >>     
> >>     The patch doesn't fix accessing memory with null pointer in
> >>     skl_interrupt().
> >>     
> >>     There are two problems: 1) skl_init_chip() is called twice, before
> >>     and after dma buffer is allocate. The first call sets bus->chip_init
> >>     which prevents the second from initializing bus->corb.buf and
> >>     rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables
> >>     interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers.
> >>     There is a small window which skl_interrupt() can be called if irq
> >>     has been acquired. If so, it crashes when using null dma buffer
> >>     pointers.
> > 
> > Actually this followed by another fix b61749a89f82,
> >     sound: enable interrupt after dma buffer initialization
> >     
> > and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io().
> > 
> > So I wonder how the irq gets triggered in your case.
> > If it were a shared irq, it's understandable.  But for MSI, it should
> > have been the isolated source.
> 
> I'm still working on how the irq was triggered,
> it is a little complex to reproduce it, first it must run with NFS rootfs,
> without NFS rootfs it can not reproduced.
> Then with kdump enabled, after "echo c > /proc/sysrq-trigger" crash the kernel,
> the kernel specified by kdump will boot, then interrupt will trigger
> soon after azx interrupt was register.

Ah, so it happens in a kdump kernel?  It implies that the interrupt
line may be still active (or confused).  Then it's no wonder a stale
interrupt comes up.

> > In anyway, for the latest tree, the change I suggested would cover
> > better although it's more radical as you pointed.
> 
> Got it, Thanks.

OK, I'm going to submit and apply the proper patch.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-30 10:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-30  6:10 [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer Song liwei
2019-04-30  6:10 ` Song liwei
2019-04-30  7:31 ` Takashi Iwai
2019-04-30  7:31   ` Takashi Iwai
2019-04-30  8:32   ` Liwei Song
2019-04-30  8:32     ` Liwei Song
2019-04-30  8:53     ` Takashi Iwai
2019-04-30  8:53       ` Takashi Iwai
2019-04-30  9:29       ` Liwei Song
2019-04-30  9:29         ` Liwei Song
2019-04-30 10:17         ` Takashi Iwai
2019-04-30 10:17           ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.