All of lore.kernel.org
 help / color / mirror / Atom feed
* [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-09-25  5:20 ` Daniel J Blueman
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel J Blueman @ 2012-09-25  5:20 UTC (permalink / raw)
  To: Dave Airlie, Takashi Iwai; +Cc: Linux Kernel, alsa-devel

On my Macbook with a discrete Nvidia GPU, there is a race between
selecting the integrated GPU and putting the discrete GPU into D3 [1],
reliably causing a kernel oops [2].

Introducing a delay of ~1s between the calls prevents this. When the
second 'OFF' write path executes, it looks like struct azx at
card->private_data hasn't yet been allocated yet [3], so there is
likely some locking missing.

I'm happy to perform further testing and debug of course...

Thanks,
  Daniel

--- [1]

echo IGD > /sys/kernel/debug/vgaswitcheroo/switch
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

--- [2]

BUG: unable to handle kernel NULL pointer dereference at 0000000000000170
IP: [<ffffffffa01ba936>] azx_vs_set_state+0x26/0x178 [snd_hda_intel]
PGD 259c26067 PUD 25a0fd067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: snd_hda_codec_hdmi bnep rfcomm b43 joydev nfsd ssb
nfs_acl auth_rpcgss binfmt_misc nfs lockd sunrpc uvcvideo bcm5974
videobuf2_core videobuf2_vmalloc videobuf2_memops coretemp kvm_intel
snd_hda_codec_cirrus kvm applesmc input_polldev microcode bcma lpc_ich
mfd_core mei snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm
snd_timer snd snd_page_alloc nls_iso8859_1 apple_gmux mac_hid apple_bl
btrfs hid_apple sdhci_pci ghash_clmulni_intel tg3 sdhci i915 nouveau
ttm drm_kms_helper hwmon mxm_wmi video
CPU 2
Pid: 961, comm: sh Not tainted 3.6.0-rc7 #2 Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F
RIP: 0010:[<ffffffffa01ba936>] [<ffffffffa01ba936>]
azx_vs_set_state+0x26/0x178 [snd_hda_intel]
RSP: 0018:ffff880264271e48 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88025a2f5280 RCX: 0000000000000000
RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff880265479098
RBP: ffff880264271e68 R08: 2222222222222222 R09: 2222222222222222
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880265479098
R13: 0000000000000000 R14: ffff880264271f50 R15: 0000000000000000
FS: 00007fa4fe183700(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000170 CR3: 00000002641a7000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 961, threadinfo ffff880264270000, task ffff880264503a00)
Stack:
 2222222222222222 ffff88025a2f5280 0000000000000000 ffff880264271e98
 ffff880264271e88 ffffffff812e83a7 ffff8802622835c0 0000000000000004
 ffff880264271ef8 ffffffff812e89ac ffff88020a46464f ffff880264503a00
Call Trace:
 [<ffffffff812e83a7>] set_audio_state+0x67/0x70
 [<ffffffff812e89ac>] vga_switcheroo_debugfs_write+0xbc/0x380
 [<ffffffff81108773>] vfs_write+0xa3/0x160
 [<ffffffff81108a75>] sys_write+0x45/0xa0
 [<ffffffff815231a6>] system_call_fastpath+0x1a/0x1f
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 f0 4c 8d a7 98
00 00 00 4c 89 e7 48 89 5d e8 4c 89 6d f8 41 89 f5 e8 2a 35 13 e1 <48>
8b 98 70 01 00 00 0f b6 83 55 02 00 00 a8 08 75 34 45 85 ed
RIP [<ffffffffa01ba936>] azx_vs_set_state+0x26/0x178 [snd_hda_intel]
 RSP <ffff880264271e48>
CR2: 0000000000000170

--- [3]

(gdb) list *(azx_vs_set_state+0x26)
0x2936 is in azx_vs_set_state (sound/pci/hda/hda_intel.c:2505).
2500	
2501	static void azx_vs_set_state(struct pci_dev *pci,
2502				   enum vga_switcheroo_state state)
2503	{
2504		struct snd_card *card = pci_get_drvdata(pci);
2505		struct azx *chip = card->private_data;
2506		bool disabled;
2507	
2508		if (chip->init_failed)
2509			return;
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-09-25  5:20 ` Daniel J Blueman
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel J Blueman @ 2012-09-25  5:20 UTC (permalink / raw)
  To: Dave Airlie, Takashi Iwai; +Cc: alsa-devel, Linux Kernel

On my Macbook with a discrete Nvidia GPU, there is a race between
selecting the integrated GPU and putting the discrete GPU into D3 [1],
reliably causing a kernel oops [2].

Introducing a delay of ~1s between the calls prevents this. When the
second 'OFF' write path executes, it looks like struct azx at
card->private_data hasn't yet been allocated yet [3], so there is
likely some locking missing.

I'm happy to perform further testing and debug of course...

Thanks,
  Daniel

--- [1]

echo IGD > /sys/kernel/debug/vgaswitcheroo/switch
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

--- [2]

BUG: unable to handle kernel NULL pointer dereference at 0000000000000170
IP: [<ffffffffa01ba936>] azx_vs_set_state+0x26/0x178 [snd_hda_intel]
PGD 259c26067 PUD 25a0fd067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: snd_hda_codec_hdmi bnep rfcomm b43 joydev nfsd ssb
nfs_acl auth_rpcgss binfmt_misc nfs lockd sunrpc uvcvideo bcm5974
videobuf2_core videobuf2_vmalloc videobuf2_memops coretemp kvm_intel
snd_hda_codec_cirrus kvm applesmc input_polldev microcode bcma lpc_ich
mfd_core mei snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm
snd_timer snd snd_page_alloc nls_iso8859_1 apple_gmux mac_hid apple_bl
btrfs hid_apple sdhci_pci ghash_clmulni_intel tg3 sdhci i915 nouveau
ttm drm_kms_helper hwmon mxm_wmi video
CPU 2
Pid: 961, comm: sh Not tainted 3.6.0-rc7 #2 Apple Inc.
MacBookPro10,1/Mac-C3EC7CD22292981F
RIP: 0010:[<ffffffffa01ba936>] [<ffffffffa01ba936>]
azx_vs_set_state+0x26/0x178 [snd_hda_intel]
RSP: 0018:ffff880264271e48 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88025a2f5280 RCX: 0000000000000000
RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff880265479098
RBP: ffff880264271e68 R08: 2222222222222222 R09: 2222222222222222
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880265479098
R13: 0000000000000000 R14: ffff880264271f50 R15: 0000000000000000
FS: 00007fa4fe183700(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000170 CR3: 00000002641a7000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 961, threadinfo ffff880264270000, task ffff880264503a00)
Stack:
 2222222222222222 ffff88025a2f5280 0000000000000000 ffff880264271e98
 ffff880264271e88 ffffffff812e83a7 ffff8802622835c0 0000000000000004
 ffff880264271ef8 ffffffff812e89ac ffff88020a46464f ffff880264503a00
Call Trace:
 [<ffffffff812e83a7>] set_audio_state+0x67/0x70
 [<ffffffff812e89ac>] vga_switcheroo_debugfs_write+0xbc/0x380
 [<ffffffff81108773>] vfs_write+0xa3/0x160
 [<ffffffff81108a75>] sys_write+0x45/0xa0
 [<ffffffff815231a6>] system_call_fastpath+0x1a/0x1f
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 f0 4c 8d a7 98
00 00 00 4c 89 e7 48 89 5d e8 4c 89 6d f8 41 89 f5 e8 2a 35 13 e1 <48>
8b 98 70 01 00 00 0f b6 83 55 02 00 00 a8 08 75 34 45 85 ed
RIP [<ffffffffa01ba936>] azx_vs_set_state+0x26/0x178 [snd_hda_intel]
 RSP <ffff880264271e48>
CR2: 0000000000000170

--- [3]

(gdb) list *(azx_vs_set_state+0x26)
0x2936 is in azx_vs_set_state (sound/pci/hda/hda_intel.c:2505).
2500	
2501	static void azx_vs_set_state(struct pci_dev *pci,
2502				   enum vga_switcheroo_state state)
2503	{
2504		struct snd_card *card = pci_get_drvdata(pci);
2505		struct azx *chip = card->private_data;
2506		bool disabled;
2507	
2508		if (chip->init_failed)
2509			return;
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-09-25  5:20 ` Daniel J Blueman
@ 2012-10-08 12:58   ` Takashi Iwai
  -1 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-08 12:58 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, Linux Kernel, alsa-devel

Hi Daniel,

sorry for the late reply.  I'm just back from vacation.

At Tue, 25 Sep 2012 13:20:05 +0800,
Daniel J Blueman wrote:
> 
> On my Macbook with a discrete Nvidia GPU, there is a race between
> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> reliably causing a kernel oops [2].
> 
> Introducing a delay of ~1s between the calls prevents this. When the
> second 'OFF' write path executes, it looks like struct azx at
> card->private_data hasn't yet been allocated yet [3], so there is
> likely some locking missing.

It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
card->private_data causes Oops).  Could you check the patch like below
and see whether you get a kernel warning (but no Oops) or the problem
gets fixed by shifting the assignment of pci drvdata?


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..152f9e1 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2609,9 +2609,15 @@ static void azx_vs_set_state(struct pci_dev *pci,
 			     enum vga_switcheroo_state state)
 {
 	struct snd_card *card = pci_get_drvdata(pci);
-	struct azx *chip = card->private_data;
+	struct azx *chip;
 	bool disabled;
 
+	if (WARN_ON(!card))
+		return;
+
+	chip = card->private_data;
+	if (WARN_ON(!chip))
+		return;
 	if (chip->init_failed)
 		return;
 
@@ -3314,6 +3320,7 @@ static int __devinit azx_probe(struct pci_dev *pci,
 	}
 
 	snd_card_set_dev(card, &pci->dev);
+	pci_set_drvdata(pci, card);
 
 	err = azx_create(card, pci, dev, pci_id->driver_data, &chip);
 	if (err < 0)
@@ -3340,8 +3347,6 @@ static int __devinit azx_probe(struct pci_dev *pci,
 			goto out_free;
 	}
 
-	pci_set_drvdata(pci, card);
-
 	if (pci_dev_run_wake(pci))
 		pm_runtime_put_noidle(&pci->dev);
 
@@ -3350,6 +3355,7 @@ static int __devinit azx_probe(struct pci_dev *pci,
 
 out_free:
 	snd_card_free(card);
+	pci_set_drvdata(pci, NULL);
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-10-08 12:58   ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-08 12:58 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, alsa-devel, Linux Kernel

Hi Daniel,

sorry for the late reply.  I'm just back from vacation.

At Tue, 25 Sep 2012 13:20:05 +0800,
Daniel J Blueman wrote:
> 
> On my Macbook with a discrete Nvidia GPU, there is a race between
> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> reliably causing a kernel oops [2].
> 
> Introducing a delay of ~1s between the calls prevents this. When the
> second 'OFF' write path executes, it looks like struct azx at
> card->private_data hasn't yet been allocated yet [3], so there is
> likely some locking missing.

It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
card->private_data causes Oops).  Could you check the patch like below
and see whether you get a kernel warning (but no Oops) or the problem
gets fixed by shifting the assignment of pci drvdata?


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..152f9e1 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2609,9 +2609,15 @@ static void azx_vs_set_state(struct pci_dev *pci,
 			     enum vga_switcheroo_state state)
 {
 	struct snd_card *card = pci_get_drvdata(pci);
-	struct azx *chip = card->private_data;
+	struct azx *chip;
 	bool disabled;
 
+	if (WARN_ON(!card))
+		return;
+
+	chip = card->private_data;
+	if (WARN_ON(!chip))
+		return;
 	if (chip->init_failed)
 		return;
 
@@ -3314,6 +3320,7 @@ static int __devinit azx_probe(struct pci_dev *pci,
 	}
 
 	snd_card_set_dev(card, &pci->dev);
+	pci_set_drvdata(pci, card);
 
 	err = azx_create(card, pci, dev, pci_id->driver_data, &chip);
 	if (err < 0)
@@ -3340,8 +3347,6 @@ static int __devinit azx_probe(struct pci_dev *pci,
 			goto out_free;
 	}
 
-	pci_set_drvdata(pci, card);
-
 	if (pci_dev_run_wake(pci))
 		pm_runtime_put_noidle(&pci->dev);
 
@@ -3350,6 +3355,7 @@ static int __devinit azx_probe(struct pci_dev *pci,
 
 out_free:
 	snd_card_free(card);
+	pci_set_drvdata(pci, NULL);
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-08 12:58   ` Takashi Iwai
  (?)
@ 2012-10-08 16:34   ` Daniel J Blueman
  2012-10-09 10:04       ` Takashi Iwai
  -1 siblings, 1 reply; 17+ messages in thread
From: Daniel J Blueman @ 2012-10-08 16:34 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Dave Airlie, Linux Kernel, alsa-devel

On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> At Tue, 25 Sep 2012 13:20:05 +0800,
> Daniel J Blueman wrote:
>> On my Macbook with a discrete Nvidia GPU, there is a race between
>> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> reliably causing a kernel oops [2].
>>
>> Introducing a delay of ~1s between the calls prevents this. When the
>> second 'OFF' write path executes, it looks like struct azx at
>> card->private_data hasn't yet been allocated yet [3], so there is
>> likely some locking missing.
>
> It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> card->private_data causes Oops).  Could you check the patch like below
> and see whether you get a kernel warning (but no Oops) or the problem
> gets fixed by shifting the assignment of pci drvdata?
[...]

Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
though we see unexpected 0x0 responses in the response ring buffer
[1], which we don't see when there's a >~1.5s delay between IGD and
OFF.

Thanks,
  Daniel

--- [1]

snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
snd_hda_intel 0000:00:1b.0: irq 55 for MSI/MSI-X
vga_switcheroo: enabled
input: HDA Intel PCH Headphone as
/devices/pci0000:00/0000:00:1b.0/sound/card0/input11
snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
{echo IGD >/sys/kernel/debug/vgaswitcheroo/switch}
{echo OFF >/sys/kernel/debug/vgaswitcheroo/switch}
hda_intel: Disabling MSI
hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
hda-intel: Disabling 0000:01:00.1 via VGA-switcheroo
VGA switcheroo: switched nouveau off
[drm] nouveau 0000:01:00.0: Disabling display...
[drm] nouveau 0000:01:00.0: Disabling fbcon...
[drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)...
[drm] nouveau 0000:01:00.0: Evicting buffers...
[drm] nouveau 0000:01:00.0: Idling channels...
[drm] nouveau 0000:01:00.0: Suspending GPU objects...
[drm] nouveau 0000:01:00.0: And we're gone!
hda-intel: spurious response 0x0:0x0, last cmd=0x1f0004
{repeats 220 times}
hda-intel: spurious response 0x0:0x0, last cmd=0x1f0004
HDMI: failed to get afg sub nodes
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-08 16:34   ` Daniel J Blueman
@ 2012-10-09 10:04       ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-09 10:04 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, Linux Kernel, alsa-devel

At Tue, 9 Oct 2012 00:34:09 +0800,
Daniel J Blueman wrote:
> 
> On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 25 Sep 2012 13:20:05 +0800,
> > Daniel J Blueman wrote:
> >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> reliably causing a kernel oops [2].
> >>
> >> Introducing a delay of ~1s between the calls prevents this. When the
> >> second 'OFF' write path executes, it looks like struct azx at
> >> card->private_data hasn't yet been allocated yet [3], so there is
> >> likely some locking missing.
> >
> > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> > card->private_data causes Oops).  Could you check the patch like below
> > and see whether you get a kernel warning (but no Oops) or the problem
> > gets fixed by shifting the assignment of pci drvdata?
> [...]
> 
> Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> though we see unexpected 0x0 responses in the response ring buffer
> [1], which we don't see when there's a >~1.5s delay between IGD and
> OFF.

If the previous patch fixed, it means that the switching occurred
during the device was being probed.  Maybe a better approach to
register the VGA switcheroo after the proper initialization.

The patch below is a revised one.  Please give it a try.


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..dcbfd29 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -501,6 +501,7 @@ struct azx {
 
 	/* VGA-switcheroo setup */
 	unsigned int use_vga_switcheroo:1;
+	unsigned int vga_switcheroo_registered:1;
 	unsigned int init_failed:1; /* delayed init failed */
 	unsigned int disabled:1; /* disabled by VGA-switcher */
 
@@ -2684,14 +2685,20 @@ static const struct vga_switcheroo_client_ops azx_vs_ops = {
 
 static int __devinit register_vga_switcheroo(struct azx *chip)
 {
+	int err;
+
 	if (!chip->use_vga_switcheroo)
 		return 0;
 	/* FIXME: currently only handling DIS controller
 	 * is there any machine with two switchable HDMI audio controllers?
 	 */
-	return vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
+	err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
 						    VGA_SWITCHEROO_DIS,
 						    chip->bus != NULL);
+	if (err < 0)
+		return err;
+	chip->vga_switcheroo_registered = 1;
+	return 0;
 }
 #else
 #define init_vga_switcheroo(chip)		/* NOP */
@@ -2713,7 +2720,8 @@ static int azx_free(struct azx *chip)
 	if (use_vga_switcheroo(chip)) {
 		if (chip->disabled && chip->bus)
 			snd_hda_unlock_devices(chip->bus);
-		vga_switcheroo_unregister_client(chip->pci);
+		if (chip->vga_switcheroo_registered)
+			vga_switcheroo_unregister_client(chip->pci);
 	}
 
 	if (chip->initialized) {
@@ -3067,14 +3075,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
 	}
 
  ok:
-	err = register_vga_switcheroo(chip);
-	if (err < 0) {
-		snd_printk(KERN_ERR SFX
-			   "Error registering VGA-switcheroo client\n");
-		azx_free(chip);
-		return err;
-	}
-
 	err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
 	if (err < 0) {
 		snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3345,6 +3345,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
 	if (pci_dev_run_wake(pci))
 		pm_runtime_put_noidle(&pci->dev);
 
+	err = register_vga_switcheroo(chip);
+	if (err < 0) {
+		snd_printk(KERN_ERR SFX
+			   "Error registering VGA-switcheroo client\n");
+		goto out_free;
+	}
+
 	dev++;
 	return 0;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-10-09 10:04       ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-09 10:04 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, alsa-devel, Linux Kernel

At Tue, 9 Oct 2012 00:34:09 +0800,
Daniel J Blueman wrote:
> 
> On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 25 Sep 2012 13:20:05 +0800,
> > Daniel J Blueman wrote:
> >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> reliably causing a kernel oops [2].
> >>
> >> Introducing a delay of ~1s between the calls prevents this. When the
> >> second 'OFF' write path executes, it looks like struct azx at
> >> card->private_data hasn't yet been allocated yet [3], so there is
> >> likely some locking missing.
> >
> > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> > card->private_data causes Oops).  Could you check the patch like below
> > and see whether you get a kernel warning (but no Oops) or the problem
> > gets fixed by shifting the assignment of pci drvdata?
> [...]
> 
> Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> though we see unexpected 0x0 responses in the response ring buffer
> [1], which we don't see when there's a >~1.5s delay between IGD and
> OFF.

If the previous patch fixed, it means that the switching occurred
during the device was being probed.  Maybe a better approach to
register the VGA switcheroo after the proper initialization.

The patch below is a revised one.  Please give it a try.


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..dcbfd29 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -501,6 +501,7 @@ struct azx {
 
 	/* VGA-switcheroo setup */
 	unsigned int use_vga_switcheroo:1;
+	unsigned int vga_switcheroo_registered:1;
 	unsigned int init_failed:1; /* delayed init failed */
 	unsigned int disabled:1; /* disabled by VGA-switcher */
 
@@ -2684,14 +2685,20 @@ static const struct vga_switcheroo_client_ops azx_vs_ops = {
 
 static int __devinit register_vga_switcheroo(struct azx *chip)
 {
+	int err;
+
 	if (!chip->use_vga_switcheroo)
 		return 0;
 	/* FIXME: currently only handling DIS controller
 	 * is there any machine with two switchable HDMI audio controllers?
 	 */
-	return vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
+	err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
 						    VGA_SWITCHEROO_DIS,
 						    chip->bus != NULL);
+	if (err < 0)
+		return err;
+	chip->vga_switcheroo_registered = 1;
+	return 0;
 }
 #else
 #define init_vga_switcheroo(chip)		/* NOP */
@@ -2713,7 +2720,8 @@ static int azx_free(struct azx *chip)
 	if (use_vga_switcheroo(chip)) {
 		if (chip->disabled && chip->bus)
 			snd_hda_unlock_devices(chip->bus);
-		vga_switcheroo_unregister_client(chip->pci);
+		if (chip->vga_switcheroo_registered)
+			vga_switcheroo_unregister_client(chip->pci);
 	}
 
 	if (chip->initialized) {
@@ -3067,14 +3075,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
 	}
 
  ok:
-	err = register_vga_switcheroo(chip);
-	if (err < 0) {
-		snd_printk(KERN_ERR SFX
-			   "Error registering VGA-switcheroo client\n");
-		azx_free(chip);
-		return err;
-	}
-
 	err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
 	if (err < 0) {
 		snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3345,6 +3345,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
 	if (pci_dev_run_wake(pci))
 		pm_runtime_put_noidle(&pci->dev);
 
+	err = register_vga_switcheroo(chip);
+	if (err < 0) {
+		snd_printk(KERN_ERR SFX
+			   "Error registering VGA-switcheroo client\n");
+		goto out_free;
+	}
+
 	dev++;
 	return 0;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-09 10:04       ` Takashi Iwai
@ 2012-10-09 10:07         ` Takashi Iwai
  -1 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-09 10:07 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, Linux Kernel, alsa-devel

At Tue, 09 Oct 2012 12:04:08 +0200,
Takashi Iwai wrote:
> 
> At Tue, 9 Oct 2012 00:34:09 +0800,
> Daniel J Blueman wrote:
> > 
> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> > > Daniel J Blueman wrote:
> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> > >> reliably causing a kernel oops [2].
> > >>
> > >> Introducing a delay of ~1s between the calls prevents this. When the
> > >> second 'OFF' write path executes, it looks like struct azx at
> > >> card->private_data hasn't yet been allocated yet [3], so there is
> > >> likely some locking missing.
> > >
> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> > > card->private_data causes Oops).  Could you check the patch like below
> > > and see whether you get a kernel warning (but no Oops) or the problem
> > > gets fixed by shifting the assignment of pci drvdata?
> > [...]
> > 
> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> > though we see unexpected 0x0 responses in the response ring buffer
> > [1], which we don't see when there's a >~1.5s delay between IGD and
> > OFF.
> 
> If the previous patch fixed, it means that the switching occurred
> during the device was being probed.  Maybe a better approach to
> register the VGA switcheroo after the proper initialization.
> 
> The patch below is a revised one.  Please give it a try.

Also, it's not clear which card spews the spurious response.
Apply the patch below in addition.


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..9a0a29d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -829,8 +829,9 @@ static void azx_update_rirb(struct azx *chip)
 			smp_wmb();
 			chip->rirb.cmds[addr]--;
 		} else
-			snd_printk(KERN_ERR SFX "spurious response %#x:%#x, "
+			snd_printk(KERN_ERR SFX "%s: spurious response %#x:%#x, "
 				   "last cmd=%#08x\n",
+				   pci_name(chip->pci),
 				   res, res_ex,
 				   chip->last_cmd[addr]);
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-10-09 10:07         ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-09 10:07 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, alsa-devel, Linux Kernel

At Tue, 09 Oct 2012 12:04:08 +0200,
Takashi Iwai wrote:
> 
> At Tue, 9 Oct 2012 00:34:09 +0800,
> Daniel J Blueman wrote:
> > 
> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> > > Daniel J Blueman wrote:
> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> > >> reliably causing a kernel oops [2].
> > >>
> > >> Introducing a delay of ~1s between the calls prevents this. When the
> > >> second 'OFF' write path executes, it looks like struct azx at
> > >> card->private_data hasn't yet been allocated yet [3], so there is
> > >> likely some locking missing.
> > >
> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> > > card->private_data causes Oops).  Could you check the patch like below
> > > and see whether you get a kernel warning (but no Oops) or the problem
> > > gets fixed by shifting the assignment of pci drvdata?
> > [...]
> > 
> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> > though we see unexpected 0x0 responses in the response ring buffer
> > [1], which we don't see when there's a >~1.5s delay between IGD and
> > OFF.
> 
> If the previous patch fixed, it means that the switching occurred
> during the device was being probed.  Maybe a better approach to
> register the VGA switcheroo after the proper initialization.
> 
> The patch below is a revised one.  Please give it a try.

Also, it's not clear which card spews the spurious response.
Apply the patch below in addition.


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..9a0a29d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -829,8 +829,9 @@ static void azx_update_rirb(struct azx *chip)
 			smp_wmb();
 			chip->rirb.cmds[addr]--;
 		} else
-			snd_printk(KERN_ERR SFX "spurious response %#x:%#x, "
+			snd_printk(KERN_ERR SFX "%s: spurious response %#x:%#x, "
 				   "last cmd=%#08x\n",
+				   pci_name(chip->pci),
 				   res, res_ex,
 				   chip->last_cmd[addr]);
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-09 10:07         ` Takashi Iwai
  (?)
@ 2012-10-09 11:23         ` Daniel J Blueman
  2012-10-09 13:04             ` Takashi Iwai
  -1 siblings, 1 reply; 17+ messages in thread
From: Daniel J Blueman @ 2012-10-09 11:23 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Dave Airlie, Linux Kernel, alsa-devel

On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
> At Tue, 09 Oct 2012 12:04:08 +0200,
> Takashi Iwai wrote:
>>
>> At Tue, 9 Oct 2012 00:34:09 +0800,
>> Daniel J Blueman wrote:
>> >
>> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
>> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> > > Daniel J Blueman wrote:
>> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> > >> reliably causing a kernel oops [2].
>> > >>
>> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> > >> second 'OFF' write path executes, it looks like struct azx at
>> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> > >> likely some locking missing.
>> > >
>> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> > > card->private_data causes Oops).  Could you check the patch like below
>> > > and see whether you get a kernel warning (but no Oops) or the problem
>> > > gets fixed by shifting the assignment of pci drvdata?
>> > [...]
>> >
>> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> > though we see unexpected 0x0 responses in the response ring buffer
>> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> > OFF.
>>
>> If the previous patch fixed, it means that the switching occurred
>> during the device was being probed.  Maybe a better approach to
>> register the VGA switcheroo after the proper initialization.
>>
>> The patch below is a revised one.  Please give it a try.
>
> Also, it's not clear which card spews the spurious response.
> Apply the patch below in addition.
[...]

hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
$ lspci -s :1:0.1
01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)

It's the NVIDIA device which presumably hasn't completed it's
transition to D3 at the time the OFF is executed.

Thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-09 11:23         ` Daniel J Blueman
@ 2012-10-09 13:04             ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-09 13:04 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, Linux Kernel, alsa-devel

At Tue, 9 Oct 2012 19:23:56 +0800,
Daniel J Blueman wrote:
> 
> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 09 Oct 2012 12:04:08 +0200,
> > Takashi Iwai wrote:
> >>
> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> Daniel J Blueman wrote:
> >> >
> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> > > Daniel J Blueman wrote:
> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> > >> reliably causing a kernel oops [2].
> >> > >>
> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> > >> likely some locking missing.
> >> > >
> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> > > card->private_data causes Oops).  Could you check the patch like below
> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> > [...]
> >> >
> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> > OFF.
> >>
> >> If the previous patch fixed, it means that the switching occurred
> >> during the device was being probed.  Maybe a better approach to
> >> register the VGA switcheroo after the proper initialization.
> >>
> >> The patch below is a revised one.  Please give it a try.
> >
> > Also, it's not clear which card spews the spurious response.
> > Apply the patch below in addition.
> [...]
> 
> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> $ lspci -s :1:0.1
> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> 
> It's the NVIDIA device which presumably hasn't completed it's
> transition to D3 at the time the OFF is executed.

OK, then could you try the patch below on the top of previous two
patches?


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..676c64d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2639,13 +2640,13 @@ static void azx_vs_set_state(struct pci_dev *pci,
 			   disabled ? "Disabling" : "Enabling",
 			   pci_name(chip->pci));
 		if (disabled) {
+			snd_hda_lock_devices(chip->bus);
 			azx_suspend(&pci->dev);
 			chip->disabled = true;
-			snd_hda_lock_devices(chip->bus);
 		} else {
-			snd_hda_unlock_devices(chip->bus);
 			chip->disabled = false;
 			azx_resume(&pci->dev);
+			snd_hda_unlock_devices(chip->bus);
 		}
 	}
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-10-09 13:04             ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-09 13:04 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, alsa-devel, Linux Kernel

At Tue, 9 Oct 2012 19:23:56 +0800,
Daniel J Blueman wrote:
> 
> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 09 Oct 2012 12:04:08 +0200,
> > Takashi Iwai wrote:
> >>
> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> Daniel J Blueman wrote:
> >> >
> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> > > Daniel J Blueman wrote:
> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> > >> reliably causing a kernel oops [2].
> >> > >>
> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> > >> likely some locking missing.
> >> > >
> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> > > card->private_data causes Oops).  Could you check the patch like below
> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> > [...]
> >> >
> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> > OFF.
> >>
> >> If the previous patch fixed, it means that the switching occurred
> >> during the device was being probed.  Maybe a better approach to
> >> register the VGA switcheroo after the proper initialization.
> >>
> >> The patch below is a revised one.  Please give it a try.
> >
> > Also, it's not clear which card spews the spurious response.
> > Apply the patch below in addition.
> [...]
> 
> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> $ lspci -s :1:0.1
> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> 
> It's the NVIDIA device which presumably hasn't completed it's
> transition to D3 at the time the OFF is executed.

OK, then could you try the patch below on the top of previous two
patches?


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f09ff6c..676c64d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2639,13 +2640,13 @@ static void azx_vs_set_state(struct pci_dev *pci,
 			   disabled ? "Disabling" : "Enabling",
 			   pci_name(chip->pci));
 		if (disabled) {
+			snd_hda_lock_devices(chip->bus);
 			azx_suspend(&pci->dev);
 			chip->disabled = true;
-			snd_hda_lock_devices(chip->bus);
 		} else {
-			snd_hda_unlock_devices(chip->bus);
 			chip->disabled = false;
 			azx_resume(&pci->dev);
+			snd_hda_unlock_devices(chip->bus);
 		}
 	}
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-09 13:04             ` Takashi Iwai
  (?)
@ 2012-10-09 14:26             ` Daniel J Blueman
  2012-10-10 12:34               ` Takashi Iwai
  -1 siblings, 1 reply; 17+ messages in thread
From: Daniel J Blueman @ 2012-10-09 14:26 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Dave Airlie, Linux Kernel, alsa-devel

On 9 October 2012 21:04, Takashi Iwai <tiwai@suse.de> wrote:
> At Tue, 9 Oct 2012 19:23:56 +0800,
> Daniel J Blueman wrote:
>> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
>> > At Tue, 09 Oct 2012 12:04:08 +0200,
>> > Takashi Iwai wrote:
>> >> At Tue, 9 Oct 2012 00:34:09 +0800,
>> >> Daniel J Blueman wrote:
>> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
>> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> >> > > Daniel J Blueman wrote:
>> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> >> > >> reliably causing a kernel oops [2].
>> >> > >>
>> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> >> > >> second 'OFF' write path executes, it looks like struct azx at
>> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> >> > >> likely some locking missing.
>> >> > >
>> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> >> > > card->private_data causes Oops).  Could you check the patch like below
>> >> > > and see whether you get a kernel warning (but no Oops) or the problem
>> >> > > gets fixed by shifting the assignment of pci drvdata?
>> >> > [...]
>> >> >
>> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> >> > though we see unexpected 0x0 responses in the response ring buffer
>> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> >> > OFF.
>> >>
>> >> If the previous patch fixed, it means that the switching occurred
>> >> during the device was being probed.  Maybe a better approach to
>> >> register the VGA switcheroo after the proper initialization.
>> >>
>> >> The patch below is a revised one.  Please give it a try.
>> >
>> > Also, it's not clear which card spews the spurious response.
>> > Apply the patch below in addition.
>> [...]
>>
>> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
>> $ lspci -s :1:0.1
>> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
>>
>> It's the NVIDIA device which presumably hasn't completed it's
>> transition to D3 at the time the OFF is executed.
>
> OK, then could you try the patch below on the top of previous two
> patches?

The first IGD switcheroo command fails to switch to the integrated GPU:

# cat /sys/kernel/debug/vgaswitcheroo/switch
0:DIS:+:Pwr:0000:01:00.0
1:IGD: :Pwr:0000:00:02.0
2:DIS-Audio: :Pwr:0000:01:00.1
# echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
vga_switcheroo: client 1 refused switch

I also instrumented snd_hda_lock_devices, but none of the failure
paths are being taken, which would leave inconsistent state, as the
return value isn't checked.

Thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-09 14:26             ` Daniel J Blueman
@ 2012-10-10 12:34               ` Takashi Iwai
  2012-10-12 15:08                 ` Daniel J Blueman
  0 siblings, 1 reply; 17+ messages in thread
From: Takashi Iwai @ 2012-10-10 12:34 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, Linux Kernel, alsa-devel

At Tue, 9 Oct 2012 22:26:40 +0800,
Daniel J Blueman wrote:
> 
> On 9 October 2012 21:04, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 9 Oct 2012 19:23:56 +0800,
> > Daniel J Blueman wrote:
> >> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
> >> > Takashi Iwai wrote:
> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> >> Daniel J Blueman wrote:
> >> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> >> > > Daniel J Blueman wrote:
> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> >> > >> reliably causing a kernel oops [2].
> >> >> > >>
> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> >> > >> likely some locking missing.
> >> >> > >
> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> >> > > card->private_data causes Oops).  Could you check the patch like below
> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> >> > [...]
> >> >> >
> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> >> > OFF.
> >> >>
> >> >> If the previous patch fixed, it means that the switching occurred
> >> >> during the device was being probed.  Maybe a better approach to
> >> >> register the VGA switcheroo after the proper initialization.
> >> >>
> >> >> The patch below is a revised one.  Please give it a try.
> >> >
> >> > Also, it's not clear which card spews the spurious response.
> >> > Apply the patch below in addition.
> >> [...]
> >>
> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> >> $ lspci -s :1:0.1
> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> >>
> >> It's the NVIDIA device which presumably hasn't completed it's
> >> transition to D3 at the time the OFF is executed.
> >
> > OK, then could you try the patch below on the top of previous two
> > patches?
> 
> The first IGD switcheroo command fails to switch to the integrated GPU:
> 
> # cat /sys/kernel/debug/vgaswitcheroo/switch
> 0:DIS:+:Pwr:0000:01:00.0
> 1:IGD: :Pwr:0000:00:02.0
> 2:DIS-Audio: :Pwr:0000:01:00.1
> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
> vga_switcheroo: client 1 refused switch
> 
> I also instrumented snd_hda_lock_devices, but none of the failure
> paths are being taken, which would leave inconsistent state, as the
> return value isn't checked.

Hm, right, the return value of snd_hda_lock_devices() isn't checked,
but I don't understand how this results like above.
Basically switching is protected by mutex in vga_switcheroo.c, so the
whole operation in the client side should be serialized.

In anyway, try the patch below cleanly, and see the spurious message
error coming up at which timing.


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 6833835..4518b05 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -501,6 +501,7 @@ struct azx {
 
 	/* VGA-switcheroo setup */
 	unsigned int use_vga_switcheroo:1;
+	unsigned int vga_switcheroo_registered:1;
 	unsigned int init_failed:1; /* delayed init failed */
 	unsigned int disabled:1; /* disabled by VGA-switcher */
 
@@ -1597,6 +1598,8 @@ static int DELAYED_INIT_MARK azx_codec_create(struct azx *chip, const char *mode
 	int c, codecs, err;
 	int max_slots;
 
+	printk(KERN_DEBUG "XXX %s: azx_codec_create entered\n",
+	       pci_name(chip->pci));
 	memset(&bus_temp, 0, sizeof(bus_temp));
 	bus_temp.private_data = chip;
 	bus_temp.modelname = model;
@@ -1673,6 +1676,8 @@ static int DELAYED_INIT_MARK azx_codec_create(struct azx *chip, const char *mode
 		snd_printk(KERN_ERR SFX "no codecs initialized\n");
 		return -ENXIO;
 	}
+	printk(KERN_DEBUG "XXX %s: azx_codec_create done\n",
+	       pci_name(chip->pci));
 	return 0;
 }
 
@@ -2615,6 +2620,8 @@ static void azx_vs_set_state(struct pci_dev *pci,
 		return;
 
 	disabled = (state == VGA_SWITCHEROO_OFF);
+	printk(KERN_DEBUG "XXX %s: chip->disabled=%d, disabled=%d\n",
+	       pci_name(chip->pci), chip->disabled, disabled);
 	if (chip->disabled == disabled)
 		return;
 
@@ -2638,14 +2645,26 @@ static void azx_vs_set_state(struct pci_dev *pci,
 			   disabled ? "Disabling" : "Enabling",
 			   pci_name(chip->pci));
 		if (disabled) {
+			printk(KERN_DEBUG "XXX %s: azx_suspend...\n",
+			       pci_name(chip->pci));
 			azx_suspend(&pci->dev);
+			printk(KERN_DEBUG "XXX %s: azx_suspend done...\n",
+			       pci_name(chip->pci));
 			chip->disabled = true;
-			snd_hda_lock_devices(chip->bus);
+			if (snd_hda_lock_devices(chip->bus))
+				snd_printk(KERN_WARNING SFX
+					   "Cannot lock devices!\n");
 		} else {
 			snd_hda_unlock_devices(chip->bus);
 			chip->disabled = false;
+			printk(KERN_DEBUG "XXX %s: azx_resume...\n",
+			       pci_name(chip->pci));
 			azx_resume(&pci->dev);
+			printk(KERN_DEBUG "XXX %s: azx_resume done...\n",
+			       pci_name(chip->pci));
 		}
+		printk(KERN_DEBUG "XXX %s: switching finished: disabled=%d\n",
+		       pci_name(chip->pci), chip->disabled);
 	}
 }
 
@@ -2683,14 +2702,20 @@ static const struct vga_switcheroo_client_ops azx_vs_ops = {
 
 static int __devinit register_vga_switcheroo(struct azx *chip)
 {
+	int err;
+
 	if (!chip->use_vga_switcheroo)
 		return 0;
 	/* FIXME: currently only handling DIS controller
 	 * is there any machine with two switchable HDMI audio controllers?
 	 */
-	return vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
+	err = vga_switcheroo_register_audio_client(chip->pci, &azx_vs_ops,
 						    VGA_SWITCHEROO_DIS,
 						    chip->bus != NULL);
+	if (err < 0)
+		return err;
+	chip->vga_switcheroo_registered = 1;
+	return 0;
 }
 #else
 #define init_vga_switcheroo(chip)		/* NOP */
@@ -2712,7 +2737,8 @@ static int azx_free(struct azx *chip)
 	if (use_vga_switcheroo(chip)) {
 		if (chip->disabled && chip->bus)
 			snd_hda_unlock_devices(chip->bus);
-		vga_switcheroo_unregister_client(chip->pci);
+		if (chip->vga_switcheroo_registered)
+			vga_switcheroo_unregister_client(chip->pci);
 	}
 
 	if (chip->initialized) {
@@ -3062,14 +3088,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
 	}
 
  ok:
-	err = register_vga_switcheroo(chip);
-	if (err < 0) {
-		snd_printk(KERN_ERR SFX
-			   "Error registering VGA-switcheroo client\n");
-		azx_free(chip);
-		return err;
-	}
-
 	err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
 	if (err < 0) {
 		snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3340,6 +3358,13 @@ static int __devinit azx_probe(struct pci_dev *pci,
 	if (pci_dev_run_wake(pci))
 		pm_runtime_put_noidle(&pci->dev);
 
+	err = register_vga_switcheroo(chip);
+	if (err < 0) {
+		snd_printk(KERN_ERR SFX
+			   "Error registering VGA-switcheroo client\n");
+		goto out_free;
+	}
+
 	dev++;
 	return 0;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-10 12:34               ` Takashi Iwai
@ 2012-10-12 15:08                 ` Daniel J Blueman
  2012-10-12 15:26                     ` Takashi Iwai
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel J Blueman @ 2012-10-12 15:08 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Dave Airlie, Linux Kernel, alsa-devel

On 10 October 2012 20:34, Takashi Iwai <tiwai@suse.de> wrote:
> At Tue, 9 Oct 2012 22:26:40 +0800,
> Daniel J Blueman wrote:
>> On 9 October 2012 21:04, Takashi Iwai <tiwai@suse.de> wrote:
>> > At Tue, 9 Oct 2012 19:23:56 +0800,
>> > Daniel J Blueman wrote:
>> >> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
>> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
>> >> > Takashi Iwai wrote:
>> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
>> >> >> Daniel J Blueman wrote:
>> >> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
>> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> >> >> > > Daniel J Blueman wrote:
>> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> >> >> > >> reliably causing a kernel oops [2].
>> >> >> > >>
>> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
>> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> >> >> > >> likely some locking missing.
>> >> >> > >
>> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> >> >> > > card->private_data causes Oops).  Could you check the patch like below
>> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
>> >> >> > > gets fixed by shifting the assignment of pci drvdata?
>> >> >> > [...]
>> >> >> >
>> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> >> >> > though we see unexpected 0x0 responses in the response ring buffer
>> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> >> >> > OFF.
>> >> >>
>> >> >> If the previous patch fixed, it means that the switching occurred
>> >> >> during the device was being probed.  Maybe a better approach to
>> >> >> register the VGA switcheroo after the proper initialization.
>> >> >>
>> >> >> The patch below is a revised one.  Please give it a try.
>> >> >
>> >> > Also, it's not clear which card spews the spurious response.
>> >> > Apply the patch below in addition.
>> >> [...]
>> >>
>> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
>> >> $ lspci -s :1:0.1
>> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
>> >>
>> >> It's the NVIDIA device which presumably hasn't completed it's
>> >> transition to D3 at the time the OFF is executed.
>> >
>> > OK, then could you try the patch below on the top of previous two
>> > patches?
>>
>> The first IGD switcheroo command fails to switch to the integrated GPU:
>>
>> # cat /sys/kernel/debug/vgaswitcheroo/switch
>> 0:DIS:+:Pwr:0000:01:00.0
>> 1:IGD: :Pwr:0000:00:02.0
>> 2:DIS-Audio: :Pwr:0000:01:00.1
>> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
>> vga_switcheroo: client 1 refused switch
>>
>> I also instrumented snd_hda_lock_devices, but none of the failure
>> paths are being taken, which would leave inconsistent state, as the
>> return value isn't checked.
>
> Hm, right, the return value of snd_hda_lock_devices() isn't checked,
> but I don't understand how this results like above.
> Basically switching is protected by mutex in vga_switcheroo.c, so the
> whole operation in the client side should be serialized.
>
> In anyway, try the patch below cleanly, and see the spurious message
> error coming up at which timing.
[...]

The patch _does_ address the issue. A recent update to my Macbook
firmware misleadingly broke i915 switching, but since I can reproduce
the oops without the IGD switching completing with the stock kernel,
and consistently can't without [1], the patch is good.

Tested-by: Daniel J Blueman <daniel@quora.org>

Thanks Takashi!
  Daniel

--- [1]

snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
snd_hda_intel 0000:00:1b.0: irq 54 for MSI/MSI-X
XXX 0000:00:1b.0: azx_codec_create entered
vga_switcheroo: enabled
XXX 0000:00:1b.0: azx_codec_create done
input: HDA Intel PCH Headphone as
/devices/pci0000:00/0000:00:1b.0/sound/card0/input9
snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
hda_intel: Disabling MSI
hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
XXX 0000:01:00.1: azx_codec_create entered
XXX 0000:01:00.1: azx_codec_create done
input: HDA NVidia HDMI/DP,pcm=8 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
input: HDA NVidia HDMI/DP,pcm=7 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
input: HDA NVidia HDMI/DP,pcm=3 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
vga_switcheroo: client 1 refused switch
i915: switched off
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
  2012-10-12 15:08                 ` Daniel J Blueman
@ 2012-10-12 15:26                     ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-12 15:26 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, Linux Kernel, alsa-devel

At Fri, 12 Oct 2012 23:08:14 +0800,
Daniel J Blueman wrote:
> 
> On 10 October 2012 20:34, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 9 Oct 2012 22:26:40 +0800,
> > Daniel J Blueman wrote:
> >> On 9 October 2012 21:04, Takashi Iwai <tiwai@suse.de> wrote:
> >> > At Tue, 9 Oct 2012 19:23:56 +0800,
> >> > Daniel J Blueman wrote:
> >> >> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
> >> >> > Takashi Iwai wrote:
> >> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> >> >> Daniel J Blueman wrote:
> >> >> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> >> >> > > Daniel J Blueman wrote:
> >> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> >> >> > >> reliably causing a kernel oops [2].
> >> >> >> > >>
> >> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> >> >> > >> likely some locking missing.
> >> >> >> > >
> >> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> >> >> > > card->private_data causes Oops).  Could you check the patch like below
> >> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> >> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> >> >> > [...]
> >> >> >> >
> >> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> >> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> >> >> > OFF.
> >> >> >>
> >> >> >> If the previous patch fixed, it means that the switching occurred
> >> >> >> during the device was being probed.  Maybe a better approach to
> >> >> >> register the VGA switcheroo after the proper initialization.
> >> >> >>
> >> >> >> The patch below is a revised one.  Please give it a try.
> >> >> >
> >> >> > Also, it's not clear which card spews the spurious response.
> >> >> > Apply the patch below in addition.
> >> >> [...]
> >> >>
> >> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> >> >> $ lspci -s :1:0.1
> >> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> >> >>
> >> >> It's the NVIDIA device which presumably hasn't completed it's
> >> >> transition to D3 at the time the OFF is executed.
> >> >
> >> > OK, then could you try the patch below on the top of previous two
> >> > patches?
> >>
> >> The first IGD switcheroo command fails to switch to the integrated GPU:
> >>
> >> # cat /sys/kernel/debug/vgaswitcheroo/switch
> >> 0:DIS:+:Pwr:0000:01:00.0
> >> 1:IGD: :Pwr:0000:00:02.0
> >> 2:DIS-Audio: :Pwr:0000:01:00.1
> >> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
> >> vga_switcheroo: client 1 refused switch
> >>
> >> I also instrumented snd_hda_lock_devices, but none of the failure
> >> paths are being taken, which would leave inconsistent state, as the
> >> return value isn't checked.
> >
> > Hm, right, the return value of snd_hda_lock_devices() isn't checked,
> > but I don't understand how this results like above.
> > Basically switching is protected by mutex in vga_switcheroo.c, so the
> > whole operation in the client side should be serialized.
> >
> > In anyway, try the patch below cleanly, and see the spurious message
> > error coming up at which timing.
> [...]
> 
> The patch _does_ address the issue. A recent update to my Macbook
> firmware misleadingly broke i915 switching, but since I can reproduce
> the oops without the IGD switching completing with the stock kernel,
> and consistently can't without [1], the patch is good.
> 
> Tested-by: Daniel J Blueman <daniel@quora.org>

OK, then I'm going to apply the patch to 3.7 tree (with Cc to
stable).


thanks,

Takashi


> 
> Thanks Takashi!
>   Daniel
> 
> --- [1]
> 
> snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
> snd_hda_intel 0000:00:1b.0: irq 54 for MSI/MSI-X
> XXX 0000:00:1b.0: azx_codec_create entered
> vga_switcheroo: enabled
> XXX 0000:00:1b.0: azx_codec_create done
> input: HDA Intel PCH Headphone as
> /devices/pci0000:00/0000:00:1b.0/sound/card0/input9
> snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
> hda_intel: Disabling MSI
> hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
> XXX 0000:01:00.1: azx_codec_create entered
> XXX 0000:01:00.1: azx_codec_create done
> input: HDA NVidia HDMI/DP,pcm=8 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
> input: HDA NVidia HDMI/DP,pcm=7 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
> input: HDA NVidia HDMI/DP,pcm=3 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
> vga_switcheroo: client 1 refused switch
> i915: switched off
> -- 
> Daniel J Blueman
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [3.6-rc7] switcheroo race with Intel HDA...
@ 2012-10-12 15:26                     ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2012-10-12 15:26 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Dave Airlie, alsa-devel, Linux Kernel

At Fri, 12 Oct 2012 23:08:14 +0800,
Daniel J Blueman wrote:
> 
> On 10 October 2012 20:34, Takashi Iwai <tiwai@suse.de> wrote:
> > At Tue, 9 Oct 2012 22:26:40 +0800,
> > Daniel J Blueman wrote:
> >> On 9 October 2012 21:04, Takashi Iwai <tiwai@suse.de> wrote:
> >> > At Tue, 9 Oct 2012 19:23:56 +0800,
> >> > Daniel J Blueman wrote:
> >> >> On 9 October 2012 18:07, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
> >> >> > Takashi Iwai wrote:
> >> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
> >> >> >> Daniel J Blueman wrote:
> >> >> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@suse.de> wrote:
> >> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
> >> >> >> > > Daniel J Blueman wrote:
> >> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
> >> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
> >> >> >> > >> reliably causing a kernel oops [2].
> >> >> >> > >>
> >> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
> >> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
> >> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
> >> >> >> > >> likely some locking missing.
> >> >> >> > >
> >> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
> >> >> >> > > card->private_data causes Oops).  Could you check the patch like below
> >> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
> >> >> >> > > gets fixed by shifting the assignment of pci drvdata?
> >> >> >> > [...]
> >> >> >> >
> >> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
> >> >> >> > though we see unexpected 0x0 responses in the response ring buffer
> >> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
> >> >> >> > OFF.
> >> >> >>
> >> >> >> If the previous patch fixed, it means that the switching occurred
> >> >> >> during the device was being probed.  Maybe a better approach to
> >> >> >> register the VGA switcheroo after the proper initialization.
> >> >> >>
> >> >> >> The patch below is a revised one.  Please give it a try.
> >> >> >
> >> >> > Also, it's not clear which card spews the spurious response.
> >> >> > Apply the patch below in addition.
> >> >> [...]
> >> >>
> >> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
> >> >> $ lspci -s :1:0.1
> >> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
> >> >>
> >> >> It's the NVIDIA device which presumably hasn't completed it's
> >> >> transition to D3 at the time the OFF is executed.
> >> >
> >> > OK, then could you try the patch below on the top of previous two
> >> > patches?
> >>
> >> The first IGD switcheroo command fails to switch to the integrated GPU:
> >>
> >> # cat /sys/kernel/debug/vgaswitcheroo/switch
> >> 0:DIS:+:Pwr:0000:01:00.0
> >> 1:IGD: :Pwr:0000:00:02.0
> >> 2:DIS-Audio: :Pwr:0000:01:00.1
> >> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
> >> vga_switcheroo: client 1 refused switch
> >>
> >> I also instrumented snd_hda_lock_devices, but none of the failure
> >> paths are being taken, which would leave inconsistent state, as the
> >> return value isn't checked.
> >
> > Hm, right, the return value of snd_hda_lock_devices() isn't checked,
> > but I don't understand how this results like above.
> > Basically switching is protected by mutex in vga_switcheroo.c, so the
> > whole operation in the client side should be serialized.
> >
> > In anyway, try the patch below cleanly, and see the spurious message
> > error coming up at which timing.
> [...]
> 
> The patch _does_ address the issue. A recent update to my Macbook
> firmware misleadingly broke i915 switching, but since I can reproduce
> the oops without the IGD switching completing with the stock kernel,
> and consistently can't without [1], the patch is good.
> 
> Tested-by: Daniel J Blueman <daniel@quora.org>

OK, then I'm going to apply the patch to 3.7 tree (with Cc to
stable).


thanks,

Takashi


> 
> Thanks Takashi!
>   Daniel
> 
> --- [1]
> 
> snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
> snd_hda_intel 0000:00:1b.0: irq 54 for MSI/MSI-X
> XXX 0000:00:1b.0: azx_codec_create entered
> vga_switcheroo: enabled
> XXX 0000:00:1b.0: azx_codec_create done
> input: HDA Intel PCH Headphone as
> /devices/pci0000:00/0000:00:1b.0/sound/card0/input9
> snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
> hda_intel: Disabling MSI
> hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
> XXX 0000:01:00.1: azx_codec_create entered
> XXX 0000:01:00.1: azx_codec_create done
> input: HDA NVidia HDMI/DP,pcm=8 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
> input: HDA NVidia HDMI/DP,pcm=7 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
> input: HDA NVidia HDMI/DP,pcm=3 as
> /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
> vga_switcheroo: client 1 refused switch
> i915: switched off
> -- 
> Daniel J Blueman
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-10-12 15:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-25  5:20 [3.6-rc7] switcheroo race with Intel HDA Daniel J Blueman
2012-09-25  5:20 ` Daniel J Blueman
2012-10-08 12:58 ` Takashi Iwai
2012-10-08 12:58   ` Takashi Iwai
2012-10-08 16:34   ` Daniel J Blueman
2012-10-09 10:04     ` Takashi Iwai
2012-10-09 10:04       ` Takashi Iwai
2012-10-09 10:07       ` Takashi Iwai
2012-10-09 10:07         ` Takashi Iwai
2012-10-09 11:23         ` Daniel J Blueman
2012-10-09 13:04           ` Takashi Iwai
2012-10-09 13:04             ` Takashi Iwai
2012-10-09 14:26             ` Daniel J Blueman
2012-10-10 12:34               ` Takashi Iwai
2012-10-12 15:08                 ` Daniel J Blueman
2012-10-12 15:26                   ` Takashi Iwai
2012-10-12 15:26                     ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.