From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753803Ab2LDNzz (ORCPT ); Tue, 4 Dec 2012 08:55:55 -0500 Received: from cantor2.suse.de ([195.135.220.15]:52735 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753664Ab2LDNzx (ORCPT ); Tue, 4 Dec 2012 08:55:53 -0500 Date: Tue, 04 Dec 2012 14:55:52 +0100 Message-ID: From: Takashi Iwai To: Daniel J Blueman Cc: Seth Forshee , Dave Airlie , Linux Kernel Subject: Re: switcheroo registration vs switching race... In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/24.2 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org At Tue, 04 Dec 2012 14:23:05 +0100, Takashi Iwai wrote: > > At Tue, 4 Dec 2012 20:58:55 +0800, > Daniel J Blueman wrote: > > > > On 4 December 2012 01:10, Takashi Iwai wrote: > > > At Tue, 4 Dec 2012 00:50:56 +0800, > > > Daniel J Blueman wrote: > > >> > > >> On 4 December 2012 00:23, Takashi Iwai wrote: > > >> > At Mon, 3 Dec 2012 23:08:28 +0800, > > >> > Daniel J Blueman wrote: > > >> >> > > >> >> On 3 December 2012 22:40, Takashi Iwai wrote: > > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, > > >> >> > Daniel J Blueman wrote: > > >> >> >> > > >> >> >> On 3 December 2012 19:17, Takashi Iwai wrote: > > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > > >> >> >> > Takashi Iwai wrote: > > >> >> >> >> > > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > > >> >> >> >> Daniel J Blueman wrote: > > >> >> >> >> > > > >> >> >> >> > Hi Seth, Dave, Takashi, > > >> >> >> >> > > > >> >> >> >> > If I power down the unused discrete GPU before lightdm starts by > > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race > > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to > > >> >> >> >> > commands [2]. > > >> >> >> >> > > > >> >> >> >> > Adding some debug, I see that the registered audio devices are put > > >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete (and > > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, so the > > >> >> >> >> > list is empty. The symptom is since the HDA driver it's talking to > > >> >> >> >> > hardware which is now in D3. > > >> >> >> >> > > > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA > > >> >> >> >> > controller, but perhaps this should be solved at a higher level in the > > >> >> >> >> > vgaswitcheroo code; what do you think? > > >> >> >> >> > > >> >> >> >> Maybe it's a side effect for the recent effort to fix another race in > > >> >> >> >> the probe. A part of them problem is that the registration is done at > > >> >> >> >> the very last of probing. > > >> >> >> >> > > >> >> >> >> Instead of delaying the registration, how about the patch below? > > >> >> >> > > > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 merge, at > > >> >> >> > least... > > >> >> >> > > >> >> >> Ping ack; I was trying to find time to understand another race that > > >> >> >> occurs with GPU probing after switching, but is separate from the > > >> >> >> situation before switching, here. > > >> >> >> > > >> >> >> In the context of writing the switch, it looks like struct azx isn't > > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with > > >> >> >> azx_codec_create? > > >> >> > > > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata. > > >> >> > > > >> >> > Below is the revised patch. Just moved pci_set_drvdata() before > > >> >> > register_vga_switcheroo(). Could you retest with it? > > >> >> > > >> >> Superb; this addresses the oops. > > >> > > > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > > >> > > > >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel: > > >> >> spurious response 0x0:0x0, last cmd=0x170500": > > >> >> http://quora.org/2012/hda-switch-spurious.txt > > >> > > > >> > Hm, it's not clear who triggers these messages. I'll try to check the > > >> > code paths. > > >> > > > >> >> Presumably this implies the read of the ring-buffer pointer returned > > >> >> 0xffffffff, so the HDA driver understands the pointer to have wrapped > > >> >> and processes the 191 unwritten entries? > > >> > > > >> > Good point. Actually there is one bug that looks obviously wrong > > >> > (writing 32bit value to CORBWP). Maybe it has been working just > > >> > because writing CORBRP doesn't influence except for the reset bit. > > >> > > > >> > Reading CORBWP as a byte is OK, but this could be better in a word so > > >> > that we can check 0xffff as invalid. > > >> > > > >> > A test patch is below. Hopefully this improves the situation... > > >> > > >> I'll check this out tomorrow and also instrument the code to get a > > >> backtrace, since there may still be an underlying race with the > > >> previous patches: > > > > [...] > > > > > That's odd. The Cirrus one (0000:00:1b.0) must be independent from > > > the vga switcheroo things for Nvidia... > > > > > > The patch below is the revised patch of the first one. Now the vga > > > switcheroo registration is done before the video controller D3 check, > > > so the race should be smaller. > > > > This patch improves things further of course; a major improvement over > > without. ~15% of the time, I still get the 'spurious response' > > messages in this callpath: > > A possible scenario is that the graphics went in D3 before probing > hd-audio, and the hd-audio continues to initialize the hardware > without knowing the graphics counterpart is disabled. > > There is a code check_hdmi_disabled() in hda_intel.c that checks the > availability of the video driver, and it might be that this doesn't > work as expected... I think I understand this path. You are setting "OFF", right? This will set the audio client OFF before can_switch() is called. Thus it can be called even before the probing process finished. In short, wait_for_completion() must be put at the beginning of azx_vs_set_state() in addition to azx_vs_can_switch(). The revised patch is attached below. Hopefully this sorts out all races. Takashi --- diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4bb52da..22ecadc 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -49,6 +49,7 @@ #include #include #include +#include #ifdef CONFIG_X86 /* for snoop control */ @@ -469,6 +470,7 @@ struct azx { /* locks */ spinlock_t reg_lock; struct mutex open_mutex; + struct completion probe_wait; /* streams (x num_streams) */ struct azx_dev *azx_dev; @@ -2745,6 +2747,7 @@ static void azx_vs_set_state(struct pci_dev *pci, struct azx *chip = card->private_data; bool disabled; + wait_for_completion(&chip->probe_wait); if (chip->init_failed) return; @@ -2790,6 +2793,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci) struct snd_card *card = pci_get_drvdata(pci); struct azx *chip = card->private_data; + wait_for_completion(&chip->probe_wait); if (chip->init_failed) return false; if (chip->disabled || !chip->bus) @@ -2851,6 +2855,9 @@ static int azx_free(struct azx *chip) azx_notifier_unregister(chip); + chip->init_failed = 1; /* to be sure */ + complete(&chip->probe_wait); + if (use_vga_switcheroo(chip)) { if (chip->disabled && chip->bus) snd_hda_unlock_devices(chip->bus); @@ -3156,6 +3163,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, INIT_LIST_HEAD(&chip->pcm_list); INIT_LIST_HEAD(&chip->list); init_vga_switcheroo(chip); + init_completion(&chip->probe_wait); chip->position_fix[0] = chip->position_fix[1] = check_position_fix(chip, position_fix[dev]); @@ -3183,26 +3191,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci, } } - if (check_hdmi_disabled(pci)) { - snd_printk(KERN_INFO SFX "VGA controller for %s is disabled\n", - pci_name(pci)); - if (use_vga_switcheroo(chip)) { - snd_printk(KERN_INFO SFX "Delaying initialization\n"); - chip->disabled = true; - goto ok; - } - kfree(chip); - pci_disable_device(pci); - return -ENXIO; - } - - err = azx_first_init(chip); - if (err < 0) { - azx_free(chip); - return err; - } - - ok: err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops); if (err < 0) { snd_printk(KERN_ERR SFX "Error creating device [card]!\n"); @@ -3447,7 +3435,29 @@ static int __devinit azx_probe(struct pci_dev *pci, if (err < 0) goto out_free; card->private_data = chip; + + pci_set_drvdata(pci, card); + + err = register_vga_switcheroo(chip); + if (err < 0) { + snd_printk(KERN_ERR SFX + "Error registering VGA-switcheroo client\n"); + goto out_free; + } + + if (check_hdmi_disabled(pci)) { + snd_printk(KERN_INFO SFX "VGA controller for %s is disabled\n", + pci_name(pci)); + snd_printk(KERN_INFO SFX "Delaying initialization\n"); + chip->disabled = true; + } + probe_now = !chip->disabled; + if (probe_now) { + err = azx_first_init(chip); + if (err < 0) + goto out_free; + } #ifdef CONFIG_SND_HDA_PATCH_LOADER if (patch[dev] && *patch[dev]) { @@ -3468,23 +3478,16 @@ static int __devinit azx_probe(struct pci_dev *pci, goto out_free; } - pci_set_drvdata(pci, card); - if (pci_dev_run_wake(pci)) pm_runtime_put_noidle(&pci->dev); - err = register_vga_switcheroo(chip); - if (err < 0) { - snd_printk(KERN_ERR SFX - "Error registering VGA-switcheroo client\n"); - goto out_free; - } - dev++; + complete(&chip->probe_wait); return 0; out_free: snd_card_free(card); + pci_set_drvdata(pci, NULL); return err; }