All of lore.kernel.org
 help / color / mirror / Atom feed
* Panda ES board hang when using GPIO as interrupt
@ 2012-06-25 20:52 ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-25 20:52 UTC (permalink / raw)
  To: khilman, tarun.kanti
  Cc: tony, santosh.shilimkar, b-cousson, grant.likely, linux-omap,
	linux-arm-kernel, linux-wireless

Hi Kevin, Tarun,

We are using the expansion connector A on Panda board to mount a SDIO 
WiFi dongle on MMC2 with a level triggered interrupt signal connected to 
GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly 
within 5 mins during a network traffic test. After bisecting we found 
the culprit is "[PATCH 8/8] gpio/omap: fix missing check in 
*_runtime_suspend()" [1].

I noticed Kevin raised some similar cases on other platforms and also 
provided two patches in the patch mail thread. But unfortunately those 
two patches doesn't help in our case. I tested the driver with 3.5-rc3 
mainline kernel and the issue is still there. I can only "fix" the hang 
by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the 
hang only happens on Panda ES board. Old Panda with 4430 works good.

Any thoughts and suggestions?

Thanks,
Franky

[1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-25 20:52 ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-25 20:52 UTC (permalink / raw)
  To: khilman-l0cyMroinI0, tarun.kanti-l0cyMroinI0
  Cc: tony-4v6yS6AI5VpBDgjK7y7TUQ, santosh.shilimkar-l0cyMroinI0,
	b-cousson-l0cyMroinI0, grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA

Hi Kevin, Tarun,

We are using the expansion connector A on Panda board to mount a SDIO 
WiFi dongle on MMC2 with a level triggered interrupt signal connected to 
GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly 
within 5 mins during a network traffic test. After bisecting we found 
the culprit is "[PATCH 8/8] gpio/omap: fix missing check in 
*_runtime_suspend()" [1].

I noticed Kevin raised some similar cases on other platforms and also 
provided two patches in the patch mail thread. But unfortunately those 
two patches doesn't help in our case. I tested the driver with 3.5-rc3 
mainline kernel and the issue is still there. I can only "fix" the hang 
by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the 
hang only happens on Panda ES board. Old Panda with 4430 works good.

Any thoughts and suggestions?

Thanks,
Franky

[1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-25 20:52 ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-25 20:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Kevin, Tarun,

We are using the expansion connector A on Panda board to mount a SDIO 
WiFi dongle on MMC2 with a level triggered interrupt signal connected to 
GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly 
within 5 mins during a network traffic test. After bisecting we found 
the culprit is "[PATCH 8/8] gpio/omap: fix missing check in 
*_runtime_suspend()" [1].

I noticed Kevin raised some similar cases on other platforms and also 
provided two patches in the patch mail thread. But unfortunately those 
two patches doesn't help in our case. I tested the driver with 3.5-rc3 
mainline kernel and the issue is still there. I can only "fix" the hang 
by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the 
hang only happens on Panda ES board. Old Panda with 4430 works good.

Any thoughts and suggestions?

Thanks,
Franky

[1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-25 20:52 ` Franky Lin
@ 2012-06-26  7:21   ` DebBarma, Tarun Kanti
  -1 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-26  7:21 UTC (permalink / raw)
  To: Franky Lin
  Cc: khilman, tony, santosh.shilimkar, b-cousson, grant.likely,
	linux-omap, linux-arm-kernel, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 2912 bytes --]

On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl@broadcom.com> wrote:
> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO WiFi
> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
> mins during a network traffic test. After bisecting we found the culprit is
> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>
> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those two
> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
> kernel and the issue is still there. I can only "fix" the hang by either
> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
> happens on Panda ES board. Old Panda with 4430 works good.
>
> Any thoughts and suggestions?
I just had a quick look at the code. Can you please check if the
attached patch solves
the issue? I just boot tested on Panda and Blaze.
--
Tarun

>From 0e1b322451b7a49487d2d17a147db1aa1d1119fa Mon Sep 17 00:00:00 2001
From: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Date: Tue, 26 Jun 2012 12:13:47 +0530
Subject: [PATCH] gpio/omap: enabled_non_wakeup_gpios check skips
bank->saved_datain

Commit b3c64bc30af67ed328a8d919e41160942b870451
(gpio/omap: (re)fix wakeups on level-triggered GPIOs)
still skips update of bank->saved_datain in *_runtime_suspend()
which must be done irrespective of edge/level trigger types.
Therefore, move the enbaled_non_wakeup_gpios check after the
bank->saved_datain is updated.

Signed-off-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
---
 drivers/gpio/gpio-omap.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..94ecdcf 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
                __raw_writel(wake_hi | bank->context.risingdetect,
                             bank->base + bank->regs->risingdetect);

-       if (!bank->enabled_non_wakeup_gpios)
-               goto update_gpio_context_count;
-
        if (bank->power_mode != OFF_MODE) {
                bank->power_mode = 0;
                goto update_gpio_context_count;
@@ -1191,6 +1188,10 @@ static int omap_gpio_runtime_suspend(struct device *dev)
         */
        bank->saved_datain = __raw_readl(bank->base +
                                                bank->regs->datain);
+
+       if (!bank->enabled_non_wakeup_gpios)
+               goto update_gpio_context_count;
+
        l1 = bank->context.fallingdetect;
        l2 = bank->context.risingdetect;

-- 
1.7.0.4



>
> Thanks,
> Franky
>
> [1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/
>

[-- Attachment #2: 0001-gpio-omap-enabled_non_wakeup_gpios-check-skips-bank-.patch --]
[-- Type: text/x-patch, Size: 1556 bytes --]

From 0e1b322451b7a49487d2d17a147db1aa1d1119fa Mon Sep 17 00:00:00 2001
From: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Date: Tue, 26 Jun 2012 12:13:47 +0530
Subject: [PATCH] gpio/omap: enabled_non_wakeup_gpios check skips bank->saved_datain

Commit b3c64bc30af67ed328a8d919e41160942b870451
(gpio/omap: (re)fix wakeups on level-triggered GPIOs)
still skips update of bank->saved_datain in *_runtime_suspend()
which must be done irrespective of edge/level trigger types.
Therefore, move the enbaled_non_wakeup_gpios check after the
bank->saved_datain is updated.

Signed-off-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
---
 drivers/gpio/gpio-omap.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..94ecdcf 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
 		__raw_writel(wake_hi | bank->context.risingdetect,
 			     bank->base + bank->regs->risingdetect);
 
-	if (!bank->enabled_non_wakeup_gpios)
-		goto update_gpio_context_count;
-
 	if (bank->power_mode != OFF_MODE) {
 		bank->power_mode = 0;
 		goto update_gpio_context_count;
@@ -1191,6 +1188,10 @@ static int omap_gpio_runtime_suspend(struct device *dev)
 	 */
 	bank->saved_datain = __raw_readl(bank->base +
 						bank->regs->datain);
+
+	if (!bank->enabled_non_wakeup_gpios)
+		goto update_gpio_context_count;
+
 	l1 = bank->context.fallingdetect;
 	l2 = bank->context.risingdetect;
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-26  7:21   ` DebBarma, Tarun Kanti
  0 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-26  7:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl@broadcom.com> wrote:
> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO WiFi
> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
> mins during a network traffic test. After bisecting we found the culprit is
> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>
> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those two
> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
> kernel and the issue is still there. I can only "fix" the hang by either
> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
> happens on Panda ES board. Old Panda with 4430 works good.
>
> Any thoughts and suggestions?
I just had a quick look at the code. Can you please check if the
attached patch solves
the issue? I just boot tested on Panda and Blaze.
--
Tarun

>From 0e1b322451b7a49487d2d17a147db1aa1d1119fa Mon Sep 17 00:00:00 2001
From: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Date: Tue, 26 Jun 2012 12:13:47 +0530
Subject: [PATCH] gpio/omap: enabled_non_wakeup_gpios check skips
bank->saved_datain

Commit b3c64bc30af67ed328a8d919e41160942b870451
(gpio/omap: (re)fix wakeups on level-triggered GPIOs)
still skips update of bank->saved_datain in *_runtime_suspend()
which must be done irrespective of edge/level trigger types.
Therefore, move the enbaled_non_wakeup_gpios check after the
bank->saved_datain is updated.

Signed-off-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
---
 drivers/gpio/gpio-omap.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..94ecdcf 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
                __raw_writel(wake_hi | bank->context.risingdetect,
                             bank->base + bank->regs->risingdetect);

-       if (!bank->enabled_non_wakeup_gpios)
-               goto update_gpio_context_count;
-
        if (bank->power_mode != OFF_MODE) {
                bank->power_mode = 0;
                goto update_gpio_context_count;
@@ -1191,6 +1188,10 @@ static int omap_gpio_runtime_suspend(struct device *dev)
         */
        bank->saved_datain = __raw_readl(bank->base +
                                                bank->regs->datain);
+
+       if (!bank->enabled_non_wakeup_gpios)
+               goto update_gpio_context_count;
+
        l1 = bank->context.fallingdetect;
        l2 = bank->context.risingdetect;

-- 
1.7.0.4



>
> Thanks,
> Franky
>
> [1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-gpio-omap-enabled_non_wakeup_gpios-check-skips-bank-.patch
Type: text/x-patch
Size: 1556 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20120626/eb8a718c/attachment.bin>

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-26 18:20     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-26 18:20 UTC (permalink / raw)
  To: DebBarma, Tarun Kanti
  Cc: khilman, tony, santosh.shilimkar, b-cousson, grant.likely,
	linux-omap, linux-arm-kernel, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1666 bytes --]

On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl@broadcom.com> wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO WiFi
>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
>> mins during a network traffic test. After bisecting we found the culprit is
>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those two
>> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
>> kernel and the issue is still there. I can only "fix" the hang by either
>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>> happens on Panda ES board. Old Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
> I just had a quick look at the code. Can you please check if the
> attached patch solves
> the issue? I just boot tested on Panda and Blaze.
> --
> Tarun
>

Thanks for the prompt reply.

Booting is fine even without the patch and revert. The wifi dongle 
generates interrupt whenever there is data packet available for host to 
read. So during a traffic test a significant numbers of interrupt will 
be triggered through the GPIO. So I assume it has something to do with 
the interrupt GPIO.

With the patch, the kernel still crashes. But the symptom is slightly 
different. Now it has a panic log every time. See attachment.

Regards,
Franky

[-- Attachment #2: panic.log --]
[-- Type: text/plain, Size: 10398 bytes --]

[  636.143585] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM                                                                                                         
[  636.150634] Modules linked in: brcmfmac brcmutil cfg80211                                                                                                                        
[  636.156311] CPU: 0    Not tainted  (3.5.0-rc4+ #3)                                                                                                                               
[  636.161346] PC is at __lock_acquire+0x65c/0x1d88                                                                                                                                 
[  636.166198] LR is at 0x60000093                                                                                                                                                  
[  636.169494] pc : [<c008e670>]    lr : [<60000093>]    psr: 20000093                                                                                                              
[  636.169494] sp : c06b1e18  ip : 9e370001  fp : c0724f70                                                                                                                          
[  636.181549] r10: c06b0000  r9 : 0000001e  r8 : c0b92998                                                                                                                          
[  636.187042] r7 : c06d2cc8  r6 : 00000000  r5 : c0746d64  r4 : c06d2868                                                                                                           
[  636.193908] r3 : 00003b0e  r2 : ec3b001d  r1 : 0001d870  r0 : 0000001d                                                                                                           
[  636.200744] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel                                                                                                 
[  636.208526] Control: 10c53c7d  Table: ae39c04a  DAC: 00000017                                                                                                                    
[  636.214569] Process swapper/0 (pid: 0, stack limit = 0xc06b02f8)                                                                                                                 
[  636.220855] Stack: (0xc06b1e18 to 0xc06b2000)                                                                                                                                    
[  636.225433] 1e00:                                                       c06d00f8 00000002                                                                                        
[  636.234039] 1e20: c0807968 00000001 00000000 00000002 0000001d 00000000 00000001 0001d870                                                                                        
[  636.242614] 1e40: c08070e8 00000001 00000000 00000002 00000002 00000000 00000000 c00903e4                                                                                        
[  636.251220] 1e60: 00000002 00000080 00000000 c0066838 00000000 00000000 60000093 00000000                                                                                        
[  636.259796] 1e80: 60000093 00000000 c06b4324 c06b0000 00000000 00000000 00000002 00000000                                                                                        
[  636.268402] 1ea0: 00000000 c00903e4 00000002 00000080 00000000 c00a3588 00000000 c14b0aa0                                                                                        
[  636.276977] 1ec0: 60000093 c06adaa0 00000094 c06b4314 00000002 c06b0000 0000002c 00000000                                                                                        
[  636.285583] 1ee0: 412fc09a c06d3f80 00000000 c04a2914 00000002 00000000 c00a3588 c0048328                                                                                        
[  636.294189] 1f00: 00000033 c06b42c0 c06b4314 c00a3588 c06d00f8 c06af318 c06b0000 c009ff98                                                                                        
[  636.302764] 1f20: 000001da c0014c78 fa24010c c06ced30 c06b1f58 fa240100 00000000 c000848c                                                                                        
[  636.311370] 1f40: c06d2868 c0014f70 20000013 ffffffff c06b1f8c c04a31e4 057b6e56 00000001                                                                                        
[  636.319946] 1f60: 00000000 c06d2868 c06b0000 c0744308 c04ae350 c06d3d50 00000000 412fc09a                                                                                        
[  636.328552] 1f80: c06d3f80 00000000 00000001 c06b1fa0 057b6e57 c0014f70 20000013 ffffffff                                                                                        
[  636.337127] 1fa0: c06d2868 c001519c c071b85c c06cfdf8 c0744240 c0691fdc c14ad080 8000406a                                                                                        
[  636.345733] 1fc0: 00000000 c06617ac ffffffff ffffffff c0661230 00000000 00000000 c0691fdc                                                                                        
[  636.354309] 1fe0: 00000000 10c53c7d c06ced08 c0691fac c06d3d44 80008044 00000000 00000000                                                                                        
[  636.362915] [<c008e670>] (__lock_acquire+0x65c/0x1d88) from [<c00903e4>] (lock_acquire+0x98/0x100)                                                                               
[  636.372344] [<c00903e4>] (lock_acquire+0x98/0x100) from [<c04a2914>] (_raw_spin_lock+0x2c/0x3c)                                                                                  
[  636.381500] [<c04a2914>] (_raw_spin_lock+0x2c/0x3c) from [<c00a3588>] (handle_fasteoi_irq+0x14/0x194)                                                                            
[  636.391174] [<c00a3588>] (handle_fasteoi_irq+0x14/0x194) from [<c009ff98>] (generic_handle_irq+0x30/0x48)                                                                        
[  636.401245] [<c009ff98>] (generic_handle_irq+0x30/0x48) from [<c0014c78>] (handle_IRQ+0x4c/0xac)                                                                                 
[  636.410491] [<c0014c78>] (handle_IRQ+0x4c/0xac) from [<c000848c>] (gic_handle_irq+0x28/0x5c)                                                                                     
[  636.419342] [<c000848c>] (gic_handle_irq+0x28/0x5c) from [<c04a31e4>] (__irq_svc+0x44/0x60)                                                                                      
[  636.428131] Exception stack(0xc06b1f58 to 0xc06b1fa0)                                                                                                                            
[  636.433441] 1f40:                                                       057b6e56 00000001                                                                                        
[  636.442016] 1f60: 00000000 c06d2868 c06b0000 c0744308 c04ae350 c06d3d50 00000000 412fc09a                                                                                        
[  636.450622] 1f80: c06d3f80 00000000 00000001 c06b1fa0 057b6e57 c0014f70 20000013 ffffffff                                                                                        
[  636.459228] [<c04a31e4>] (__irq_svc+0x44/0x60) from [<c0014f70>] (default_idle+0x20/0x44)                                                                                        
[  636.467803] [<c0014f70>] (default_idle+0x20/0x44) from [<c001519c>] (cpu_idle+0x9c/0x114)                                                                                        
[  636.476409] [<c001519c>] (cpu_idle+0x9c/0x114) from [<c06617ac>] (start_kernel+0x2b0/0x300)                                                                                      
[  636.485198] Code: e1a02928 e1a0c182 e089800c e58dc024 (e59f98b4)                                                                                                                 
[  636.491607] ---[ end trace 5e6c69cac2b687b2 ]---                                                                                                                                 
[  636.496459] Kernel panic - not syncing: Fatal exception in interrupt                                                                                                             
[  636.503112] CPU1: stopping                                                                                                                                                       
[  636.505981] [<c001b61c>] (unwind_backtrace+0x0/0xf0) from [<c0019728>] (handle_IPI+0x130/0x15c)                                                                                  
[  636.515136] [<c0019728>] (handle_IPI+0x130/0x15c) from [<c00084b8>] (gic_handle_irq+0x54/0x5c)                                                                                   
[  636.524169] [<c00084b8>] (gic_handle_irq+0x54/0x5c) from [<c04a31e4>] (__irq_svc+0x44/0x60)                                                                                      
[  636.532958] Exception stack(0xee069f88 to 0xee069fd0)                                                                                                                            
[  636.538269] 9f80:                   c0724f70 c0014f50 00000000 00000000 ee068000 c0744308                                                                                        
[  636.546844] 9fa0: c04ae350 c06d3d50 00000000 412fc09a c06d3f80 00000000 00000000 ee069fd0                                                                                        
[  636.555450] 9fc0: c0014f6c c0014f70 60000113 ffffffff                                                                                                                            
[  636.560760] [<c04a31e4>] (__irq_svc+0x44/0x60) from [<c0014f70>] (default_idle+0x20/0x44)                                                                                        
[  636.569366] [<c0014f70>] (default_idle+0x20/0x44) from [<c001519c>] (cpu_idle+0x9c/0x114)                                                                                        
[  636.577941] [<c001519c>] (cpu_idle+0x9c/0x114) from [<8049bdd4>] (0x8049bdd4)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-26 18:20     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-26 18:20 UTC (permalink / raw)
  To: DebBarma, Tarun Kanti
  Cc: khilman-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	santosh.shilimkar-l0cyMroinI0, b-cousson-l0cyMroinI0,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]

On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO WiFi
>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
>> mins during a network traffic test. After bisecting we found the culprit is
>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those two
>> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
>> kernel and the issue is still there. I can only "fix" the hang by either
>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>> happens on Panda ES board. Old Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
> I just had a quick look at the code. Can you please check if the
> attached patch solves
> the issue? I just boot tested on Panda and Blaze.
> --
> Tarun
>

Thanks for the prompt reply.

Booting is fine even without the patch and revert. The wifi dongle 
generates interrupt whenever there is data packet available for host to 
read. So during a traffic test a significant numbers of interrupt will 
be triggered through the GPIO. So I assume it has something to do with 
the interrupt GPIO.

With the patch, the kernel still crashes. But the symptom is slightly 
different. Now it has a panic log every time. See attachment.

Regards,
Franky

[-- Attachment #2: panic.log --]
[-- Type: text/plain, Size: 10398 bytes --]

[  636.143585] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM                                                                                                         
[  636.150634] Modules linked in: brcmfmac brcmutil cfg80211                                                                                                                        
[  636.156311] CPU: 0    Not tainted  (3.5.0-rc4+ #3)                                                                                                                               
[  636.161346] PC is at __lock_acquire+0x65c/0x1d88                                                                                                                                 
[  636.166198] LR is at 0x60000093                                                                                                                                                  
[  636.169494] pc : [<c008e670>]    lr : [<60000093>]    psr: 20000093                                                                                                              
[  636.169494] sp : c06b1e18  ip : 9e370001  fp : c0724f70                                                                                                                          
[  636.181549] r10: c06b0000  r9 : 0000001e  r8 : c0b92998                                                                                                                          
[  636.187042] r7 : c06d2cc8  r6 : 00000000  r5 : c0746d64  r4 : c06d2868                                                                                                           
[  636.193908] r3 : 00003b0e  r2 : ec3b001d  r1 : 0001d870  r0 : 0000001d                                                                                                           
[  636.200744] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel                                                                                                 
[  636.208526] Control: 10c53c7d  Table: ae39c04a  DAC: 00000017                                                                                                                    
[  636.214569] Process swapper/0 (pid: 0, stack limit = 0xc06b02f8)                                                                                                                 
[  636.220855] Stack: (0xc06b1e18 to 0xc06b2000)                                                                                                                                    
[  636.225433] 1e00:                                                       c06d00f8 00000002                                                                                        
[  636.234039] 1e20: c0807968 00000001 00000000 00000002 0000001d 00000000 00000001 0001d870                                                                                        
[  636.242614] 1e40: c08070e8 00000001 00000000 00000002 00000002 00000000 00000000 c00903e4                                                                                        
[  636.251220] 1e60: 00000002 00000080 00000000 c0066838 00000000 00000000 60000093 00000000                                                                                        
[  636.259796] 1e80: 60000093 00000000 c06b4324 c06b0000 00000000 00000000 00000002 00000000                                                                                        
[  636.268402] 1ea0: 00000000 c00903e4 00000002 00000080 00000000 c00a3588 00000000 c14b0aa0                                                                                        
[  636.276977] 1ec0: 60000093 c06adaa0 00000094 c06b4314 00000002 c06b0000 0000002c 00000000                                                                                        
[  636.285583] 1ee0: 412fc09a c06d3f80 00000000 c04a2914 00000002 00000000 c00a3588 c0048328                                                                                        
[  636.294189] 1f00: 00000033 c06b42c0 c06b4314 c00a3588 c06d00f8 c06af318 c06b0000 c009ff98                                                                                        
[  636.302764] 1f20: 000001da c0014c78 fa24010c c06ced30 c06b1f58 fa240100 00000000 c000848c                                                                                        
[  636.311370] 1f40: c06d2868 c0014f70 20000013 ffffffff c06b1f8c c04a31e4 057b6e56 00000001                                                                                        
[  636.319946] 1f60: 00000000 c06d2868 c06b0000 c0744308 c04ae350 c06d3d50 00000000 412fc09a                                                                                        
[  636.328552] 1f80: c06d3f80 00000000 00000001 c06b1fa0 057b6e57 c0014f70 20000013 ffffffff                                                                                        
[  636.337127] 1fa0: c06d2868 c001519c c071b85c c06cfdf8 c0744240 c0691fdc c14ad080 8000406a                                                                                        
[  636.345733] 1fc0: 00000000 c06617ac ffffffff ffffffff c0661230 00000000 00000000 c0691fdc                                                                                        
[  636.354309] 1fe0: 00000000 10c53c7d c06ced08 c0691fac c06d3d44 80008044 00000000 00000000                                                                                        
[  636.362915] [<c008e670>] (__lock_acquire+0x65c/0x1d88) from [<c00903e4>] (lock_acquire+0x98/0x100)                                                                               
[  636.372344] [<c00903e4>] (lock_acquire+0x98/0x100) from [<c04a2914>] (_raw_spin_lock+0x2c/0x3c)                                                                                  
[  636.381500] [<c04a2914>] (_raw_spin_lock+0x2c/0x3c) from [<c00a3588>] (handle_fasteoi_irq+0x14/0x194)                                                                            
[  636.391174] [<c00a3588>] (handle_fasteoi_irq+0x14/0x194) from [<c009ff98>] (generic_handle_irq+0x30/0x48)                                                                        
[  636.401245] [<c009ff98>] (generic_handle_irq+0x30/0x48) from [<c0014c78>] (handle_IRQ+0x4c/0xac)                                                                                 
[  636.410491] [<c0014c78>] (handle_IRQ+0x4c/0xac) from [<c000848c>] (gic_handle_irq+0x28/0x5c)                                                                                     
[  636.419342] [<c000848c>] (gic_handle_irq+0x28/0x5c) from [<c04a31e4>] (__irq_svc+0x44/0x60)                                                                                      
[  636.428131] Exception stack(0xc06b1f58 to 0xc06b1fa0)                                                                                                                            
[  636.433441] 1f40:                                                       057b6e56 00000001                                                                                        
[  636.442016] 1f60: 00000000 c06d2868 c06b0000 c0744308 c04ae350 c06d3d50 00000000 412fc09a                                                                                        
[  636.450622] 1f80: c06d3f80 00000000 00000001 c06b1fa0 057b6e57 c0014f70 20000013 ffffffff                                                                                        
[  636.459228] [<c04a31e4>] (__irq_svc+0x44/0x60) from [<c0014f70>] (default_idle+0x20/0x44)                                                                                        
[  636.467803] [<c0014f70>] (default_idle+0x20/0x44) from [<c001519c>] (cpu_idle+0x9c/0x114)                                                                                        
[  636.476409] [<c001519c>] (cpu_idle+0x9c/0x114) from [<c06617ac>] (start_kernel+0x2b0/0x300)                                                                                      
[  636.485198] Code: e1a02928 e1a0c182 e089800c e58dc024 (e59f98b4)                                                                                                                 
[  636.491607] ---[ end trace 5e6c69cac2b687b2 ]---                                                                                                                                 
[  636.496459] Kernel panic - not syncing: Fatal exception in interrupt                                                                                                             
[  636.503112] CPU1: stopping                                                                                                                                                       
[  636.505981] [<c001b61c>] (unwind_backtrace+0x0/0xf0) from [<c0019728>] (handle_IPI+0x130/0x15c)                                                                                  
[  636.515136] [<c0019728>] (handle_IPI+0x130/0x15c) from [<c00084b8>] (gic_handle_irq+0x54/0x5c)                                                                                   
[  636.524169] [<c00084b8>] (gic_handle_irq+0x54/0x5c) from [<c04a31e4>] (__irq_svc+0x44/0x60)                                                                                      
[  636.532958] Exception stack(0xee069f88 to 0xee069fd0)                                                                                                                            
[  636.538269] 9f80:                   c0724f70 c0014f50 00000000 00000000 ee068000 c0744308                                                                                        
[  636.546844] 9fa0: c04ae350 c06d3d50 00000000 412fc09a c06d3f80 00000000 00000000 ee069fd0                                                                                        
[  636.555450] 9fc0: c0014f6c c0014f70 60000113 ffffffff                                                                                                                            
[  636.560760] [<c04a31e4>] (__irq_svc+0x44/0x60) from [<c0014f70>] (default_idle+0x20/0x44)                                                                                        
[  636.569366] [<c0014f70>] (default_idle+0x20/0x44) from [<c001519c>] (cpu_idle+0x9c/0x114)                                                                                        
[  636.577941] [<c001519c>] (cpu_idle+0x9c/0x114) from [<8049bdd4>] (0x8049bdd4)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-26 18:20     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-26 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl@broadcom.com> wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO WiFi
>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
>> mins during a network traffic test. After bisecting we found the culprit is
>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those two
>> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
>> kernel and the issue is still there. I can only "fix" the hang by either
>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>> happens on Panda ES board. Old Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
> I just had a quick look at the code. Can you please check if the
> attached patch solves
> the issue? I just boot tested on Panda and Blaze.
> --
> Tarun
>

Thanks for the prompt reply.

Booting is fine even without the patch and revert. The wifi dongle 
generates interrupt whenever there is data packet available for host to 
read. So during a traffic test a significant numbers of interrupt will 
be triggered through the GPIO. So I assume it has something to do with 
the interrupt GPIO.

With the patch, the kernel still crashes. But the symptom is slightly 
different. Now it has a panic log every time. See attachment.

Regards,
Franky
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: panic.log
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20120626/aa313b28/attachment.ksh>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-27  3:37   ` Kevin Hilman
  0 siblings, 0 replies; 57+ messages in thread
From: Kevin Hilman @ 2012-06-27  3:37 UTC (permalink / raw)
  To: Franky Lin
  Cc: tarun.kanti, tony, santosh.shilimkar, b-cousson, grant.likely,
	linux-omap, linux-arm-kernel, linux-wireless

Hello,

"Franky Lin" <frankyl@broadcom.com> writes:

> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected
> to GPIO 138. It's been working fine until 3.5 rc1. The board hang
> randomly within 5 mins during a network traffic test. After bisecting
> we found the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

<grumble>

As you might guess.  That patch has caused me enough headaches that
reverting it sounds like a good idea now.  But, I'd still like to better
understand exactly what's going on.

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the
> hang by either reverting the commit or disabling
> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
> Panda with 4430 works good.
>
> Any thoughts and suggestions?

If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem?  IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume?  Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1].  If that works, I will give
a bit more of a thinking on it tomorrow.

Thanks for reporting the problem!   Bug reports like this that have
clearly been thoroughly researched and bisected are greatly appreciated!

Kevin

[1] patch against v3.5-rc4

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c                
index c4ed172..2a6067f 100644                                                   
--- a/drivers/gpio/gpio-omap.c                                                  
+++ b/drivers/gpio/gpio-omap.c                                                  
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
                __raw_writel(wake_hi | bank->context.risingdetect,              
                             bank->base + bank->regs->risingdetect);            
                                                                                
-       if (!bank->enabled_non_wakeup_gpios)                                    
-               goto update_gpio_context_count;                                 
-                                                                               
        if (bank->power_mode != OFF_MODE) {                                     
                bank->power_mode = 0;                                           
                goto update_gpio_context_count;                                 




^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-27  3:37   ` Kevin Hilman
  0 siblings, 0 replies; 57+ messages in thread
From: Kevin Hilman @ 2012-06-27  3:37 UTC (permalink / raw)
  To: Franky Lin
  Cc: tarun.kanti-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	santosh.shilimkar-l0cyMroinI0, b-cousson-l0cyMroinI0,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel@lists.infradead.org,
	linux-wireless@vger.kernel.org

Hello,

"Franky Lin" <frankyl-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> writes:

> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected
> to GPIO 138. It's been working fine until 3.5 rc1. The board hang
> randomly within 5 mins during a network traffic test. After bisecting
> we found the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

<grumble>

As you might guess.  That patch has caused me enough headaches that
reverting it sounds like a good idea now.  But, I'd still like to better
understand exactly what's going on.

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the
> hang by either reverting the commit or disabling
> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
> Panda with 4430 works good.
>
> Any thoughts and suggestions?

If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem?  IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume?  Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1].  If that works, I will give
a bit more of a thinking on it tomorrow.

Thanks for reporting the problem!   Bug reports like this that have
clearly been thoroughly researched and bisected are greatly appreciated!

Kevin

[1] patch against v3.5-rc4

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c                
index c4ed172..2a6067f 100644                                                   
--- a/drivers/gpio/gpio-omap.c                                                  
+++ b/drivers/gpio/gpio-omap.c                                                  
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
                __raw_writel(wake_hi | bank->context.risingdetect,              
                             bank->base + bank->regs->risingdetect);            
                                                                                
-       if (!bank->enabled_non_wakeup_gpios)                                    
-               goto update_gpio_context_count;                                 
-                                                                               
        if (bank->power_mode != OFF_MODE) {                                     
                bank->power_mode = 0;                                           
                goto update_gpio_context_count;                                 



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-27  3:37   ` Kevin Hilman
  0 siblings, 0 replies; 57+ messages in thread
From: Kevin Hilman @ 2012-06-27  3:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

"Franky Lin" <frankyl@broadcom.com> writes:

> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected
> to GPIO 138. It's been working fine until 3.5 rc1. The board hang
> randomly within 5 mins during a network traffic test. After bisecting
> we found the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

<grumble>

As you might guess.  That patch has caused me enough headaches that
reverting it sounds like a good idea now.  But, I'd still like to better
understand exactly what's going on.

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the
> hang by either reverting the commit or disabling
> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
> Panda with 4430 works good.
>
> Any thoughts and suggestions?

If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem?  IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume?  Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1].  If that works, I will give
a bit more of a thinking on it tomorrow.

Thanks for reporting the problem!   Bug reports like this that have
clearly been thoroughly researched and bisected are greatly appreciated!

Kevin

[1] patch against v3.5-rc4

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c                
index c4ed172..2a6067f 100644                                                   
--- a/drivers/gpio/gpio-omap.c                                                  
+++ b/drivers/gpio/gpio-omap.c                                                  
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
                __raw_writel(wake_hi | bank->context.risingdetect,              
                             bank->base + bank->regs->risingdetect);            
                                                                                
-       if (!bank->enabled_non_wakeup_gpios)                                    
-               goto update_gpio_context_count;                                 
-                                                                               
        if (bank->power_mode != OFF_MODE) {                                     
                bank->power_mode = 0;                                           
                goto update_gpio_context_count;                                 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-26 18:20     ` Franky Lin
@ 2012-06-27 13:29       ` DebBarma, Tarun Kanti
  -1 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-27 13:29 UTC (permalink / raw)
  To: Franky Lin
  Cc: khilman, tony, santosh.shilimkar, b-cousson, grant.likely,
	linux-omap, linux-arm-kernel, linux-wireless

On Tue, Jun 26, 2012 at 11:50 PM, Franky Lin <frankyl@broadcom.com> wrote:
> On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
>>
>> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl@broadcom.com> wrote:
>>>
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi
>>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>>> 138. It's been working fine until 3.5 rc1. The board hang randomly within
>>> 5
>>> mins during a network traffic test. After bisecting we found the culprit
>>> is
>>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>>
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two
>>> patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline
>>> kernel and the issue is still there. I can only "fix" the hang by either
>>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>>> happens on Panda ES board. Old Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> I just had a quick look at the code. Can you please check if the
>> attached patch solves
>> the issue? I just boot tested on Panda and Blaze.
>> --
>> Tarun
>>
>
> Thanks for the prompt reply.
>
> Booting is fine even without the patch and revert. The wifi dongle generates
> interrupt whenever there is data packet available for host to read. So
> during a traffic test a significant numbers of interrupt will be triggered
> through the GPIO. So I assume it has something to do with the interrupt
> GPIO.
>
> With the patch, the kernel still crashes. But the symptom is slightly
> different. Now it has a panic log every time. See attachment.
I tried comparing the present code with older version with regard
to enabled_non_wakeup_gpios check. The obvious difference I
observed is that this check is performed after off-mode check,
unlike the present case where the check is done just prior to
off-mode check. But then, as Kevin pointed out, we need to understand
the exact problem. I am trying to have a setup to reproduce the
problem. BTW, you can ignore my patch because I realized that
saved_datain is part of the workaround.
---
Tarun

>
> Regards,
> Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-27 13:29       ` DebBarma, Tarun Kanti
  0 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-27 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 26, 2012 at 11:50 PM, Franky Lin <frankyl@broadcom.com> wrote:
> On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
>>
>> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin <frankyl@broadcom.com> wrote:
>>>
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi
>>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>>> 138. It's been working fine until 3.5 rc1. The board hang randomly within
>>> 5
>>> mins during a network traffic test. After bisecting we found the culprit
>>> is
>>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>>
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two
>>> patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline
>>> kernel and the issue is still there. I can only "fix" the hang by either
>>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>>> happens on Panda ES board. Old Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> I just had a quick look at the code. Can you please check if the
>> attached patch solves
>> the issue? I just boot tested on Panda and Blaze.
>> --
>> Tarun
>>
>
> Thanks for the prompt reply.
>
> Booting is fine even without the patch and revert. The wifi dongle generates
> interrupt whenever there is data packet available for host to read. So
> during a traffic test a significant numbers of interrupt will be triggered
> through the GPIO. So I assume it has something to do with the interrupt
> GPIO.
>
> With the patch, the kernel still crashes. But the symptom is slightly
> different. Now it has a panic log every time. See attachment.
I tried comparing the present code with older version with regard
to enabled_non_wakeup_gpios check. The obvious difference I
observed is that this check is performed after off-mode check,
unlike the present case where the check is done just prior to
off-mode check. But then, as Kevin pointed out, we need to understand
the exact problem. I am trying to have a setup to reproduce the
problem. BTW, you can ignore my patch because I realized that
saved_datain is part of the workaround.
---
Tarun

>
> Regards,
> Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-25 20:52 ` Franky Lin
  (?)
@ 2012-06-27 23:43   ` Jon Hunter
  -1 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-27 23:43 UTC (permalink / raw)
  To: Franky Lin
  Cc: khilman, tarun.kanti, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel

Hi Franky,

On 06/25/2012 03:52 PM, Franky Lin wrote:
> Hi Kevin, Tarun,
> 
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
> within 5 mins during a network traffic test. After bisecting we found
> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

I have been looking into this today to see if I can replicate the
problem that you have reported. However, so far I have not had any luck.
Please note that my test setup is not exactly the same as yours as I
don't have your wlan module. However, I have been using a 2nd board to
generate gpio events to a panda-es to see I can make it lock up. I have
tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
problems after sending 100k gpio events (over many minutes). My setup is
as follows ...

- OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
- Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
- Created a simple kernel module that acquires gpio-138 and sets up a
  IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
- GPIO events are triggered roughly every 1ms

Can you confirm ...
1. You are just using omap2plus_defconfig with no changes?
2. Rough frequency of gpio events?
3. Is the gpio configured for active low or high?
4. When the hang occurs, what is the state of the gpio? Active or
   inactive? Can you probe it with a scope? If it was always active I
   could see that this would lock the device up, but I am not sure how
   that would relate to the results from your bisect???

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the hang
> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
> hang only happens on Panda ES board. Old Panda with 4430 works good.

It does not make sense to me yet why this would only impact 4460, but I
will keep this in mind.

In your wlan driver are you acquiring and freeing the gpio often? Or are
you only acquiring the gpio on boot?

The reason I ask is because for omap4, it seems that we are not
currently calling omap2_gpio_prepare_for_idle() during idle and so the
only time I see us call the runtime_suspend/resume handlers for omap4 is
during probe and when we acquire and free the gpio.

So if you were not acquiring and freeing the gpio and are using the
stock kernel, then as far as I can tell, the runtime pm code is not
being exercised much. My test is not acquiring and releasing the gpio
and so I am wondering if that is the secret to reproducing this problem :-)

Cheers
Jon


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-27 23:43   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-27 23:43 UTC (permalink / raw)
  To: Franky Lin
  Cc: khilman, tarun.kanti, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel

Hi Franky,

On 06/25/2012 03:52 PM, Franky Lin wrote:
> Hi Kevin, Tarun,
> 
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
> within 5 mins during a network traffic test. After bisecting we found
> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

I have been looking into this today to see if I can replicate the
problem that you have reported. However, so far I have not had any luck.
Please note that my test setup is not exactly the same as yours as I
don't have your wlan module. However, I have been using a 2nd board to
generate gpio events to a panda-es to see I can make it lock up. I have
tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
problems after sending 100k gpio events (over many minutes). My setup is
as follows ...

- OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
- Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
- Created a simple kernel module that acquires gpio-138 and sets up a
  IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
- GPIO events are triggered roughly every 1ms

Can you confirm ...
1. You are just using omap2plus_defconfig with no changes?
2. Rough frequency of gpio events?
3. Is the gpio configured for active low or high?
4. When the hang occurs, what is the state of the gpio? Active or
   inactive? Can you probe it with a scope? If it was always active I
   could see that this would lock the device up, but I am not sure how
   that would relate to the results from your bisect???

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the hang
> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
> hang only happens on Panda ES board. Old Panda with 4430 works good.

It does not make sense to me yet why this would only impact 4460, but I
will keep this in mind.

In your wlan driver are you acquiring and freeing the gpio often? Or are
you only acquiring the gpio on boot?

The reason I ask is because for omap4, it seems that we are not
currently calling omap2_gpio_prepare_for_idle() during idle and so the
only time I see us call the runtime_suspend/resume handlers for omap4 is
during probe and when we acquire and free the gpio.

So if you were not acquiring and freeing the gpio and are using the
stock kernel, then as far as I can tell, the runtime pm code is not
being exercised much. My test is not acquiring and releasing the gpio
and so I am wondering if that is the secret to reproducing this problem :-)

Cheers
Jon


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-27 23:43   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-27 23:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Franky,

On 06/25/2012 03:52 PM, Franky Lin wrote:
> Hi Kevin, Tarun,
> 
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
> within 5 mins during a network traffic test. After bisecting we found
> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

I have been looking into this today to see if I can replicate the
problem that you have reported. However, so far I have not had any luck.
Please note that my test setup is not exactly the same as yours as I
don't have your wlan module. However, I have been using a 2nd board to
generate gpio events to a panda-es to see I can make it lock up. I have
tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
problems after sending 100k gpio events (over many minutes). My setup is
as follows ...

- OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
- Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
- Created a simple kernel module that acquires gpio-138 and sets up a
  IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
- GPIO events are triggered roughly every 1ms

Can you confirm ...
1. You are just using omap2plus_defconfig with no changes?
2. Rough frequency of gpio events?
3. Is the gpio configured for active low or high?
4. When the hang occurs, what is the state of the gpio? Active or
   inactive? Can you probe it with a scope? If it was always active I
   could see that this would lock the device up, but I am not sure how
   that would relate to the results from your bisect???

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the hang
> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
> hang only happens on Panda ES board. Old Panda with 4430 works good.

It does not make sense to me yet why this would only impact 4460, but I
will keep this in mind.

In your wlan driver are you acquiring and freeing the gpio often? Or are
you only acquiring the gpio on boot?

The reason I ask is because for omap4, it seems that we are not
currently calling omap2_gpio_prepare_for_idle() during idle and so the
only time I see us call the runtime_suspend/resume handlers for omap4 is
during probe and when we acquire and free the gpio.

So if you were not acquiring and freeing the gpio and are using the
stock kernel, then as far as I can tell, the runtime pm code is not
being exercised much. My test is not acquiring and releasing the gpio
and so I am wondering if that is the secret to reproducing this problem :-)

Cheers
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28  0:41     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28  0:41 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: tarun.kanti, tony, santosh.shilimkar, b-cousson, grant.likely,
	linux-omap, linux-arm-kernel, linux-wireless

On 06/26/2012 08:37 PM, Kevin Hilman wrote:
> "Franky Lin" <frankyl@broadcom.com> writes:
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the
>> hang by either reverting the commit or disabling
>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>> Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
>
> If reverting the patch fixes your problem, can you isolate down to which
> part of that patch causes the problem?  IOW, can you fix your problem if
> you undo just the hunk added in runtime_suspend or undo just the moved
> hunk runtime_resume?  Or is reverting both required?
>
> I suspect the added runtime_suspend hunk is causing the problems, so can
> you see if just undoing that part works[1].  If that works, I will give
> a bit more of a thinking on it tomorrow.

runtime_suspend hunk is fine. The hang still exist after reverting it. 
The culprit is the moved hunk in runtime_resume. Reverting it makes the 
hang disappear.

>
> Thanks for reporting the problem!   Bug reports like this that have
> clearly been thoroughly researched and bisected are greatly appreciated!
>
> Kevin
>

You are welcome.

Regards,
Franky


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28  0:41     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28  0:41 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: tarun.kanti-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	santosh.shilimkar-l0cyMroinI0, b-cousson-l0cyMroinI0,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA

On 06/26/2012 08:37 PM, Kevin Hilman wrote:
> "Franky Lin" <frankyl-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> writes:
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the
>> hang by either reverting the commit or disabling
>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>> Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
>
> If reverting the patch fixes your problem, can you isolate down to which
> part of that patch causes the problem?  IOW, can you fix your problem if
> you undo just the hunk added in runtime_suspend or undo just the moved
> hunk runtime_resume?  Or is reverting both required?
>
> I suspect the added runtime_suspend hunk is causing the problems, so can
> you see if just undoing that part works[1].  If that works, I will give
> a bit more of a thinking on it tomorrow.

runtime_suspend hunk is fine. The hang still exist after reverting it. 
The culprit is the moved hunk in runtime_resume. Reverting it makes the 
hang disappear.

>
> Thanks for reporting the problem!   Bug reports like this that have
> clearly been thoroughly researched and bisected are greatly appreciated!
>
> Kevin
>

You are welcome.

Regards,
Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28  0:41     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/26/2012 08:37 PM, Kevin Hilman wrote:
> "Franky Lin" <frankyl@broadcom.com> writes:
>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the
>> hang by either reverting the commit or disabling
>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>> Panda with 4430 works good.
>>
>> Any thoughts and suggestions?
>
> If reverting the patch fixes your problem, can you isolate down to which
> part of that patch causes the problem?  IOW, can you fix your problem if
> you undo just the hunk added in runtime_suspend or undo just the moved
> hunk runtime_resume?  Or is reverting both required?
>
> I suspect the added runtime_suspend hunk is causing the problems, so can
> you see if just undoing that part works[1].  If that works, I will give
> a bit more of a thinking on it tomorrow.

runtime_suspend hunk is fine. The hang still exist after reverting it. 
The culprit is the moved hunk in runtime_resume. Reverting it makes the 
hang disappear.

>
> Thanks for reporting the problem!   Bug reports like this that have
> clearly been thoroughly researched and bisected are greatly appreciated!
>
> Kevin
>

You are welcome.

Regards,
Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-27 23:43   ` Jon Hunter
@ 2012-06-28  1:03     ` Franky Lin
  -1 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28  1:03 UTC (permalink / raw)
  To: Jon Hunter
  Cc: khilman, tarun.kanti, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel

On 06/27/2012 04:43 PM, Jon Hunter wrote:
> Hi Franky,
>
> On 06/25/2012 03:52 PM, Franky Lin wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO
>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>> within 5 mins during a network traffic test. After bisecting we found
>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>> *_runtime_suspend()" [1].
>
> I have been looking into this today to see if I can replicate the
> problem that you have reported. However, so far I have not had any luck.
> Please note that my test setup is not exactly the same as yours as I
> don't have your wlan module. However, I have been using a 2nd board to
> generate gpio events to a panda-es to see I can make it lock up. I have
> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
> problems after sending 100k gpio events (over many minutes). My setup is
> as follows ...
>
> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
> - Created a simple kernel module that acquires gpio-138 and sets up a
>    IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
> - GPIO events are triggered roughly every 1ms

Don't know if it's related, but we also mux several other pins on 
connector A:
         /* MMC2 Mux for extension board */
         /* MMC2 CMD */
         OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         /* MMC2 CLK */
         OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         /* MMC2 DAT 0-3 */
         OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         /* GPIO MUX for OOB interupt of dongle */
         OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
         /* GPIO MUX for WLAN_ENABLE for dongle */
         OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

> Can you confirm ...
> 1. You are just using omap2plus_defconfig with no changes?
No, we enable following options
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_USB_OHCI_HCD=y

> 2. Rough frequency of gpio events?
3367 interrupts were triggered during a 10 secs throughput test.

> 3. Is the gpio configured for active low or high?
active high

> 4. When the hang occurs, what is the state of the gpio? Active or
>     inactive? Can you probe it with a scope? If it was always active I
>     could see that this would lock the device up, but I am not sure how
>     that would relate to the results from your bisect???

I dont have a scope nearby. Let me see if I can find one tomorrow.

>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the hang
>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>
> It does not make sense to me yet why this would only impact 4460, but I
> will keep this in mind.
>
> In your wlan driver are you acquiring and freeing the gpio often? Or are
> you only acquiring the gpio on boot?
>
> The reason I ask is because for omap4, it seems that we are not
> currently calling omap2_gpio_prepare_for_idle() during idle and so the
> only time I see us call the runtime_suspend/resume handlers for omap4 is
> during probe and when we acquire and free the gpio.
>
> So if you were not acquiring and freeing the gpio and are using the
> stock kernel, then as far as I can tell, the runtime pm code is not
> being exercised much. My test is not acquiring and releasing the gpio
> and so I am wondering if that is the secret to reproducing this problem :-)

We only request the irq once during initialization. But we do frequently 
disable and re-enable it since we need to access to the module through 
SDIO to clear the interrupt. Apparently we can't finish all this in irq 
handler.

Hope these could help.

Regards,
Franky


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28  1:03     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28  1:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/27/2012 04:43 PM, Jon Hunter wrote:
> Hi Franky,
>
> On 06/25/2012 03:52 PM, Franky Lin wrote:
>> Hi Kevin, Tarun,
>>
>> We are using the expansion connector A on Panda board to mount a SDIO
>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>> within 5 mins during a network traffic test. After bisecting we found
>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>> *_runtime_suspend()" [1].
>
> I have been looking into this today to see if I can replicate the
> problem that you have reported. However, so far I have not had any luck.
> Please note that my test setup is not exactly the same as yours as I
> don't have your wlan module. However, I have been using a 2nd board to
> generate gpio events to a panda-es to see I can make it lock up. I have
> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
> problems after sending 100k gpio events (over many minutes). My setup is
> as follows ...
>
> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
> - Created a simple kernel module that acquires gpio-138 and sets up a
>    IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
> - GPIO events are triggered roughly every 1ms

Don't know if it's related, but we also mux several other pins on 
connector A:
         /* MMC2 Mux for extension board */
         /* MMC2 CMD */
         OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         /* MMC2 CLK */
         OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         /* MMC2 DAT 0-3 */
         OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
         /* GPIO MUX for OOB interupt of dongle */
         OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
         /* GPIO MUX for WLAN_ENABLE for dongle */
         OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

> Can you confirm ...
> 1. You are just using omap2plus_defconfig with no changes?
No, we enable following options
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_USB_OHCI_HCD=y

> 2. Rough frequency of gpio events?
3367 interrupts were triggered during a 10 secs throughput test.

> 3. Is the gpio configured for active low or high?
active high

> 4. When the hang occurs, what is the state of the gpio? Active or
>     inactive? Can you probe it with a scope? If it was always active I
>     could see that this would lock the device up, but I am not sure how
>     that would relate to the results from your bisect???

I dont have a scope nearby. Let me see if I can find one tomorrow.

>> I noticed Kevin raised some similar cases on other platforms and also
>> provided two patches in the patch mail thread. But unfortunately those
>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>> mainline kernel and the issue is still there. I can only "fix" the hang
>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>
> It does not make sense to me yet why this would only impact 4460, but I
> will keep this in mind.
>
> In your wlan driver are you acquiring and freeing the gpio often? Or are
> you only acquiring the gpio on boot?
>
> The reason I ask is because for omap4, it seems that we are not
> currently calling omap2_gpio_prepare_for_idle() during idle and so the
> only time I see us call the runtime_suspend/resume handlers for omap4 is
> during probe and when we acquire and free the gpio.
>
> So if you were not acquiring and freeing the gpio and are using the
> stock kernel, then as far as I can tell, the runtime pm code is not
> being exercised much. My test is not acquiring and releasing the gpio
> and so I am wondering if that is the secret to reproducing this problem :-)

We only request the irq once during initialization. But we do frequently 
disable and re-enable it since we need to access to the module through 
SDIO to clear the interrupt. Apparently we can't finish all this in irq 
handler.

Hope these could help.

Regards,
Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 15:37       ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 15:37 UTC (permalink / raw)
  To: Franky Lin
  Cc: khilman, tarun.kanti, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel

Hi Franky,

On 06/27/2012 08:03 PM, Franky Lin wrote:
> On 06/27/2012 04:43 PM, Jon Hunter wrote:
>> Hi Franky,
>>
>> On 06/25/2012 03:52 PM, Franky Lin wrote:
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>>> within 5 mins during a network traffic test. After bisecting we found
>>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>>> *_runtime_suspend()" [1].
>>
>> I have been looking into this today to see if I can replicate the
>> problem that you have reported. However, so far I have not had any luck.
>> Please note that my test setup is not exactly the same as yours as I
>> don't have your wlan module. However, I have been using a 2nd board to
>> generate gpio events to a panda-es to see I can make it lock up. I have
>> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
>> problems after sending 100k gpio events (over many minutes). My setup is
>> as follows ...
>>
>> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
>> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
>> - Created a simple kernel module that acquires gpio-138 and sets up a
>>    IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
>> - GPIO events are triggered roughly every 1ms
> 
> Don't know if it's related, but we also mux several other pins on
> connector A:
>         /* MMC2 Mux for extension board */
>         /* MMC2 CMD */
>         OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* MMC2 CLK */
>         OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* MMC2 DAT 0-3 */
>         OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* GPIO MUX for OOB interupt of dongle */
>         OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
>         /* GPIO MUX for WLAN_ENABLE for dongle */
>         OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

I would not have thought so. However, I will think about that thanks.

>> Can you confirm ...
>> 1. You are just using omap2plus_defconfig with no changes?
> No, we enable following options
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> CONFIG_USB_OHCI_HCD=y

Ok, thanks.

>> 2. Rough frequency of gpio events?
> 3367 interrupts were triggered during a 10 secs throughput test.
> 
>> 3. Is the gpio configured for active low or high?
> active high
> 
>> 4. When the hang occurs, what is the state of the gpio? Active or
>>     inactive? Can you probe it with a scope? If it was always active I
>>     could see that this would lock the device up, but I am not sure how
>>     that would relate to the results from your bisect???
> 
> I dont have a scope nearby. Let me see if I can find one tomorrow.

Great, that would be good.

>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the hang
>>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>>
>> It does not make sense to me yet why this would only impact 4460, but I
>> will keep this in mind.
>>
>> In your wlan driver are you acquiring and freeing the gpio often? Or are
>> you only acquiring the gpio on boot?
>>
>> The reason I ask is because for omap4, it seems that we are not
>> currently calling omap2_gpio_prepare_for_idle() during idle and so the
>> only time I see us call the runtime_suspend/resume handlers for omap4 is
>> during probe and when we acquire and free the gpio.
>>
>> So if you were not acquiring and freeing the gpio and are using the
>> stock kernel, then as far as I can tell, the runtime pm code is not
>> being exercised much. My test is not acquiring and releasing the gpio
>> and so I am wondering if that is the secret to reproducing this
>> problem :-)
> 
> We only request the irq once during initialization. But we do frequently
> disable and re-enable it since we need to access to the module through
> SDIO to clear the interrupt. Apparently we can't finish all this in irq
> handler.

Ok, thanks. I don't see why that would cause a problem, but I can try
that too.

> Hope these could help.

Yes, good info to have.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 15:37       ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 15:37 UTC (permalink / raw)
  To: Franky Lin
  Cc: khilman-l0cyMroinI0, tarun.kanti-l0cyMroinI0,
	b-cousson-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	santosh.shilimkar-l0cyMroinI0, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Franky,

On 06/27/2012 08:03 PM, Franky Lin wrote:
> On 06/27/2012 04:43 PM, Jon Hunter wrote:
>> Hi Franky,
>>
>> On 06/25/2012 03:52 PM, Franky Lin wrote:
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>>> within 5 mins during a network traffic test. After bisecting we found
>>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>>> *_runtime_suspend()" [1].
>>
>> I have been looking into this today to see if I can replicate the
>> problem that you have reported. However, so far I have not had any luck.
>> Please note that my test setup is not exactly the same as yours as I
>> don't have your wlan module. However, I have been using a 2nd board to
>> generate gpio events to a panda-es to see I can make it lock up. I have
>> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
>> problems after sending 100k gpio events (over many minutes). My setup is
>> as follows ...
>>
>> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
>> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
>> - Created a simple kernel module that acquires gpio-138 and sets up a
>>    IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
>> - GPIO events are triggered roughly every 1ms
> 
> Don't know if it's related, but we also mux several other pins on
> connector A:
>         /* MMC2 Mux for extension board */
>         /* MMC2 CMD */
>         OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* MMC2 CLK */
>         OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* MMC2 DAT 0-3 */
>         OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* GPIO MUX for OOB interupt of dongle */
>         OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
>         /* GPIO MUX for WLAN_ENABLE for dongle */
>         OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

I would not have thought so. However, I will think about that thanks.

>> Can you confirm ...
>> 1. You are just using omap2plus_defconfig with no changes?
> No, we enable following options
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> CONFIG_USB_OHCI_HCD=y

Ok, thanks.

>> 2. Rough frequency of gpio events?
> 3367 interrupts were triggered during a 10 secs throughput test.
> 
>> 3. Is the gpio configured for active low or high?
> active high
> 
>> 4. When the hang occurs, what is the state of the gpio? Active or
>>     inactive? Can you probe it with a scope? If it was always active I
>>     could see that this would lock the device up, but I am not sure how
>>     that would relate to the results from your bisect???
> 
> I dont have a scope nearby. Let me see if I can find one tomorrow.

Great, that would be good.

>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the hang
>>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>>
>> It does not make sense to me yet why this would only impact 4460, but I
>> will keep this in mind.
>>
>> In your wlan driver are you acquiring and freeing the gpio often? Or are
>> you only acquiring the gpio on boot?
>>
>> The reason I ask is because for omap4, it seems that we are not
>> currently calling omap2_gpio_prepare_for_idle() during idle and so the
>> only time I see us call the runtime_suspend/resume handlers for omap4 is
>> during probe and when we acquire and free the gpio.
>>
>> So if you were not acquiring and freeing the gpio and are using the
>> stock kernel, then as far as I can tell, the runtime pm code is not
>> being exercised much. My test is not acquiring and releasing the gpio
>> and so I am wondering if that is the secret to reproducing this
>> problem :-)
> 
> We only request the irq once during initialization. But we do frequently
> disable and re-enable it since we need to access to the module through
> SDIO to clear the interrupt. Apparently we can't finish all this in irq
> handler.

Ok, thanks. I don't see why that would cause a problem, but I can try
that too.

> Hope these could help.

Yes, good info to have.

Thanks
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 15:37       ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Franky,

On 06/27/2012 08:03 PM, Franky Lin wrote:
> On 06/27/2012 04:43 PM, Jon Hunter wrote:
>> Hi Franky,
>>
>> On 06/25/2012 03:52 PM, Franky Lin wrote:
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>>> within 5 mins during a network traffic test. After bisecting we found
>>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>>> *_runtime_suspend()" [1].
>>
>> I have been looking into this today to see if I can replicate the
>> problem that you have reported. However, so far I have not had any luck.
>> Please note that my test setup is not exactly the same as yours as I
>> don't have your wlan module. However, I have been using a 2nd board to
>> generate gpio events to a panda-es to see I can make it lock up. I have
>> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
>> problems after sending 100k gpio events (over many minutes). My setup is
>> as follows ...
>>
>> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
>> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
>> - Created a simple kernel module that acquires gpio-138 and sets up a
>>    IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
>> - GPIO events are triggered roughly every 1ms
> 
> Don't know if it's related, but we also mux several other pins on
> connector A:
>         /* MMC2 Mux for extension board */
>         /* MMC2 CMD */
>         OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* MMC2 CLK */
>         OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* MMC2 DAT 0-3 */
>         OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
>         /* GPIO MUX for OOB interupt of dongle */
>         OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
>         /* GPIO MUX for WLAN_ENABLE for dongle */
>         OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

I would not have thought so. However, I will think about that thanks.

>> Can you confirm ...
>> 1. You are just using omap2plus_defconfig with no changes?
> No, we enable following options
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> CONFIG_USB_OHCI_HCD=y

Ok, thanks.

>> 2. Rough frequency of gpio events?
> 3367 interrupts were triggered during a 10 secs throughput test.
> 
>> 3. Is the gpio configured for active low or high?
> active high
> 
>> 4. When the hang occurs, what is the state of the gpio? Active or
>>     inactive? Can you probe it with a scope? If it was always active I
>>     could see that this would lock the device up, but I am not sure how
>>     that would relate to the results from your bisect???
> 
> I dont have a scope nearby. Let me see if I can find one tomorrow.

Great, that would be good.

>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the hang
>>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>>
>> It does not make sense to me yet why this would only impact 4460, but I
>> will keep this in mind.
>>
>> In your wlan driver are you acquiring and freeing the gpio often? Or are
>> you only acquiring the gpio on boot?
>>
>> The reason I ask is because for omap4, it seems that we are not
>> currently calling omap2_gpio_prepare_for_idle() during idle and so the
>> only time I see us call the runtime_suspend/resume handlers for omap4 is
>> during probe and when we acquire and free the gpio.
>>
>> So if you were not acquiring and freeing the gpio and are using the
>> stock kernel, then as far as I can tell, the runtime pm code is not
>> being exercised much. My test is not acquiring and releasing the gpio
>> and so I am wondering if that is the secret to reproducing this
>> problem :-)
> 
> We only request the irq once during initialization. But we do frequently
> disable and re-enable it since we need to access to the module through
> SDIO to clear the interrupt. Apparently we can't finish all this in irq
> handler.

Ok, thanks. I don't see why that would cause a problem, but I can try
that too.

> Hope these could help.

Yes, good info to have.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28  0:41     ` Franky Lin
  (?)
@ 2012-06-28 15:42       ` Jon Hunter
  -1 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 15:42 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/27/2012 07:41 PM, Franky Lin wrote:
> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>> "Franky Lin" <frankyl@broadcom.com> writes:
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the
>>> hang by either reverting the commit or disabling
>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>> Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> If reverting the patch fixes your problem, can you isolate down to which
>> part of that patch causes the problem?  IOW, can you fix your problem if
>> you undo just the hunk added in runtime_suspend or undo just the moved
>> hunk runtime_resume?  Or is reverting both required?
>>
>> I suspect the added runtime_suspend hunk is causing the problems, so can
>> you see if just undoing that part works[1].  If that works, I will give
>> a bit more of a thinking on it tomorrow.
> 
> runtime_suspend hunk is fine. The hang still exist after reverting it.
> The culprit is the moved hunk in runtime_resume. Reverting it makes the
> hang disappear.

Thanks. From reviewing the code the only thing that appears suspect based
upon your findings is the return if we find the context has not been lost.
We are not checking if "workaround_enabled" is set before we return. 

Could you try the following change on top of v3.5-rc3?

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3b89e85 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1238,12 +1238,8 @@ static int omap_gpio_runtime_resume(struct device *dev)
        if (bank->get_context_loss_count) {
                context_lost_cnt_after =
                        bank->get_context_loss_count(bank->dev);
-               if (context_lost_cnt_after != bank->context_loss_count) {
+               if (context_lost_cnt_after != bank->context_loss_count)
                        omap_gpio_restore_context(bank);
-               } else {
-                       spin_unlock_irqrestore(&bank->lock, flags);
-                       return 0;
-               }
        }

Also, could you add a print in the runtime_suspend/resume() functions so
we can see how often these are being called. In my case, I really don't see
these being exercised and I am wondering how often you see suspend/resume
being called in your setup.

Cheers
Jon

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 15:42       ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 15:42 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/27/2012 07:41 PM, Franky Lin wrote:
> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>> "Franky Lin" <frankyl@broadcom.com> writes:
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the
>>> hang by either reverting the commit or disabling
>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>> Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> If reverting the patch fixes your problem, can you isolate down to which
>> part of that patch causes the problem?  IOW, can you fix your problem if
>> you undo just the hunk added in runtime_suspend or undo just the moved
>> hunk runtime_resume?  Or is reverting both required?
>>
>> I suspect the added runtime_suspend hunk is causing the problems, so can
>> you see if just undoing that part works[1].  If that works, I will give
>> a bit more of a thinking on it tomorrow.
> 
> runtime_suspend hunk is fine. The hang still exist after reverting it.
> The culprit is the moved hunk in runtime_resume. Reverting it makes the
> hang disappear.

Thanks. From reviewing the code the only thing that appears suspect based
upon your findings is the return if we find the context has not been lost.
We are not checking if "workaround_enabled" is set before we return. 

Could you try the following change on top of v3.5-rc3?

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3b89e85 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1238,12 +1238,8 @@ static int omap_gpio_runtime_resume(struct device *dev)
        if (bank->get_context_loss_count) {
                context_lost_cnt_after =
                        bank->get_context_loss_count(bank->dev);
-               if (context_lost_cnt_after != bank->context_loss_count) {
+               if (context_lost_cnt_after != bank->context_loss_count)
                        omap_gpio_restore_context(bank);
-               } else {
-                       spin_unlock_irqrestore(&bank->lock, flags);
-                       return 0;
-               }
        }

Also, could you add a print in the runtime_suspend/resume() functions so
we can see how often these are being called. In my case, I really don't see
these being exercised and I am wondering how often you see suspend/resume
being called in your setup.

Cheers
Jon

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 15:42       ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 15:42 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/27/2012 07:41 PM, Franky Lin wrote:
> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>> "Franky Lin" <frankyl@broadcom.com> writes:
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the
>>> hang by either reverting the commit or disabling
>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>> Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> If reverting the patch fixes your problem, can you isolate down to which
>> part of that patch causes the problem?  IOW, can you fix your problem if
>> you undo just the hunk added in runtime_suspend or undo just the moved
>> hunk runtime_resume?  Or is reverting both required?
>>
>> I suspect the added runtime_suspend hunk is causing the problems, so can
>> you see if just undoing that part works[1].  If that works, I will give
>> a bit more of a thinking on it tomorrow.
> 
> runtime_suspend hunk is fine. The hang still exist after reverting it.
> The culprit is the moved hunk in runtime_resume. Reverting it makes the
> hang disappear.

Thanks. From reviewing the code the only thing that appears suspect based
upon your findings is the return if we find the context has not been lost.
We are not checking if "workaround_enabled" is set before we return. 

Could you try the following change on top of v3.5-rc3?

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3b89e85 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1238,12 +1238,8 @@ static int omap_gpio_runtime_resume(struct device *dev)
        if (bank->get_context_loss_count) {
                context_lost_cnt_after =
                        bank->get_context_loss_count(bank->dev);
-               if (context_lost_cnt_after != bank->context_loss_count) {
+               if (context_lost_cnt_after != bank->context_loss_count)
                        omap_gpio_restore_context(bank);
-               } else {
-                       spin_unlock_irqrestore(&bank->lock, flags);
-                       return 0;
-               }
        }

Also, could you add a print in the runtime_suspend/resume() functions so
we can see how often these are being called. In my case, I really don't see
these being exercised and I am wondering how often you see suspend/resume
being called in your setup.

Cheers
Jon

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28 15:42       ` Jon Hunter
@ 2012-06-28 21:24         ` Franky Lin
  -1 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 21:24 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel

On 06/28/2012 08:42 AM, Jon Hunter wrote:
>
> On 06/27/2012 07:41 PM, Franky Lin wrote:
>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>> "Franky Lin" <frankyl@broadcom.com> writes:
>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>> provided two patches in the patch mail thread. But unfortunately those
>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>> hang by either reverting the commit or disabling
>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>> Panda with 4430 works good.
>>>>
>>>> Any thoughts and suggestions?
>>>
>>> If reverting the patch fixes your problem, can you isolate down to which
>>> part of that patch causes the problem?  IOW, can you fix your problem if
>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>> hunk runtime_resume?  Or is reverting both required?
>>>
>>> I suspect the added runtime_suspend hunk is causing the problems, so can
>>> you see if just undoing that part works[1].  If that works, I will give
>>> a bit more of a thinking on it tomorrow.
>>
>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>> hang disappear.
>
> Thanks. From reviewing the code the only thing that appears suspect based
> upon your findings is the return if we find the context has not been lost.
> We are not checking if "workaround_enabled" is set before we return.
>
> Could you try the following change on top of v3.5-rc3?
>

The patch doesn't help. And I also managed to probe the signal. It's 
active when it hung.

> Also, could you add a print in the runtime_suspend/resume() functions so
> we can see how often these are being called. In my case, I really don't see
> these being exercised and I am wondering how often you see suspend/resume
> being called in your setup.

Well, the runtime_suspend/resume never get called during the test.

Thanks,
Franky


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 21:24         ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 21:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/28/2012 08:42 AM, Jon Hunter wrote:
>
> On 06/27/2012 07:41 PM, Franky Lin wrote:
>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>> "Franky Lin" <frankyl@broadcom.com> writes:
>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>> provided two patches in the patch mail thread. But unfortunately those
>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>> hang by either reverting the commit or disabling
>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>> Panda with 4430 works good.
>>>>
>>>> Any thoughts and suggestions?
>>>
>>> If reverting the patch fixes your problem, can you isolate down to which
>>> part of that patch causes the problem?  IOW, can you fix your problem if
>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>> hunk runtime_resume?  Or is reverting both required?
>>>
>>> I suspect the added runtime_suspend hunk is causing the problems, so can
>>> you see if just undoing that part works[1].  If that works, I will give
>>> a bit more of a thinking on it tomorrow.
>>
>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>> hang disappear.
>
> Thanks. From reviewing the code the only thing that appears suspect based
> upon your findings is the return if we find the context has not been lost.
> We are not checking if "workaround_enabled" is set before we return.
>
> Could you try the following change on top of v3.5-rc3?
>

The patch doesn't help. And I also managed to probe the signal. It's 
active when it hung.

> Also, could you add a print in the runtime_suspend/resume() functions so
> we can see how often these are being called. In my case, I really don't see
> these being exercised and I am wondering how often you see suspend/resume
> being called in your setup.

Well, the runtime_suspend/resume never get called during the test.

Thanks,
Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 21:55           ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 21:55 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 04:24 PM, Franky Lin wrote:
> On 06/28/2012 08:42 AM, Jon Hunter wrote:
>>
>> On 06/27/2012 07:41 PM, Franky Lin wrote:
>>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>>> "Franky Lin" <frankyl@broadcom.com> writes:
>>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>>> provided two patches in the patch mail thread. But unfortunately those
>>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>>> hang by either reverting the commit or disabling
>>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>>> Panda with 4430 works good.
>>>>>
>>>>> Any thoughts and suggestions?
>>>>
>>>> If reverting the patch fixes your problem, can you isolate down to
>>>> which
>>>> part of that patch causes the problem?  IOW, can you fix your
>>>> problem if
>>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>>> hunk runtime_resume?  Or is reverting both required?
>>>>
>>>> I suspect the added runtime_suspend hunk is causing the problems, so
>>>> can
>>>> you see if just undoing that part works[1].  If that works, I will give
>>>> a bit more of a thinking on it tomorrow.
>>>
>>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>>> hang disappear.
>>
>> Thanks. From reviewing the code the only thing that appears suspect based
>> upon your findings is the return if we find the context has not been
>> lost.
>> We are not checking if "workaround_enabled" is set before we return.
>>
>> Could you try the following change on top of v3.5-rc3?
>>
> 
> The patch doesn't help. And I also managed to probe the signal. It's
> active when it hung.

Ok. Any way to manually reset the wlan module to deactivate the gpio
when it is hung? I am wondering if the gpio is deactivated if the board
comes back to life, indicating it is stuck in the interrupt somewhere.

>> Also, could you add a print in the runtime_suspend/resume() functions so
>> we can see how often these are being called. In my case, I really
>> don't see
>> these being exercised and I am wondering how often you see suspend/resume
>> being called in your setup.
> 
> Well, the runtime_suspend/resume never get called during the test.

Well, at least that is consistent with what I see, but also perplexing
that it takes sometime to fail. Can you try the following as a debug
patch to see if it is in the context restore that is the problem. From
your testing and bisect, the only possible difference in the current
kernel is that it could perform the context restore when acquiring the gpio.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..a2401bd 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+       return;
+
        __raw_writel(bank->context.wake_en,
                                bank->base + bank->regs->wkup_en);
        __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

Cheers
Jon

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 21:55           ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 21:55 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	santosh.shilimkar-l0cyMroinI0, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	tarun.kanti-l0cyMroinI0,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


On 06/28/2012 04:24 PM, Franky Lin wrote:
> On 06/28/2012 08:42 AM, Jon Hunter wrote:
>>
>> On 06/27/2012 07:41 PM, Franky Lin wrote:
>>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>>> "Franky Lin" <frankyl-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> writes:
>>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>>> provided two patches in the patch mail thread. But unfortunately those
>>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>>> hang by either reverting the commit or disabling
>>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>>> Panda with 4430 works good.
>>>>>
>>>>> Any thoughts and suggestions?
>>>>
>>>> If reverting the patch fixes your problem, can you isolate down to
>>>> which
>>>> part of that patch causes the problem?  IOW, can you fix your
>>>> problem if
>>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>>> hunk runtime_resume?  Or is reverting both required?
>>>>
>>>> I suspect the added runtime_suspend hunk is causing the problems, so
>>>> can
>>>> you see if just undoing that part works[1].  If that works, I will give
>>>> a bit more of a thinking on it tomorrow.
>>>
>>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>>> hang disappear.
>>
>> Thanks. From reviewing the code the only thing that appears suspect based
>> upon your findings is the return if we find the context has not been
>> lost.
>> We are not checking if "workaround_enabled" is set before we return.
>>
>> Could you try the following change on top of v3.5-rc3?
>>
> 
> The patch doesn't help. And I also managed to probe the signal. It's
> active when it hung.

Ok. Any way to manually reset the wlan module to deactivate the gpio
when it is hung? I am wondering if the gpio is deactivated if the board
comes back to life, indicating it is stuck in the interrupt somewhere.

>> Also, could you add a print in the runtime_suspend/resume() functions so
>> we can see how often these are being called. In my case, I really
>> don't see
>> these being exercised and I am wondering how often you see suspend/resume
>> being called in your setup.
> 
> Well, the runtime_suspend/resume never get called during the test.

Well, at least that is consistent with what I see, but also perplexing
that it takes sometime to fail. Can you try the following as a debug
patch to see if it is in the context restore that is the problem. From
your testing and bisect, the only possible difference in the current
kernel is that it could perform the context restore when acquiring the gpio.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..a2401bd 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+       return;
+
        __raw_writel(bank->context.wake_en,
                                bank->base + bank->regs->wkup_en);
        __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 21:55           ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 21:55 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/28/2012 04:24 PM, Franky Lin wrote:
> On 06/28/2012 08:42 AM, Jon Hunter wrote:
>>
>> On 06/27/2012 07:41 PM, Franky Lin wrote:
>>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>>>> "Franky Lin" <frankyl@broadcom.com> writes:
>>>>> I noticed Kevin raised some similar cases on other platforms and also
>>>>> provided two patches in the patch mail thread. But unfortunately those
>>>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>>>> mainline kernel and the issue is still there. I can only "fix" the
>>>>> hang by either reverting the commit or disabling
>>>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>>>> Panda with 4430 works good.
>>>>>
>>>>> Any thoughts and suggestions?
>>>>
>>>> If reverting the patch fixes your problem, can you isolate down to
>>>> which
>>>> part of that patch causes the problem?  IOW, can you fix your
>>>> problem if
>>>> you undo just the hunk added in runtime_suspend or undo just the moved
>>>> hunk runtime_resume?  Or is reverting both required?
>>>>
>>>> I suspect the added runtime_suspend hunk is causing the problems, so
>>>> can
>>>> you see if just undoing that part works[1].  If that works, I will give
>>>> a bit more of a thinking on it tomorrow.
>>>
>>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>>> hang disappear.
>>
>> Thanks. From reviewing the code the only thing that appears suspect based
>> upon your findings is the return if we find the context has not been
>> lost.
>> We are not checking if "workaround_enabled" is set before we return.
>>
>> Could you try the following change on top of v3.5-rc3?
>>
> 
> The patch doesn't help. And I also managed to probe the signal. It's
> active when it hung.

Ok. Any way to manually reset the wlan module to deactivate the gpio
when it is hung? I am wondering if the gpio is deactivated if the board
comes back to life, indicating it is stuck in the interrupt somewhere.

>> Also, could you add a print in the runtime_suspend/resume() functions so
>> we can see how often these are being called. In my case, I really
>> don't see
>> these being exercised and I am wondering how often you see suspend/resume
>> being called in your setup.
> 
> Well, the runtime_suspend/resume never get called during the test.

Well, at least that is consistent with what I see, but also perplexing
that it takes sometime to fail. Can you try the following as a debug
patch to see if it is in the context restore that is the problem. From
your testing and bisect, the only possible difference in the current
kernel is that it could perform the context restore when acquiring the gpio.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..a2401bd 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+       return;
+
        __raw_writel(bank->context.wake_en,
                                bank->base + bank->regs->wkup_en);
        __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

Cheers
Jon

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 22:53             ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 22:53 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel

On 06/28/2012 02:55 PM, Jon Hunter wrote:
> Ok. Any way to manually reset the wlan module to deactivate the gpio
> when it is hung? I am wondering if the gpio is deactivated if the board
> comes back to life, indicating it is stuck in the interrupt somewhere.

The only way I can think of is removing the module manually. But it 
didn't bring the board back to live.

> Well, at least that is consistent with what I see, but also perplexing
> that it takes sometime to fail. Can you try the following as a debug
> patch to see if it is in the context restore that is the problem. From
> your testing and bisect, the only possible difference in the current
> kernel is that it could perform the context restore when acquiring the gpio.
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..a2401bd 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>   #if defined(CONFIG_PM_RUNTIME)
>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>   {
> +       return;
> +
>          __raw_writel(bank->context.wake_en,
>                                  bank->base + bank->regs->wkup_en);
>          __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>

This one works! It can run more than 20 mins.

I found one interesting thing. When I added the print info to see when 
runtime_suspend/resume get called, it seems like the suspend/resume is 
unbalance during boot. Resume got called more than suspend. So I hack 
the code to make sure suspend and resume are called in pair. A resume 
without suspend will do nothing and return immediately. This also makes 
the hang vanish.

Regards,
Franky



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 22:53             ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 22:53 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Kevin Hilman, b-cousson-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	santosh.shilimkar-l0cyMroinI0, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	tarun.kanti-l0cyMroinI0,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 06/28/2012 02:55 PM, Jon Hunter wrote:
> Ok. Any way to manually reset the wlan module to deactivate the gpio
> when it is hung? I am wondering if the gpio is deactivated if the board
> comes back to life, indicating it is stuck in the interrupt somewhere.

The only way I can think of is removing the module manually. But it 
didn't bring the board back to live.

> Well, at least that is consistent with what I see, but also perplexing
> that it takes sometime to fail. Can you try the following as a debug
> patch to see if it is in the context restore that is the problem. From
> your testing and bisect, the only possible difference in the current
> kernel is that it could perform the context restore when acquiring the gpio.
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..a2401bd 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>   #if defined(CONFIG_PM_RUNTIME)
>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>   {
> +       return;
> +
>          __raw_writel(bank->context.wake_en,
>                                  bank->base + bank->regs->wkup_en);
>          __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>

This one works! It can run more than 20 mins.

I found one interesting thing. When I added the print info to see when 
runtime_suspend/resume get called, it seems like the suspend/resume is 
unbalance during boot. Resume got called more than suspend. So I hack 
the code to make sure suspend and resume are called in pair. A resume 
without suspend will do nothing and return immediately. This also makes 
the hang vanish.

Regards,
Franky


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 22:53             ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 22:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/28/2012 02:55 PM, Jon Hunter wrote:
> Ok. Any way to manually reset the wlan module to deactivate the gpio
> when it is hung? I am wondering if the gpio is deactivated if the board
> comes back to life, indicating it is stuck in the interrupt somewhere.

The only way I can think of is removing the module manually. But it 
didn't bring the board back to live.

> Well, at least that is consistent with what I see, but also perplexing
> that it takes sometime to fail. Can you try the following as a debug
> patch to see if it is in the context restore that is the problem. From
> your testing and bisect, the only possible difference in the current
> kernel is that it could perform the context restore when acquiring the gpio.
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..a2401bd 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>   #if defined(CONFIG_PM_RUNTIME)
>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>   {
> +       return;
> +
>          __raw_writel(bank->context.wake_en,
>                                  bank->base + bank->regs->wkup_en);
>          __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>

This one works! It can run more than 20 mins.

I found one interesting thing. When I added the print info to see when 
runtime_suspend/resume get called, it seems like the suspend/resume is 
unbalance during boot. Resume got called more than suspend. So I hack 
the code to make sure suspend and resume are called in pair. A resume 
without suspend will do nothing and return immediately. This also makes 
the hang vanish.

Regards,
Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28 22:53             ` Franky Lin
  (?)
@ 2012-06-28 22:59               ` Jon Hunter
  -1 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 22:59 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 05:53 PM, Franky Lin wrote:
> On 06/28/2012 02:55 PM, Jon Hunter wrote:
>> Ok. Any way to manually reset the wlan module to deactivate the gpio
>> when it is hung? I am wondering if the gpio is deactivated if the board
>> comes back to life, indicating it is stuck in the interrupt somewhere.
> 
> The only way I can think of is removing the module manually. But it
> didn't bring the board back to live.
> 
>> Well, at least that is consistent with what I see, but also perplexing
>> that it takes sometime to fail. Can you try the following as a debug
>> patch to see if it is in the context restore that is the problem. From
>> your testing and bisect, the only possible difference in the current
>> kernel is that it could perform the context restore when acquiring the
>> gpio.
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..a2401bd 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>>   #if defined(CONFIG_PM_RUNTIME)
>>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>>   {
>> +       return;
>> +
>>          __raw_writel(bank->context.wake_en,
>>                                  bank->base + bank->regs->wkup_en);
>>          __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>>
> 
> This one works! It can run more than 20 mins.

Great! I need to dig into the context restore some more.

> I found one interesting thing. When I added the print info to see when
> runtime_suspend/resume get called, it seems like the suspend/resume is
> unbalance during boot. Resume got called more than suspend. So I hack
> the code to make sure suspend and resume are called in pair. A resume
> without suspend will do nothing and return immediately. This also makes
> the hang vanish.

I am not 100% sure I follow. On boot I would expect to see a
resume/suspend due to the probe on the irq bank and then I would expect
to see another resume from the acquisition of the gpio, however, I would
not expect a suspend until the gpio is freed, which I don't believe you
are doing.

Can you share your hack? Just paste the diff? This may help me
understand more.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 22:59               ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 22:59 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 05:53 PM, Franky Lin wrote:
> On 06/28/2012 02:55 PM, Jon Hunter wrote:
>> Ok. Any way to manually reset the wlan module to deactivate the gpio
>> when it is hung? I am wondering if the gpio is deactivated if the board
>> comes back to life, indicating it is stuck in the interrupt somewhere.
> 
> The only way I can think of is removing the module manually. But it
> didn't bring the board back to live.
> 
>> Well, at least that is consistent with what I see, but also perplexing
>> that it takes sometime to fail. Can you try the following as a debug
>> patch to see if it is in the context restore that is the problem. From
>> your testing and bisect, the only possible difference in the current
>> kernel is that it could perform the context restore when acquiring the
>> gpio.
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..a2401bd 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>>   #if defined(CONFIG_PM_RUNTIME)
>>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>>   {
>> +       return;
>> +
>>          __raw_writel(bank->context.wake_en,
>>                                  bank->base + bank->regs->wkup_en);
>>          __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>>
> 
> This one works! It can run more than 20 mins.

Great! I need to dig into the context restore some more.

> I found one interesting thing. When I added the print info to see when
> runtime_suspend/resume get called, it seems like the suspend/resume is
> unbalance during boot. Resume got called more than suspend. So I hack
> the code to make sure suspend and resume are called in pair. A resume
> without suspend will do nothing and return immediately. This also makes
> the hang vanish.

I am not 100% sure I follow. On boot I would expect to see a
resume/suspend due to the probe on the irq bank and then I would expect
to see another resume from the acquisition of the gpio, however, I would
not expect a suspend until the gpio is freed, which I don't believe you
are doing.

Can you share your hack? Just paste the diff? This may help me
understand more.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 22:59               ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 22:59 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/28/2012 05:53 PM, Franky Lin wrote:
> On 06/28/2012 02:55 PM, Jon Hunter wrote:
>> Ok. Any way to manually reset the wlan module to deactivate the gpio
>> when it is hung? I am wondering if the gpio is deactivated if the board
>> comes back to life, indicating it is stuck in the interrupt somewhere.
> 
> The only way I can think of is removing the module manually. But it
> didn't bring the board back to live.
> 
>> Well, at least that is consistent with what I see, but also perplexing
>> that it takes sometime to fail. Can you try the following as a debug
>> patch to see if it is in the context restore that is the problem. From
>> your testing and bisect, the only possible difference in the current
>> kernel is that it could perform the context restore when acquiring the
>> gpio.
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..a2401bd 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>>   #if defined(CONFIG_PM_RUNTIME)
>>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>>   {
>> +       return;
>> +
>>          __raw_writel(bank->context.wake_en,
>>                                  bank->base + bank->regs->wkup_en);
>>          __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>>
> 
> This one works! It can run more than 20 mins.

Great! I need to dig into the context restore some more.

> I found one interesting thing. When I added the print info to see when
> runtime_suspend/resume get called, it seems like the suspend/resume is
> unbalance during boot. Resume got called more than suspend. So I hack
> the code to make sure suspend and resume are called in pair. A resume
> without suspend will do nothing and return immediately. This also makes
> the hang vanish.

I am not 100% sure I follow. On boot I would expect to see a
resume/suspend due to the probe on the irq bank and then I would expect
to see another resume from the acquisition of the gpio, however, I would
not expect a suspend until the gpio is freed, which I don't believe you
are doing.

Can you share your hack? Just paste the diff? This may help me
understand more.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28 22:59               ` Jon Hunter
@ 2012-06-28 23:10                 ` Franky Lin
  -1 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 23:10 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel

On 06/28/2012 03:59 PM, Jon Hunter wrote:
>
> On 06/28/2012 05:53 PM, Franky Lin wrote:
>> I found one interesting thing. When I added the print info to see when
>> runtime_suspend/resume get called, it seems like the suspend/resume is
>> unbalance during boot. Resume got called more than suspend. So I hack
>> the code to make sure suspend and resume are called in pair. A resume
>> without suspend will do nothing and return immediately. This also makes
>> the hang vanish.
>
> I am not 100% sure I follow. On boot I would expect to see a
> resume/suspend due to the probe on the irq bank and then I would expect
> to see another resume from the acquisition of the gpio, however, I would
> not expect a suspend until the gpio is freed, which I don't believe you
> are doing.
>
> Can you share your hack? Just paste the diff? This may help me
> understand more.
>

OK.
This is what I saw in the log:
[    0.171844] dummy:
[    0.172912] NET: Registered protocol family 16
[    0.173431] GPMC revision 6.0
[    0.173492] gpmc: irq-52 could not claim: err -22
[    0.177551] ??????omap_gpio_runtime_resume
[    0.178619] OMAP GPIO hardware version 0.1
[    0.178649] !!!!!omap_gpio_runtime_suspend
[    0.178771] ??????omap_gpio_runtime_resume
[    0.179351] !!!!!omap_gpio_runtime_suspend
[    0.179504] ??????omap_gpio_runtime_resume
[    0.180023] !!!!!omap_gpio_runtime_suspend
[    0.180145] ??????omap_gpio_runtime_resume
[    0.180694] !!!!!omap_gpio_runtime_suspend
[    0.180847] ??????omap_gpio_runtime_resume
[    0.181365] !!!!!omap_gpio_runtime_suspend
[    0.181518] ??????omap_gpio_runtime_resume
[    0.182037] !!!!!omap_gpio_runtime_suspend
[    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
[    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
[    0.186584] error setting wl12xx data: -38
[    0.189788] _omap_mux_get_by_name: Could not find signal 
uart1_rx.uart1_rx
[    0.189788] _omap_mux_get_by_name: Could not find signal 
uart1_rx.uart1_rx
[    0.239501] ??????omap_gpio_runtime_resume
[    0.239532] ??????omap_gpio_runtime_resume
[    0.241058]  usbhs_omap: alias fck already exists
[    0.244781] ??????omap_gpio_runtime_resume

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..bca3985 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct 
platform_device *pdev)

  #if defined(CONFIG_PM_RUNTIME)
  static void omap_gpio_restore_context(struct gpio_bank *bank);
-
+static int flag = 0;
  static int omap_gpio_runtime_suspend(struct device *dev)
  {
         struct platform_device *pdev = to_platform_device(dev);
@@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device 
*dev)
         unsigned long flags;
         u32 wake_low, wake_hi;

+       flag ++;
+
         spin_lock_irqsave(&bank->lock, flags);

         /*
@@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device 
*dev)
         u32 l = 0, gen, gen0, gen1;
         unsigned long flags;

+       if (flag)
+               flag--;
+       else
+               return 0;
+
         spin_lock_irqsave(&bank->lock, flags);
         _gpio_dbck_enable(bank);

Regards,
Franky


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:10                 ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-28 23:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/28/2012 03:59 PM, Jon Hunter wrote:
>
> On 06/28/2012 05:53 PM, Franky Lin wrote:
>> I found one interesting thing. When I added the print info to see when
>> runtime_suspend/resume get called, it seems like the suspend/resume is
>> unbalance during boot. Resume got called more than suspend. So I hack
>> the code to make sure suspend and resume are called in pair. A resume
>> without suspend will do nothing and return immediately. This also makes
>> the hang vanish.
>
> I am not 100% sure I follow. On boot I would expect to see a
> resume/suspend due to the probe on the irq bank and then I would expect
> to see another resume from the acquisition of the gpio, however, I would
> not expect a suspend until the gpio is freed, which I don't believe you
> are doing.
>
> Can you share your hack? Just paste the diff? This may help me
> understand more.
>

OK.
This is what I saw in the log:
[    0.171844] dummy:
[    0.172912] NET: Registered protocol family 16
[    0.173431] GPMC revision 6.0
[    0.173492] gpmc: irq-52 could not claim: err -22
[    0.177551] ??????omap_gpio_runtime_resume
[    0.178619] OMAP GPIO hardware version 0.1
[    0.178649] !!!!!omap_gpio_runtime_suspend
[    0.178771] ??????omap_gpio_runtime_resume
[    0.179351] !!!!!omap_gpio_runtime_suspend
[    0.179504] ??????omap_gpio_runtime_resume
[    0.180023] !!!!!omap_gpio_runtime_suspend
[    0.180145] ??????omap_gpio_runtime_resume
[    0.180694] !!!!!omap_gpio_runtime_suspend
[    0.180847] ??????omap_gpio_runtime_resume
[    0.181365] !!!!!omap_gpio_runtime_suspend
[    0.181518] ??????omap_gpio_runtime_resume
[    0.182037] !!!!!omap_gpio_runtime_suspend
[    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
[    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
[    0.186584] error setting wl12xx data: -38
[    0.189788] _omap_mux_get_by_name: Could not find signal 
uart1_rx.uart1_rx
[    0.189788] _omap_mux_get_by_name: Could not find signal 
uart1_rx.uart1_rx
[    0.239501] ??????omap_gpio_runtime_resume
[    0.239532] ??????omap_gpio_runtime_resume
[    0.241058]  usbhs_omap: alias fck already exists
[    0.244781] ??????omap_gpio_runtime_resume

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..bca3985 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct 
platform_device *pdev)

  #if defined(CONFIG_PM_RUNTIME)
  static void omap_gpio_restore_context(struct gpio_bank *bank);
-
+static int flag = 0;
  static int omap_gpio_runtime_suspend(struct device *dev)
  {
         struct platform_device *pdev = to_platform_device(dev);
@@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device 
*dev)
         unsigned long flags;
         u32 wake_low, wake_hi;

+       flag ++;
+
         spin_lock_irqsave(&bank->lock, flags);

         /*
@@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device 
*dev)
         u32 l = 0, gen, gen0, gen1;
         unsigned long flags;

+       if (flag)
+               flag--;
+       else
+               return 0;
+
         spin_lock_irqsave(&bank->lock, flags);
         _gpio_dbck_enable(bank);

Regards,
Franky

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:28                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:28 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend

There a 6 resume/suspend pairs here one for probing each of the 6 gpio
banks. So this makes sense.

> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

Yes, these 3 resumes at the end are most likely caused by calls to
omap_gpio_request(). In other words, 3 gpios are acquired. So that is
expected and looks fine to me.

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..bca3985 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
> 
>  #if defined(CONFIG_PM_RUNTIME)
>  static void omap_gpio_restore_context(struct gpio_bank *bank);
> -
> +static int flag = 0;
>  static int omap_gpio_runtime_suspend(struct device *dev)
>  {
>         struct platform_device *pdev = to_platform_device(dev);
> @@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device
> *dev)
>         unsigned long flags;
>         u32 wake_low, wake_hi;
> 
> +       flag ++;
> +
>         spin_lock_irqsave(&bank->lock, flags);
> 
>         /*
> @@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device
> *dev)
>         u32 l = 0, gen, gen0, gen1;
>         unsigned long flags;
> 
> +       if (flag)
> +               flag--;
> +       else
> +               return 0;
> +
>         spin_lock_irqsave(&bank->lock, flags);
>         _gpio_dbck_enable(bank);

I guess that this would also avoid the context restore, so I could see
it would work, but this is definitely not right. Ok, well let me look
into the restore.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:28                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:28 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson-l0cyMroinI0, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	santosh.shilimkar-l0cyMroinI0, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	tarun.kanti-l0cyMroinI0,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend

There a 6 resume/suspend pairs here one for probing each of the 6 gpio
banks. So this makes sense.

> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

Yes, these 3 resumes at the end are most likely caused by calls to
omap_gpio_request(). In other words, 3 gpios are acquired. So that is
expected and looks fine to me.

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..bca3985 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
> 
>  #if defined(CONFIG_PM_RUNTIME)
>  static void omap_gpio_restore_context(struct gpio_bank *bank);
> -
> +static int flag = 0;
>  static int omap_gpio_runtime_suspend(struct device *dev)
>  {
>         struct platform_device *pdev = to_platform_device(dev);
> @@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device
> *dev)
>         unsigned long flags;
>         u32 wake_low, wake_hi;
> 
> +       flag ++;
> +
>         spin_lock_irqsave(&bank->lock, flags);
> 
>         /*
> @@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device
> *dev)
>         u32 l = 0, gen, gen0, gen1;
>         unsigned long flags;
> 
> +       if (flag)
> +               flag--;
> +       else
> +               return 0;
> +
>         spin_lock_irqsave(&bank->lock, flags);
>         _gpio_dbck_enable(bank);

I guess that this would also avoid the context restore, so I could see
it would work, but this is definitely not right. Ok, well let me look
into the restore.

Thanks
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:28                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:28 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend

There a 6 resume/suspend pairs here one for probing each of the 6 gpio
banks. So this makes sense.

> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

Yes, these 3 resumes at the end are most likely caused by calls to
omap_gpio_request(). In other words, 3 gpios are acquired. So that is
expected and looks fine to me.

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..bca3985 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
> 
>  #if defined(CONFIG_PM_RUNTIME)
>  static void omap_gpio_restore_context(struct gpio_bank *bank);
> -
> +static int flag = 0;
>  static int omap_gpio_runtime_suspend(struct device *dev)
>  {
>         struct platform_device *pdev = to_platform_device(dev);
> @@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device
> *dev)
>         unsigned long flags;
>         u32 wake_low, wake_hi;
> 
> +       flag ++;
> +
>         spin_lock_irqsave(&bank->lock, flags);
> 
>         /*
> @@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device
> *dev)
>         u32 l = 0, gen, gen0, gen1;
>         unsigned long flags;
> 
> +       if (flag)
> +               flag--;
> +       else
> +               return 0;
> +
>         spin_lock_irqsave(&bank->lock, flags);
>         _gpio_dbck_enable(bank);

I guess that this would also avoid the context restore, so I could see
it would work, but this is definitely not right. Ok, well let me look
into the restore.

Thanks
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28 23:10                 ` Franky Lin
  (?)
@ 2012-06-28 23:35                   ` Jon Hunter
  -1 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:35 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

Sorry, can you do one more test? :-)

Add the following and send me the output?

Thanks!
Jon

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3aa0f96 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1155,6 +1155,7 @@ static int omap_gpio_runtime_suspend(struct device
*dev)
        unsigned long flags;
        u32 wake_low, wake_hi;

+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        spin_lock_irqsave(&bank->lock, flags);

        /*
@@ -1221,6 +1222,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
        u32 l = 0, gen, gen0, gen1;
        unsigned long flags;

+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        spin_lock_irqsave(&bank->lock, flags);
        _gpio_dbck_enable(bank);

@@ -1239,6 +1241,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
                context_lost_cnt_after =
                        bank->get_context_loss_count(bank->dev);
                if (context_lost_cnt_after != bank->context_loss_count) {
+                       pr_info("%s: count %d, now %d", __func__,
bank->context_loss_count, context_lost_cnt_after);
                        omap_gpio_restore_context(bank);
                } else {
                        spin_unlock_irqrestore(&bank->lock, flags);
@@ -1341,6 +1344,7 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        __raw_writel(bank->context.wake_en,
                                bank->base + bank->regs->wkup_en);
        __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:35                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:35 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

Sorry, can you do one more test? :-)

Add the following and send me the output?

Thanks!
Jon

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3aa0f96 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1155,6 +1155,7 @@ static int omap_gpio_runtime_suspend(struct device
*dev)
        unsigned long flags;
        u32 wake_low, wake_hi;

+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        spin_lock_irqsave(&bank->lock, flags);

        /*
@@ -1221,6 +1222,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
        u32 l = 0, gen, gen0, gen1;
        unsigned long flags;

+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        spin_lock_irqsave(&bank->lock, flags);
        _gpio_dbck_enable(bank);

@@ -1239,6 +1241,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
                context_lost_cnt_after =
                        bank->get_context_loss_count(bank->dev);
                if (context_lost_cnt_after != bank->context_loss_count) {
+                       pr_info("%s: count %d, now %d", __func__,
bank->context_loss_count, context_lost_cnt_after);
                        omap_gpio_restore_context(bank);
                } else {
                        spin_unlock_irqrestore(&bank->lock, flags);
@@ -1341,6 +1344,7 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        __raw_writel(bank->context.wake_en,
                                bank->base + bank->regs->wkup_en);
        __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:35                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:35 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

Sorry, can you do one more test? :-)

Add the following and send me the output?

Thanks!
Jon

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3aa0f96 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1155,6 +1155,7 @@ static int omap_gpio_runtime_suspend(struct device
*dev)
        unsigned long flags;
        u32 wake_low, wake_hi;

+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        spin_lock_irqsave(&bank->lock, flags);

        /*
@@ -1221,6 +1222,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
        u32 l = 0, gen, gen0, gen1;
        unsigned long flags;

+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        spin_lock_irqsave(&bank->lock, flags);
        _gpio_dbck_enable(bank);

@@ -1239,6 +1241,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
                context_lost_cnt_after =
                        bank->get_context_loss_count(bank->dev);
                if (context_lost_cnt_after != bank->context_loss_count) {
+                       pr_info("%s: count %d, now %d", __func__,
bank->context_loss_count, context_lost_cnt_after);
                        omap_gpio_restore_context(bank);
                } else {
                        spin_unlock_irqrestore(&bank->lock, flags);
@@ -1341,6 +1344,7 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+       pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
        __raw_writel(bank->context.wake_en,
                                bank->base + bank->regs->wkup_en);
        __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28 23:10                 ` Franky Lin
  (?)
@ 2012-06-28 23:54                   ` Jon Hunter
  -1 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:54 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
[    0.177276] omap_gpio_runtime_resume: count 0, now 1
[    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000

Can you try ...

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)
 #ifdef CONFIG_OF_GPIO
        bank->chip.of_node = of_node_get(node);
 #endif
+       if (bank->get_context_loss_count)
+               bank->context_loss_count =
+                               bank->get_context_loss_count(bank->dev);

        bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
        if (bank->irq_base < 0) {

Thanks
Jon


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:54                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:54 UTC (permalink / raw)
  To: Franky Lin
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
[    0.177276] omap_gpio_runtime_resume: count 0, now 1
[    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000

Can you try ...

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)
 #ifdef CONFIG_OF_GPIO
        bank->chip.of_node = of_node_get(node);
 #endif
+       if (bank->get_context_loss_count)
+               bank->context_loss_count =
+                               bank->get_context_loss_count(bank->dev);

        bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
        if (bank->irq_base < 0) {

Thanks
Jon


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-28 23:54                   ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-28 23:54 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [    0.171844] dummy:
> [    0.172912] NET: Registered protocol family 16
> [    0.173431] GPMC revision 6.0
> [    0.173492] gpmc: irq-52 could not claim: err -22
> [    0.177551] ??????omap_gpio_runtime_resume
> [    0.178619] OMAP GPIO hardware version 0.1
> [    0.178649] !!!!!omap_gpio_runtime_suspend
> [    0.178771] ??????omap_gpio_runtime_resume
> [    0.179351] !!!!!omap_gpio_runtime_suspend
> [    0.179504] ??????omap_gpio_runtime_resume
> [    0.180023] !!!!!omap_gpio_runtime_suspend
> [    0.180145] ??????omap_gpio_runtime_resume
> [    0.180694] !!!!!omap_gpio_runtime_suspend
> [    0.180847] ??????omap_gpio_runtime_resume
> [    0.181365] !!!!!omap_gpio_runtime_suspend
> [    0.181518] ??????omap_gpio_runtime_resume
> [    0.182037] !!!!!omap_gpio_runtime_suspend
> [    0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [    0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [    0.186584] error setting wl12xx data: -38
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [    0.239501] ??????omap_gpio_runtime_resume
> [    0.239532] ??????omap_gpio_runtime_resume
> [    0.241058]  usbhs_omap: alias fck already exists
> [    0.244781] ??????omap_gpio_runtime_resume

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
[    0.177276] omap_gpio_runtime_resume: count 0, now 1
[    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000

Can you try ...

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)
 #ifdef CONFIG_OF_GPIO
        bank->chip.of_node = of_node_get(node);
 #endif
+       if (bank->get_context_loss_count)
+               bank->context_loss_count =
+                               bank->get_context_loss_count(bank->dev);

        bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
        if (bank->irq_base < 0) {

Thanks
Jon

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-28 23:54                   ` Jon Hunter
@ 2012-06-29  0:59                     ` Franky Lin
  -1 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-29  0:59 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Kevin Hilman, b-cousson, tony, linux-wireless, grant.likely,
	santosh.shilimkar, linux-omap, tarun.kanti, linux-arm-kernel

On 06/28/2012 04:54 PM, Jon Hunter wrote:
> I am wondering if this could be the bug ... on start-up I see that we do
> a context restore on bank1 during the probe which is before we have done
> the first suspend! In other words, we could restore a bad/uninitialised
> context for bank1. In the case of bank1, the loss count starts at 1 and
> not 0 and so we falsely think we need to perform a restore :-(
>
> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>
> Can you try ...
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..9623408 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
>   #ifdef CONFIG_OF_GPIO
>          bank->chip.of_node = of_node_get(node);
>   #endif
> +       if (bank->get_context_loss_count)
> +               bank->context_loss_count =
> +                               bank->get_context_loss_count(bank->dev);
>
>          bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>          if (bank->irq_base < 0) {
>

Looks like you found the culprit. :) It does fix the problem.

Franky


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-29  0:59                     ` Franky Lin
  0 siblings, 0 replies; 57+ messages in thread
From: Franky Lin @ 2012-06-29  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/28/2012 04:54 PM, Jon Hunter wrote:
> I am wondering if this could be the bug ... on start-up I see that we do
> a context restore on bank1 during the probe which is before we have done
> the first suspend! In other words, we could restore a bad/uninitialised
> context for bank1. In the case of bank1, the loss count starts at 1 and
> not 0 and so we falsely think we need to perform a restore :-(
>
> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>
> Can you try ...
>
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..9623408 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
>   #ifdef CONFIG_OF_GPIO
>          bank->chip.of_node = of_node_get(node);
>   #endif
> +       if (bank->get_context_loss_count)
> +               bank->context_loss_count =
> +                               bank->get_context_loss_count(bank->dev);
>
>          bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>          if (bank->irq_base < 0) {
>

Looks like you found the culprit. :) It does fix the problem.

Franky

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
  2012-06-29  0:59                     ` Franky Lin
  (?)
@ 2012-06-29  4:07                       ` DebBarma, Tarun Kanti
  -1 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-29  4:07 UTC (permalink / raw)
  To: Franky Lin
  Cc: Jon Hunter, Kevin Hilman, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel

On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>
>> I am wondering if this could be the bug ... on start-up I see that we do
>> a context restore on bank1 during the probe which is before we have done
>> the first suspend! In other words, we could restore a bad/uninitialised
>> context for bank1. In the case of bank1, the loss count starts at 1 and
>> not 0 and so we falsely think we need to perform a restore :-(
>>
>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>
>> Can you try ...
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..9623408 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>> platform_device *pdev)
>>  #ifdef CONFIG_OF_GPIO
>>         bank->chip.of_node = of_node_get(node);
>>  #endif
>> +       if (bank->get_context_loss_count)
>> +               bank->context_loss_count =
>> +                               bank->get_context_loss_count(bank->dev);
>>
>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>         if (bank->irq_base < 0) {
>>
>
> Looks like you found the culprit. :) It does fix the problem.
So this looks similar to what NeilBrown <neilb@suse.de> reported in
another thread.
The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
WKUP domain. In fact he tried out with the same fix. Anyways, we
should hear from
Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
bank or to put the workaround here in the gpio driver.
--
Tarun
>
> Franky
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-29  4:07                       ` DebBarma, Tarun Kanti
  0 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-29  4:07 UTC (permalink / raw)
  To: Franky Lin
  Cc: Jon Hunter, Kevin Hilman, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel

On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>
>> I am wondering if this could be the bug ... on start-up I see that we do
>> a context restore on bank1 during the probe which is before we have done
>> the first suspend! In other words, we could restore a bad/uninitialised
>> context for bank1. In the case of bank1, the loss count starts at 1 and
>> not 0 and so we falsely think we need to perform a restore :-(
>>
>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>
>> Can you try ...
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..9623408 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>> platform_device *pdev)
>>  #ifdef CONFIG_OF_GPIO
>>         bank->chip.of_node = of_node_get(node);
>>  #endif
>> +       if (bank->get_context_loss_count)
>> +               bank->context_loss_count =
>> +                               bank->get_context_loss_count(bank->dev);
>>
>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>         if (bank->irq_base < 0) {
>>
>
> Looks like you found the culprit. :) It does fix the problem.
So this looks similar to what NeilBrown <neilb@suse.de> reported in
another thread.
The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
WKUP domain. In fact he tried out with the same fix. Anyways, we
should hear from
Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
bank or to put the workaround here in the gpio driver.
--
Tarun
>
> Franky
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-29  4:07                       ` DebBarma, Tarun Kanti
  0 siblings, 0 replies; 57+ messages in thread
From: DebBarma, Tarun Kanti @ 2012-06-29  4:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>
>> I am wondering if this could be the bug ... on start-up I see that we do
>> a context restore on bank1 during the probe which is before we have done
>> the first suspend! In other words, we could restore a bad/uninitialised
>> context for bank1. In the case of bank1, the loss count starts at 1 and
>> not 0 and so we falsely think we need to perform a restore :-(
>>
>> [ ? ?0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>> [ ? ?0.177276] omap_gpio_runtime_resume: count 0, now 1
>> [ ? ?0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>> [ ? ?0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>
>> Can you try ...
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..9623408 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>> platform_device *pdev)
>> ?#ifdef CONFIG_OF_GPIO
>> ? ? ? ? bank->chip.of_node = of_node_get(node);
>> ?#endif
>> + ? ? ? if (bank->get_context_loss_count)
>> + ? ? ? ? ? ? ? bank->context_loss_count =
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bank->get_context_loss_count(bank->dev);
>>
>> ? ? ? ? bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>> ? ? ? ? if (bank->irq_base < 0) {
>>
>
> Looks like you found the culprit. :) It does fix the problem.
So this looks similar to what NeilBrown <neilb@suse.de> reported in
another thread.
The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
WKUP domain. In fact he tried out with the same fix. Anyways, we
should hear from
Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
bank or to put the workaround here in the gpio driver.
--
Tarun
>
> Franky
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-29 15:53                         ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-29 15:53 UTC (permalink / raw)
  To: DebBarma, Tarun Kanti
  Cc: Franky Lin, Kevin Hilman, b-cousson, tony, linux-wireless,
	grant.likely, santosh.shilimkar, linux-omap, linux-arm-kernel


On 06/28/2012 11:07 PM, DebBarma, Tarun Kanti wrote:
> On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
>> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>>
>>> I am wondering if this could be the bug ... on start-up I see that we do
>>> a context restore on bank1 during the probe which is before we have done
>>> the first suspend! In other words, we could restore a bad/uninitialised
>>> context for bank1. In the case of bank1, the loss count starts at 1 and
>>> not 0 and so we falsely think we need to perform a restore :-(
>>>
>>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>>
>>> Can you try ...
>>>
>>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>>> index c4ed172..9623408 100644
>>> --- a/drivers/gpio/gpio-omap.c
>>> +++ b/drivers/gpio/gpio-omap.c
>>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>>> platform_device *pdev)
>>>  #ifdef CONFIG_OF_GPIO
>>>         bank->chip.of_node = of_node_get(node);
>>>  #endif
>>> +       if (bank->get_context_loss_count)
>>> +               bank->context_loss_count =
>>> +                               bank->get_context_loss_count(bank->dev);
>>>
>>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>>         if (bank->irq_base < 0) {
>>>
>>
>> Looks like you found the culprit. :) It does fix the problem.
> So this looks similar to what NeilBrown <neilb@suse.de> reported in
> another thread.
> The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
> WKUP domain. In fact he tried out with the same fix. Anyways, we
> should hear from
> Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
> bank or to put the workaround here in the gpio driver.

Ok, so I have been looking at this some more today. I believe that the
actual bug is that we are not checking to see if "loses_context" is true
before populating "get_context_loss_count" (see omap dmtimer driver).
For bank0 loses_context is false and so we should never be calling
"get_context_loss_count" in the first place.

I will send out a patch to fix this and will copy Kevin and Franky.

Franky, if you can test and confirm it works that would be great.

Kevin, if you can review that would be great too.

Cheers
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Panda ES board hang when using GPIO as interrupt
@ 2012-06-29 15:53                         ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-29 15:53 UTC (permalink / raw)
  To: DebBarma, Tarun Kanti
  Cc: Franky Lin, Kevin Hilman, b-cousson-l0cyMroinI0,
	tony-4v6yS6AI5VpBDgjK7y7TUQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	santosh.shilimkar-l0cyMroinI0, linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


On 06/28/2012 11:07 PM, DebBarma, Tarun Kanti wrote:
> On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>>
>>> I am wondering if this could be the bug ... on start-up I see that we do
>>> a context restore on bank1 during the probe which is before we have done
>>> the first suspend! In other words, we could restore a bad/uninitialised
>>> context for bank1. In the case of bank1, the loss count starts at 1 and
>>> not 0 and so we falsely think we need to perform a restore :-(
>>>
>>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>>
>>> Can you try ...
>>>
>>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>>> index c4ed172..9623408 100644
>>> --- a/drivers/gpio/gpio-omap.c
>>> +++ b/drivers/gpio/gpio-omap.c
>>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>>> platform_device *pdev)
>>>  #ifdef CONFIG_OF_GPIO
>>>         bank->chip.of_node = of_node_get(node);
>>>  #endif
>>> +       if (bank->get_context_loss_count)
>>> +               bank->context_loss_count =
>>> +                               bank->get_context_loss_count(bank->dev);
>>>
>>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>>         if (bank->irq_base < 0) {
>>>
>>
>> Looks like you found the culprit. :) It does fix the problem.
> So this looks similar to what NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> reported in
> another thread.
> The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
> WKUP domain. In fact he tried out with the same fix. Anyways, we
> should hear from
> Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
> bank or to put the workaround here in the gpio driver.

Ok, so I have been looking at this some more today. I believe that the
actual bug is that we are not checking to see if "loses_context" is true
before populating "get_context_loss_count" (see omap dmtimer driver).
For bank0 loses_context is false and so we should never be calling
"get_context_loss_count" in the first place.

I will send out a patch to fix this and will copy Kevin and Franky.

Franky, if you can test and confirm it works that would be great.

Kevin, if you can review that would be great too.

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Panda ES board hang when using GPIO as interrupt
@ 2012-06-29 15:53                         ` Jon Hunter
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Hunter @ 2012-06-29 15:53 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/28/2012 11:07 PM, DebBarma, Tarun Kanti wrote:
> On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin <frankyl@broadcom.com> wrote:
>> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>>
>>> I am wondering if this could be the bug ... on start-up I see that we do
>>> a context restore on bank1 during the probe which is before we have done
>>> the first suspend! In other words, we could restore a bad/uninitialised
>>> context for bank1. In the case of bank1, the loss count starts at 1 and
>>> not 0 and so we falsely think we need to perform a restore :-(
>>>
>>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc310000
>>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc310000
>>>
>>> Can you try ...
>>>
>>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>>> index c4ed172..9623408 100644
>>> --- a/drivers/gpio/gpio-omap.c
>>> +++ b/drivers/gpio/gpio-omap.c
>>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>>> platform_device *pdev)
>>>  #ifdef CONFIG_OF_GPIO
>>>         bank->chip.of_node = of_node_get(node);
>>>  #endif
>>> +       if (bank->get_context_loss_count)
>>> +               bank->context_loss_count =
>>> +                               bank->get_context_loss_count(bank->dev);
>>>
>>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>>         if (bank->irq_base < 0) {
>>>
>>
>> Looks like you found the culprit. :) It does fix the problem.
> So this looks similar to what NeilBrown <neilb@suse.de> reported in
> another thread.
> The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
> WKUP domain. In fact he tried out with the same fix. Anyways, we
> should hear from
> Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
> bank or to put the workaround here in the gpio driver.

Ok, so I have been looking at this some more today. I believe that the
actual bug is that we are not checking to see if "loses_context" is true
before populating "get_context_loss_count" (see omap dmtimer driver).
For bank0 loses_context is false and so we should never be calling
"get_context_loss_count" in the first place.

I will send out a patch to fix this and will copy Kevin and Franky.

Franky, if you can test and confirm it works that would be great.

Kevin, if you can review that would be great too.

Cheers
Jon

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2012-06-29 15:53 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-25 20:52 Panda ES board hang when using GPIO as interrupt Franky Lin
2012-06-25 20:52 ` Franky Lin
2012-06-25 20:52 ` Franky Lin
2012-06-26  7:21 ` DebBarma, Tarun Kanti
2012-06-26  7:21   ` DebBarma, Tarun Kanti
2012-06-26 18:20   ` Franky Lin
2012-06-26 18:20     ` Franky Lin
2012-06-26 18:20     ` Franky Lin
2012-06-27 13:29     ` DebBarma, Tarun Kanti
2012-06-27 13:29       ` DebBarma, Tarun Kanti
2012-06-27  3:37 ` Kevin Hilman
2012-06-27  3:37   ` Kevin Hilman
2012-06-27  3:37   ` Kevin Hilman
2012-06-28  0:41   ` Franky Lin
2012-06-28  0:41     ` Franky Lin
2012-06-28  0:41     ` Franky Lin
2012-06-28 15:42     ` Jon Hunter
2012-06-28 15:42       ` Jon Hunter
2012-06-28 15:42       ` Jon Hunter
2012-06-28 21:24       ` Franky Lin
2012-06-28 21:24         ` Franky Lin
2012-06-28 21:55         ` Jon Hunter
2012-06-28 21:55           ` Jon Hunter
2012-06-28 21:55           ` Jon Hunter
2012-06-28 22:53           ` Franky Lin
2012-06-28 22:53             ` Franky Lin
2012-06-28 22:53             ` Franky Lin
2012-06-28 22:59             ` Jon Hunter
2012-06-28 22:59               ` Jon Hunter
2012-06-28 22:59               ` Jon Hunter
2012-06-28 23:10               ` Franky Lin
2012-06-28 23:10                 ` Franky Lin
2012-06-28 23:28                 ` Jon Hunter
2012-06-28 23:28                   ` Jon Hunter
2012-06-28 23:28                   ` Jon Hunter
2012-06-28 23:35                 ` Jon Hunter
2012-06-28 23:35                   ` Jon Hunter
2012-06-28 23:35                   ` Jon Hunter
2012-06-28 23:54                 ` Jon Hunter
2012-06-28 23:54                   ` Jon Hunter
2012-06-28 23:54                   ` Jon Hunter
2012-06-29  0:59                   ` Franky Lin
2012-06-29  0:59                     ` Franky Lin
2012-06-29  4:07                     ` DebBarma, Tarun Kanti
2012-06-29  4:07                       ` DebBarma, Tarun Kanti
2012-06-29  4:07                       ` DebBarma, Tarun Kanti
2012-06-29 15:53                       ` Jon Hunter
2012-06-29 15:53                         ` Jon Hunter
2012-06-29 15:53                         ` Jon Hunter
2012-06-27 23:43 ` Jon Hunter
2012-06-27 23:43   ` Jon Hunter
2012-06-27 23:43   ` Jon Hunter
2012-06-28  1:03   ` Franky Lin
2012-06-28  1:03     ` Franky Lin
2012-06-28 15:37     ` Jon Hunter
2012-06-28 15:37       ` Jon Hunter
2012-06-28 15:37       ` Jon Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.