All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kenzo Iwami <k-iwami@cj.jp.nec.com>
To: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Cc: Shaw Vrana <shaw@vranix.com>, Auke Kok <auke-jan.h.kok@intel.com>,
	netdev@vger.kernel.org,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	"Ronciak, John" <john.ronciak@intel.com>
Subject: Re: watchdog timeout panic in e1000 driver
Date: Wed, 15 Nov 2006 19:33:41 +0900	[thread overview]
Message-ID: <455AED05.9010607@cj.jp.nec.com> (raw)
In-Reply-To: <45489F59.6030102@cj.jp.nec.com>

Hi,

>>>> Even if the total lock time can be reduced, it's possible that interrupt
>>>> handler is executed while the interrupted code is still holding the 
>>>> semaphore.
>>>> I think your method only decrease the frequency of this problem.
>>>> Why does reducing the lock time solve this problem?
>>> there are several problems here that need addressing. It's not acceptable 
>>> for our driver to wait up to 15 seconds, and we can (presumably) reduce it 
>>> to milliseconds, so that would help a lot. We should in no case at all hold 
>>> it for any period longer than (give or take) half a second, so working 
>>> towards that is a very good step in the right direction.
>>>
>>> Adding the timer task back may also help, as we are no longer trying to 
>>> aqcuire the sw_fw_semaphore in interrupt context, but we removed it for a 
>>> reason, and I need to dig up what reason this exactly was before we can 
>>> revert it. Jesse might know, so I'll talk to him. But this will not fix the 
>>> fact that the semaphore is held for a long time :)
[...]
> I think this problem occurs because interrupt handler is executed in same
> CPU as process that acquires semaphore.
> How about disabling interrupt while the process is holding the semaphore?
> I think this is possible, if the total lock time has been reduced.

I created the attached patch based on the method described above.
This patch disables interrupt while the process is holding the semaphore.

I measured how long interrupts are being disabled 10,000 times using the
following method. TSC was read by rdtscll when interrupt was disabled
and interrupt was enabled again, then I subtract these two value.

The longest period interrupt was disabled is under 10usec, which seems
acceptable. The evaluation environment is;
  CPU    : Intel Xeon CPU 3.73GHz
  kernel : 2.6.19-rc5

I also ran the reproduction TP I sent previously and confirmed that the
system didn't panic.
  http://marc.theaimsgroup.com/?l=linux-netdev&m=116125332319850&w=2

How about this method?

I welcome any comments.
-- 
  Kenzo Iwami (k-iwami@cj.jp.nec.com)

Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com>

diff -urpN linux-2.6.19-rc5_org/drivers/net/e1000/e1000_hw.c
linux-2.6.19-rc5/drivers/net/e1000/e1000_hw.c
--- linux-2.6.19-rc5_org/drivers/net/e1000/e1000_hw.c	2006-11-14
17:47:28.000000000 +0900
+++ linux-2.6.19-rc5/drivers/net/e1000/e1000_hw.c	2006-11-15 17:49:39.000000000
+0900
@@ -3379,6 +3379,7 @@ e1000_swfw_sync_acquire(struct e1000_hw
     uint32_t swmask = mask;
     uint32_t fwmask = mask << 16;
     int32_t timeout = 200;
+    unsigned long flags;

     DEBUGFUNC("e1000_swfw_sync_acquire");

@@ -3389,8 +3390,11 @@ e1000_swfw_sync_acquire(struct e1000_hw
         return e1000_get_hw_eeprom_semaphore(hw);

     while (timeout) {
-            if (e1000_get_hw_eeprom_semaphore(hw))
+            local_irq_save(flags);
+            if (e1000_get_hw_eeprom_semaphore(hw)) {
+                local_irq_restore(flags);
                 return -E1000_ERR_SWFW_SYNC;
+            }

             swfw_sync = E1000_READ_REG(hw, SW_FW_SYNC);
             if (!(swfw_sync & (fwmask | swmask))) {
@@ -3400,6 +3404,7 @@ e1000_swfw_sync_acquire(struct e1000_hw
             /* firmware currently using resource (fwmask) */
             /* or other software thread currently using resource (swmask) */
             e1000_put_hw_eeprom_semaphore(hw);
+            local_irq_restore(flags);
             mdelay(5);
             timeout--;
     }
@@ -3413,6 +3418,7 @@ e1000_swfw_sync_acquire(struct e1000_hw
     E1000_WRITE_REG(hw, SW_FW_SYNC, swfw_sync);

     e1000_put_hw_eeprom_semaphore(hw);
+    local_irq_restore(flags);
     return E1000_SUCCESS;
 }

@@ -3421,6 +3427,7 @@ e1000_swfw_sync_release(struct e1000_hw
 {
     uint32_t swfw_sync;
     uint32_t swmask = mask;
+    unsigned long flags;

     DEBUGFUNC("e1000_swfw_sync_release");

@@ -3434,6 +3441,7 @@ e1000_swfw_sync_release(struct e1000_hw
         return;
     }

+    local_irq_save(flags);
     /* if (e1000_get_hw_eeprom_semaphore(hw))
      *    return -E1000_ERR_SWFW_SYNC; */
     while (e1000_get_hw_eeprom_semaphore(hw) != E1000_SUCCESS);
@@ -3444,6 +3452,7 @@ e1000_swfw_sync_release(struct e1000_hw
     E1000_WRITE_REG(hw, SW_FW_SYNC, swfw_sync);

     e1000_put_hw_eeprom_semaphore(hw);
+    local_irq_restore(flags);
 }

 /*****************************************************************************

  reply	other threads:[~2006-11-15 10:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-19 10:19 watchdog timeout panic in e1000 driver Kenzo Iwami
2006-10-19 15:39 ` Auke Kok
     [not found]   ` <4538BFF2.2040207@cj.jp.nec.com>
2006-10-20 15:51     ` Auke Kok
2006-10-24  9:01       ` Kenzo Iwami
2006-10-24 16:15         ` Auke Kok
2006-10-25 13:41           ` Kenzo Iwami
2006-10-25 15:09             ` Auke Kok
2006-10-26 10:35               ` Kenzo Iwami
2006-10-26 14:34                 ` Auke Kok
2006-10-30 11:36                   ` Kenzo Iwami
2006-10-30 17:30                     ` Auke Kok
2006-10-31  3:22                       ` Shaw Vrana
2006-11-01 13:21                         ` Kenzo Iwami
2006-11-15 10:33                           ` Kenzo Iwami [this message]
2006-11-15 16:11                             ` Auke Kok
2006-11-16  9:23                               ` Kenzo Iwami
2007-02-20  9:26 ` Kenzo Iwami
2007-02-20 16:10   ` Auke Kok
2007-02-21  5:17     ` Kenzo Iwami
2006-11-16 17:20 Brandeburg, Jesse
2006-11-21 10:16 ` Kenzo Iwami
2006-12-04  9:14   ` Kenzo Iwami
2006-12-05  0:46     ` Auke Kok
2006-12-12  7:58       ` Kenzo Iwami
2006-12-19  0:13         ` Kenzo Iwami
2007-01-15  9:12           ` Kenzo Iwami
2007-01-15 16:14             ` Auke Kok
2007-01-16  8:42               ` Kenzo Iwami
2007-01-18  9:22                 ` Kenzo Iwami

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=455AED05.9010607@cj.jp.nec.com \
    --to=k-iwami@cj.jp.nec.com \
    --cc=auke-jan.h.kok@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.ronciak@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaw@vranix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.