All of lore.kernel.org
 help / color / mirror / Atom feed
* The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
@ 2013-07-12 11:02 Hans Verkuil
       [not found] ` <CAD+HZHUc7wULDUpKq24rmcVC9MCcgaoO4i1LxGcW-q-y-c3tcQ@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Verkuil @ 2013-07-12 11:02 UTC (permalink / raw)
  To: xjtuwjp, lindar_liu; +Cc: linux-scsi

Hi!

I am trying to get my Adaptec 71605H HBA to work with kernel 3.10.
Unfortunately it hangs while waiting for a completion.

After registering all the interrupts the pm8001_pci_probe() function calls
pm8001_init_sas_add(). That function calls PM8001_CHIP_DISP->get_nvmd_req()
and waits for a completion that never happens.

I tried to use the drivers available from the Adaptec website, and after
hacking them so they compiled again with 3.10 I just copied those sources
over to the kernel, recompiled and tested those. And lo and behold, that
actually worked! So the Adaptec drivers do something that the current kernel
drivers don't.

The patch I applied to the 3.10 kernel to effectively update the current
drivers with those from Adaptec is here:

http://hverkuil.home.xs4all.nl/0002-pm80xx-copy-over-adaptec-drivers.patch

The original drivers from Adaptec are here:

http://www.adaptec.com/en-us/downloads/rh/rhel_6/productid=asa-71605h&dn=adaptec+71605h.html

I tried to find relevant differences that might explain why the adaptec
version works while the current 3.10 doesn't, but nothing jumped out to
me.

It's no problem to test any patches you have, or to do some debugging if
someone can point me in the right direction.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
       [not found]     ` <E5DEE6F468524847995699AD11E8088501B28CAA@BBY1EXM11.pmc_nt.nt.pmc-sierra.bc.ca>
@ 2013-07-14  8:45       ` Hans Verkuil
  2013-07-15  8:53         ` Jack Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Verkuil @ 2013-07-14  8:45 UTC (permalink / raw)
  To: Anand Kumar Santhanam; +Cc: Jack Wang, lindar_liu, linux-scsi

Hi Anand,

On 07/12/2013 03:14 PM, Anand Kumar Santhanam wrote:
> Hans,
> 
> I reviewed the code changes and I did not see major differences except
> for the fact that in adaptec driver we have 64 interrupt handlers to
> handle 64 MSI-X.
> This was optimized in open src driver to use only 1 interrupt handler.
> Can you pls make this change to the open src driver (i.e have multiple
> interrupt handlers for multiple MSI-X) and check?

I've looked at this more closely, and I wonder whether there isn't a race condition
here. When an interrupt arrives you put the interrupt vector in pm8001_ha->int_vector,
then schedule the tasklet. But what if two interrupts with different vectors arrive
in quick succession before the tasklet got a chance to run? In that case the tasklet
will only see the second vector, not the first. Rather scary.

I have not actually seen any issues with this, but by definition race conditions are
hard to reproduce and I haven't done any serious testing with this card. For now I
will run with the quick and dirty msi.diff (http://hverkuil.home.xs4all.nl/msi.diff).

I see two solutions: either use the 64 interrupt handlers as done in the adaptec
driver, or you can change int_vector into a u64 and use it as a bitmask to record
all interrupt vectors that have arrived.

BTW, another difference between the linux kernel driver and the adaptec version are
several of the defines in pm8001_defs.h: e.g. MPI_QUEUE is 256 in the adaptec driver,
while it is 1024 in the kernel driver. There are other differences as well.

Are all the changes in the kernel correct? I would like to have a confirmation of
that before I am going to trust my data to this driver.

It clearly hasn't been tested with actual hardware :-(

Regards,

	Hans

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
  2013-07-14  8:45       ` Hans Verkuil
@ 2013-07-15  8:53         ` Jack Wang
  2013-07-15 12:37           ` Anand Kumar Santhanam
  0 siblings, 1 reply; 6+ messages in thread
From: Jack Wang @ 2013-07-15  8:53 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Anand Kumar Santhanam, lindar_liu, linux-scsi, jinpu.wang

Hi Hans,
On 07/14/2013 10:45 AM, Hans Verkuil wrote:
> Hi Anand,
> 
> On 07/12/2013 03:14 PM, Anand Kumar Santhanam wrote:
>> Hans,
>>
>> I reviewed the code changes and I did not see major differences except
>> for the fact that in adaptec driver we have 64 interrupt handlers to
>> handle 64 MSI-X.
>> This was optimized in open src driver to use only 1 interrupt handler.
>> Can you pls make this change to the open src driver (i.e have multiple
>> interrupt handlers for multiple MSI-X) and check?
> 
> I've looked at this more closely, and I wonder whether there isn't a race condition
> here. When an interrupt arrives you put the interrupt vector in pm8001_ha->int_vector,
> then schedule the tasklet. But what if two interrupts with different vectors arrive
> in quick succession before the tasklet got a chance to run? In that case the tasklet
> will only see the second vector, not the first. Rather scary.
> 
> I have not actually seen any issues with this, but by definition race conditions are
> hard to reproduce and I haven't done any serious testing with this card. For now I
> will run with the quick and dirty msi.diff (http://hverkuil.home.xs4all.nl/msi.diff).
> 
> I see two solutions: either use the 64 interrupt handlers as done in the adaptec
> driver, or you can change int_vector into a u64 and use it as a bitmask to record
> all interrupt vectors that have arrived.
Thanks for looking into this, I think second one is what we want, set
the bitmask when interrupt arrived and clear it when it's processed.

> 
> BTW, another difference between the linux kernel driver and the adaptec version are
> several of the defines in pm8001_defs.h: e.g. MPI_QUEUE is 256 in the adaptec driver,
> while it is 1024 in the kernel driver. There are other differences as well.
> 
Different value may reflect different performance character, but both
should works, there is no one for all setting.

> Are all the changes in the kernel correct? I would like to have a confirmation of
> that before I am going to trust my data to this driver.
> 
> It clearly hasn't been tested with actual hardware :-(
> 
:_(

Thanks,
Jack

> Regards,
> 
> 	Hans
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
  2013-07-15  8:53         ` Jack Wang
@ 2013-07-15 12:37           ` Anand Kumar Santhanam
  2013-07-15 13:02             ` Hans Verkuil
  2013-07-15 13:08             ` Jack Wang
  0 siblings, 2 replies; 6+ messages in thread
From: Anand Kumar Santhanam @ 2013-07-15 12:37 UTC (permalink / raw)
  To: Jack Wang, Hans Verkuil; +Cc: lindar_liu, linux-scsi, jinpu.wang

Hi Hans,

Pls find responses inline.

Regards
Anand

-----Original Message-----
From: Jack Wang [mailto:xjtuwjp@gmail.com] 
Sent: Monday, July 15, 2013 2:24 PM
To: Hans Verkuil
Cc: Anand Kumar Santhanam; lindar_liu; linux-scsi@vger.kernel.org;
jinpu.wang@profitbricks.com
Subject: Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA

Hi Hans,
On 07/14/2013 10:45 AM, Hans Verkuil wrote:
> Hi Anand,
> 
> On 07/12/2013 03:14 PM, Anand Kumar Santhanam wrote:
>> Hans,
>>
>> I reviewed the code changes and I did not see major differences 
>> except for the fact that in adaptec driver we have 64 interrupt 
>> handlers to handle 64 MSI-X.
>> This was optimized in open src driver to use only 1 interrupt
handler.
>> Can you pls make this change to the open src driver (i.e have 
>> multiple interrupt handlers for multiple MSI-X) and check?
> 
> I've looked at this more closely, and I wonder whether there isn't a 
> race condition here. When an interrupt arrives you put the interrupt 
> vector in pm8001_ha->int_vector, then schedule the tasklet. But what 
> if two interrupts with different vectors arrive in quick succession 
> before the tasklet got a chance to run? In that case the tasklet will
only see the second vector, not the first. Rather scary.
> 
> I have not actually seen any issues with this, but by definition race 
> conditions are hard to reproduce and I haven't done any serious 
> testing with this card. For now I will run with the quick and dirty
msi.diff (http://hverkuil.home.xs4all.nl/msi.diff).
> 
> I see two solutions: either use the 64 interrupt handlers as done in 
> the adaptec driver, or you can change int_vector into a u64 and use it

> as a bitmask to record all interrupt vectors that have arrived.
Thanks for looking into this, I think second one is what we want, set
the bitmask when interrupt arrived and clear it when it's processed.

Anand>> Yes. We will go for the second solution. The multiple
interrupt/tasklet handlers for MSI-X got rejected by the community and
hence
We went for the existing approach. I checked with a single controller
and I did not observe any issues. Also pls note that the open source
driver
Supports only one MSI-X for now and this problem will not occur.

> 
> BTW, another difference between the linux kernel driver and the 
> adaptec version are several of the defines in pm8001_defs.h: e.g. 
> MPI_QUEUE is 256 in the adaptec driver, while it is 1024 in the kernel
driver. There are other differences as well.
> 
Different value may reflect different performance character, but both
should works, there is no one for all setting.

Anand >> I agree with Jack. I will check on the #defines and get back.

> Are all the changes in the kernel correct? I would like to have a 
> confirmation of that before I am going to trust my data to this
driver.
> 
> It clearly hasn't been tested with actual hardware :-(
> 
:_(

>> My sincere apologies. I tested the same with single controller and it
worked fine. However I messed up when submitting 
The patches. This was my first open source submission and request you to
bear the inconvenience.

Thanks,
Jack

> Regards,
> 
> 	Hans
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
  2013-07-15 12:37           ` Anand Kumar Santhanam
@ 2013-07-15 13:02             ` Hans Verkuil
  2013-07-15 13:08             ` Jack Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Hans Verkuil @ 2013-07-15 13:02 UTC (permalink / raw)
  To: Anand Kumar Santhanam; +Cc: Jack Wang, lindar_liu, linux-scsi, jinpu.wang

Hi Anand!

On 07/15/2013 02:37 PM, Anand Kumar Santhanam wrote:
> Hi Hans,
> 
> Pls find responses inline.
> 
> Regards
> Anand
> 
> -----Original Message-----
> From: Jack Wang [mailto:xjtuwjp@gmail.com] 
> Sent: Monday, July 15, 2013 2:24 PM
> To: Hans Verkuil
> Cc: Anand Kumar Santhanam; lindar_liu; linux-scsi@vger.kernel.org;
> jinpu.wang@profitbricks.com
> Subject: Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
> 
> Hi Hans,
> On 07/14/2013 10:45 AM, Hans Verkuil wrote:
>> Hi Anand,
>>
>> On 07/12/2013 03:14 PM, Anand Kumar Santhanam wrote:
>>> Hans,
>>>
>>> I reviewed the code changes and I did not see major differences 
>>> except for the fact that in adaptec driver we have 64 interrupt 
>>> handlers to handle 64 MSI-X.
>>> This was optimized in open src driver to use only 1 interrupt
> handler.
>>> Can you pls make this change to the open src driver (i.e have 
>>> multiple interrupt handlers for multiple MSI-X) and check?
>>
>> I've looked at this more closely, and I wonder whether there isn't a 
>> race condition here. When an interrupt arrives you put the interrupt 
>> vector in pm8001_ha->int_vector, then schedule the tasklet. But what 
>> if two interrupts with different vectors arrive in quick succession 
>> before the tasklet got a chance to run? In that case the tasklet will
> only see the second vector, not the first. Rather scary.
>>
>> I have not actually seen any issues with this, but by definition race 
>> conditions are hard to reproduce and I haven't done any serious 
>> testing with this card. For now I will run with the quick and dirty
> msi.diff (http://hverkuil.home.xs4all.nl/msi.diff).
>>
>> I see two solutions: either use the 64 interrupt handlers as done in 
>> the adaptec driver, or you can change int_vector into a u64 and use it
> 
>> as a bitmask to record all interrupt vectors that have arrived.
> Thanks for looking into this, I think second one is what we want, set
> the bitmask when interrupt arrived and clear it when it's processed.
> 
> Anand>> Yes. We will go for the second solution. The multiple
> interrupt/tasklet handlers for MSI-X got rejected by the community and
> hence
> We went for the existing approach. I checked with a single controller
> and I did not observe any issues.

Well, you typically won't see any issues if there is a race condition :-)

> Also pls note that the open source
> driver
> Supports only one MSI-X for now and this problem will not occur.

Are you referring to the kernel driver or the adaptec driver? Both are
open source...

Anyway, I'm pretty sure I saw interrupts with different vectors arriving
with the kernel driver.

> 
>>
>> BTW, another difference between the linux kernel driver and the 
>> adaptec version are several of the defines in pm8001_defs.h: e.g. 
>> MPI_QUEUE is 256 in the adaptec driver, while it is 1024 in the kernel
> driver. There are other differences as well.
>>
> Different value may reflect different performance character, but both
> should works, there is no one for all setting.
> 
> Anand >> I agree with Jack. I will check on the #defines and get back.

Thanks!

>> Are all the changes in the kernel correct? I would like to have a 
>> confirmation of that before I am going to trust my data to this
> driver.
>>
>> It clearly hasn't been tested with actual hardware :-(
>>
> :_(
> 
>>> My sincere apologies. I tested the same with single controller and it
> worked fine. However I messed up when submitting 
> The patches. This was my first open source submission and request you to
> bear the inconvenience.

No problem, such things happen. Luckily I'm a kernel developer so I know my
way around a driver, even though scsi is not my area of expertise.

Note that I saw one other difference between the kernel and adaptec driver:
the adaptec driver has support for SGPIO commands. I'm not sure what it does
or whether it affects the pm80xx devices.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
  2013-07-15 12:37           ` Anand Kumar Santhanam
  2013-07-15 13:02             ` Hans Verkuil
@ 2013-07-15 13:08             ` Jack Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Jack Wang @ 2013-07-15 13:08 UTC (permalink / raw)
  To: Anand Kumar Santhanam; +Cc: Jack Wang, Hans Verkuil, lindar_liu, linux-scsi

Hi Anand,
On 07/15/2013 02:37 PM, Anand Kumar Santhanam wrote:
> Hi Hans,
> 
> Pls find responses inline.
> 
> Regards
> Anand
> 
> -----Original Message-----
> From: Jack Wang [mailto:xjtuwjp@gmail.com] 
> Sent: Monday, July 15, 2013 2:24 PM
> To: Hans Verkuil
> Cc: Anand Kumar Santhanam; lindar_liu; linux-scsi@vger.kernel.org;
> jinpu.wang@profitbricks.com
> Subject: Re: The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA
> 
> Hi Hans,
> On 07/14/2013 10:45 AM, Hans Verkuil wrote:
>> Hi Anand,
>>
>> On 07/12/2013 03:14 PM, Anand Kumar Santhanam wrote:
>>> Hans,
>>>
>>> I reviewed the code changes and I did not see major differences 
>>> except for the fact that in adaptec driver we have 64 interrupt 
>>> handlers to handle 64 MSI-X.
>>> This was optimized in open src driver to use only 1 interrupt
> handler.
>>> Can you pls make this change to the open src driver (i.e have 
>>> multiple interrupt handlers for multiple MSI-X) and check?
>>
>> I've looked at this more closely, and I wonder whether there isn't a 
>> race condition here. When an interrupt arrives you put the interrupt 
>> vector in pm8001_ha->int_vector, then schedule the tasklet. But what 
>> if two interrupts with different vectors arrive in quick succession 
>> before the tasklet got a chance to run? In that case the tasklet will
> only see the second vector, not the first. Rather scary.
>>
>> I have not actually seen any issues with this, but by definition race 
>> conditions are hard to reproduce and I haven't done any serious 
>> testing with this card. For now I will run with the quick and dirty
> msi.diff (http://hverkuil.home.xs4all.nl/msi.diff).
>>
>> I see two solutions: either use the 64 interrupt handlers as done in 
>> the adaptec driver, or you can change int_vector into a u64 and use it
> 
>> as a bitmask to record all interrupt vectors that have arrived.
> Thanks for looking into this, I think second one is what we want, set
> the bitmask when interrupt arrived and clear it when it's processed.
> 
> Anand>> Yes. We will go for the second solution. The multiple
> interrupt/tasklet handlers for MSI-X got rejected by the community and
> hence
> We went for the existing approach. I checked with a single controller
> and I did not observe any issues. Also pls note that the open source
> driver
> Supports only one MSI-X for now and this problem will not occur.
> 
>>
>> BTW, another difference between the linux kernel driver and the 
>> adaptec version are several of the defines in pm8001_defs.h: e.g. 
>> MPI_QUEUE is 256 in the adaptec driver, while it is 1024 in the kernel
> driver. There are other differences as well.
>>
> Different value may reflect different performance character, but both
> should works, there is no one for all setting.
> 
> Anand >> I agree with Jack. I will check on the #defines and get back.
> 
>> Are all the changes in the kernel correct? I would like to have a 
>> confirmation of that before I am going to trust my data to this
> driver.
>>
>> It clearly hasn't been tested with actual hardware :-(
>>
> :_(
> 
>>> My sincere apologies. I tested the same with single controller and it
> worked fine. However I messed up when submitting 
> The patches. This was my first open source submission and request you to
> bear the inconvenience.

Never mind, people make mistake, quickly fix that should be fine.
I suggest keep minimal difference between your in house driver and in
tree driver, it will make us easy to maintain the driver and make it
easy to use for user.

Best Regards,
Jack



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-15 13:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-12 11:02 The pm80xx driver hangs in 3.10 with the Adaptec 71605H HBA Hans Verkuil
     [not found] ` <CAD+HZHUc7wULDUpKq24rmcVC9MCcgaoO4i1LxGcW-q-y-c3tcQ@mail.gmail.com>
     [not found]   ` <201307121419.50786.hansverk@cisco.com>
     [not found]     ` <E5DEE6F468524847995699AD11E8088501B28CAA@BBY1EXM11.pmc_nt.nt.pmc-sierra.bc.ca>
2013-07-14  8:45       ` Hans Verkuil
2013-07-15  8:53         ` Jack Wang
2013-07-15 12:37           ` Anand Kumar Santhanam
2013-07-15 13:02             ` Hans Verkuil
2013-07-15 13:08             ` Jack Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.