All of lore.kernel.org
 help / color / mirror / Atom feed
* Broken pci_block_user_cfg_access interface
@ 2011-08-24 10:43 Jan Kiszka
  2011-08-24 15:02 ` Brian King
  2011-08-29 15:05 ` Michael S. Tsirkin
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-08-24 10:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-pci
  Cc: Alex Williamson, Michael S. Tsirkin, Jesse Barnes,
	Matthew Wilcox, Brian King

Hi,

trying to port the generic device interrupt masking pattern of
uio_pci_generic to KVM's device assignment code, I stumbled over some
fundamental problem with the current pci_block/unblock_user_cfg_access
interface: it does not provide any synchronization between blocking
sides. This allows user space to trigger a kernel BUG, just run two

while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done

loops in parallel and watch the kernel oops.

Instead of some funky open-coded locking mechanism, we would rather need
a plain mutex across both the user space access (via sysfs) and the
sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
not sure which of them already allow sleeping, specifically if the IPR
driver would be fine with such a change. Can someone in the CC list
comment on this?

uio_pci_generic would definitely not be able to sleep as it takes the
lock from (potentially hard) IRQ context. This particular use case, RMW
of command/status word, requires a separate mechanism. I'm considering
to introduce a dedicated raw spinlock with IRQ protection for that
words, maybe also a PCI core service to abstract INTx testing and masking.

Any further thoughts on how to resolve this issue?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-24 10:43 Broken pci_block_user_cfg_access interface Jan Kiszka
@ 2011-08-24 15:02 ` Brian King
  2011-08-25  9:19   ` Jan Kiszka
  2011-08-29 15:05 ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Brian King @ 2011-08-24 15:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Linux Kernel Mailing List, linux-pci, Alex Williamson,
	Michael S. Tsirkin, Jesse Barnes, Matthew Wilcox

On 08/24/2011 05:43 AM, Jan Kiszka wrote:
> Hi,
> 
> trying to port the generic device interrupt masking pattern of
> uio_pci_generic to KVM's device assignment code, I stumbled over some
> fundamental problem with the current pci_block/unblock_user_cfg_access
> interface: it does not provide any synchronization between blocking
> sides. This allows user space to trigger a kernel BUG, just run two
> 
> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
> 
> loops in parallel and watch the kernel oops.
> 
> Instead of some funky open-coded locking mechanism, we would rather need
> a plain mutex across both the user space access (via sysfs) and the
> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
> not sure which of them already allow sleeping, specifically if the IPR
> driver would be fine with such a change. Can someone in the CC list
> comment on this?

The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
context, so a mutex won't work. When the pci_block/unblock API was
originally added, it did not have the checking it has today to detect
if it is being called nested. This was added some time later. The
API that works best for the ipr driver is to allow for many block calls,
but a single unblock call unblocks access. It seems like what might
work well in the case above is a block count. Each call to pci_block
increments a count. Each pci_unblock decrements the count and only
actually do the unblock if the count drops to zero. It should be reasonably
simple for ipr to use that sort of an API as well.

-Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-24 15:02 ` Brian King
@ 2011-08-25  9:19   ` Jan Kiszka
  2011-08-25  9:40     ` Michael S. Tsirkin
  2011-08-25 13:02     ` Brian King
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-08-25  9:19 UTC (permalink / raw)
  To: Brian King
  Cc: Linux Kernel Mailing List, linux-pci, Alex Williamson,
	Michael S. Tsirkin, Jesse Barnes, Matthew Wilcox

On 2011-08-24 17:02, Brian King wrote:
> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>> Hi,
>>
>> trying to port the generic device interrupt masking pattern of
>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>> fundamental problem with the current pci_block/unblock_user_cfg_access
>> interface: it does not provide any synchronization between blocking
>> sides. This allows user space to trigger a kernel BUG, just run two
>>
>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>
>> loops in parallel and watch the kernel oops.
>>
>> Instead of some funky open-coded locking mechanism, we would rather need
>> a plain mutex across both the user space access (via sysfs) and the
>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>> not sure which of them already allow sleeping, specifically if the IPR
>> driver would be fine with such a change. Can someone in the CC list
>> comment on this?
> 
> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
> context, so a mutex won't work.

Ugh. What precisely does it have to do with the config space while
running inside an IRQ handler (or holding a lock that synchronizes it
with such a handler)?

> When the pci_block/unblock API was
> originally added, it did not have the checking it has today to detect
> if it is being called nested. This was added some time later. The

For a reason...

> API that works best for the ipr driver is to allow for many block calls,
> but a single unblock call unblocks access. It seems like what might
> work well in the case above is a block count. Each call to pci_block
> increments a count. Each pci_unblock decrements the count and only
> actually do the unblock if the count drops to zero. It should be reasonably
> simple for ipr to use that sort of an API as well.

That will just paper over the underlying bug: multiple kernel users (!=
sysfs access) fiddle with the config space in an unsynchronized fashion.
Think of sysfs-triggered pci_reset_function while your ipr driver does
its accesses.

So it's pointless to tweak the current pci_block semantics, we rather
need to establish a new mechanism that synchronizes *all* users of the
config space.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25  9:19   ` Jan Kiszka
@ 2011-08-25  9:40     ` Michael S. Tsirkin
  2011-08-25 10:34       ` Jan Kiszka
  2011-08-25 13:06       ` Brian King
  2011-08-25 13:02     ` Brian King
  1 sibling, 2 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-25  9:40 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Brian King, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
> On 2011-08-24 17:02, Brian King wrote:
> > On 08/24/2011 05:43 AM, Jan Kiszka wrote:
> >> Hi,
> >>
> >> trying to port the generic device interrupt masking pattern of
> >> uio_pci_generic to KVM's device assignment code, I stumbled over some
> >> fundamental problem with the current pci_block/unblock_user_cfg_access
> >> interface: it does not provide any synchronization between blocking
> >> sides. This allows user space to trigger a kernel BUG, just run two
> >>
> >> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
> >>
> >> loops in parallel and watch the kernel oops.
> >>
> >> Instead of some funky open-coded locking mechanism, we would rather need
> >> a plain mutex across both the user space access (via sysfs) and the
> >> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
> >> not sure which of them already allow sleeping, specifically if the IPR
> >> driver would be fine with such a change. Can someone in the CC list
> >> comment on this?
> > 
> > The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
> > context, so a mutex won't work.
> 
> Ugh. What precisely does it have to do with the config space while
> running inside an IRQ handler (or holding a lock that synchronizes it
> with such a handler)?
> 
> > When the pci_block/unblock API was
> > originally added, it did not have the checking it has today to detect
> > if it is being called nested. This was added some time later. The
> 
> For a reason...
> 
> > API that works best for the ipr driver is to allow for many block calls,
> > but a single unblock call unblocks access. It seems like what might
> > work well in the case above is a block count. Each call to pci_block
> > increments a count. Each pci_unblock decrements the count and only
> > actually do the unblock if the count drops to zero. It should be reasonably
> > simple for ipr to use that sort of an API as well.
> 
> That will just paper over the underlying bug: multiple kernel users (!=
> sysfs access) fiddle with the config space in an unsynchronized fashion.
> Think of sysfs-triggered pci_reset_function while your ipr driver does
> its accesses.
>
> So it's pointless to tweak the current pci_block semantics, we rather
> need to establish a new mechanism that synchronizes *all* users of the
> config space.
> 
> Jan

It does look like all of the problems are actually around reset.
So maybe all we need to do is synchronize the sysfs-triggered
pci_reset_function with pci_block/unblock_user_cfg_access?

In other words, when reset is triggered from sysfs, it
should obey pci_block/unblock_user_cfg_access
restrictions?

It does not look like reset needs to sleep, so fixing
that should not be hard, right?

> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25  9:40     ` Michael S. Tsirkin
@ 2011-08-25 10:34       ` Jan Kiszka
  2011-08-25 13:06       ` Brian King
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-08-25 10:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Brian King, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On 2011-08-25 11:40, Michael S. Tsirkin wrote:
> On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
>> On 2011-08-24 17:02, Brian King wrote:
>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>> Hi,
>>>>
>>>> trying to port the generic device interrupt masking pattern of
>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>> interface: it does not provide any synchronization between blocking
>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>
>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>
>>>> loops in parallel and watch the kernel oops.
>>>>
>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>> a plain mutex across both the user space access (via sysfs) and the
>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>> driver would be fine with such a change. Can someone in the CC list
>>>> comment on this?
>>>
>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>> context, so a mutex won't work.
>>
>> Ugh. What precisely does it have to do with the config space while
>> running inside an IRQ handler (or holding a lock that synchronizes it
>> with such a handler)?
>>
>>> When the pci_block/unblock API was
>>> originally added, it did not have the checking it has today to detect
>>> if it is being called nested. This was added some time later. The
>>
>> For a reason...
>>
>>> API that works best for the ipr driver is to allow for many block calls,
>>> but a single unblock call unblocks access. It seems like what might
>>> work well in the case above is a block count. Each call to pci_block
>>> increments a count. Each pci_unblock decrements the count and only
>>> actually do the unblock if the count drops to zero. It should be reasonably
>>> simple for ipr to use that sort of an API as well.
>>
>> That will just paper over the underlying bug: multiple kernel users (!=
>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>> its accesses.
>>
>> So it's pointless to tweak the current pci_block semantics, we rather
>> need to establish a new mechanism that synchronizes *all* users of the
>> config space.
>>
>> Jan
> 
> It does look like all of the problems are actually around reset.
> So maybe all we need to do is synchronize the sysfs-triggered
> pci_reset_function with pci_block/unblock_user_cfg_access?
> 
> In other words, when reset is triggered from sysfs, it
> should obey pci_block/unblock_user_cfg_access
> restrictions?
> 
> It does not look like reset needs to sleep, so fixing
> that should not be hard, right?

It obviously does need to sleep (wait). I just don't know if blocking
config space access is required during that phase, but I wouldn't be
surprised if it was.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25  9:19   ` Jan Kiszka
  2011-08-25  9:40     ` Michael S. Tsirkin
@ 2011-08-25 13:02     ` Brian King
  2011-08-25 13:06       ` Jan Kiszka
  1 sibling, 1 reply; 35+ messages in thread
From: Brian King @ 2011-08-25 13:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Linux Kernel Mailing List, linux-pci, Alex Williamson,
	Michael S. Tsirkin, Jesse Barnes, Matthew Wilcox

On 08/25/2011 04:19 AM, Jan Kiszka wrote:
> On 2011-08-24 17:02, Brian King wrote:
>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>> Hi,
>>>
>>> trying to port the generic device interrupt masking pattern of
>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>> interface: it does not provide any synchronization between blocking
>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>
>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>
>>> loops in parallel and watch the kernel oops.
>>>
>>> Instead of some funky open-coded locking mechanism, we would rather need
>>> a plain mutex across both the user space access (via sysfs) and the
>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>> not sure which of them already allow sleeping, specifically if the IPR
>>> driver would be fine with such a change. Can someone in the CC list
>>> comment on this?
>>
>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>> context, so a mutex won't work.
> 
> Ugh. What precisely does it have to do with the config space while
> running inside an IRQ handler (or holding a lock that synchronizes it
> with such a handler)?

The ipr driver can get an error interrupt which will trigger the driver
to reset the adapter. While the adapter is going through reset we need
to ensure user config accesses are blocked, since many ipr adapters
won't respond on the PCI bus during this time.


>> API that works best for the ipr driver is to allow for many block calls,
>> but a single unblock call unblocks access. It seems like what might
>> work well in the case above is a block count. Each call to pci_block
>> increments a count. Each pci_unblock decrements the count and only
>> actually do the unblock if the count drops to zero. It should be reasonably
>> simple for ipr to use that sort of an API as well.
> 
> That will just paper over the underlying bug: multiple kernel users (!=
> sysfs access) fiddle with the config space in an unsynchronized fashion.
> Think of sysfs-triggered pci_reset_function while your ipr driver does
> its accesses.

I took a look at the sysfs triggered pci reset function and don't see any way
that the controlling device driver ever gets to be involved in this reset.
If code outside the ipr driver were to reset the adapter, the adapter firmware
would be left in an uninitialized state and until scsi core starts timing
out ops and driving EH, the card would be unusable. I can't imagine the
ipr driver is unique in this.

-Brian


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 13:02     ` Brian King
@ 2011-08-25 13:06       ` Jan Kiszka
  2011-08-25 18:19         ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-08-25 13:06 UTC (permalink / raw)
  To: Brian King
  Cc: Linux Kernel Mailing List, linux-pci, Alex Williamson,
	Michael S. Tsirkin, Jesse Barnes, Matthew Wilcox

On 2011-08-25 15:02, Brian King wrote:
> On 08/25/2011 04:19 AM, Jan Kiszka wrote:
>> On 2011-08-24 17:02, Brian King wrote:
>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>> Hi,
>>>>
>>>> trying to port the generic device interrupt masking pattern of
>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>> interface: it does not provide any synchronization between blocking
>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>
>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>
>>>> loops in parallel and watch the kernel oops.
>>>>
>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>> a plain mutex across both the user space access (via sysfs) and the
>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>> driver would be fine with such a change. Can someone in the CC list
>>>> comment on this?
>>>
>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>> context, so a mutex won't work.
>>
>> Ugh. What precisely does it have to do with the config space while
>> running inside an IRQ handler (or holding a lock that synchronizes it
>> with such a handler)?
> 
> The ipr driver can get an error interrupt which will trigger the driver
> to reset the adapter. While the adapter is going through reset we need
> to ensure user config accesses are blocked, since many ipr adapters
> won't respond on the PCI bus during this time.

What about offloading the reset to thread context (workqueue etc.)?

> 
> 
>>> API that works best for the ipr driver is to allow for many block calls,
>>> but a single unblock call unblocks access. It seems like what might
>>> work well in the case above is a block count. Each call to pci_block
>>> increments a count. Each pci_unblock decrements the count and only
>>> actually do the unblock if the count drops to zero. It should be reasonably
>>> simple for ipr to use that sort of an API as well.
>>
>> That will just paper over the underlying bug: multiple kernel users (!=
>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>> its accesses.
> 
> I took a look at the sysfs triggered pci reset function and don't see any way
> that the controlling device driver ever gets to be involved in this reset.
> If code outside the ipr driver were to reset the adapter, the adapter firmware
> would be left in an uninitialized state and until scsi core starts timing
> out ops and driving EH, the card would be unusable. I can't imagine the
> ipr driver is unique in this.

Right, that's why a PCI core service is needed for coordination.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25  9:40     ` Michael S. Tsirkin
  2011-08-25 10:34       ` Jan Kiszka
@ 2011-08-25 13:06       ` Brian King
  2011-08-25 13:12         ` Brian King
  1 sibling, 1 reply; 35+ messages in thread
From: Brian King @ 2011-08-25 13:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On 08/25/2011 04:40 AM, Michael S. Tsirkin wrote:
> On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
>> On 2011-08-24 17:02, Brian King wrote:
>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>> Hi,
>>>>
>>>> trying to port the generic device interrupt masking pattern of
>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>> interface: it does not provide any synchronization between blocking
>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>
>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>
>>>> loops in parallel and watch the kernel oops.
>>>>
>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>> a plain mutex across both the user space access (via sysfs) and the
>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>> driver would be fine with such a change. Can someone in the CC list
>>>> comment on this?
>>>
>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>> context, so a mutex won't work.
>>
>> Ugh. What precisely does it have to do with the config space while
>> running inside an IRQ handler (or holding a lock that synchronizes it
>> with such a handler)?
>>
>>> When the pci_block/unblock API was
>>> originally added, it did not have the checking it has today to detect
>>> if it is being called nested. This was added some time later. The
>>
>> For a reason...
>>
>>> API that works best for the ipr driver is to allow for many block calls,
>>> but a single unblock call unblocks access. It seems like what might
>>> work well in the case above is a block count. Each call to pci_block
>>> increments a count. Each pci_unblock decrements the count and only
>>> actually do the unblock if the count drops to zero. It should be reasonably
>>> simple for ipr to use that sort of an API as well.
>>
>> That will just paper over the underlying bug: multiple kernel users (!=
>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>> its accesses.
>>
>> So it's pointless to tweak the current pci_block semantics, we rather
>> need to establish a new mechanism that synchronizes *all* users of the
>> config space.
>>
>> Jan
> 
> It does look like all of the problems are actually around reset.
> So maybe all we need to do is synchronize the sysfs-triggered
> pci_reset_function with pci_block/unblock_user_cfg_access?
> 
> In other words, when reset is triggered from sysfs, it
> should obey pci_block/unblock_user_cfg_access
> restrictions?
> 
> It does not look like reset needs to sleep, so fixing
> that should not be hard, right?

This sounds reasonable to me. Although I think we still have the driver issue
I described in my previous mail. Perhaps the best way to resolve that would
be to allow the adapter driver to register a reset function so that the
driver could be the one driving the reset, allowing the driver to synchronize
the reset with whatever else might be going on and also then reinitialize
the adapter firmware, etc. If no driver was loaded or no driver specific
reset function registered, the current reset mechanism would be invoked.

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 13:06       ` Brian King
@ 2011-08-25 13:12         ` Brian King
  2011-08-25 13:16           ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Brian King @ 2011-08-25 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On 08/25/2011 08:06 AM, Brian King wrote:
> On 08/25/2011 04:40 AM, Michael S. Tsirkin wrote:
>> On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
>>> On 2011-08-24 17:02, Brian King wrote:
>>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>>> Hi,
>>>>>
>>>>> trying to port the generic device interrupt masking pattern of
>>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>>> interface: it does not provide any synchronization between blocking
>>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>>
>>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>>
>>>>> loops in parallel and watch the kernel oops.
>>>>>
>>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>>> a plain mutex across both the user space access (via sysfs) and the
>>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>>> driver would be fine with such a change. Can someone in the CC list
>>>>> comment on this?
>>>>
>>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>>> context, so a mutex won't work.
>>>
>>> Ugh. What precisely does it have to do with the config space while
>>> running inside an IRQ handler (or holding a lock that synchronizes it
>>> with such a handler)?
>>>
>>>> When the pci_block/unblock API was
>>>> originally added, it did not have the checking it has today to detect
>>>> if it is being called nested. This was added some time later. The
>>>
>>> For a reason...
>>>
>>>> API that works best for the ipr driver is to allow for many block calls,
>>>> but a single unblock call unblocks access. It seems like what might
>>>> work well in the case above is a block count. Each call to pci_block
>>>> increments a count. Each pci_unblock decrements the count and only
>>>> actually do the unblock if the count drops to zero. It should be reasonably
>>>> simple for ipr to use that sort of an API as well.
>>>
>>> That will just paper over the underlying bug: multiple kernel users (!=
>>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>>> its accesses.
>>>
>>> So it's pointless to tweak the current pci_block semantics, we rather
>>> need to establish a new mechanism that synchronizes *all* users of the
>>> config space.
>>>
>>> Jan
>>
>> It does look like all of the problems are actually around reset.
>> So maybe all we need to do is synchronize the sysfs-triggered
>> pci_reset_function with pci_block/unblock_user_cfg_access?
>>
>> In other words, when reset is triggered from sysfs, it
>> should obey pci_block/unblock_user_cfg_access
>> restrictions?
>>
>> It does not look like reset needs to sleep, so fixing
>> that should not be hard, right?
> 
> This sounds reasonable to me. Although I think we still have the driver issue
> I described in my previous mail. Perhaps the best way to resolve that would
> be to allow the adapter driver to register a reset function so that the
> driver could be the one driving the reset, allowing the driver to synchronize
> the reset with whatever else might be going on and also then reinitialize
> the adapter firmware, etc. If no driver was loaded or no driver specific
> reset function registered, the current reset mechanism would be invoked.

This would also allow the driver to do unique types of resets for different
adapter types. Some of the adapters the ipr driver supports need to get
reset via BIST, others via PCIe warm reset, etc. 


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 13:12         ` Brian King
@ 2011-08-25 13:16           ` Jan Kiszka
  2011-08-25 13:24             ` Brian King
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-08-25 13:16 UTC (permalink / raw)
  To: Brian King
  Cc: Michael S. Tsirkin, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On 2011-08-25 15:12, Brian King wrote:
> On 08/25/2011 08:06 AM, Brian King wrote:
>> On 08/25/2011 04:40 AM, Michael S. Tsirkin wrote:
>>> On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
>>>> On 2011-08-24 17:02, Brian King wrote:
>>>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> trying to port the generic device interrupt masking pattern of
>>>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>>>> interface: it does not provide any synchronization between blocking
>>>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>>>
>>>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>>>
>>>>>> loops in parallel and watch the kernel oops.
>>>>>>
>>>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>>>> a plain mutex across both the user space access (via sysfs) and the
>>>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>>>> driver would be fine with such a change. Can someone in the CC list
>>>>>> comment on this?
>>>>>
>>>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>>>> context, so a mutex won't work.
>>>>
>>>> Ugh. What precisely does it have to do with the config space while
>>>> running inside an IRQ handler (or holding a lock that synchronizes it
>>>> with such a handler)?
>>>>
>>>>> When the pci_block/unblock API was
>>>>> originally added, it did not have the checking it has today to detect
>>>>> if it is being called nested. This was added some time later. The
>>>>
>>>> For a reason...
>>>>
>>>>> API that works best for the ipr driver is to allow for many block calls,
>>>>> but a single unblock call unblocks access. It seems like what might
>>>>> work well in the case above is a block count. Each call to pci_block
>>>>> increments a count. Each pci_unblock decrements the count and only
>>>>> actually do the unblock if the count drops to zero. It should be reasonably
>>>>> simple for ipr to use that sort of an API as well.
>>>>
>>>> That will just paper over the underlying bug: multiple kernel users (!=
>>>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>>>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>>>> its accesses.
>>>>
>>>> So it's pointless to tweak the current pci_block semantics, we rather
>>>> need to establish a new mechanism that synchronizes *all* users of the
>>>> config space.
>>>>
>>>> Jan
>>>
>>> It does look like all of the problems are actually around reset.
>>> So maybe all we need to do is synchronize the sysfs-triggered
>>> pci_reset_function with pci_block/unblock_user_cfg_access?
>>>
>>> In other words, when reset is triggered from sysfs, it
>>> should obey pci_block/unblock_user_cfg_access
>>> restrictions?
>>>
>>> It does not look like reset needs to sleep, so fixing
>>> that should not be hard, right?
>>
>> This sounds reasonable to me. Although I think we still have the driver issue
>> I described in my previous mail. Perhaps the best way to resolve that would
>> be to allow the adapter driver to register a reset function so that the
>> driver could be the one driving the reset, allowing the driver to synchronize
>> the reset with whatever else might be going on and also then reinitialize
>> the adapter firmware, etc. If no driver was loaded or no driver specific
>> reset function registered, the current reset mechanism would be invoked.
> 
> This would also allow the driver to do unique types of resets for different
> adapter types. Some of the adapters the ipr driver supports need to get
> reset via BIST, others via PCIe warm reset, etc. 

Is this broken ATM? I thought the PCI core would simply try all methods
+ has a quirks section for completely funky devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 13:16           ` Jan Kiszka
@ 2011-08-25 13:24             ` Brian King
  2011-08-25 18:16               ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Brian King @ 2011-08-25 13:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Michael S. Tsirkin, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On 08/25/2011 08:16 AM, Jan Kiszka wrote:
> On 2011-08-25 15:12, Brian King wrote:
>> On 08/25/2011 08:06 AM, Brian King wrote:
>>> On 08/25/2011 04:40 AM, Michael S. Tsirkin wrote:
>>>> On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
>>>>> On 2011-08-24 17:02, Brian King wrote:
>>>>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> trying to port the generic device interrupt masking pattern of
>>>>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>>>>> interface: it does not provide any synchronization between blocking
>>>>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>>>>
>>>>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>>>>
>>>>>>> loops in parallel and watch the kernel oops.
>>>>>>>
>>>>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>>>>> a plain mutex across both the user space access (via sysfs) and the
>>>>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>>>>> driver would be fine with such a change. Can someone in the CC list
>>>>>>> comment on this?
>>>>>>
>>>>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>>>>> context, so a mutex won't work.
>>>>>
>>>>> Ugh. What precisely does it have to do with the config space while
>>>>> running inside an IRQ handler (or holding a lock that synchronizes it
>>>>> with such a handler)?
>>>>>
>>>>>> When the pci_block/unblock API was
>>>>>> originally added, it did not have the checking it has today to detect
>>>>>> if it is being called nested. This was added some time later. The
>>>>>
>>>>> For a reason...
>>>>>
>>>>>> API that works best for the ipr driver is to allow for many block calls,
>>>>>> but a single unblock call unblocks access. It seems like what might
>>>>>> work well in the case above is a block count. Each call to pci_block
>>>>>> increments a count. Each pci_unblock decrements the count and only
>>>>>> actually do the unblock if the count drops to zero. It should be reasonably
>>>>>> simple for ipr to use that sort of an API as well.
>>>>>
>>>>> That will just paper over the underlying bug: multiple kernel users (!=
>>>>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>>>>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>>>>> its accesses.
>>>>>
>>>>> So it's pointless to tweak the current pci_block semantics, we rather
>>>>> need to establish a new mechanism that synchronizes *all* users of the
>>>>> config space.
>>>>>
>>>>> Jan
>>>>
>>>> It does look like all of the problems are actually around reset.
>>>> So maybe all we need to do is synchronize the sysfs-triggered
>>>> pci_reset_function with pci_block/unblock_user_cfg_access?
>>>>
>>>> In other words, when reset is triggered from sysfs, it
>>>> should obey pci_block/unblock_user_cfg_access
>>>> restrictions?
>>>>
>>>> It does not look like reset needs to sleep, so fixing
>>>> that should not be hard, right?
>>>
>>> This sounds reasonable to me. Although I think we still have the driver issue
>>> I described in my previous mail. Perhaps the best way to resolve that would
>>> be to allow the adapter driver to register a reset function so that the
>>> driver could be the one driving the reset, allowing the driver to synchronize
>>> the reset with whatever else might be going on and also then reinitialize
>>> the adapter firmware, etc. If no driver was loaded or no driver specific
>>> reset function registered, the current reset mechanism would be invoked.
>>
>> This would also allow the driver to do unique types of resets for different
>> adapter types. Some of the adapters the ipr driver supports need to get
>> reset via BIST, others via PCIe warm reset, etc. 
> 
> Is this broken ATM? I thought the PCI core would simply try all methods
> + has a quirks section for completely funky devices.

Yes. Its certainly broken for ipr. If the ipr driver is loaded, it really needs
to be the one doing the reset. If its not loaded, I may need to add a few
quirks for these adapters to properly handle this function.

-Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 13:24             ` Brian King
@ 2011-08-25 18:16               ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-25 18:16 UTC (permalink / raw)
  To: Brian King
  Cc: Jan Kiszka, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On Thu, Aug 25, 2011 at 08:24:09AM -0500, Brian King wrote:
> > Is this broken ATM? I thought the PCI core would simply try all methods
> > + has a quirks section for completely funky devices.
> 
> Yes. Its certainly broken for ipr. If the ipr driver is loaded, it really needs
> to be the one doing the reset. If its not loaded, I may need to add a few
> quirks for these adapters to properly handle this function.
> 
> -Brian

Driving reset through sysfs while another driver is bound
is likely broken for many types of devices.
But so it writing config space through sysfs,
or accessing BARs by mmap while a kernel driver is bound
with the exception of simple stub drivers such as uio and pci-stub.
And no amount of locking can fix that I think.

Let's try to see whether there's an issue with
existing applications, all of which IIUC
have a single userspace driver accessing the device.

Besides that, all we need to address, as I see it,
is not making things worse: that is if device potentially
gets broken by a sysfs access, we can't help this,
but let's at least not trigger BUG errors.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 13:06       ` Jan Kiszka
@ 2011-08-25 18:19         ` Michael S. Tsirkin
  2011-08-25 18:52           ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-25 18:19 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Brian King, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On Thu, Aug 25, 2011 at 03:06:01PM +0200, Jan Kiszka wrote:
> > I took a look at the sysfs triggered pci reset function and don't see any way
> > that the controlling device driver ever gets to be involved in this reset.
> > If code outside the ipr driver were to reset the adapter, the adapter firmware
> > would be left in an uninitialized state and until scsi core starts timing
> > out ops and driving EH, the card would be unusable. I can't imagine the
> > ipr driver is unique in this.
> 
> Right, that's why a PCI core service is needed for coordination.
> 
> Jan

But why do we want to trigger reset through sysfs while the
driver runs?

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 18:19         ` Michael S. Tsirkin
@ 2011-08-25 18:52           ` Jan Kiszka
  2011-08-25 19:07             ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-08-25 18:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Brian King, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

On 2011-08-25 20:19, Michael S. Tsirkin wrote:
> On Thu, Aug 25, 2011 at 03:06:01PM +0200, Jan Kiszka wrote:
>>> I took a look at the sysfs triggered pci reset function and don't see any way
>>> that the controlling device driver ever gets to be involved in this reset.
>>> If code outside the ipr driver were to reset the adapter, the adapter firmware
>>> would be left in an uninitialized state and until scsi core starts timing
>>> out ops and driving EH, the card would be unusable. I can't imagine the
>>> ipr driver is unique in this.
>>
>> Right, that's why a PCI core service is needed for coordination.
>>
>> Jan
> 
> But why do we want to trigger reset through sysfs while the
> driver runs?

A perfectly valid race conditions are created by KVM and VFIO: shared
IRQ handler is triggered while the user space part wants to reset the
assigned device. I'm quite sure that this how I first caused this bug to
show up (it triggered for an assigned device with a shared busy IRQ line
during QEMU startup, ie. the initial guest reset).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 18:52           ` Jan Kiszka
@ 2011-08-25 19:07             ` Michael S. Tsirkin
  2011-08-25 19:26               ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-25 19:07 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Brian King, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

On Thu, Aug 25, 2011 at 08:52:08PM +0200, Jan Kiszka wrote:
> On 2011-08-25 20:19, Michael S. Tsirkin wrote:
> > On Thu, Aug 25, 2011 at 03:06:01PM +0200, Jan Kiszka wrote:
> >>> I took a look at the sysfs triggered pci reset function and don't see any way
> >>> that the controlling device driver ever gets to be involved in this reset.
> >>> If code outside the ipr driver were to reset the adapter, the adapter firmware
> >>> would be left in an uninitialized state and until scsi core starts timing
> >>> out ops and driving EH, the card would be unusable. I can't imagine the
> >>> ipr driver is unique in this.
> >>
> >> Right, that's why a PCI core service is needed for coordination.
> >>
> >> Jan
> > 
> > But why do we want to trigger reset through sysfs while the
> > driver runs?
> 
> A perfectly valid race conditions are created by KVM and VFIO: shared
> IRQ handler is triggered while the user space part wants to reset the
> assigned device.

OK, that would be solved by blocking sysfs reset while
config access is blocked, right?

> I'm quite sure that this how I first caused this bug to
> show up (it triggered for an assigned device with a shared busy IRQ line
> during QEMU startup, ie. the initial guest reset).
> 
> Jan
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-25 19:07             ` Michael S. Tsirkin
@ 2011-08-25 19:26               ` Jan Kiszka
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-08-25 19:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Brian King, Linux Kernel Mailing List, linux-pci,
	Alex Williamson, Jesse Barnes, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

On 2011-08-25 21:07, Michael S. Tsirkin wrote:
> On Thu, Aug 25, 2011 at 08:52:08PM +0200, Jan Kiszka wrote:
>> On 2011-08-25 20:19, Michael S. Tsirkin wrote:
>>> On Thu, Aug 25, 2011 at 03:06:01PM +0200, Jan Kiszka wrote:
>>>>> I took a look at the sysfs triggered pci reset function and don't see any way
>>>>> that the controlling device driver ever gets to be involved in this reset.
>>>>> If code outside the ipr driver were to reset the adapter, the adapter firmware
>>>>> would be left in an uninitialized state and until scsi core starts timing
>>>>> out ops and driving EH, the card would be unusable. I can't imagine the
>>>>> ipr driver is unique in this.
>>>>
>>>> Right, that's why a PCI core service is needed for coordination.
>>>>
>>>> Jan
>>>
>>> But why do we want to trigger reset through sysfs while the
>>> driver runs?
>>
>> A perfectly valid race conditions are created by KVM and VFIO: shared
>> IRQ handler is triggered while the user space part wants to reset the
>> assigned device.
> 
> OK, that would be solved by blocking sysfs reset while
> config access is blocked, right?

...and VFIO's reset IOCTL. And any other potential pci_reset_function
caller. And you still need to synchronize those config space accesses
that don't wait for the block_ucfg_access while some reset is being
performed. This really screams for a redesign of config space locking.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-24 10:43 Broken pci_block_user_cfg_access interface Jan Kiszka
  2011-08-24 15:02 ` Brian King
@ 2011-08-29 15:05 ` Michael S. Tsirkin
  2011-08-29 15:42   ` Jan Kiszka
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-29 15:05 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley,
	Michael S. Tsirkin, Hans J. Koch, Greg Kroah-Hartman, linux-pci,
	linux-kernel, linux-scsi, kvm

So how about something like the following?
Compile tested only, and I'm not sure I got the
ipr and iov error handling right.
Comments?

---->
Subject: [PATCH] pci: fail block usercfg access on reset

Anyone who wants to block usercfg access will
also want to block reset from userspace.
On the other hand, reset from userspace
should block any other access from userspace.

Finally, make it possible to detect reset in
pregress by returning an error status from
pci_block_user_cfg_access.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/pci/access.c          |   45 ++++++++++++++++++++++++++++++++++++----
 drivers/pci/iov.c             |   19 ++++++++++++----
 drivers/pci/pci.c             |    4 +-
 drivers/scsi/ipr.c            |   22 ++++++++++++++++++-
 drivers/uio/uio_pci_generic.c |    7 +++++-
 include/linux/pci.h           |    6 ++++-
 6 files changed, 87 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index fdaa42a..2492365 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -139,7 +139,7 @@ static noinline void pci_wait_ucfg(struct pci_dev *dev)
 		raw_spin_unlock_irq(&pci_lock);
 		schedule();
 		raw_spin_lock_irq(&pci_lock);
-	} while (dev->block_ucfg_access);
+	} while (dev->block_ucfg_access || dev->reset_in_progress);
 	__remove_wait_queue(&pci_ucfg_wait, &wait);
 }
 
@@ -153,7 +153,8 @@ int pci_user_read_config_##size						\
 	if (PCI_##size##_BAD)						\
 		return -EINVAL;						\
 	raw_spin_lock_irq(&pci_lock);				\
-	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
+	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress)) \
+		pci_wait_ucfg(dev);					\
 	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
 					pos, sizeof(type), &data);	\
 	raw_spin_unlock_irq(&pci_lock);				\
@@ -171,8 +172,9 @@ int pci_user_write_config_##size					\
 	int ret = -EIO;							\
 	if (PCI_##size##_BAD)						\
 		return -EINVAL;						\
-	raw_spin_lock_irq(&pci_lock);				\
-	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
+	raw_spin_lock_irq(&pci_lock);					\
+	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress)) \
+		pci_wait_ucfg(dev);					\
 	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
 					pos, sizeof(type), val);	\
 	raw_spin_unlock_irq(&pci_lock);				\
@@ -408,19 +410,24 @@ EXPORT_SYMBOL(pci_vpd_truncate);
  * sleep until access is unblocked again.  We don't allow nesting of
  * block/unblock calls.
  */
-void pci_block_user_cfg_access(struct pci_dev *dev)
+int pci_block_user_cfg_access(struct pci_dev *dev)
 {
 	unsigned long flags;
 	int was_blocked;
+	int in_progress;
 
 	raw_spin_lock_irqsave(&pci_lock, flags);
 	was_blocked = dev->block_ucfg_access;
 	dev->block_ucfg_access = 1;
+	in_progress = dev->reset_in_progress;
 	raw_spin_unlock_irqrestore(&pci_lock, flags);
 
+	if (in_progress)
+		return -EIO;
 	/* If we BUG() inside the pci_lock, we're guaranteed to hose
 	 * the machine */
 	BUG_ON(was_blocked);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
 
@@ -445,3 +452,31 @@ void pci_unblock_user_cfg_access(struct pci_dev *dev)
 	raw_spin_unlock_irqrestore(&pci_lock, flags);
 }
 EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
+
+void pci_reset_start(struct pci_dev *dev)
+{
+	int was_started;
+
+	raw_spin_lock_irq(&pci_lock);
+	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress))
+		pci_wait_ucfg(dev);
+	was_started = dev->reset_in_progress;
+	dev->reset_in_progress = 1;
+	raw_spin_unlock_irq(&pci_lock);
+
+	/* If we BUG() inside the pci_lock, we're guaranteed to hose
+	 * the machine */
+	BUG_ON(was_started);
+}
+void pci_reset_end(struct pci_dev *dev)
+{
+	raw_spin_lock_irq(&pci_lock);
+
+	/* This indicates a problem in the caller, but we don't need
+	 * to kill them, unlike a double-reset above. */
+	WARN_ON(!dev->reset_in_progress);
+
+	dev->reset_in_progress = 0;
+	wake_up_all(&pci_ucfg_wait);
+	raw_spin_unlock_irq(&pci_lock);
+}
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 42fae47..464d9b5 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -275,7 +275,7 @@ static void sriov_disable_migration(struct pci_dev *dev)
 
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
-	int rc;
+	int rc, rc1;
 	int i, j;
 	int nres;
 	u16 offset, stride, initial;
@@ -340,7 +340,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
-	pci_block_user_cfg_access(dev);
+	rc = pci_block_user_cfg_access(dev);
+	if (rc)
+		goto err;
+
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	msleep(100);
 	pci_unblock_user_cfg_access(dev);
@@ -371,11 +374,14 @@ failed:
 		virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
-	pci_block_user_cfg_access(dev);
+	rc1 = pci_block_user_cfg_access(dev);
+	if (rc1)
+		goto err;
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	ssleep(1);
 	pci_unblock_user_cfg_access(dev);
 
+err:
 	if (iov->link != dev->devfn)
 		sysfs_remove_link(&dev->dev.kobj, "dep_link");
 
@@ -384,7 +390,7 @@ failed:
 
 static void sriov_disable(struct pci_dev *dev)
 {
-	int i;
+	int i, rc;
 	struct pci_sriov *iov = dev->sriov;
 
 	if (!iov->nr_virtfn)
@@ -397,11 +403,14 @@ static void sriov_disable(struct pci_dev *dev)
 		virtfn_remove(dev, i, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
-	pci_block_user_cfg_access(dev);
+	rc = pci_block_user_cfg_access(dev);
+	if (rc)
+		goto err;
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	ssleep(1);
 	pci_unblock_user_cfg_access(dev);
 
+err:
 	if (iov->link != dev->devfn)
 		sysfs_remove_link(&dev->dev.kobj, "dep_link");
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0ce6742..815efc2 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2959,7 +2959,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
 	might_sleep();
 
 	if (!probe) {
-		pci_block_user_cfg_access(dev);
+		pci_reset_start(dev);
 		/* block PM suspend, driver probe, etc. */
 		device_lock(&dev->dev);
 	}
@@ -2984,7 +2984,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
 done:
 	if (!probe) {
 		device_unlock(&dev->dev);
-		pci_unblock_user_cfg_access(dev);
+		pci_reset_end(dev);
 	}
 
 	return rc;
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 8d63630..d322da3 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -7661,7 +7661,9 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
 	int rc = PCIBIOS_SUCCESSFUL;
 
 	ENTER;
-	pci_block_user_cfg_access(ioa_cfg->pdev);
+	rc = pci_block_user_cfg_access(ioa_cfg->pdev);
+	if (rc)
+		goto err;
 
 	if (ioa_cfg->ipr_chip->bist_method == IPR_MMIO)
 		writel(IPR_UPROCI_SIS64_START_BIST,
@@ -7681,6 +7683,13 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
 
 	LEAVE;
 	return rc;
+
+err:
+
+	ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
+	rc = IPR_RC_JOB_CONTINUE;
+	LEAVE;
+	return rc;
 }
 
 /**
@@ -7715,14 +7724,23 @@ static int ipr_reset_slot_reset(struct ipr_cmnd *ipr_cmd)
 {
 	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
 	struct pci_dev *pdev = ioa_cfg->pdev;
+	int rc;
 
 	ENTER;
-	pci_block_user_cfg_access(pdev);
+	rc = pci_block_user_cfg_access(pdev);
+	if (rc)
+		goto err;
+
 	pci_set_pcie_reset_state(pdev, pcie_warm_reset);
 	ipr_cmd->job_step = ipr_reset_slot_reset_done;
 	ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
 	LEAVE;
 	return IPR_RC_JOB_RETURN;
+
+	ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
+	rc = IPR_RC_JOB_CONTINUE;
+	LEAVE;
+	return rc;
 }
 
 /**
diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
index fc22e1e..a26d35f 100644
--- a/drivers/uio/uio_pci_generic.c
+++ b/drivers/uio/uio_pci_generic.c
@@ -51,6 +51,7 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
 	irqreturn_t ret = IRQ_NONE;
 	u32 cmd_status_dword;
 	u16 origcmd, newcmd, status;
+	int r;
 
 	/* We do a single dword read to retrieve both command and status.
 	 * Document assumptions that make this possible. */
@@ -58,7 +59,9 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
 	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
 
 	spin_lock_irq(&gdev->lock);
-	pci_block_user_cfg_access(pdev);
+	r = pci_block_user_cfg_access(pdev);
+	if (r < 0)
+		goto err;
 
 	/* Read both command and status registers in a single 32-bit operation.
 	 * Note: we could cache the value for command and move the status read
@@ -83,6 +86,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
 done:
 
 	pci_unblock_user_cfg_access(pdev);
+err:
+
 	spin_unlock_irq(&gdev->lock);
 	return ret;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8c230cb..ec3b8fe 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -322,6 +322,7 @@ struct pci_dev {
 	unsigned int    is_hotplug_bridge:1;
 	unsigned int    __aer_firmware_first_valid:1;
 	unsigned int	__aer_firmware_first:1;
+	unsigned int	reset_in_progress:1;
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
 
@@ -1079,9 +1080,12 @@ int  ht_create_irq(struct pci_dev *dev, int idx);
 void ht_destroy_irq(unsigned int irq);
 #endif /* CONFIG_HT_IRQ */
 
-extern void pci_block_user_cfg_access(struct pci_dev *dev);
+extern int pci_block_user_cfg_access(struct pci_dev *dev);
 extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
 
+extern void pci_reset_start(struct pci_dev *dev);
+extern void pci_reset_end(struct pci_dev *dev);
+
 /*
  * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
  * a PCI domain is defined to be a set of PCI busses which share
-- 
1.7.5.53.gc233e

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 15:05 ` Michael S. Tsirkin
@ 2011-08-29 15:42   ` Jan Kiszka
  2011-08-29 15:58     ` Michael S. Tsirkin
  2011-08-29 18:47     ` Jan Kiszka
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-08-29 15:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-08-29 17:05, Michael S. Tsirkin wrote:
> So how about something like the following?
> Compile tested only, and I'm not sure I got the
> ipr and iov error handling right.
> Comments?

This still doesn't synchronize two callers of pci_block_user_cfg_access
unless one was reset. We may not have such a scenario yet, but that's
how the old code started as well...

And it makes the interface more convoluted (from 10000 meter, why should
pci_block_user_cfg_access return an error if reset is running?).

> 
> ---->
> Subject: [PATCH] pci: fail block usercfg access on reset
> 
> Anyone who wants to block usercfg access will
> also want to block reset from userspace.
> On the other hand, reset from userspace
> should block any other access from userspace.
> 
> Finally, make it possible to detect reset in
> pregress by returning an error status from
> pci_block_user_cfg_access.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/pci/access.c          |   45 ++++++++++++++++++++++++++++++++++++----
>  drivers/pci/iov.c             |   19 ++++++++++++----
>  drivers/pci/pci.c             |    4 +-
>  drivers/scsi/ipr.c            |   22 ++++++++++++++++++-
>  drivers/uio/uio_pci_generic.c |    7 +++++-
>  include/linux/pci.h           |    6 ++++-
>  6 files changed, 87 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index fdaa42a..2492365 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -139,7 +139,7 @@ static noinline void pci_wait_ucfg(struct pci_dev *dev)
>  		raw_spin_unlock_irq(&pci_lock);
>  		schedule();
>  		raw_spin_lock_irq(&pci_lock);
> -	} while (dev->block_ucfg_access);
> +	} while (dev->block_ucfg_access || dev->reset_in_progress);
>  	__remove_wait_queue(&pci_ucfg_wait, &wait);
>  }
>  
> @@ -153,7 +153,8 @@ int pci_user_read_config_##size						\
>  	if (PCI_##size##_BAD)						\
>  		return -EINVAL;						\
>  	raw_spin_lock_irq(&pci_lock);				\
> -	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
> +	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress)) \
> +		pci_wait_ucfg(dev);					\
>  	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
>  					pos, sizeof(type), &data);	\
>  	raw_spin_unlock_irq(&pci_lock);				\
> @@ -171,8 +172,9 @@ int pci_user_write_config_##size					\
>  	int ret = -EIO;							\
>  	if (PCI_##size##_BAD)						\
>  		return -EINVAL;						\
> -	raw_spin_lock_irq(&pci_lock);				\
> -	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
> +	raw_spin_lock_irq(&pci_lock);					\
> +	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress)) \
> +		pci_wait_ucfg(dev);					\
>  	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
>  					pos, sizeof(type), val);	\
>  	raw_spin_unlock_irq(&pci_lock);				\
> @@ -408,19 +410,24 @@ EXPORT_SYMBOL(pci_vpd_truncate);
>   * sleep until access is unblocked again.  We don't allow nesting of
>   * block/unblock calls.
>   */
> -void pci_block_user_cfg_access(struct pci_dev *dev)
> +int pci_block_user_cfg_access(struct pci_dev *dev)
>  {
>  	unsigned long flags;
>  	int was_blocked;
> +	int in_progress;
>  
>  	raw_spin_lock_irqsave(&pci_lock, flags);
>  	was_blocked = dev->block_ucfg_access;
>  	dev->block_ucfg_access = 1;
> +	in_progress = dev->reset_in_progress;
>  	raw_spin_unlock_irqrestore(&pci_lock, flags);
>  
> +	if (in_progress)
> +		return -EIO;
>  	/* If we BUG() inside the pci_lock, we're guaranteed to hose
>  	 * the machine */
>  	BUG_ON(was_blocked);
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
>  
> @@ -445,3 +452,31 @@ void pci_unblock_user_cfg_access(struct pci_dev *dev)
>  	raw_spin_unlock_irqrestore(&pci_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
> +
> +void pci_reset_start(struct pci_dev *dev)
> +{
> +	int was_started;
> +
> +	raw_spin_lock_irq(&pci_lock);
> +	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress))
> +		pci_wait_ucfg(dev);
> +	was_started = dev->reset_in_progress;
> +	dev->reset_in_progress = 1;
> +	raw_spin_unlock_irq(&pci_lock);
> +
> +	/* If we BUG() inside the pci_lock, we're guaranteed to hose
> +	 * the machine */
> +	BUG_ON(was_started);
> +}
> +void pci_reset_end(struct pci_dev *dev)
> +{
> +	raw_spin_lock_irq(&pci_lock);
> +
> +	/* This indicates a problem in the caller, but we don't need
> +	 * to kill them, unlike a double-reset above. */
> +	WARN_ON(!dev->reset_in_progress);
> +
> +	dev->reset_in_progress = 0;
> +	wake_up_all(&pci_ucfg_wait);
> +	raw_spin_unlock_irq(&pci_lock);
> +}
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 42fae47..464d9b5 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -275,7 +275,7 @@ static void sriov_disable_migration(struct pci_dev *dev)
>  
>  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  {
> -	int rc;
> +	int rc, rc1;
>  	int i, j;
>  	int nres;
>  	u16 offset, stride, initial;
> @@ -340,7 +340,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  	}
>  
>  	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
> -	pci_block_user_cfg_access(dev);
> +	rc = pci_block_user_cfg_access(dev);
> +	if (rc)
> +		goto err;
> +
>  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>  	msleep(100);
>  	pci_unblock_user_cfg_access(dev);
> @@ -371,11 +374,14 @@ failed:
>  		virtfn_remove(dev, j, 0);
>  
>  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> -	pci_block_user_cfg_access(dev);
> +	rc1 = pci_block_user_cfg_access(dev);
> +	if (rc1)
> +		goto err;
>  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>  	ssleep(1);
>  	pci_unblock_user_cfg_access(dev);
>  
> +err:
>  	if (iov->link != dev->devfn)
>  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
>  
> @@ -384,7 +390,7 @@ failed:
>  
>  static void sriov_disable(struct pci_dev *dev)
>  {
> -	int i;
> +	int i, rc;
>  	struct pci_sriov *iov = dev->sriov;
>  
>  	if (!iov->nr_virtfn)
> @@ -397,11 +403,14 @@ static void sriov_disable(struct pci_dev *dev)
>  		virtfn_remove(dev, i, 0);
>  
>  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> -	pci_block_user_cfg_access(dev);
> +	rc = pci_block_user_cfg_access(dev);
> +	if (rc)
> +		goto err;
>  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>  	ssleep(1);
>  	pci_unblock_user_cfg_access(dev);
>  
> +err:
>  	if (iov->link != dev->devfn)
>  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
>  
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 0ce6742..815efc2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2959,7 +2959,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
>  	might_sleep();
>  
>  	if (!probe) {
> -		pci_block_user_cfg_access(dev);
> +		pci_reset_start(dev);
>  		/* block PM suspend, driver probe, etc. */
>  		device_lock(&dev->dev);
>  	}
> @@ -2984,7 +2984,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
>  done:
>  	if (!probe) {
>  		device_unlock(&dev->dev);
> -		pci_unblock_user_cfg_access(dev);
> +		pci_reset_end(dev);
>  	}
>  
>  	return rc;
> diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
> index 8d63630..d322da3 100644
> --- a/drivers/scsi/ipr.c
> +++ b/drivers/scsi/ipr.c
> @@ -7661,7 +7661,9 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
>  	int rc = PCIBIOS_SUCCESSFUL;
>  
>  	ENTER;
> -	pci_block_user_cfg_access(ioa_cfg->pdev);
> +	rc = pci_block_user_cfg_access(ioa_cfg->pdev);
> +	if (rc)
> +		goto err;
>  
>  	if (ioa_cfg->ipr_chip->bist_method == IPR_MMIO)
>  		writel(IPR_UPROCI_SIS64_START_BIST,
> @@ -7681,6 +7683,13 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
>  
>  	LEAVE;
>  	return rc;
> +
> +err:
> +
> +	ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
> +	rc = IPR_RC_JOB_CONTINUE;
> +	LEAVE;
> +	return rc;
>  }
>  
>  /**
> @@ -7715,14 +7724,23 @@ static int ipr_reset_slot_reset(struct ipr_cmnd *ipr_cmd)
>  {
>  	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
>  	struct pci_dev *pdev = ioa_cfg->pdev;
> +	int rc;
>  
>  	ENTER;
> -	pci_block_user_cfg_access(pdev);
> +	rc = pci_block_user_cfg_access(pdev);
> +	if (rc)
> +		goto err;

Looks like the target for this jump is missing.

> +
>  	pci_set_pcie_reset_state(pdev, pcie_warm_reset);
>  	ipr_cmd->job_step = ipr_reset_slot_reset_done;
>  	ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
>  	LEAVE;
>  	return IPR_RC_JOB_RETURN;
> +
> +	ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
> +	rc = IPR_RC_JOB_CONTINUE;
> +	LEAVE;
> +	return rc;
>  }
>  
>  /**
> diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
> index fc22e1e..a26d35f 100644
> --- a/drivers/uio/uio_pci_generic.c
> +++ b/drivers/uio/uio_pci_generic.c
> @@ -51,6 +51,7 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
>  	irqreturn_t ret = IRQ_NONE;
>  	u32 cmd_status_dword;
>  	u16 origcmd, newcmd, status;
> +	int r;
>  
>  	/* We do a single dword read to retrieve both command and status.
>  	 * Document assumptions that make this possible. */
> @@ -58,7 +59,9 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
>  	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
>  
>  	spin_lock_irq(&gdev->lock);
> -	pci_block_user_cfg_access(pdev);
> +	r = pci_block_user_cfg_access(pdev);
> +	if (r < 0)
> +		goto err;
>  
>  	/* Read both command and status registers in a single 32-bit operation.
>  	 * Note: we could cache the value for command and move the status read
> @@ -83,6 +86,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
>  done:
>  
>  	pci_unblock_user_cfg_access(pdev);
> +err:
> +
>  	spin_unlock_irq(&gdev->lock);
>  	return ret;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 8c230cb..ec3b8fe 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -322,6 +322,7 @@ struct pci_dev {
>  	unsigned int    is_hotplug_bridge:1;
>  	unsigned int    __aer_firmware_first_valid:1;
>  	unsigned int	__aer_firmware_first:1;
> +	unsigned int	reset_in_progress:1;
>  	pci_dev_flags_t dev_flags;
>  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
>  
> @@ -1079,9 +1080,12 @@ int  ht_create_irq(struct pci_dev *dev, int idx);
>  void ht_destroy_irq(unsigned int irq);
>  #endif /* CONFIG_HT_IRQ */
>  
> -extern void pci_block_user_cfg_access(struct pci_dev *dev);
> +extern int pci_block_user_cfg_access(struct pci_dev *dev);
>  extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
>  
> +extern void pci_reset_start(struct pci_dev *dev);
> +extern void pci_reset_end(struct pci_dev *dev);
> +
>  /*
>   * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
>   * a PCI domain is defined to be a set of PCI busses which share

I still don't get what prevents converting ipr to allow plain mutex
synchronization. My vision is:
 - push reset-on-error of ipr into workqueue (or threaded IRQ?)
 - require mutex synchronization for common config space access and the
   full reset cycle
 - only exception: INTx status/masking access
    => use pci_lock + test for reset_in_progress, skip operation if
       that is the case

That would allow to drop the whole block_user_cfg infrastructure.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 15:42   ` Jan Kiszka
@ 2011-08-29 15:58     ` Michael S. Tsirkin
  2011-08-29 16:14       ` Jan Kiszka
  2011-08-29 18:47     ` Jan Kiszka
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-29 15:58 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On Mon, Aug 29, 2011 at 05:42:16PM +0200, Jan Kiszka wrote:
> On 2011-08-29 17:05, Michael S. Tsirkin wrote:
> > So how about something like the following?
> > Compile tested only, and I'm not sure I got the
> > ipr and iov error handling right.
> > Comments?
> 
> This still doesn't synchronize two callers of pci_block_user_cfg_access
> unless one was reset. We may not have such a scenario yet, but that's
> how the old code started as well...
> 
> And it makes the interface more convoluted (from 10000 meter, why should
> pci_block_user_cfg_access return an error if reset is running?).

Well I made the error EIO but it really is something like
EAGAIN which means 'I would block if I could, but I was
asked not to'.

> > 
> > ---->
> > Subject: [PATCH] pci: fail block usercfg access on reset
> > 
> > Anyone who wants to block usercfg access will
> > also want to block reset from userspace.
> > On the other hand, reset from userspace
> > should block any other access from userspace.
> > 
> > Finally, make it possible to detect reset in
> > pregress by returning an error status from
> > pci_block_user_cfg_access.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/pci/access.c          |   45 ++++++++++++++++++++++++++++++++++++----
> >  drivers/pci/iov.c             |   19 ++++++++++++----
> >  drivers/pci/pci.c             |    4 +-
> >  drivers/scsi/ipr.c            |   22 ++++++++++++++++++-
> >  drivers/uio/uio_pci_generic.c |    7 +++++-
> >  include/linux/pci.h           |    6 ++++-
> >  6 files changed, 87 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> > index fdaa42a..2492365 100644
> > --- a/drivers/pci/access.c
> > +++ b/drivers/pci/access.c
> > @@ -139,7 +139,7 @@ static noinline void pci_wait_ucfg(struct pci_dev *dev)
> >  		raw_spin_unlock_irq(&pci_lock);
> >  		schedule();
> >  		raw_spin_lock_irq(&pci_lock);
> > -	} while (dev->block_ucfg_access);
> > +	} while (dev->block_ucfg_access || dev->reset_in_progress);
> >  	__remove_wait_queue(&pci_ucfg_wait, &wait);
> >  }
> >  
> > @@ -153,7 +153,8 @@ int pci_user_read_config_##size						\
> >  	if (PCI_##size##_BAD)						\
> >  		return -EINVAL;						\
> >  	raw_spin_lock_irq(&pci_lock);				\
> > -	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
> > +	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress)) \
> > +		pci_wait_ucfg(dev);					\
> >  	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
> >  					pos, sizeof(type), &data);	\
> >  	raw_spin_unlock_irq(&pci_lock);				\
> > @@ -171,8 +172,9 @@ int pci_user_write_config_##size					\
> >  	int ret = -EIO;							\
> >  	if (PCI_##size##_BAD)						\
> >  		return -EINVAL;						\
> > -	raw_spin_lock_irq(&pci_lock);				\
> > -	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
> > +	raw_spin_lock_irq(&pci_lock);					\
> > +	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress)) \
> > +		pci_wait_ucfg(dev);					\
> >  	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
> >  					pos, sizeof(type), val);	\
> >  	raw_spin_unlock_irq(&pci_lock);				\
> > @@ -408,19 +410,24 @@ EXPORT_SYMBOL(pci_vpd_truncate);
> >   * sleep until access is unblocked again.  We don't allow nesting of
> >   * block/unblock calls.
> >   */
> > -void pci_block_user_cfg_access(struct pci_dev *dev)
> > +int pci_block_user_cfg_access(struct pci_dev *dev)
> >  {
> >  	unsigned long flags;
> >  	int was_blocked;
> > +	int in_progress;
> >  
> >  	raw_spin_lock_irqsave(&pci_lock, flags);
> >  	was_blocked = dev->block_ucfg_access;
> >  	dev->block_ucfg_access = 1;
> > +	in_progress = dev->reset_in_progress;
> >  	raw_spin_unlock_irqrestore(&pci_lock, flags);
> >  
> > +	if (in_progress)
> > +		return -EIO;
> >  	/* If we BUG() inside the pci_lock, we're guaranteed to hose
> >  	 * the machine */
> >  	BUG_ON(was_blocked);
> > +	return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
> >  
> > @@ -445,3 +452,31 @@ void pci_unblock_user_cfg_access(struct pci_dev *dev)
> >  	raw_spin_unlock_irqrestore(&pci_lock, flags);
> >  }
> >  EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
> > +
> > +void pci_reset_start(struct pci_dev *dev)
> > +{
> > +	int was_started;
> > +
> > +	raw_spin_lock_irq(&pci_lock);
> > +	if (unlikely(dev->block_ucfg_access || dev->reset_in_progress))
> > +		pci_wait_ucfg(dev);
> > +	was_started = dev->reset_in_progress;
> > +	dev->reset_in_progress = 1;
> > +	raw_spin_unlock_irq(&pci_lock);
> > +
> > +	/* If we BUG() inside the pci_lock, we're guaranteed to hose
> > +	 * the machine */
> > +	BUG_ON(was_started);
> > +}
> > +void pci_reset_end(struct pci_dev *dev)
> > +{
> > +	raw_spin_lock_irq(&pci_lock);
> > +
> > +	/* This indicates a problem in the caller, but we don't need
> > +	 * to kill them, unlike a double-reset above. */
> > +	WARN_ON(!dev->reset_in_progress);
> > +
> > +	dev->reset_in_progress = 0;
> > +	wake_up_all(&pci_ucfg_wait);
> > +	raw_spin_unlock_irq(&pci_lock);
> > +}
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 42fae47..464d9b5 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -275,7 +275,7 @@ static void sriov_disable_migration(struct pci_dev *dev)
> >  
> >  static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> >  {
> > -	int rc;
> > +	int rc, rc1;
> >  	int i, j;
> >  	int nres;
> >  	u16 offset, stride, initial;
> > @@ -340,7 +340,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
> >  	}
> >  
> >  	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
> > -	pci_block_user_cfg_access(dev);
> > +	rc = pci_block_user_cfg_access(dev);
> > +	if (rc)
> > +		goto err;
> > +
> >  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
> >  	msleep(100);
> >  	pci_unblock_user_cfg_access(dev);
> > @@ -371,11 +374,14 @@ failed:
> >  		virtfn_remove(dev, j, 0);
> >  
> >  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> > -	pci_block_user_cfg_access(dev);
> > +	rc1 = pci_block_user_cfg_access(dev);
> > +	if (rc1)
> > +		goto err;
> >  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
> >  	ssleep(1);
> >  	pci_unblock_user_cfg_access(dev);
> >  
> > +err:
> >  	if (iov->link != dev->devfn)
> >  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
> >  
> > @@ -384,7 +390,7 @@ failed:
> >  
> >  static void sriov_disable(struct pci_dev *dev)
> >  {
> > -	int i;
> > +	int i, rc;
> >  	struct pci_sriov *iov = dev->sriov;
> >  
> >  	if (!iov->nr_virtfn)
> > @@ -397,11 +403,14 @@ static void sriov_disable(struct pci_dev *dev)
> >  		virtfn_remove(dev, i, 0);
> >  
> >  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> > -	pci_block_user_cfg_access(dev);
> > +	rc = pci_block_user_cfg_access(dev);
> > +	if (rc)
> > +		goto err;
> >  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
> >  	ssleep(1);
> >  	pci_unblock_user_cfg_access(dev);
> >  
> > +err:
> >  	if (iov->link != dev->devfn)
> >  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
> >  
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 0ce6742..815efc2 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2959,7 +2959,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
> >  	might_sleep();
> >  
> >  	if (!probe) {
> > -		pci_block_user_cfg_access(dev);
> > +		pci_reset_start(dev);
> >  		/* block PM suspend, driver probe, etc. */
> >  		device_lock(&dev->dev);
> >  	}
> > @@ -2984,7 +2984,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
> >  done:
> >  	if (!probe) {
> >  		device_unlock(&dev->dev);
> > -		pci_unblock_user_cfg_access(dev);
> > +		pci_reset_end(dev);
> >  	}
> >  
> >  	return rc;
> > diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
> > index 8d63630..d322da3 100644
> > --- a/drivers/scsi/ipr.c
> > +++ b/drivers/scsi/ipr.c
> > @@ -7661,7 +7661,9 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
> >  	int rc = PCIBIOS_SUCCESSFUL;
> >  
> >  	ENTER;
> > -	pci_block_user_cfg_access(ioa_cfg->pdev);
> > +	rc = pci_block_user_cfg_access(ioa_cfg->pdev);
> > +	if (rc)
> > +		goto err;
> >  
> >  	if (ioa_cfg->ipr_chip->bist_method == IPR_MMIO)
> >  		writel(IPR_UPROCI_SIS64_START_BIST,
> > @@ -7681,6 +7683,13 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
> >  
> >  	LEAVE;
> >  	return rc;
> > +
> > +err:
> > +
> > +	ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
> > +	rc = IPR_RC_JOB_CONTINUE;
> > +	LEAVE;
> > +	return rc;
> >  }
> >  
> >  /**
> > @@ -7715,14 +7724,23 @@ static int ipr_reset_slot_reset(struct ipr_cmnd *ipr_cmd)
> >  {
> >  	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
> >  	struct pci_dev *pdev = ioa_cfg->pdev;
> > +	int rc;
> >  
> >  	ENTER;
> > -	pci_block_user_cfg_access(pdev);
> > +	rc = pci_block_user_cfg_access(pdev);
> > +	if (rc)
> > +		goto err;
> 
> Looks like the target for this jump is missing.
> 
> > +
> >  	pci_set_pcie_reset_state(pdev, pcie_warm_reset);
> >  	ipr_cmd->job_step = ipr_reset_slot_reset_done;
> >  	ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
> >  	LEAVE;
> >  	return IPR_RC_JOB_RETURN;
> > +
> > +	ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
> > +	rc = IPR_RC_JOB_CONTINUE;
> > +	LEAVE;
> > +	return rc;
> >  }
> >  
> >  /**
> > diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
> > index fc22e1e..a26d35f 100644
> > --- a/drivers/uio/uio_pci_generic.c
> > +++ b/drivers/uio/uio_pci_generic.c
> > @@ -51,6 +51,7 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
> >  	irqreturn_t ret = IRQ_NONE;
> >  	u32 cmd_status_dword;
> >  	u16 origcmd, newcmd, status;
> > +	int r;
> >  
> >  	/* We do a single dword read to retrieve both command and status.
> >  	 * Document assumptions that make this possible. */
> > @@ -58,7 +59,9 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
> >  	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
> >  
> >  	spin_lock_irq(&gdev->lock);
> > -	pci_block_user_cfg_access(pdev);
> > +	r = pci_block_user_cfg_access(pdev);
> > +	if (r < 0)
> > +		goto err;
> >  
> >  	/* Read both command and status registers in a single 32-bit operation.
> >  	 * Note: we could cache the value for command and move the status read
> > @@ -83,6 +86,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
> >  done:
> >  
> >  	pci_unblock_user_cfg_access(pdev);
> > +err:
> > +
> >  	spin_unlock_irq(&gdev->lock);
> >  	return ret;
> >  }
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 8c230cb..ec3b8fe 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -322,6 +322,7 @@ struct pci_dev {
> >  	unsigned int    is_hotplug_bridge:1;
> >  	unsigned int    __aer_firmware_first_valid:1;
> >  	unsigned int	__aer_firmware_first:1;
> > +	unsigned int	reset_in_progress:1;
> >  	pci_dev_flags_t dev_flags;
> >  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
> >  
> > @@ -1079,9 +1080,12 @@ int  ht_create_irq(struct pci_dev *dev, int idx);
> >  void ht_destroy_irq(unsigned int irq);
> >  #endif /* CONFIG_HT_IRQ */
> >  
> > -extern void pci_block_user_cfg_access(struct pci_dev *dev);
> > +extern int pci_block_user_cfg_access(struct pci_dev *dev);
> >  extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
> >  
> > +extern void pci_reset_start(struct pci_dev *dev);
> > +extern void pci_reset_end(struct pci_dev *dev);
> > +
> >  /*
> >   * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
> >   * a PCI domain is defined to be a set of PCI busses which share
> 
> I still don't get what prevents converting ipr to allow plain mutex
> synchronization. My vision is:
>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>  - require mutex synchronization for common config space access

Meaning pci_user_ read/write config?

>     and the
>    full reset cycle
>  - only exception: INTx status/masking access
>     => use pci_lock + test for reset_in_progress, skip operation if
>        that is the case
> 
> That would allow to drop the whole block_user_cfg infrastructure.
> 
> Jan

We still need to block userspace access while INTx does
the status/masking access, right?




> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 15:58     ` Michael S. Tsirkin
@ 2011-08-29 16:14       ` Jan Kiszka
  2011-08-29 16:23         ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-08-29 16:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-08-29 17:58, Michael S. Tsirkin wrote:
> On Mon, Aug 29, 2011 at 05:42:16PM +0200, Jan Kiszka wrote:
>> I still don't get what prevents converting ipr to allow plain mutex
>> synchronization. My vision is:
>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>>  - require mutex synchronization for common config space access
> 
> Meaning pci_user_ read/write config?

And pci_dev_reset, yes.

> 
>>     and the
>>    full reset cycle
>>  - only exception: INTx status/masking access
>>     => use pci_lock + test for reset_in_progress, skip operation if
>>        that is the case
>>
>> That would allow to drop the whole block_user_cfg infrastructure.
>>
>> Jan
> 
> We still need to block userspace access while INTx does
> the status/masking access, right?

Yes, pci_lock would do that for us.

We should consider making the related bits for INTx test & mask/unmask
generic PCI services so that no user (uio_pci_generic, kvm, vfio) needs
to worry about the locking details.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 16:14       ` Jan Kiszka
@ 2011-08-29 16:23         ` Michael S. Tsirkin
  2011-08-29 16:26           ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-29 16:23 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On Mon, Aug 29, 2011 at 06:14:39PM +0200, Jan Kiszka wrote:
> On 2011-08-29 17:58, Michael S. Tsirkin wrote:
> > On Mon, Aug 29, 2011 at 05:42:16PM +0200, Jan Kiszka wrote:
> >> I still don't get what prevents converting ipr to allow plain mutex
> >> synchronization. My vision is:
> >>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
> >>  - require mutex synchronization for common config space access
> > 
> > Meaning pci_user_ read/write config?
> 
> And pci_dev_reset, yes.
> 
> > 
> >>     and the
> >>    full reset cycle
> >>  - only exception: INTx status/masking access
> >>     => use pci_lock + test for reset_in_progress, skip operation if
> >>        that is the case
> >>
> >> That would allow to drop the whole block_user_cfg infrastructure.
> >>
> >> Jan
> > 
> > We still need to block userspace access while INTx does
> > the status/masking access, right?
> 
> Yes, pci_lock would do that for us.

Well this means block_user_cfg is not going away,
this is what it really is: pci_lock + a bit to lock out userspace.

> We should consider making the related bits for INTx test & mask/unmask
> generic PCI services so that no user (uio_pci_generic, kvm, vfio) needs
> to worry about the locking details.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 16:23         ` Michael S. Tsirkin
@ 2011-08-29 16:26           ` Jan Kiszka
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-08-29 16:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-08-29 18:23, Michael S. Tsirkin wrote:
> On Mon, Aug 29, 2011 at 06:14:39PM +0200, Jan Kiszka wrote:
>> On 2011-08-29 17:58, Michael S. Tsirkin wrote:
>>> On Mon, Aug 29, 2011 at 05:42:16PM +0200, Jan Kiszka wrote:
>>>> I still don't get what prevents converting ipr to allow plain mutex
>>>> synchronization. My vision is:
>>>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>>>>  - require mutex synchronization for common config space access
>>>
>>> Meaning pci_user_ read/write config?
>>
>> And pci_dev_reset, yes.
>>
>>>
>>>>     and the
>>>>    full reset cycle
>>>>  - only exception: INTx status/masking access
>>>>     => use pci_lock + test for reset_in_progress, skip operation if
>>>>        that is the case
>>>>
>>>> That would allow to drop the whole block_user_cfg infrastructure.
>>>>
>>>> Jan
>>>
>>> We still need to block userspace access while INTx does
>>> the status/masking access, right?
>>
>> Yes, pci_lock would do that for us.
> 
> Well this means block_user_cfg is not going away,
> this is what it really is: pci_lock + a bit to lock out userspace.

I does as we only end up with a mutex and pci_lock. No more hand-crafted
queuing/blocking/waking.

INTx masking is a bit special as it's the only thing that truly requires
atomic context. But that's something we should address generically anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 15:42   ` Jan Kiszka
  2011-08-29 15:58     ` Michael S. Tsirkin
@ 2011-08-29 18:47     ` Jan Kiszka
  2011-08-29 19:18       ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-08-29 18:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-08-29 17:42, Jan Kiszka wrote:
> I still don't get what prevents converting ipr to allow plain mutex
> synchronization. My vision is:
>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)

I'm starting to like your proposal: I had a look at ipr, but it turned
out to be anything but trivial to convert that driver. It runs its
complete state machine under spin_lock_irq, and the functions calling
pci_block/unblock_user_cfg_access are deep inside this thing. I have no
hardware to test whatever change, and I feel a bit uncomfortable asking
Brian to redesign his driver that massively.

So back to your idea: I would generalize pci_block_user_cfg_access to
pci_block_cfg_access. It should fail when some other site already holds
the access lock, but it should remain non-blocking - for the sake of ipr.

We should still provide generic pci-2.3 IRQ masking services, but that
could be done in a second step. I could have a look at this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 18:47     ` Jan Kiszka
@ 2011-08-29 19:18       ` Michael S. Tsirkin
  2011-08-30 16:30         ` Brian King
  2011-09-02  7:48         ` [RFC] pci: Rework config space blocking services Jan Kiszka
  0 siblings, 2 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-29 19:18 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
> On 2011-08-29 17:42, Jan Kiszka wrote:
> > I still don't get what prevents converting ipr to allow plain mutex
> > synchronization. My vision is:
> >  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
> 
> I'm starting to like your proposal: I had a look at ipr, but it turned
> out to be anything but trivial to convert that driver. It runs its
> complete state machine under spin_lock_irq, and the functions calling
> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
> hardware to test whatever change, and I feel a bit uncomfortable asking
> Brian to redesign his driver that massively.
> 
> So back to your idea: I would generalize pci_block_user_cfg_access to
> pci_block_cfg_access. It should fail when some other site already holds
> the access lock, but it should remain non-blocking - for the sake of ipr.

It would be easy to have blocking and non-blocking variants.

But
- I have no idea whether supporting sysfs config/reset access
  while ipr is active makes any sense - I know we need it for uio.
- reset while uio handles interrupt needs to block, not fail I think


> We should still provide generic pci-2.3 IRQ masking services, but that
> could be done in a second step. I could have a look at this.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-29 19:18       ` Michael S. Tsirkin
@ 2011-08-30 16:30         ` Brian King
  2011-08-30 18:01           ` Michael S. Tsirkin
  2011-09-02  7:48         ` [RFC] pci: Rework config space blocking services Jan Kiszka
  1 sibling, 1 reply; 35+ messages in thread
From: Brian King @ 2011-08-30 16:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Jesse Barnes, Brian King, James E.J. Bottomley,
	Hans J. Koch, Greg Kroah-Hartman, linux-pci, linux-kernel,
	linux-scsi, kvm

On 08/29/2011 02:18 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
>> On 2011-08-29 17:42, Jan Kiszka wrote:
>>> I still don't get what prevents converting ipr to allow plain mutex
>>> synchronization. My vision is:
>>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>>
>> I'm starting to like your proposal: I had a look at ipr, but it turned
>> out to be anything but trivial to convert that driver. It runs its
>> complete state machine under spin_lock_irq, and the functions calling
>> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
>> hardware to test whatever change, and I feel a bit uncomfortable asking
>> Brian to redesign his driver that massively.
>>
>> So back to your idea: I would generalize pci_block_user_cfg_access to
>> pci_block_cfg_access. It should fail when some other site already holds
>> the access lock, but it should remain non-blocking - for the sake of ipr.
> 
> It would be easy to have blocking and non-blocking variants.
> 
> But
> - I have no idea whether supporting sysfs config/reset access
>   while ipr is active makes any sense - I know we need it for uio.

I really don't think it makes sense. Ideally, I really think the driver
should be able to override the PCI layer reset interface in sysfs. If a driver
is loaded, the driver owns all the state of that device and resetting it without
informing the driver is just nasty. Additionally, many devices may have
much more complex logic to performing a reset than what PCI defines. With
ipr, for example, it needs to get a shutdown command issued to it prior to the
reset if at all possible so that the firmware quiesces any I/O it is performing.
It also needs additional communication prior to resetting the chip to ensure
the firmware is not modifying its persistent error log on the adapter's flash,
since resetting the card while the flash segment is being updated will cause
the adapter to lose the persistent error log. Post reset, ipr has a bunch
of work to do to get the firmware back up and running to a state where it
can handle I/O again.

Different ipr chips also have different requirements as to what
reset mechanisms defined by PCI actually work. Some chips require BIST to be
run via PCI config space, while others require a PCI warm reset, otherwise
the card ends up in an unusable state.

So, here is my proposal to resolve this particular issue. Add a reset function
to the pci_driver struct which would allow drivers to override the default reset
action. Drivers that can tolerate the existing reset mechanism can simply point
to a generic PCI function to perform the reset. Drivers which can't handle their
device getting reset, will simply not have a reset function defined. In this case,
anyone attempting to issue a reset via sysfs will get an error. If a driver
is not loaded, then we can perform the default reset method and fix any device
specific oddities with quirks.

I like keeping pci_block_user_cfg_access a non blocking interface. If it can
return a failure due to some other caller, it should be easy enough to modify
the ipr driver to wait for access to get unblocked before resetting the adapter.

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-30 16:30         ` Brian King
@ 2011-08-30 18:01           ` Michael S. Tsirkin
  2011-08-30 19:41             ` Brian King
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-08-30 18:01 UTC (permalink / raw)
  To: Brian King
  Cc: Jan Kiszka, Jesse Barnes, Brian King, James E.J. Bottomley,
	Hans J. Koch, Greg Kroah-Hartman, linux-pci, linux-kernel,
	linux-scsi, kvm

On Tue, Aug 30, 2011 at 11:30:35AM -0500, Brian King wrote:
> On 08/29/2011 02:18 PM, Michael S. Tsirkin wrote:
> > On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
> >> On 2011-08-29 17:42, Jan Kiszka wrote:
> >>> I still don't get what prevents converting ipr to allow plain mutex
> >>> synchronization. My vision is:
> >>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
> >>
> >> I'm starting to like your proposal: I had a look at ipr, but it turned
> >> out to be anything but trivial to convert that driver. It runs its
> >> complete state machine under spin_lock_irq, and the functions calling
> >> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
> >> hardware to test whatever change, and I feel a bit uncomfortable asking
> >> Brian to redesign his driver that massively.
> >>
> >> So back to your idea: I would generalize pci_block_user_cfg_access to
> >> pci_block_cfg_access. It should fail when some other site already holds
> >> the access lock, but it should remain non-blocking - for the sake of ipr.
> > 
> > It would be easy to have blocking and non-blocking variants.
> > 
> > But
> > - I have no idea whether supporting sysfs config/reset access
> >   while ipr is active makes any sense - I know we need it for uio.
> 
> I really don't think it makes sense. Ideally, I really think the driver
> should be able to override the PCI layer reset interface in sysfs.

Well, it's always possible for root to do silly things
like reset the device from userspace using sysfs config
access. So I don't really see this blocking as necessary:
broken application with access to a physical device will lead
to problems.


> If a driver
> is loaded, the driver owns all the state of that device and resetting it without
> informing the driver is just nasty. Additionally, many devices may have
> much more complex logic to performing a reset than what PCI defines. With
> ipr, for example, it needs to get a shutdown command issued to it prior to the
> reset if at all possible so that the firmware quiesces any I/O it is performing.
> It also needs additional communication prior to resetting the chip to ensure
> the firmware is not modifying its persistent error log on the adapter's flash,
> since resetting the card while the flash segment is being updated will cause
> the adapter to lose the persistent error log. Post reset, ipr has a bunch
> of work to do to get the firmware back up and running to a state where it
> can handle I/O again.
> 
> Different ipr chips also have different requirements as to what
> reset mechanisms defined by PCI actually work. Some chips require BIST to be
> run via PCI config space, while others require a PCI warm reset, otherwise
> the card ends up in an unusable state.
> 
> So, here is my proposal to resolve this particular issue.

As userspace has no place touching the device while
ipr is active, there's no issue with ipr at all, is there?


> Add a reset function
> to the pci_driver struct which would allow drivers to override the default reset
> action. Drivers that can tolerate the existing reset mechanism can simply point
> to a generic PCI function to perform the reset. Drivers which can't handle their
> device getting reset, will simply not have a reset function defined. In this case,
> anyone attempting to issue a reset via sysfs will get an error. If a driver
> is not loaded, then we can perform the default reset method and fix any device
> specific oddities with quirks.
> 
> I like keeping pci_block_user_cfg_access a non blocking interface. If it can
> return a failure due to some other caller, it should be easy enough to modify
> the ipr driver to wait for access to get unblocked before resetting the adapter.
> 
> Thanks,
> 
> Brian

There shouldn't be other callers though, should there?
So it might be enough to fail gracefully (to make debugging
easier) rather than retry.

> -- 
> Brian King
> Linux on Power Virtualization
> IBM Linux Technology Center
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Broken pci_block_user_cfg_access interface
  2011-08-30 18:01           ` Michael S. Tsirkin
@ 2011-08-30 19:41             ` Brian King
  0 siblings, 0 replies; 35+ messages in thread
From: Brian King @ 2011-08-30 19:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Jesse Barnes, Brian King, James E.J. Bottomley,
	Hans J. Koch, Greg Kroah-Hartman, linux-pci, linux-kernel,
	linux-scsi, kvm

On 08/30/2011 01:01 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 30, 2011 at 11:30:35AM -0500, Brian King wrote:
>> On 08/29/2011 02:18 PM, Michael S. Tsirkin wrote:
>>> On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
>>>> On 2011-08-29 17:42, Jan Kiszka wrote:
>>>>> I still don't get what prevents converting ipr to allow plain mutex
>>>>> synchronization. My vision is:
>>>>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>>>>
>>>> I'm starting to like your proposal: I had a look at ipr, but it turned
>>>> out to be anything but trivial to convert that driver. It runs its
>>>> complete state machine under spin_lock_irq, and the functions calling
>>>> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
>>>> hardware to test whatever change, and I feel a bit uncomfortable asking
>>>> Brian to redesign his driver that massively.
>>>>
>>>> So back to your idea: I would generalize pci_block_user_cfg_access to
>>>> pci_block_cfg_access. It should fail when some other site already holds
>>>> the access lock, but it should remain non-blocking - for the sake of ipr.
>>>
>>> It would be easy to have blocking and non-blocking variants.
>>>
>>> But
>>> - I have no idea whether supporting sysfs config/reset access
>>>   while ipr is active makes any sense - I know we need it for uio.
>>
>> I really don't think it makes sense. Ideally, I really think the driver
>> should be able to override the PCI layer reset interface in sysfs.
> 
> Well, it's always possible for root to do silly things
> like reset the device from userspace using sysfs config
> access. So I don't really see this blocking as necessary:
> broken application with access to a physical device will lead
> to problems.

I agree that it is probably not necessary for the current use of the sysfs API,
but if there was interest in expanding it to be usable for other purposes,
we might need to do something like what I am proposing. However, it doesn't
appear that anyone has any immediate need for doing so.

> 
> 
>> If a driver
>> is loaded, the driver owns all the state of that device and resetting it without
>> informing the driver is just nasty. Additionally, many devices may have
>> much more complex logic to performing a reset than what PCI defines. With
>> ipr, for example, it needs to get a shutdown command issued to it prior to the
>> reset if at all possible so that the firmware quiesces any I/O it is performing.
>> It also needs additional communication prior to resetting the chip to ensure
>> the firmware is not modifying its persistent error log on the adapter's flash,
>> since resetting the card while the flash segment is being updated will cause
>> the adapter to lose the persistent error log. Post reset, ipr has a bunch
>> of work to do to get the firmware back up and running to a state where it
>> can handle I/O again.
>>
>> Different ipr chips also have different requirements as to what
>> reset mechanisms defined by PCI actually work. Some chips require BIST to be
>> run via PCI config space, while others require a PCI warm reset, otherwise
>> the card ends up in an unusable state.
>>
>> So, here is my proposal to resolve this particular issue.
> 
> As userspace has no place touching the device while
> ipr is active, there's no issue with ipr at all, is there?

Correct.


>> Add a reset function
>> to the pci_driver struct which would allow drivers to override the default reset
>> action. Drivers that can tolerate the existing reset mechanism can simply point
>> to a generic PCI function to perform the reset. Drivers which can't handle their
>> device getting reset, will simply not have a reset function defined. In this case,
>> anyone attempting to issue a reset via sysfs will get an error. If a driver
>> is not loaded, then we can perform the default reset method and fix any device
>> specific oddities with quirks.
>>
>> I like keeping pci_block_user_cfg_access a non blocking interface. If it can
>> return a failure due to some other caller, it should be easy enough to modify
>> the ipr driver to wait for access to get unblocked before resetting the adapter.
>>
>> Thanks,
>>
>> Brian
> 
> There shouldn't be other callers though, should there?
> So it might be enough to fail gracefully (to make debugging
> easier) rather than retry.

Actually, its probably not worth the complexity to change much of anything in the
ipr driver. With the current users, ipr should be the only caller. If, for some
reason, we were double blocked due to ipr, the ipr driver already handles that today
just fine and having the second block essentially be a noop is fine, since we will
only get a single unblock due to ipr's reset state machine. If another caller is
added in the future, then we will need to make potential changes, but without knowing
the use case, its not clear to me what the proper action would be anyway.

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC] pci: Rework config space blocking services
  2011-08-29 19:18       ` Michael S. Tsirkin
  2011-08-30 16:30         ` Brian King
@ 2011-09-02  7:48         ` Jan Kiszka
  2011-09-06  7:00           ` Michael S. Tsirkin
  2011-09-07 13:46           ` Brian King
  1 sibling, 2 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-09-02  7:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-08-29 21:18, Michael S. Tsirkin wrote:
> On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
>> On 2011-08-29 17:42, Jan Kiszka wrote:
>>> I still don't get what prevents converting ipr to allow plain mutex
>>> synchronization. My vision is:
>>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>>
>> I'm starting to like your proposal: I had a look at ipr, but it turned
>> out to be anything but trivial to convert that driver. It runs its
>> complete state machine under spin_lock_irq, and the functions calling
>> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
>> hardware to test whatever change, and I feel a bit uncomfortable asking
>> Brian to redesign his driver that massively.
>>
>> So back to your idea: I would generalize pci_block_user_cfg_access to
>> pci_block_cfg_access. It should fail when some other site already holds
>> the access lock, but it should remain non-blocking - for the sake of ipr.
> 
> It would be easy to have blocking and non-blocking variants.
> 
> But
> - I have no idea whether supporting sysfs config/reset access
>   while ipr is active makes any sense - I know we need it for uio.
> - reset while uio handles interrupt needs to block, not fail I think
> 

Here is a preview following those ideas. I'll look into generic INTx
masking services now and, if that works out and no concerns are raised,
I'll post it all.

Jan

-----8<-----

pci_block_user_cfg_access was designed for the use case that a single
context, the IPR driver, temporarily delays user space accesses to the
config space via sysfs. This assumption became invalid by the time
pci_dev_reset was added as locking instance. Today, if you run two loops
in parallel that reset the same device via sysfs, you end up with a
kernel BUG as pci_block_user_cfg_access detect the broken assumption.

This reworks the pci_block_user_cfg_access to a sleeping service
pci_block_cfg_access and an atomic variant called
pci_block_cfg_access_in_atomic. The former not only blocks user space
access as before but also waits if access was already blocked. The
latter service just returns an error code in this case, allowing the
caller to resolve the conflict instead of raising a BUG.

---
 drivers/pci/access.c          |   76 +++++++++++++++++++++++++++--------------
 drivers/pci/iov.c             |   12 +++---
 drivers/pci/pci.c             |    4 +-
 drivers/scsi/ipr.c            |   24 +++++++++----
 drivers/uio/uio_pci_generic.c |   10 +++--
 include/linux/pci.h           |   14 +++++---
 6 files changed, 89 insertions(+), 51 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index fdaa42a..640522a 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -127,20 +127,20 @@ EXPORT_SYMBOL(pci_write_vpd);
  * We have a bit per device to indicate it's blocked and a global wait queue
  * for callers to sleep on until devices are unblocked.
  */
-static DECLARE_WAIT_QUEUE_HEAD(pci_ucfg_wait);
+static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
 
-static noinline void pci_wait_ucfg(struct pci_dev *dev)
+static noinline void pci_wait_cfg(struct pci_dev *dev)
 {
 	DECLARE_WAITQUEUE(wait, current);
 
-	__add_wait_queue(&pci_ucfg_wait, &wait);
+	__add_wait_queue(&pci_cfg_wait, &wait);
 	do {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		raw_spin_unlock_irq(&pci_lock);
 		schedule();
 		raw_spin_lock_irq(&pci_lock);
-	} while (dev->block_ucfg_access);
-	__remove_wait_queue(&pci_ucfg_wait, &wait);
+	} while (dev->block_cfg_access);
+	__remove_wait_queue(&pci_cfg_wait, &wait);
 }
 
 /* Returns 0 on success, negative values indicate error. */
@@ -153,7 +153,8 @@ int pci_user_read_config_##size						\
 	if (PCI_##size##_BAD)						\
 		return -EINVAL;						\
 	raw_spin_lock_irq(&pci_lock);				\
-	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
+	if (unlikely(dev->block_cfg_access))				\
+		pci_wait_cfg(dev);					\
 	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
 					pos, sizeof(type), &data);	\
 	raw_spin_unlock_irq(&pci_lock);				\
@@ -172,7 +173,8 @@ int pci_user_write_config_##size					\
 	if (PCI_##size##_BAD)						\
 		return -EINVAL;						\
 	raw_spin_lock_irq(&pci_lock);				\
-	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
+	if (unlikely(dev->block_cfg_access))				\
+		pci_wait_cfg(dev);					\
 	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
 					pos, sizeof(type), val);	\
 	raw_spin_unlock_irq(&pci_lock);				\
@@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
 EXPORT_SYMBOL(pci_vpd_truncate);
 
 /**
- * pci_block_user_cfg_access - Block userspace PCI config reads/writes
+ * pci_block_cfg_access - Block PCI config reads/writes
  * @dev:	pci device struct
  *
- * When user access is blocked, any reads or writes to config space will
- * sleep until access is unblocked again.  We don't allow nesting of
- * block/unblock calls.
+ * When access is blocked, any userspace reads or writes to config space
+ * and concurrent block requests will sleep until
+ * access is unblocked again.
  */
-void pci_block_user_cfg_access(struct pci_dev *dev)
+void pci_block_cfg_access(struct pci_dev *dev)
 {
 	unsigned long flags;
-	int was_blocked;
+
+	might_sleep();
+
+	raw_spin_lock_irqsave(&pci_lock, flags);
+	if (dev->block_cfg_access)
+		pci_wait_cfg(dev);
+	dev->block_cfg_access = 1;
+	raw_spin_unlock_irqrestore(&pci_lock, flags);
+}
+EXPORT_SYMBOL_GPL(pci_block_cfg_access);
+
+/**
+ * pci_block_cfg_access_in_atomic - Block PCI config reads/writes from atomic
+ *                                  context
+ * @dev:	pci device struct
+ *
+ * Same as pci_block_cfg_access, but will fail with -EBUSY if access is
+ * already blocked.
+ */
+int pci_block_cfg_access_in_atomic(struct pci_dev *dev)
+{
+	unsigned long flags;
+	int err = 0;
 
 	raw_spin_lock_irqsave(&pci_lock, flags);
-	was_blocked = dev->block_ucfg_access;
-	dev->block_ucfg_access = 1;
+	if (dev->block_cfg_access)
+		err = -EBUSY;
+	else
+		dev->block_cfg_access = 1;
 	raw_spin_unlock_irqrestore(&pci_lock, flags);
 
-	/* If we BUG() inside the pci_lock, we're guaranteed to hose
-	 * the machine */
-	BUG_ON(was_blocked);
+	return err;
 }
-EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
+EXPORT_SYMBOL_GPL(pci_block_cfg_access_in_atomic);
 
 /**
- * pci_unblock_user_cfg_access - Unblock userspace PCI config reads/writes
+ * pci_unblock_cfg_access - Unblock PCI config reads/writes
  * @dev:	pci device struct
  *
- * This function allows userspace PCI config accesses to resume.
+ * This function allows PCI config accesses to resume.
  */
-void pci_unblock_user_cfg_access(struct pci_dev *dev)
+void pci_unblock_cfg_access(struct pci_dev *dev)
 {
 	unsigned long flags;
 
@@ -438,10 +462,10 @@ void pci_unblock_user_cfg_access(struct pci_dev *dev)
 
 	/* This indicates a problem in the caller, but we don't need
 	 * to kill them, unlike a double-block above. */
-	WARN_ON(!dev->block_ucfg_access);
+	WARN_ON(!dev->block_cfg_access);
 
-	dev->block_ucfg_access = 0;
-	wake_up_all(&pci_ucfg_wait);
+	dev->block_cfg_access = 0;
+	wake_up_all(&pci_cfg_wait);
 	raw_spin_unlock_irqrestore(&pci_lock, flags);
 }
-EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
+EXPORT_SYMBOL_GPL(pci_unblock_cfg_access);
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 42fae47..a061847 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -340,10 +340,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 	}
 
 	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
-	pci_block_user_cfg_access(dev);
+	pci_block_cfg_access(dev);
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	msleep(100);
-	pci_unblock_user_cfg_access(dev);
+	pci_unblock_cfg_access(dev);
 
 	iov->initial = initial;
 	if (nr_virtfn < initial)
@@ -371,10 +371,10 @@ failed:
 		virtfn_remove(dev, j, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
-	pci_block_user_cfg_access(dev);
+	pci_block_cfg_access(dev);
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	ssleep(1);
-	pci_unblock_user_cfg_access(dev);
+	pci_unblock_cfg_access(dev);
 
 	if (iov->link != dev->devfn)
 		sysfs_remove_link(&dev->dev.kobj, "dep_link");
@@ -397,10 +397,10 @@ static void sriov_disable(struct pci_dev *dev)
 		virtfn_remove(dev, i, 0);
 
 	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
-	pci_block_user_cfg_access(dev);
+	pci_block_cfg_access(dev);
 	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
 	ssleep(1);
-	pci_unblock_user_cfg_access(dev);
+	pci_unblock_cfg_access(dev);
 
 	if (iov->link != dev->devfn)
 		sysfs_remove_link(&dev->dev.kobj, "dep_link");
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0ce6742..3c6ce05 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2959,7 +2959,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
 	might_sleep();
 
 	if (!probe) {
-		pci_block_user_cfg_access(dev);
+		pci_block_cfg_access(dev);
 		/* block PM suspend, driver probe, etc. */
 		device_lock(&dev->dev);
 	}
@@ -2984,7 +2984,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
 done:
 	if (!probe) {
 		device_unlock(&dev->dev);
-		pci_unblock_user_cfg_access(dev);
+		pci_unblock_cfg_access(dev);
 	}
 
 	return rc;
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 8d63630..7fc3861 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -7640,7 +7640,7 @@ static int ipr_reset_restore_cfg_space(struct ipr_cmnd *ipr_cmd)
 static int ipr_reset_bist_done(struct ipr_cmnd *ipr_cmd)
 {
 	ENTER;
-	pci_unblock_user_cfg_access(ipr_cmd->ioa_cfg->pdev);
+	pci_unblock_cfg_access(ipr_cmd->ioa_cfg->pdev);
 	ipr_cmd->job_step = ipr_reset_restore_cfg_space;
 	LEAVE;
 	return IPR_RC_JOB_CONTINUE;
@@ -7661,7 +7661,8 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
 	int rc = PCIBIOS_SUCCESSFUL;
 
 	ENTER;
-	pci_block_user_cfg_access(ioa_cfg->pdev);
+	if (pci_block_cfg_access_in_atomic(ioa_cfg->pdev) != 0)
+		goto error;
 
 	if (ioa_cfg->ipr_chip->bist_method == IPR_MMIO)
 		writel(IPR_UPROCI_SIS64_START_BIST,
@@ -7674,7 +7675,8 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
 		ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_BIST_TIMEOUT);
 		rc = IPR_RC_JOB_RETURN;
 	} else {
-		pci_unblock_user_cfg_access(ipr_cmd->ioa_cfg->pdev);
+		pci_unblock_cfg_access(ipr_cmd->ioa_cfg->pdev);
+error:
 		ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
 		rc = IPR_RC_JOB_CONTINUE;
 	}
@@ -7715,14 +7717,20 @@ static int ipr_reset_slot_reset(struct ipr_cmnd *ipr_cmd)
 {
 	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
 	struct pci_dev *pdev = ioa_cfg->pdev;
+	int rc = IPR_RC_JOB_RETURN;
 
 	ENTER;
-	pci_block_user_cfg_access(pdev);
-	pci_set_pcie_reset_state(pdev, pcie_warm_reset);
-	ipr_cmd->job_step = ipr_reset_slot_reset_done;
-	ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
+	if (pci_block_cfg_access_in_atomic(pdev) == 0) {
+		pci_set_pcie_reset_state(pdev, pcie_warm_reset);
+		ipr_cmd->job_step = ipr_reset_slot_reset_done;
+		ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
+	} else {
+		ipr_cmd->s.ioasa.hdr.ioasc =
+			cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
+		rc = IPR_RC_JOB_CONTINUE;
+	}
 	LEAVE;
-	return IPR_RC_JOB_RETURN;
+	return rc;
 }
 
 /**
diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
index fc22e1e..642f7fa 100644
--- a/drivers/uio/uio_pci_generic.c
+++ b/drivers/uio/uio_pci_generic.c
@@ -58,7 +58,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
 	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
 
 	spin_lock_irq(&gdev->lock);
-	pci_block_user_cfg_access(pdev);
+	if (pci_block_cfg_access_in_atomic(pdev) != 0)
+		goto error;
 
 	/* Read both command and status registers in a single 32-bit operation.
 	 * Note: we could cache the value for command and move the status read
@@ -82,7 +83,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
 	ret = IRQ_HANDLED;
 done:
 
-	pci_unblock_user_cfg_access(pdev);
+	pci_unblock_cfg_access(pdev);
+error:
 	spin_unlock_irq(&gdev->lock);
 	return ret;
 }
@@ -95,7 +97,7 @@ static int __devinit verify_pci_2_3(struct pci_dev *pdev)
 	u16 orig, new;
 	int err = 0;
 
-	pci_block_user_cfg_access(pdev);
+	pci_block_cfg_access(pdev);
 	pci_read_config_word(pdev, PCI_COMMAND, &orig);
 	pci_write_config_word(pdev, PCI_COMMAND,
 			      orig ^ PCI_COMMAND_INTX_DISABLE);
@@ -118,7 +120,7 @@ static int __devinit verify_pci_2_3(struct pci_dev *pdev)
 	/* Now restore the original value. */
 	pci_write_config_word(pdev, PCI_COMMAND, orig);
 err:
-	pci_unblock_user_cfg_access(pdev);
+	pci_unblock_cfg_access(pdev);
 	return err;
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8c230cb..3414b74 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -305,7 +305,7 @@ struct pci_dev {
 	unsigned int	is_added:1;
 	unsigned int	is_busmaster:1; /* device is busmaster */
 	unsigned int	no_msi:1;	/* device may not use msi */
-	unsigned int	block_ucfg_access:1;	/* userspace config space access is blocked */
+	unsigned int	block_cfg_access:1;	/* config space access is blocked */
 	unsigned int	broken_parity_status:1;	/* Device generates false positive parity */
 	unsigned int	irq_reroute_variant:2;	/* device needs IRQ rerouting variant */
 	unsigned int 	msi_enabled:1;
@@ -1079,8 +1079,9 @@ int  ht_create_irq(struct pci_dev *dev, int idx);
 void ht_destroy_irq(unsigned int irq);
 #endif /* CONFIG_HT_IRQ */
 
-extern void pci_block_user_cfg_access(struct pci_dev *dev);
-extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
+extern void pci_block_cfg_access(struct pci_dev *dev);
+extern int pci_block_cfg_access_in_atomic(struct pci_dev *dev);
+extern void pci_unblock_cfg_access(struct pci_dev *dev);
 
 /*
  * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
@@ -1277,10 +1278,13 @@ static inline void pci_release_regions(struct pci_dev *dev)
 
 #define pci_dma_burst_advice(pdev, strat, strategy_parameter) do { } while (0)
 
-static inline void pci_block_user_cfg_access(struct pci_dev *dev)
+static inline void pci_block_cfg_access(struct pci_dev *dev)
 { }
 
-static inline void pci_unblock_user_cfg_access(struct pci_dev *dev)
+static inline int pci_block_cfg_access_in_atomic(struct pci_dev *dev)
+{ return 0; }
+
+static inline void pci_unblock_cfg_access(struct pci_dev *dev)
 { }
 
 static inline struct pci_bus *pci_find_next_bus(const struct pci_bus *from)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-02  7:48         ` [RFC] pci: Rework config space blocking services Jan Kiszka
@ 2011-09-06  7:00           ` Michael S. Tsirkin
  2011-09-06  7:18             ` Jan Kiszka
  2011-09-07 13:46           ` Brian King
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-09-06  7:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On Fri, Sep 02, 2011 at 09:48:33AM +0200, Jan Kiszka wrote:
> On 2011-08-29 21:18, Michael S. Tsirkin wrote:
> > On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
> >> On 2011-08-29 17:42, Jan Kiszka wrote:
> >>> I still don't get what prevents converting ipr to allow plain mutex
> >>> synchronization. My vision is:
> >>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
> >>
> >> I'm starting to like your proposal: I had a look at ipr, but it turned
> >> out to be anything but trivial to convert that driver. It runs its
> >> complete state machine under spin_lock_irq, and the functions calling
> >> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
> >> hardware to test whatever change, and I feel a bit uncomfortable asking
> >> Brian to redesign his driver that massively.
> >>
> >> So back to your idea: I would generalize pci_block_user_cfg_access to
> >> pci_block_cfg_access. It should fail when some other site already holds
> >> the access lock, but it should remain non-blocking - for the sake of ipr.
> > 
> > It would be easy to have blocking and non-blocking variants.
> > 
> > But
> > - I have no idea whether supporting sysfs config/reset access
> >   while ipr is active makes any sense - I know we need it for uio.
> > - reset while uio handles interrupt needs to block, not fail I think
> > 
> 
> Here is a preview following those ideas. I'll look into generic INTx
> masking services now and, if that works out and no concerns are raised,
> I'll post it all.
> 
> Jan

Hopefully as separate patches :)

No real concerns, some nitpicking comments below.

> -----8<-----
> 
> pci_block_user_cfg_access was designed for the use case that a single
> context, the IPR driver, temporarily delays user space accesses to the
> config space via sysfs. This assumption became invalid by the time
> pci_dev_reset was added as locking instance. Today, if you run two loops
> in parallel that reset the same device via sysfs, you end up with a
> kernel BUG as pci_block_user_cfg_access detect the broken assumption.
> 
> This reworks the pci_block_user_cfg_access to a sleeping service
> pci_block_cfg_access and an atomic variant called
> pci_block_cfg_access_in_atomic. The former not only blocks user space
> access as before but also waits if access was already blocked. The
> latter service just returns an error code in this case, allowing the
> caller to resolve the conflict instead of raising a BUG.

I'm not sure I understand the point of the API renaming -
the new names seem less clear than the original, to me.
Regular config access isn't blocked by this API - it still only
blocks user config accesses, we simply allow
multiple block calls in parallel now.

If we keep the old name, simply allow blocking
and add an atomic variant, the patch will be much smaller.


> 
> ---
>  drivers/pci/access.c          |   76 +++++++++++++++++++++++++++--------------
>  drivers/pci/iov.c             |   12 +++---
>  drivers/pci/pci.c             |    4 +-
>  drivers/scsi/ipr.c            |   24 +++++++++----
>  drivers/uio/uio_pci_generic.c |   10 +++--
>  include/linux/pci.h           |   14 +++++---
>  6 files changed, 89 insertions(+), 51 deletions(-)

Below might be easier to review if it is split in two:
1. rename ucfg to cfg all over, tweak whitespace
2. allow multiple block calls, add in_atomic and update
   in_atomic callers

> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index fdaa42a..640522a 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -127,20 +127,20 @@ EXPORT_SYMBOL(pci_write_vpd);
>   * We have a bit per device to indicate it's blocked and a global wait queue
>   * for callers to sleep on until devices are unblocked.
>   */
> -static DECLARE_WAIT_QUEUE_HEAD(pci_ucfg_wait);
> +static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
>  
> -static noinline void pci_wait_ucfg(struct pci_dev *dev)
> +static noinline void pci_wait_cfg(struct pci_dev *dev)
>  {
>  	DECLARE_WAITQUEUE(wait, current);
>  
> -	__add_wait_queue(&pci_ucfg_wait, &wait);
> +	__add_wait_queue(&pci_cfg_wait, &wait);
>  	do {
>  		set_current_state(TASK_UNINTERRUPTIBLE);
>  		raw_spin_unlock_irq(&pci_lock);
>  		schedule();
>  		raw_spin_lock_irq(&pci_lock);
> -	} while (dev->block_ucfg_access);
> -	__remove_wait_queue(&pci_ucfg_wait, &wait);
> +	} while (dev->block_cfg_access);
> +	__remove_wait_queue(&pci_cfg_wait, &wait);
>  }
>  
>  /* Returns 0 on success, negative values indicate error. */
> @@ -153,7 +153,8 @@ int pci_user_read_config_##size						\
>  	if (PCI_##size##_BAD)						\
>  		return -EINVAL;						\
>  	raw_spin_lock_irq(&pci_lock);				\
> -	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
> +	if (unlikely(dev->block_cfg_access))				\
> +		pci_wait_cfg(dev);					\
>  	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
>  					pos, sizeof(type), &data);	\
>  	raw_spin_unlock_irq(&pci_lock);				\
> @@ -172,7 +173,8 @@ int pci_user_write_config_##size					\
>  	if (PCI_##size##_BAD)						\
>  		return -EINVAL;						\
>  	raw_spin_lock_irq(&pci_lock);				\
> -	if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);	\
> +	if (unlikely(dev->block_cfg_access))				\
> +		pci_wait_cfg(dev);					\
>  	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
>  					pos, sizeof(type), val);	\
>  	raw_spin_unlock_irq(&pci_lock);				\
> @@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
>  EXPORT_SYMBOL(pci_vpd_truncate);
>  
>  /**
> - * pci_block_user_cfg_access - Block userspace PCI config reads/writes
> + * pci_block_cfg_access - Block PCI config reads/writes

This comment seems confusing. We don't in fact block all config
reads writes. Instead we block userspace accesses and
concurrent block requests.

>   * @dev:	pci device struct
>   *
> - * When user access is blocked, any reads or writes to config space will
> - * sleep until access is unblocked again.  We don't allow nesting of
> - * block/unblock calls.
> + * When access is blocked, any userspace reads or writes to config space
> + * and concurrent block requests will sleep until
> + * access is unblocked again.
>   */
> -void pci_block_user_cfg_access(struct pci_dev *dev)
> +void pci_block_cfg_access(struct pci_dev *dev)
>  {
>  	unsigned long flags;
> -	int was_blocked;
> +
> +	might_sleep();
> +
> +	raw_spin_lock_irqsave(&pci_lock, flags);
> +	if (dev->block_cfg_access)
> +		pci_wait_cfg(dev);
> +	dev->block_cfg_access = 1;
> +	raw_spin_unlock_irqrestore(&pci_lock, flags);

Above can sleep so irq must be enabled, thus
it can be raw_spin_lock_irq, right?

> +}
> +EXPORT_SYMBOL_GPL(pci_block_cfg_access);
> +
> +/**
> + * pci_block_cfg_access_in_atomic - Block PCI config reads/writes from atomic
> + *                                  context
> + * @dev:	pci device struct
> + *
> + * Same as pci_block_cfg_access, but will fail with -EBUSY if access is
> + * already blocked.

Mention return value on success? Callers seem to rely on it being 0.

> + */
> +int pci_block_cfg_access_in_atomic(struct pci_dev *dev)


I would rename this pci_block_cfg_access_atomic: tell
the user what it does, not where it should be called
from. Compare with kmap_atomic, which fails if it needs to block.

> +{
> +	unsigned long flags;
> +	int err = 0;
>  
>  	raw_spin_lock_irqsave(&pci_lock, flags);
> -	was_blocked = dev->block_ucfg_access;
> -	dev->block_ucfg_access = 1;
> +	if (dev->block_cfg_access)
> +		err = -EBUSY;
> +	else
> +		dev->block_cfg_access = 1;
>  	raw_spin_unlock_irqrestore(&pci_lock, flags);
>  
> -	/* If we BUG() inside the pci_lock, we're guaranteed to hose
> -	 * the machine */
> -	BUG_ON(was_blocked);
> +	return err;
>  }
> -EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
> +EXPORT_SYMBOL_GPL(pci_block_cfg_access_in_atomic);
>  
>  /**
> - * pci_unblock_user_cfg_access - Unblock userspace PCI config reads/writes
> + * pci_unblock_cfg_access - Unblock PCI config reads/writes
>   * @dev:	pci device struct
>   *
> - * This function allows userspace PCI config accesses to resume.
> + * This function allows PCI config accesses to resume.
>   */
> -void pci_unblock_user_cfg_access(struct pci_dev *dev)
> +void pci_unblock_cfg_access(struct pci_dev *dev)
>  {
>  	unsigned long flags;
>  
> @@ -438,10 +462,10 @@ void pci_unblock_user_cfg_access(struct pci_dev *dev)
>  
>  	/* This indicates a problem in the caller, but we don't need
>  	 * to kill them, unlike a double-block above. */
> -	WARN_ON(!dev->block_ucfg_access);
> +	WARN_ON(!dev->block_cfg_access);
>  
> -	dev->block_ucfg_access = 0;
> -	wake_up_all(&pci_ucfg_wait);
> +	dev->block_cfg_access = 0;
> +	wake_up_all(&pci_cfg_wait);
>  	raw_spin_unlock_irqrestore(&pci_lock, flags);
>  }
> -EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
> +EXPORT_SYMBOL_GPL(pci_unblock_cfg_access);
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 42fae47..a061847 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -340,10 +340,10 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
>  	}
>  
>  	iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
> -	pci_block_user_cfg_access(dev);
> +	pci_block_cfg_access(dev);
>  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>  	msleep(100);
> -	pci_unblock_user_cfg_access(dev);
> +	pci_unblock_cfg_access(dev);
>  
>  	iov->initial = initial;
>  	if (nr_virtfn < initial)
> @@ -371,10 +371,10 @@ failed:
>  		virtfn_remove(dev, j, 0);
>  
>  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> -	pci_block_user_cfg_access(dev);
> +	pci_block_cfg_access(dev);
>  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>  	ssleep(1);
> -	pci_unblock_user_cfg_access(dev);
> +	pci_unblock_cfg_access(dev);
>  
>  	if (iov->link != dev->devfn)
>  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
> @@ -397,10 +397,10 @@ static void sriov_disable(struct pci_dev *dev)
>  		virtfn_remove(dev, i, 0);
>  
>  	iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> -	pci_block_user_cfg_access(dev);
> +	pci_block_cfg_access(dev);
>  	pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>  	ssleep(1);
> -	pci_unblock_user_cfg_access(dev);
> +	pci_unblock_cfg_access(dev);
>  
>  	if (iov->link != dev->devfn)
>  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 0ce6742..3c6ce05 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2959,7 +2959,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
>  	might_sleep();
>  
>  	if (!probe) {
> -		pci_block_user_cfg_access(dev);
> +		pci_block_cfg_access(dev);
>  		/* block PM suspend, driver probe, etc. */
>  		device_lock(&dev->dev);
>  	}
> @@ -2984,7 +2984,7 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
>  done:
>  	if (!probe) {
>  		device_unlock(&dev->dev);
> -		pci_unblock_user_cfg_access(dev);
> +		pci_unblock_cfg_access(dev);
>  	}
>  
>  	return rc;
> diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
> index 8d63630..7fc3861 100644
> --- a/drivers/scsi/ipr.c
> +++ b/drivers/scsi/ipr.c
> @@ -7640,7 +7640,7 @@ static int ipr_reset_restore_cfg_space(struct ipr_cmnd *ipr_cmd)
>  static int ipr_reset_bist_done(struct ipr_cmnd *ipr_cmd)
>  {
>  	ENTER;
> -	pci_unblock_user_cfg_access(ipr_cmd->ioa_cfg->pdev);
> +	pci_unblock_cfg_access(ipr_cmd->ioa_cfg->pdev);
>  	ipr_cmd->job_step = ipr_reset_restore_cfg_space;
>  	LEAVE;
>  	return IPR_RC_JOB_CONTINUE;
> @@ -7661,7 +7661,8 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
>  	int rc = PCIBIOS_SUCCESSFUL;
>  
>  	ENTER;
> -	pci_block_user_cfg_access(ioa_cfg->pdev);
> +	if (pci_block_cfg_access_in_atomic(ioa_cfg->pdev) != 0)
> +		goto error;

if (pci_block_cfg_access_in_atomic(ioa_cfg->pdev))
to be consistent in style with the rest of ipr.

>  
>  	if (ioa_cfg->ipr_chip->bist_method == IPR_MMIO)
>  		writel(IPR_UPROCI_SIS64_START_BIST,
> @@ -7674,7 +7675,8 @@ static int ipr_reset_start_bist(struct ipr_cmnd *ipr_cmd)
>  		ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_BIST_TIMEOUT);
>  		rc = IPR_RC_JOB_RETURN;
>  	} else {
> -		pci_unblock_user_cfg_access(ipr_cmd->ioa_cfg->pdev);
> +		pci_unblock_cfg_access(ipr_cmd->ioa_cfg->pdev);
> +error:
>  		ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
>  		rc = IPR_RC_JOB_CONTINUE;
>  	}
> @@ -7715,14 +7717,20 @@ static int ipr_reset_slot_reset(struct ipr_cmnd *ipr_cmd)
>  {
>  	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
>  	struct pci_dev *pdev = ioa_cfg->pdev;
> +	int rc = IPR_RC_JOB_RETURN;
>  
>  	ENTER;
> -	pci_block_user_cfg_access(pdev);
> -	pci_set_pcie_reset_state(pdev, pcie_warm_reset);
> -	ipr_cmd->job_step = ipr_reset_slot_reset_done;
> -	ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
> +	if (pci_block_cfg_access_in_atomic(pdev) == 0) {
> +		pci_set_pcie_reset_state(pdev, pcie_warm_reset);
> +		ipr_cmd->job_step = ipr_reset_slot_reset_done;
> +		ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
> +	} else {
> +		ipr_cmd->s.ioasa.hdr.ioasc =
> +			cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
> +		rc = IPR_RC_JOB_CONTINUE;
> +	}

Above is unusual.
if (pci_block_cfg_access_in_atomic(pdev)) {
	...
	rc = IPR_RC_JOB_RETURN;
} else {
	...
	rc = IPR_RC_JOB_CONTINUE;
}

would be clearer.


>  	LEAVE;
> -	return IPR_RC_JOB_RETURN;
> +	return rc;
>  }
>  
>  /**
> diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
> index fc22e1e..642f7fa 100644
> --- a/drivers/uio/uio_pci_generic.c
> +++ b/drivers/uio/uio_pci_generic.c
> @@ -58,7 +58,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
>  	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
>  
>  	spin_lock_irq(&gdev->lock);
> -	pci_block_user_cfg_access(pdev);
> +	if (pci_block_cfg_access_in_atomic(pdev) != 0)

if (pci_block_cfg_access_in_atomic(pdev))
would be more in style with the rest of uio.

> +		goto error;
>  
>  	/* Read both command and status registers in a single 32-bit operation.
>  	 * Note: we could cache the value for command and move the status read
> @@ -82,7 +83,8 @@ static irqreturn_t irqhandler(int irq, struct uio_info *info)
>  	ret = IRQ_HANDLED;
>  done:
>  
> -	pci_unblock_user_cfg_access(pdev);
> +	pci_unblock_cfg_access(pdev);
> +error:
>  	spin_unlock_irq(&gdev->lock);
>  	return ret;
>  }
> @@ -95,7 +97,7 @@ static int __devinit verify_pci_2_3(struct pci_dev *pdev)
>  	u16 orig, new;
>  	int err = 0;
>  
> -	pci_block_user_cfg_access(pdev);
> +	pci_block_cfg_access(pdev);
>  	pci_read_config_word(pdev, PCI_COMMAND, &orig);
>  	pci_write_config_word(pdev, PCI_COMMAND,
>  			      orig ^ PCI_COMMAND_INTX_DISABLE);
> @@ -118,7 +120,7 @@ static int __devinit verify_pci_2_3(struct pci_dev *pdev)
>  	/* Now restore the original value. */
>  	pci_write_config_word(pdev, PCI_COMMAND, orig);
>  err:
> -	pci_unblock_user_cfg_access(pdev);
> +	pci_unblock_cfg_access(pdev);
>  	return err;
>  }
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 8c230cb..3414b74 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -305,7 +305,7 @@ struct pci_dev {
>  	unsigned int	is_added:1;
>  	unsigned int	is_busmaster:1; /* device is busmaster */
>  	unsigned int	no_msi:1;	/* device may not use msi */
> -	unsigned int	block_ucfg_access:1;	/* userspace config space access is blocked */
> +	unsigned int	block_cfg_access:1;	/* config space access is blocked */
>  	unsigned int	broken_parity_status:1;	/* Device generates false positive parity */
>  	unsigned int	irq_reroute_variant:2;	/* device needs IRQ rerouting variant */
>  	unsigned int 	msi_enabled:1;
> @@ -1079,8 +1079,9 @@ int  ht_create_irq(struct pci_dev *dev, int idx);
>  void ht_destroy_irq(unsigned int irq);
>  #endif /* CONFIG_HT_IRQ */
>  
> -extern void pci_block_user_cfg_access(struct pci_dev *dev);
> -extern void pci_unblock_user_cfg_access(struct pci_dev *dev);
> +extern void pci_block_cfg_access(struct pci_dev *dev);
> +extern int pci_block_cfg_access_in_atomic(struct pci_dev *dev);
> +extern void pci_unblock_cfg_access(struct pci_dev *dev);
>  
>  /*
>   * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
> @@ -1277,10 +1278,13 @@ static inline void pci_release_regions(struct pci_dev *dev)
>  
>  #define pci_dma_burst_advice(pdev, strat, strategy_parameter) do { } while (0)
>  
> -static inline void pci_block_user_cfg_access(struct pci_dev *dev)
> +static inline void pci_block_cfg_access(struct pci_dev *dev)
>  { }
>  
> -static inline void pci_unblock_user_cfg_access(struct pci_dev *dev)
> +static inline int pci_block_cfg_access_in_atomic(struct pci_dev *dev)
> +{ return 0; }
> +
> +static inline void pci_unblock_cfg_access(struct pci_dev *dev)
>  { }
>  
>  static inline struct pci_bus *pci_find_next_bus(const struct pci_bus *from)
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-06  7:00           ` Michael S. Tsirkin
@ 2011-09-06  7:18             ` Jan Kiszka
  2011-09-06  8:04               ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-09-06  7:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-09-06 09:00, Michael S. Tsirkin wrote:
> On Fri, Sep 02, 2011 at 09:48:33AM +0200, Jan Kiszka wrote:
>> On 2011-08-29 21:18, Michael S. Tsirkin wrote:
>>> On Mon, Aug 29, 2011 at 08:47:07PM +0200, Jan Kiszka wrote:
>>>> On 2011-08-29 17:42, Jan Kiszka wrote:
>>>>> I still don't get what prevents converting ipr to allow plain mutex
>>>>> synchronization. My vision is:
>>>>>  - push reset-on-error of ipr into workqueue (or threaded IRQ?)
>>>>
>>>> I'm starting to like your proposal: I had a look at ipr, but it turned
>>>> out to be anything but trivial to convert that driver. It runs its
>>>> complete state machine under spin_lock_irq, and the functions calling
>>>> pci_block/unblock_user_cfg_access are deep inside this thing. I have no
>>>> hardware to test whatever change, and I feel a bit uncomfortable asking
>>>> Brian to redesign his driver that massively.
>>>>
>>>> So back to your idea: I would generalize pci_block_user_cfg_access to
>>>> pci_block_cfg_access. It should fail when some other site already holds
>>>> the access lock, but it should remain non-blocking - for the sake of ipr.
>>>
>>> It would be easy to have blocking and non-blocking variants.
>>>
>>> But
>>> - I have no idea whether supporting sysfs config/reset access
>>>   while ipr is active makes any sense - I know we need it for uio.
>>> - reset while uio handles interrupt needs to block, not fail I think
>>>
>>
>> Here is a preview following those ideas. I'll look into generic INTx
>> masking services now and, if that works out and no concerns are raised,
>> I'll post it all.
>>
>> Jan
> 
> Hopefully as separate patches :)

For sure. :)

> 
> No real concerns, some nitpicking comments below.
> 
>> -----8<-----
>>
>> pci_block_user_cfg_access was designed for the use case that a single
>> context, the IPR driver, temporarily delays user space accesses to the
>> config space via sysfs. This assumption became invalid by the time
>> pci_dev_reset was added as locking instance. Today, if you run two loops
>> in parallel that reset the same device via sysfs, you end up with a
>> kernel BUG as pci_block_user_cfg_access detect the broken assumption.
>>
>> This reworks the pci_block_user_cfg_access to a sleeping service
>> pci_block_cfg_access and an atomic variant called
>> pci_block_cfg_access_in_atomic. The former not only blocks user space
>> access as before but also waits if access was already blocked. The
>> latter service just returns an error code in this case, allowing the
>> caller to resolve the conflict instead of raising a BUG.
> 
> I'm not sure I understand the point of the API renaming -
> the new names seem less clear than the original, to me.
> Regular config access isn't blocked by this API - it still only
> blocks user config accesses, we simply allow
> multiple block calls in parallel now.

It synchronizes everyone calling pci_block_cfg_access + sysfs access. So
this is no cosmetic renaming but something that reflects the key change
in the semantics IMO.

> 
> If we keep the old name, simply allow blocking
> and add an atomic variant, the patch will be much smaller.
> 
> 
>>
>> ---
>>  drivers/pci/access.c          |   76 +++++++++++++++++++++++++++--------------
>>  drivers/pci/iov.c             |   12 +++---
>>  drivers/pci/pci.c             |    4 +-
>>  drivers/scsi/ipr.c            |   24 +++++++++----
>>  drivers/uio/uio_pci_generic.c |   10 +++--
>>  include/linux/pci.h           |   14 +++++---
>>  6 files changed, 89 insertions(+), 51 deletions(-)
> 
> Below might be easier to review if it is split in two:
> 1. rename ucfg to cfg all over, tweak whitespace
> 2. allow multiple block calls, add in_atomic and update
>    in_atomic callers

As explained above, there is a strong relation between behavioral change
and API renaming in my eyes.

> 
>>
>> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
>> index fdaa42a..640522a 100644
>> --- a/drivers/pci/access.c
>> +++ b/drivers/pci/access.c
>> @@ -127,20 +127,20 @@ EXPORT_SYMBOL(pci_write_vpd);
>>   * We have a bit per device to indicate it's blocked and a global wait queue
>>   * for callers to sleep on until devices are unblocked.
>>   */
>> -static DECLARE_WAIT_QUEUE_HEAD(pci_ucfg_wait);
>> +static DECLARE_WAIT_QUEUE_HEAD(pci_cfg_wait);
>>
>> -static noinline void pci_wait_ucfg(struct pci_dev *dev)
>> +static noinline void pci_wait_cfg(struct pci_dev *dev)
>>  {
>>       DECLARE_WAITQUEUE(wait, current);
>>
>> -     __add_wait_queue(&pci_ucfg_wait, &wait);
>> +     __add_wait_queue(&pci_cfg_wait, &wait);
>>       do {
>>               set_current_state(TASK_UNINTERRUPTIBLE);
>>               raw_spin_unlock_irq(&pci_lock);
>>               schedule();
>>               raw_spin_lock_irq(&pci_lock);
>> -     } while (dev->block_ucfg_access);
>> -     __remove_wait_queue(&pci_ucfg_wait, &wait);
>> +     } while (dev->block_cfg_access);
>> +     __remove_wait_queue(&pci_cfg_wait, &wait);
>>  }
>>
>>  /* Returns 0 on success, negative values indicate error. */
>> @@ -153,7 +153,8 @@ int pci_user_read_config_##size                                           \
>>       if (PCI_##size##_BAD)                                           \
>>               return -EINVAL;                                         \
>>       raw_spin_lock_irq(&pci_lock);                           \
>> -     if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);       \
>> +     if (unlikely(dev->block_cfg_access))                            \
>> +             pci_wait_cfg(dev);                                      \
>>       ret = dev->bus->ops->read(dev->bus, dev->devfn,                 \
>>                                       pos, sizeof(type), &data);      \
>>       raw_spin_unlock_irq(&pci_lock);                         \
>> @@ -172,7 +173,8 @@ int pci_user_write_config_##size                                  \
>>       if (PCI_##size##_BAD)                                           \
>>               return -EINVAL;                                         \
>>       raw_spin_lock_irq(&pci_lock);                           \
>> -     if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev);       \
>> +     if (unlikely(dev->block_cfg_access))                            \
>> +             pci_wait_cfg(dev);                                      \
>>       ret = dev->bus->ops->write(dev->bus, dev->devfn,                \
>>                                       pos, sizeof(type), val);        \
>>       raw_spin_unlock_irq(&pci_lock);                         \
>> @@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
>>  EXPORT_SYMBOL(pci_vpd_truncate);
>>
>>  /**
>> - * pci_block_user_cfg_access - Block userspace PCI config reads/writes
>> + * pci_block_cfg_access - Block PCI config reads/writes
> 
> This comment seems confusing. We don't in fact block all config
> reads writes. Instead we block userspace accesses and
> concurrent block requests.

I'm open for a better suggestion that summarize the more verbose (and
hopefully clearer) explanation below.

> 
>>   * @dev:     pci device struct
>>   *
>> - * When user access is blocked, any reads or writes to config space will
>> - * sleep until access is unblocked again.  We don't allow nesting of
>> - * block/unblock calls.
>> + * When access is blocked, any userspace reads or writes to config space
>> + * and concurrent block requests will sleep until
>> + * access is unblocked again.
>>   */
>> -void pci_block_user_cfg_access(struct pci_dev *dev)
>> +void pci_block_cfg_access(struct pci_dev *dev)
>>  {
>>       unsigned long flags;
>> -     int was_blocked;
>> +
>> +     might_sleep();
>> +
>> +     raw_spin_lock_irqsave(&pci_lock, flags);
>> +     if (dev->block_cfg_access)
>> +             pci_wait_cfg(dev);
>> +     dev->block_cfg_access = 1;
>> +     raw_spin_unlock_irqrestore(&pci_lock, flags);
> 
> Above can sleep so irq must be enabled, thus
> it can be raw_spin_lock_irq, right?

Yes, will clean up.

> 
>> +}
>> +EXPORT_SYMBOL_GPL(pci_block_cfg_access);
>> +
>> +/**
>> + * pci_block_cfg_access_in_atomic - Block PCI config reads/writes from atomic
>> + *                                  context
>> + * @dev:     pci device struct
>> + *
>> + * Same as pci_block_cfg_access, but will fail with -EBUSY if access is
>> + * already blocked.
> 
> Mention return value on success? Callers seem to rely on it being 0.

OK, also for all futher remarks.

Thanks for the review,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-06  7:18             ` Jan Kiszka
@ 2011-09-06  8:04               ` Michael S. Tsirkin
  2011-09-06  8:27                 ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-09-06  8:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On Tue, Sep 06, 2011 at 09:18:13AM +0200, Jan Kiszka wrote:
> >> @@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
> >>  EXPORT_SYMBOL(pci_vpd_truncate);
> >>
> >>  /**
> >> - * pci_block_user_cfg_access - Block userspace PCI config reads/writes
> >> + * pci_block_cfg_access - Block PCI config reads/writes
> > 
> > This comment seems confusing. We don't in fact block all config
> > reads writes. Instead we block userspace accesses and
> > concurrent block requests.
> 
> I'm open for a better suggestion that summarize the more verbose (and
> hopefully clearer) explanation below.

I think the problem is, it doesn't block config access
and we call it pci_block_cfg_access.

Thinking about it, doesn't this behave somewhat like a lock?
How about

pci_user_cfg_access_trylock
pci_user_cfg_access_lock
pci_user_cfg_access_unlock

And then:
 * pci_user_cfg_access_lock - Lock userspace PCI config access
 *
 * When locked, any userspace reads or writes to config space
 * and concurrent lock requests will sleep, and trylock requests
 * will fail, until pci_user_cfg_access_unlock is called.

I had a brief thought of using an rwsem internally, but
this would make trylock fail if userspace does config read,
changing semantics.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-06  8:04               ` Michael S. Tsirkin
@ 2011-09-06  8:27                 ` Jan Kiszka
  2011-09-06  8:47                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kiszka @ 2011-09-06  8:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-09-06 10:04, Michael S. Tsirkin wrote:
> On Tue, Sep 06, 2011 at 09:18:13AM +0200, Jan Kiszka wrote:
>>>> @@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
>>>>  EXPORT_SYMBOL(pci_vpd_truncate);
>>>>
>>>>  /**
>>>> - * pci_block_user_cfg_access - Block userspace PCI config reads/writes
>>>> + * pci_block_cfg_access - Block PCI config reads/writes
>>>
>>> This comment seems confusing. We don't in fact block all config
>>> reads writes. Instead we block userspace accesses and
>>> concurrent block requests.
>>
>> I'm open for a better suggestion that summarize the more verbose (and
>> hopefully clearer) explanation below.
> 
> I think the problem is, it doesn't block config access
> and we call it pci_block_cfg_access.
> 
> Thinking about it, doesn't this behave somewhat like a lock?
> How about
> 
> pci_user_cfg_access_trylock
> pci_user_cfg_access_lock
> pci_user_cfg_access_unlock
> 
> And then:
>  * pci_user_cfg_access_lock - Lock userspace PCI config access

Except that the "userspace" here is still only half of the truth and I
prefer to drop it, the naming locks good to me.

>  *
>  * When locked, any userspace reads or writes to config space
>  * and concurrent lock requests will sleep, and trylock requests
>  * will fail, until pci_user_cfg_access_unlock is called.
> 
> I had a brief thought of using an rwsem internally, but
> this would make trylock fail if userspace does config read,
> changing semantics.

Also, I bet we would make lockdep unhappy when calling
down_write_trylock from IRQ context.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-06  8:27                 ` Jan Kiszka
@ 2011-09-06  8:47                   ` Michael S. Tsirkin
  2011-09-06  8:48                     ` Jan Kiszka
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2011-09-06  8:47 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On Tue, Sep 06, 2011 at 10:27:45AM +0200, Jan Kiszka wrote:
> On 2011-09-06 10:04, Michael S. Tsirkin wrote:
> > On Tue, Sep 06, 2011 at 09:18:13AM +0200, Jan Kiszka wrote:
> >>>> @@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
> >>>>  EXPORT_SYMBOL(pci_vpd_truncate);
> >>>>
> >>>>  /**
> >>>> - * pci_block_user_cfg_access - Block userspace PCI config reads/writes
> >>>> + * pci_block_cfg_access - Block PCI config reads/writes
> >>>
> >>> This comment seems confusing. We don't in fact block all config
> >>> reads writes. Instead we block userspace accesses and
> >>> concurrent block requests.
> >>
> >> I'm open for a better suggestion that summarize the more verbose (and
> >> hopefully clearer) explanation below.
> > 
> > I think the problem is, it doesn't block config access
> > and we call it pci_block_cfg_access.
> > 
> > Thinking about it, doesn't this behave somewhat like a lock?
> > How about
> > 
> > pci_user_cfg_access_trylock
> > pci_user_cfg_access_lock
> > pci_user_cfg_access_unlock
> > 
> > And then:
> >  * pci_user_cfg_access_lock - Lock userspace PCI config access
> 
> Except that the "userspace" here is still only half of the truth

It's the name of the lock :)

> and I
> prefer to drop it, the naming locks good to me.

OK, I think it's acceptable.

> >  *
> >  * When locked, any userspace reads or writes to config space
> >  * and concurrent lock requests will sleep, and trylock requests
> >  * will fail, until pci_user_cfg_access_unlock is called.
> > 
> > I had a brief thought of using an rwsem internally, but
> > this would make trylock fail if userspace does config read,
> > changing semantics.
> 
> Also, I bet we would make lockdep unhappy when calling
> down_write_trylock from IRQ context.
> 
> Jan

I think I did this at some point, should be fine.
But whatever.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-06  8:47                   ` Michael S. Tsirkin
@ 2011-09-06  8:48                     ` Jan Kiszka
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Kiszka @ 2011-09-06  8:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Barnes, Brian King, James E.J. Bottomley, Hans J. Koch,
	Greg Kroah-Hartman, linux-pci, linux-kernel, linux-scsi, kvm

On 2011-09-06 10:47, Michael S. Tsirkin wrote:
> On Tue, Sep 06, 2011 at 10:27:45AM +0200, Jan Kiszka wrote:
>> On 2011-09-06 10:04, Michael S. Tsirkin wrote:
>>> On Tue, Sep 06, 2011 at 09:18:13AM +0200, Jan Kiszka wrote:
>>>>>> @@ -401,36 +403,58 @@ int pci_vpd_truncate(struct pci_dev *dev, size_t size)
>>>>>>  EXPORT_SYMBOL(pci_vpd_truncate);
>>>>>>
>>>>>>  /**
>>>>>> - * pci_block_user_cfg_access - Block userspace PCI config reads/writes
>>>>>> + * pci_block_cfg_access - Block PCI config reads/writes
>>>>>
>>>>> This comment seems confusing. We don't in fact block all config
>>>>> reads writes. Instead we block userspace accesses and
>>>>> concurrent block requests.
>>>>
>>>> I'm open for a better suggestion that summarize the more verbose (and
>>>> hopefully clearer) explanation below.
>>>
>>> I think the problem is, it doesn't block config access
>>> and we call it pci_block_cfg_access.
>>>
>>> Thinking about it, doesn't this behave somewhat like a lock?
>>> How about
>>>
>>> pci_user_cfg_access_trylock
>>> pci_user_cfg_access_lock
>>> pci_user_cfg_access_unlock
>>>
>>> And then:
>>>  * pci_user_cfg_access_lock - Lock userspace PCI config access
>>
>> Except that the "userspace" here is still only half of the truth
> 
> It's the name of the lock :)

Ah, blind. That has to be change as well of course (pci_cfg_access_*lock).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] pci: Rework config space blocking services
  2011-09-02  7:48         ` [RFC] pci: Rework config space blocking services Jan Kiszka
  2011-09-06  7:00           ` Michael S. Tsirkin
@ 2011-09-07 13:46           ` Brian King
  1 sibling, 0 replies; 35+ messages in thread
From: Brian King @ 2011-09-07 13:46 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Michael S. Tsirkin, Jesse Barnes, Brian King,
	James E.J. Bottomley, Hans J. Koch, Greg Kroah-Hartman,
	linux-pci, linux-kernel, linux-scsi, kvm

Here is how I would prefer to rework ipr. 

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center



The PCI config space blocking API has changed to better
allow for multiple users. Update ipr to use the new API.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 drivers/scsi/ipr.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++------
 drivers/scsi/ipr.h |    1 
 2 files changed, 60 insertions(+), 7 deletions(-)

diff -puN drivers/scsi/ipr.c~ipr_new_pci_block drivers/scsi/ipr.c
--- linux-2.6/drivers/scsi/ipr.c~ipr_new_pci_block	2011-09-06 16:52:47.000000000 -0500
+++ linux-2.6-bjking1/drivers/scsi/ipr.c	2011-09-07 08:34:54.000000000 -0500
@@ -7639,8 +7639,12 @@ static int ipr_reset_restore_cfg_space(s
  **/
 static int ipr_reset_bist_done(struct ipr_cmnd *ipr_cmd)
 {
+	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
+
 	ENTER;
-	pci_unblock_user_cfg_access(ipr_cmd->ioa_cfg->pdev);
+	if (ioa_cfg->ucfg_blocked)
+		pci_unblock_cfg_access(ioa_cfg->pdev);
+	ioa_cfg->ucfg_blocked = 0;
 	ipr_cmd->job_step = ipr_reset_restore_cfg_space;
 	LEAVE;
 	return IPR_RC_JOB_CONTINUE;
@@ -7661,8 +7665,6 @@ static int ipr_reset_start_bist(struct i
 	int rc = PCIBIOS_SUCCESSFUL;
 
 	ENTER;
-	pci_block_user_cfg_access(ioa_cfg->pdev);
-
 	if (ioa_cfg->ipr_chip->bist_method == IPR_MMIO)
 		writel(IPR_UPROCI_SIS64_START_BIST,
 		       ioa_cfg->regs.set_uproc_interrupt_reg32);
@@ -7674,7 +7676,9 @@ static int ipr_reset_start_bist(struct i
 		ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_BIST_TIMEOUT);
 		rc = IPR_RC_JOB_RETURN;
 	} else {
-		pci_unblock_user_cfg_access(ipr_cmd->ioa_cfg->pdev);
+		if (ioa_cfg->ucfg_blocked)
+			pci_unblock_cfg_access(ipr_cmd->ioa_cfg->pdev);
+		ioa_cfg->ucfg_blocked = 0;
 		ipr_cmd->s.ioasa.hdr.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR);
 		rc = IPR_RC_JOB_CONTINUE;
 	}
@@ -7717,7 +7721,6 @@ static int ipr_reset_slot_reset(struct i
 	struct pci_dev *pdev = ioa_cfg->pdev;
 
 	ENTER;
-	pci_block_user_cfg_access(pdev);
 	pci_set_pcie_reset_state(pdev, pcie_warm_reset);
 	ipr_cmd->job_step = ipr_reset_slot_reset_done;
 	ipr_reset_start_timer(ipr_cmd, IPR_PCI_RESET_TIMEOUT);
@@ -7726,6 +7729,55 @@ static int ipr_reset_slot_reset(struct i
 }
 
 /**
+ * ipr_reset_block_config_access_wait - Wait for permission to block config access
+ * @ipr_cmd:	ipr command struct
+ *
+ * Description: This attempts to block config access to the IOA.
+ *
+ * Return value:
+ * 	IPR_RC_JOB_CONTINUE / IPR_RC_JOB_RETURN
+ **/
+static int ipr_reset_block_config_access_wait(struct ipr_cmnd *ipr_cmd)
+{
+	struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg;
+	int rc = IPR_RC_JOB_CONTINUE;
+
+	if (pci_block_cfg_access_in_atomic(ioa_cfg->pdev)) {
+		if (ipr_cmd->u.time_left) {
+			rc = IPR_RC_JOB_RETURN;
+			ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT;
+			ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT);
+		} else {
+			ipr_cmd->job_step = ioa_cfg->reset;
+			dev_err(&ioa_cfg->pdev->dev,
+				"Timed out waiting to block config access. Resetting anyway.\n");
+		}
+	} else {
+		ioa_cfg->ucfg_blocked = 1;
+		ipr_cmd->job_step = ioa_cfg->reset;
+	}
+
+	return rc;
+}
+
+/**
+ * ipr_reset_block_config_access - Block config access to the IOA
+ * @ipr_cmd:	ipr command struct
+ *
+ * Description: This attempts to block config access to the IOA
+ *
+ * Return value:
+ * 	IPR_RC_JOB_CONTINUE
+ **/
+static int ipr_reset_block_config_access(struct ipr_cmnd *ipr_cmd)
+{
+	ipr_cmd->ioa_cfg->ucfg_blocked = 0;
+	ipr_cmd->job_step = ipr_reset_block_config_access_wait;
+	ipr_cmd->u.time_left = IPR_WAIT_FOR_RESET_TIMEOUT;
+	return IPR_RC_JOB_CONTINUE;
+}
+
+/**
  * ipr_reset_allowed - Query whether or not IOA can be reset
  * @ioa_cfg:	ioa config struct
  *
@@ -7764,7 +7816,7 @@ static int ipr_reset_wait_to_start_bist(
 		ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT;
 		ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT);
 	} else {
-		ipr_cmd->job_step = ioa_cfg->reset;
+		ipr_cmd->job_step = ipr_reset_block_config_access;
 		rc = IPR_RC_JOB_CONTINUE;
 	}
 
@@ -7797,7 +7849,7 @@ static int ipr_reset_alert(struct ipr_cm
 		writel(IPR_UPROCI_RESET_ALERT, ioa_cfg->regs.set_uproc_interrupt_reg32);
 		ipr_cmd->job_step = ipr_reset_wait_to_start_bist;
 	} else {
-		ipr_cmd->job_step = ioa_cfg->reset;
+		ipr_cmd->job_step = ipr_reset_block_config_access;
 	}
 
 	ipr_cmd->u.time_left = IPR_WAIT_FOR_RESET_TIMEOUT;
diff -puN drivers/scsi/ipr.h~ipr_new_pci_block drivers/scsi/ipr.h
--- linux-2.6/drivers/scsi/ipr.h~ipr_new_pci_block	2011-09-07 07:29:20.000000000 -0500
+++ linux-2.6-bjking1/drivers/scsi/ipr.h	2011-09-07 08:10:29.000000000 -0500
@@ -1384,6 +1384,7 @@ struct ipr_ioa_cfg {
 	u8 needs_warm_reset:1;
 	u8 msi_received:1;
 	u8 sis64:1;
+	u8 ucfg_blocked:1;
 
 	u8 revid;
 
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2011-09-07 16:20 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-24 10:43 Broken pci_block_user_cfg_access interface Jan Kiszka
2011-08-24 15:02 ` Brian King
2011-08-25  9:19   ` Jan Kiszka
2011-08-25  9:40     ` Michael S. Tsirkin
2011-08-25 10:34       ` Jan Kiszka
2011-08-25 13:06       ` Brian King
2011-08-25 13:12         ` Brian King
2011-08-25 13:16           ` Jan Kiszka
2011-08-25 13:24             ` Brian King
2011-08-25 18:16               ` Michael S. Tsirkin
2011-08-25 13:02     ` Brian King
2011-08-25 13:06       ` Jan Kiszka
2011-08-25 18:19         ` Michael S. Tsirkin
2011-08-25 18:52           ` Jan Kiszka
2011-08-25 19:07             ` Michael S. Tsirkin
2011-08-25 19:26               ` Jan Kiszka
2011-08-29 15:05 ` Michael S. Tsirkin
2011-08-29 15:42   ` Jan Kiszka
2011-08-29 15:58     ` Michael S. Tsirkin
2011-08-29 16:14       ` Jan Kiszka
2011-08-29 16:23         ` Michael S. Tsirkin
2011-08-29 16:26           ` Jan Kiszka
2011-08-29 18:47     ` Jan Kiszka
2011-08-29 19:18       ` Michael S. Tsirkin
2011-08-30 16:30         ` Brian King
2011-08-30 18:01           ` Michael S. Tsirkin
2011-08-30 19:41             ` Brian King
2011-09-02  7:48         ` [RFC] pci: Rework config space blocking services Jan Kiszka
2011-09-06  7:00           ` Michael S. Tsirkin
2011-09-06  7:18             ` Jan Kiszka
2011-09-06  8:04               ` Michael S. Tsirkin
2011-09-06  8:27                 ` Jan Kiszka
2011-09-06  8:47                   ` Michael S. Tsirkin
2011-09-06  8:48                     ` Jan Kiszka
2011-09-07 13:46           ` Brian King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.