linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [RFC PATCH 35/35] Add Xen virtual block device driver.
@ 2006-03-22 16:52 Ian Pratt
  2006-03-22 17:09 ` Anthony Liguori
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Ian Pratt @ 2006-03-22 16:52 UTC (permalink / raw)
  To: Anthony Liguori, Chris Wright
  Cc: virtualization, xen-devel, linux-kernel, Ian Pratt, ian.pratt

> This is another thing that has always put me off.  The 
> virtual block device driver has the ability to masquerade as 
> other types of block devices.  It actually claims to be an 
> IDE or SCSI device allocating the appropriate major/minor numbers.
> 
> This seems to be pretty evil and creating interesting failure 
> conditions for users who load IDE or SCSI modules.  I've seen 
> it trip up a number of people in the past.  I think we should 
> only ever use the major number that was actually allocated to us.

We certainly should be pushing everyone toward using the 'xdX' etc
devices that are allocated to us. However, the installers of certain
older distros and other user space tools won't except anything other
than hdX/sdX, so its useful from a compatibility POV even if it never
goes into mainline, which I agree it probably shouldn't. 

Ian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 16:52 [RFC PATCH 35/35] Add Xen virtual block device driver Ian Pratt
@ 2006-03-22 17:09 ` Anthony Liguori
  2006-03-22 23:09 ` Jeff Garzik
  2006-03-23  8:19 ` Arjan van de Ven
  2 siblings, 0 replies; 26+ messages in thread
From: Anthony Liguori @ 2006-03-22 17:09 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Chris Wright, virtualization, xen-devel, linux-kernel, Ian Pratt,
	ian.pratt

Ian Pratt wrote:
>> This seems to be pretty evil and creating interesting failure 
>> conditions for users who load IDE or SCSI modules.  I've seen 
>> it trip up a number of people in the past.  I think we should 
>> only ever use the major number that was actually allocated to us.
>>     
>
> We certainly should be pushing everyone toward using the 'xdX' etc
> devices that are allocated to us. However, the installers of certain
> older distros and other user space tools won't except anything other
> than hdX/sdX, so its useful from a compatibility POV even if it never
> goes into mainline, which I agree it probably shouldn't. 
>   

Then perhaps we should deprecate non xd block devices starting in the 
near future (3.0.3?).  We probably need to have it deprecated for a few 
releases since I think most people are not using xd at this point...

Regards,

Anthony Liguori

> Ian
>   


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 16:52 [RFC PATCH 35/35] Add Xen virtual block device driver Ian Pratt
  2006-03-22 17:09 ` Anthony Liguori
@ 2006-03-22 23:09 ` Jeff Garzik
  2006-03-24 12:17   ` Alan Cox
  2006-03-27 10:14   ` Peter Chubb
  2006-03-23  8:19 ` Arjan van de Ven
  2 siblings, 2 replies; 26+ messages in thread
From: Jeff Garzik @ 2006-03-22 23:09 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Anthony Liguori, Chris Wright, virtualization, xen-devel,
	linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

Ian Pratt wrote:
>>This is another thing that has always put me off.  The 
>>virtual block device driver has the ability to masquerade as 
>>other types of block devices.  It actually claims to be an 
>>IDE or SCSI device allocating the appropriate major/minor numbers.
>>
>>This seems to be pretty evil and creating interesting failure 
>>conditions for users who load IDE or SCSI modules.  I've seen 
>>it trip up a number of people in the past.  I think we should 
>>only ever use the major number that was actually allocated to us.
> 
> 
> We certainly should be pushing everyone toward using the 'xdX' etc
> devices that are allocated to us. However, the installers of certain
> older distros and other user space tools won't except anything other
> than hdX/sdX, so its useful from a compatibility POV even if it never
> goes into mainline, which I agree it probably shouldn't. 

Yes, this is true.  Red Hat installer guys grumbled at me when I wrote 
the 'sx8' block driver:  since it wasn't hda/sda, they had to write 
special code for it, as they apparently must do for any new block driver 
"class".  SuSE and other distros are probably similar, since each block 
driver provides its own special behaviors and feature exports.

I should have spoken up a long time ago about this, but anyway:

An IBM hypervisor on ppc64 communicates uses SCSI RPC messages.  I think 
this would be quite nice for Xen, because SCSI (a) is a message-based 
model, and (b) implementing block using SCSI has a very high Just 
Works(tm) value which cannot be ignored.  And perhaps (c) SCSI target 
code already exists, so implementing the server side doesn't require 
starting from scratch, but rather simply connecting the Legos.

	Jeff



^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 16:52 [RFC PATCH 35/35] Add Xen virtual block device driver Ian Pratt
  2006-03-22 17:09 ` Anthony Liguori
  2006-03-22 23:09 ` Jeff Garzik
@ 2006-03-23  8:19 ` Arjan van de Ven
  2006-03-23  9:34   ` Keir Fraser
  2 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2006-03-23  8:19 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Anthony Liguori, Chris Wright, virtualization, xen-devel,
	linux-kernel, Ian Pratt, ian.pratt

On Wed, 2006-03-22 at 16:52 +0000, Ian Pratt wrote:
> > This is another thing that has always put me off.  The 
> > virtual block device driver has the ability to masquerade as 
> > other types of block devices.  It actually claims to be an 
> > IDE or SCSI device allocating the appropriate major/minor numbers.
> > 
> > This seems to be pretty evil and creating interesting failure 
> > conditions for users who load IDE or SCSI modules.  I've seen 
> > it trip up a number of people in the past.  I think we should 
> > only ever use the major number that was actually allocated to us.
> 
> We certainly should be pushing everyone toward using the 'xdX' etc
> devices that are allocated to us.

yes but you are faking something stupid ;)
You aren't ide, you don't take the IDE ioctls. So please just nuke this
bit..



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-23  8:19 ` Arjan van de Ven
@ 2006-03-23  9:34   ` Keir Fraser
  2006-03-23  9:41     ` Arjan van de Ven
  2006-03-23  9:42     ` Arjan van de Ven
  0 siblings, 2 replies; 26+ messages in thread
From: Keir Fraser @ 2006-03-23  9:34 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: virtualization, Ian Pratt, xen-devel, ian.pratt, linux-kernel,
	Ian Pratt, Chris Wright


On 23 Mar 2006, at 08:19, Arjan van de Ven wrote:

>> We certainly should be pushing everyone toward using the 'xdX' etc
>> devices that are allocated to us.
>
> yes but you are faking something stupid ;)
> You aren't ide, you don't take the IDE ioctls. So please just nuke this
> bit..

Well, that's plausible. We probably don't need IDE *and* SCSI faking. 
We'd like to at least keep SCSI faking, perhaps making it more 
attractive by going to some effort to take at least the essential SCSI 
ioctls. We've talked about reving our block protocol to encapsulate 
SCSI anyway -- this would be another step on that path.

If we stick to just our own major then we break distro init scripts and 
surprise users.

  -- Keir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-23  9:34   ` Keir Fraser
@ 2006-03-23  9:41     ` Arjan van de Ven
  2006-03-23  9:42     ` Arjan van de Ven
  1 sibling, 0 replies; 26+ messages in thread
From: Arjan van de Ven @ 2006-03-23  9:41 UTC (permalink / raw)
  To: Keir Fraser
  Cc: virtualization, Ian Pratt, xen-devel, ian.pratt, linux-kernel,
	Ian Pratt, Chris Wright


> Well, that's plausible. We probably don't need IDE *and* SCSI faking. 
> We'd like to at least keep SCSI faking,

that's still unacceptable. Unless you start using the scsi layer and
really ARE scsi. 
but faking to be something you're not is not how you do things in linux.
Putting junk in the kernel because otherwise an open source installer
needs 3 extra lines... No Thanks(tm)

I would also recommend against going the full scsi-over-the-virtual-wire
mode. Xen is Xen *because* you don't need to go to a hardware level and
back on the other side. That's one of the reasons it's faster than full
virtualization. Don't throw away your advantages because you think it's
hard to add 3 lines to an open source project.

And the other consideration is this: SCSI is a complex spec. Doing a
half-emulation of that is actually worse than doing something fully on
your own. But if you want to go all the way.. that's imo way too much
overhead. You are not scsi. 

(And if someone really wants scsi in Xen, they already can use iSCSI as
protocol, no need to reinvent that wheel)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-23  9:34   ` Keir Fraser
  2006-03-23  9:41     ` Arjan van de Ven
@ 2006-03-23  9:42     ` Arjan van de Ven
  1 sibling, 0 replies; 26+ messages in thread
From: Arjan van de Ven @ 2006-03-23  9:42 UTC (permalink / raw)
  To: Keir Fraser
  Cc: virtualization, Ian Pratt, xen-devel, ian.pratt, linux-kernel,
	Ian Pratt, Chris Wright


> If we stick to just our own major then we break distro init scripts and 
> surprise users.

btw init scripts don't really break because of this, at least sane ones
don't. It's installers that may need a few tweaks, but those are minor
at worst.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 23:09 ` Jeff Garzik
@ 2006-03-24 12:17   ` Alan Cox
  2006-03-24 12:38     ` Jeff Garzik
  2006-03-27 10:14   ` Peter Chubb
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Cox @ 2006-03-24 12:17 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

On Mer, 2006-03-22 at 18:09 -0500, Jeff Garzik wrote:
> An IBM hypervisor on ppc64 communicates uses SCSI RPC messages.  I think 
> this would be quite nice for Xen, because SCSI (a) is a message-based 
> model, and (b) implementing block using SCSI has a very high Just 
> Works(tm) value which cannot be ignored.  And perhaps (c) SCSI target 
> code already exists, so implementing the server side doesn't require 
> starting from scratch, but rather simply connecting the Legos.

A pure SCSI abstraction doesn't allow for shared head scheduling which
you will need to scale Xen sanely on typical PC boxes. SCSI emulations
are also always full of bits people got wrong, often critical bits like
tagged queues and error sequences - things that break your journalled
file system.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 12:17   ` Alan Cox
@ 2006-03-24 12:38     ` Jeff Garzik
  2006-03-24 13:37       ` Jeff Garzik
  2006-03-24 15:55       ` Alan Cox
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff Garzik @ 2006-03-24 12:38 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

Alan Cox wrote:
> On Mer, 2006-03-22 at 18:09 -0500, Jeff Garzik wrote:
>> An IBM hypervisor on ppc64 communicates uses SCSI RPC messages.  I think 
>> this would be quite nice for Xen, because SCSI (a) is a message-based 
>> model, and (b) implementing block using SCSI has a very high Just 
>> Works(tm) value which cannot be ignored.  And perhaps (c) SCSI target 
>> code already exists, so implementing the server side doesn't require 
>> starting from scratch, but rather simply connecting the Legos.
> 
> A pure SCSI abstraction doesn't allow for shared head scheduling which
> you will need to scale Xen sanely on typical PC boxes.

Not true at all.  If you can do it with a block device, you can do it 
with a SCSI block device.

In fact, SCSI should make a few things easier, because the notion of 
host+bus topology is already present, and notion of messaging is already 
present, so you don't have to recreate that in a Xen block device 
infrastructure.


> SCSI emulations
> are also always full of bits people got wrong, often critical bits like
> tagged queues and error sequences - things that break your journalled
> file system.

This I'll grant you.

	Jeff




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 12:38     ` Jeff Garzik
@ 2006-03-24 13:37       ` Jeff Garzik
  2006-03-24 13:40         ` Arjan van de Ven
  2006-03-24 15:55       ` Alan Cox
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2006-03-24 13:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

Jeff Garzik wrote:
> In fact, SCSI should make a few things easier, because the notion of 
> host+bus topology is already present, and notion of messaging is already 
> present, so you don't have to recreate that in a Xen block device 
> infrastructure.

Another benefit of SCSI:  when an IBM hypervisor in the Linux kernel 
switched to SCSI, that allowed them to replace several drivers (virt 
disk, virt cdrom, virt floppy?) with a single virt-SCSI driver.

	Jeff



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 13:37       ` Jeff Garzik
@ 2006-03-24 13:40         ` Arjan van de Ven
  2006-03-24 13:50           ` Jeff Garzik
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2006-03-24 13:40 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alan Cox, Ian Pratt, Anthony Liguori, Chris Wright,
	virtualization, xen-devel, linux-kernel, Ian Pratt, ian.pratt,
	SCSI Mailing List

On Fri, 2006-03-24 at 08:37 -0500, Jeff Garzik wrote:
> Jeff Garzik wrote:
> > In fact, SCSI should make a few things easier, because the notion of 
> > host+bus topology is already present, and notion of messaging is already 
> > present, so you don't have to recreate that in a Xen block device 
> > infrastructure.
> 
> Another benefit of SCSI:  when an IBM hypervisor in the Linux kernel 
> switched to SCSI, that allowed them to replace several drivers (virt 
> disk, virt cdrom, virt floppy?) with a single virt-SCSI driver.

but there's a generic one for that: iSCSI
so in theory you only need to provide a network driver then ;)




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 13:40         ` Arjan van de Ven
@ 2006-03-24 13:50           ` Jeff Garzik
  2006-03-24 15:33             ` Dave C Boutcher
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Garzik @ 2006-03-24 13:50 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Ian Pratt, Anthony Liguori, Chris Wright,
	virtualization, xen-devel, linux-kernel, Ian Pratt, ian.pratt,
	SCSI Mailing List

Arjan van de Ven wrote:
> On Fri, 2006-03-24 at 08:37 -0500, Jeff Garzik wrote:
>> Jeff Garzik wrote:
>>> In fact, SCSI should make a few things easier, because the notion of 
>>> host+bus topology is already present, and notion of messaging is already 
>>> present, so you don't have to recreate that in a Xen block device 
>>> infrastructure.
>> Another benefit of SCSI:  when an IBM hypervisor in the Linux kernel 
>> switched to SCSI, that allowed them to replace several drivers (virt 
>> disk, virt cdrom, virt floppy?) with a single virt-SCSI driver.

> but there's a generic one for that: iSCSI
> so in theory you only need to provide a network driver then ;)

Talk about lots of overhead :)

OTOH, I bet that T10 is acting at high speed, right this second, to form 
a committee, and multiple sub-committees, to standardize SCSI 
transported over XenBus.  SXP anyone?  :)

	Jeff




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 13:50           ` Jeff Garzik
@ 2006-03-24 15:33             ` Dave C Boutcher
  2006-03-24 19:04               ` Mike Christie
  0 siblings, 1 reply; 26+ messages in thread
From: Dave C Boutcher @ 2006-03-24 15:33 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Arjan van de Ven, Alan Cox, Ian Pratt, Anthony Liguori,
	Chris Wright, virtualization, xen-devel, linux-kernel, Ian Pratt,
	ian.pratt, SCSI Mailing List


Jeff Garzik wrote:
>Arjan van de Ven wrote:
>> On Fri, 2006-03-24 at 08:37 -0500, Jeff Garzik wrote:
>>> Jeff Garzik wrote:
>>>> In fact, SCSI should make a few things easier, because the notion of 
>>>> host+bus topology is already present, and notion of messaging is already 
>>>> present, so you don't have to recreate that in a Xen block device 
>>>> infrastructure.
>>> Another benefit of SCSI:  when an IBM hypervisor in the Linux kernel 
>>> switched to SCSI, that allowed them to replace several drivers (virt 
>>> disk, virt cdrom, virt floppy?) with a single virt-SCSI driver.
>
>> but there's a generic one for that: iSCSI
>> so in theory you only need to provide a network driver then ;)
>
>Talk about lots of overhead :)
>
>OTOH, I bet that T10 is acting at high speed, right this second, to form 
>a committee, and multiple sub-committees, to standardize SCSI 
>transported over XenBus.  SXP anyone?  :)

Actually SRP (which T10 has now stopped working on) fits the bill very
nicely.

I have to say that moving the IBM virtual drivers from a random
collection of unique drivers (viodisk, viotape, viocd) to a single
virtual SCSI HBA made life much easier.

There is a group (actually, at least two groups) working on SCSI
target infrastructures...once that is in place, I would expect we
could start hacking a Xen virtual HBA.

We looked at iSCSI as a transport (instead of SRP) but we felt that 
the added complexity made it unlikely that the average human could
successfully boot their virtual machine

Dave B

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 12:38     ` Jeff Garzik
  2006-03-24 13:37       ` Jeff Garzik
@ 2006-03-24 15:55       ` Alan Cox
  2006-03-25 10:03         ` Rusty Russell
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Cox @ 2006-03-24 15:55 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

On Gwe, 2006-03-24 at 07:38 -0500, Jeff Garzik wrote:
> > A pure SCSI abstraction doesn't allow for shared head scheduling which
> > you will need to scale Xen sanely on typical PC boxes.
> 
> Not true at all.  If you can do it with a block device, you can do it 
> with a SCSI block device.

I don't believe this is true. The complexity of expressing sequences of
command ordering between virtual machines acting in a co-operative but
secure manner isn't as far as I can see expressable sanely in SCSI TCQ
> 
> In fact, SCSI should make a few things easier, because the notion of 
> host+bus topology is already present, and notion of messaging is already 
> present, so you don't have to recreate that in a Xen block device 
> infrastructure.

Those are the easy bits. 

> > are also always full of bits people got wrong, often critical bits like
> > tagged queues and error sequences - things that break your journalled
> > file system.
> 
> This I'll grant you.

And every one you get wrong is a corruptor....

Alan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 15:33             ` Dave C Boutcher
@ 2006-03-24 19:04               ` Mike Christie
  2006-03-24 19:19                 ` Dave C Boutcher
  0 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2006-03-24 19:04 UTC (permalink / raw)
  To: Dave C Boutcher
  Cc: Jeff Garzik, Arjan van de Ven, Alan Cox, Ian Pratt,
	Anthony Liguori, Chris Wright, virtualization, xen-devel,
	linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

Dave C Boutcher wrote:
> Jeff Garzik wrote:
>> Arjan van de Ven wrote:
>>> On Fri, 2006-03-24 at 08:37 -0500, Jeff Garzik wrote:
>>>> Jeff Garzik wrote:
>>>>> In fact, SCSI should make a few things easier, because the notion of 
>>>>> host+bus topology is already present, and notion of messaging is already 
>>>>> present, so you don't have to recreate that in a Xen block device 
>>>>> infrastructure.
>>>> Another benefit of SCSI:  when an IBM hypervisor in the Linux kernel 
>>>> switched to SCSI, that allowed them to replace several drivers (virt 
>>>> disk, virt cdrom, virt floppy?) with a single virt-SCSI driver.
>>> but there's a generic one for that: iSCSI
>>> so in theory you only need to provide a network driver then ;)
>> Talk about lots of overhead :)
>>
>> OTOH, I bet that T10 is acting at high speed, right this second, to form 
>> a committee, and multiple sub-committees, to standardize SCSI 
>> transported over XenBus.  SXP anyone?  :)
> 
> Actually SRP (which T10 has now stopped working on) fits the bill very
> nicely.
> 

Does the IBM vscsi code/SPEC follow the SRP SPEC or is it slightly 
modified? We also have a SRP initiator in kernel now too. It is just not 
in the drivers/scsi dir.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 19:04               ` Mike Christie
@ 2006-03-24 19:19                 ` Dave C Boutcher
  2006-03-25  0:32                   ` FUJITA Tomonori
  2006-03-25  0:47                   ` Roland Dreier
  0 siblings, 2 replies; 26+ messages in thread
From: Dave C Boutcher @ 2006-03-24 19:19 UTC (permalink / raw)
  To: Mike Christie
  Cc: Dave C Boutcher, Jeff Garzik, Arjan van de Ven, Alan Cox,
	Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List


Mike Christie wrote:
> Does the IBM vscsi code/SPEC follow the SRP SPEC or is it slightly 
> modified? We also have a SRP initiator in kernel now too. It is just not 
> in the drivers/scsi dir.

The goal was to follow the SRP spec 100%.  We added one other optional
command set (different protocol identifier than SRP) to exchange some
information like "who is at the other end", but the intent was that
the SRP part was right from the spec.

I think, since we implemented this in three operating systems (Linux,
AIX, and OS/400) using the T10 spec as the reference that we are probably
pretty close.

And yeah, I'm aware that there is another SRP implementation in the
kernel...Merging would be good...

Dave B

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 19:19                 ` Dave C Boutcher
@ 2006-03-25  0:32                   ` FUJITA Tomonori
  2006-03-25  0:47                   ` Roland Dreier
  1 sibling, 0 replies; 26+ messages in thread
From: FUJITA Tomonori @ 2006-03-25  0:32 UTC (permalink / raw)
  To: boutcher
  Cc: michaelc, jeff, arjan, alan, m+Ian.Pratt, aliguori, chrisw,
	virtualization, xen-devel, linux-kernel, ian.pratt, ian.pratt,
	linux-scsi

From: boutcher@cs.umn.edu (Dave C Boutcher)
Subject: Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
Date: Fri, 24 Mar 2006 13:19:56 -0600

> 
> Mike Christie wrote:
> > Does the IBM vscsi code/SPEC follow the SRP SPEC or is it slightly 
> > modified? We also have a SRP initiator in kernel now too. It is just not 
> > in the drivers/scsi dir.
> 
> The goal was to follow the SRP spec 100%.  We added one other optional
> command set (different protocol identifier than SRP) to exchange some
> information like "who is at the other end", but the intent was that
> the SRP part was right from the spec.
> 
> I think, since we implemented this in three operating systems (Linux,
> AIX, and OS/400) using the T10 spec as the reference that we are probably
> pretty close.

About the target side, the lun structure is very different the spec
(tgt implements this as a user-space library).


> And yeah, I'm aware that there is another SRP implementation in the
> kernel...Merging would be good...

Do you have any plans for this?

I've been thinking about writing something like scsi_transport_srp,
which can help the initiator and target drivers. I like to enable tgt
to support RDMA-capable adapters.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 19:19                 ` Dave C Boutcher
  2006-03-25  0:32                   ` FUJITA Tomonori
@ 2006-03-25  0:47                   ` Roland Dreier
  1 sibling, 0 replies; 26+ messages in thread
From: Roland Dreier @ 2006-03-25  0:47 UTC (permalink / raw)
  To: Dave C Boutcher
  Cc: Mike Christie, Jeff Garzik, Arjan van de Ven, Alan Cox,
	Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

    Dave> And yeah, I'm aware that there is another SRP implementation
    Dave> in the kernel...Merging would be good...

Changing the ibmvscsi driver to use the include/scsi/srp.h header file
at least is on my list of things to do.  Probably a 2.6.18 type of thing.

 - R.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-24 15:55       ` Alan Cox
@ 2006-03-25 10:03         ` Rusty Russell
  0 siblings, 0 replies; 26+ messages in thread
From: Rusty Russell @ 2006-03-25 10:03 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Garzik, Chris Wright, xen-devel, SCSI Mailing List,
	Ian Pratt, virtualization, ian.pratt, linux-kernel

On Fri, 2006-03-24 at 15:55 +0000, Alan Cox wrote:
> On Gwe, 2006-03-24 at 07:38 -0500, Jeff Garzik wrote:
> > > A pure SCSI abstraction doesn't allow for shared head scheduling which
> > > you will need to scale Xen sanely on typical PC boxes.
> > 
> > Not true at all.  If you can do it with a block device, you can do it 
> > with a SCSI block device.
> 
> I don't believe this is true. The complexity of expressing sequences of
> command ordering between virtual machines acting in a co-operative but
> secure manner isn't as far as I can see expressable sanely in SCSI TCQ

I thought usb_scsi taught us that SCSI was overkill for a block
abstraction?  I have a much simpler Xen block-device implementation
which seems to perform OK, and is a lot less LOC than the in-tree one,
so I don't think the "SCSI would be better than what's there" (while
possibly true) is valid.

Cheers!
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 23:09 ` Jeff Garzik
  2006-03-24 12:17   ` Alan Cox
@ 2006-03-27 10:14   ` Peter Chubb
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Chubb @ 2006-03-27 10:14 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ian Pratt, Anthony Liguori, Chris Wright, virtualization,
	xen-devel, linux-kernel, Ian Pratt, ian.pratt, SCSI Mailing List

>>>>> "Jeff" == Jeff Garzik <jeff@garzik.org> writes:

Jeff> Ian Pratt wrote:
>> 
>> We certainly should be pushing everyone toward using the 'xdX' etc
>> devices that are allocated to us. However, the installers of
>> certain older distros and other user space tools won't except
>> anything other than hdX/sdX, so its useful from a compatibility POV
>> even if it never goes into mainline, which I agree it probably
>> shouldn't.

Jeff> Yes, this is true.  Red Hat installer guys grumbled at me when I
Jeff> wrote the 'sx8' block driver: since it wasn't hda/sda, they had
Jeff> to write special code for it, as they apparently must do for any
Jeff> new block driver "class".  SuSE and other distros are probably
Jeff> similar, since each block driver provides its own special
Jeff> behaviors and feature exports.

Jeff> I should have spoken up a long time ago about this, but anyway:

Jeff> An IBM hypervisor on ppc64 communicates uses SCSI RPC messages.
Jeff> I think this would be quite nice for Xen, because SCSI (a) is a
Jeff> message-based model, and (b) implementing block using SCSI has a
Jeff> very high Just Works(tm) value which cannot be ignored.  And
Jeff> perhaps (c) SCSI target code already exists, so implementing the
Jeff> server side doesn't require starting from scratch, but rather
Jeff> simply connecting the Legos.

The IA64 virtualisation work (Xen and Linux-on-Linux) uses the SKI
simulator virtual scsi device --- which looks just like any other scsi
disk, but uses hypervisor calls to do read/write/open/close calls like
a user-mode process.  For performance, it needs to be extended a bit
to do asynchronous I/O and interrupt on completion.  As a halfway
house, the ski simscsi driver would be fairly easy to port, I think.

-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au           ERTOS within National ICT Australia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-05-09  7:00 ` [RFC PATCH 35/35] Add Xen virtual block device driver Chris Wright
@ 2006-05-09 12:01   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2006-05-09 12:01 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel, virtualization, xen-devel, Ian Pratt, Christian Limpach

On Tue, May 09, 2006 at 12:00:35AM -0700, Chris Wright wrote:
> The block device frontend driver allows the kernel to access block
> devices exported exported by a virtual machine containing a physical
> block device driver.

Any reason you're using the old crappy xen I/O code instead of Rusty's
alternative version?

Also please stop this stupid front/back naming.  In Linux terminology the
frontend is the client if there's a need for a postfix at all, and the
backend the server.  Compare that to e.g. ibm vio.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-05-09  8:49 [RFC PATCH 00/35] Xen i386 paravirtualization support Chris Wright
@ 2006-05-09  7:00 ` Chris Wright
  2006-05-09 12:01   ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wright @ 2006-05-09  7:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: virtualization, xen-devel, Ian Pratt, Christian Limpach

[-- Attachment #1: blkfront --]
[-- Type: text/plain, Size: 34125 bytes --]

The block device frontend driver allows the kernel to access block
devices exported exported by a virtual machine containing a physical
block device driver.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 drivers/block/Kconfig           |    2 
 drivers/xen/Kconfig.blk         |   14 
 drivers/xen/Makefile            |    1 
 drivers/xen/blkfront/Makefile   |    5 
 drivers/xen/blkfront/blkfront.c |  813 ++++++++++++++++++++++++++++++++++++++++
 drivers/xen/blkfront/block.h    |  156 +++++++
 drivers/xen/blkfront/vbd.c      |  216 ++++++++++
 7 files changed, 1207 insertions(+)

--- linus-2.6.orig/drivers/block/Kconfig
+++ linus-2.6/drivers/block/Kconfig
@@ -449,6 +449,8 @@ config CDROM_PKTCDVD_WCACHE
 
 source "drivers/s390/block/Kconfig"
 
+source "drivers/xen/Kconfig.blk"
+
 config ATA_OVER_ETH
 	tristate "ATA over Ethernet support"
 	depends on NET
--- linus-2.6.orig/drivers/xen/Makefile
+++ linus-2.6/drivers/xen/Makefile
@@ -6,5 +6,6 @@ obj-y	+= core/
 obj-y	+= console/
 obj-y	+= xenbus/
 
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= blkfront/
 obj-$(CONFIG_XEN_NETDEV_FRONTEND)	+= netfront/
 
--- /dev/null
+++ linus-2.6/drivers/xen/Kconfig.blk
@@ -0,0 +1,14 @@
+menu "Xen block device drivers"
+        depends on XEN
+
+config XEN_BLKDEV_FRONTEND
+	tristate "Block device frontend driver"
+	depends on XEN
+	default y
+	help
+	  The block device frontend driver allows the kernel to access block
+	  devices exported from a device driver virtual machine. Unless you
+	  are building a dedicated device driver virtual machine, then you
+	  almost certainly want to say Y here.
+
+endmenu
--- /dev/null
+++ linus-2.6/drivers/xen/blkfront/Makefile
@@ -0,0 +1,5 @@
+
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	:= xenblk.o
+
+xenblk-objs := blkfront.o vbd.o
+
--- /dev/null
+++ linus-2.6/drivers/xen/blkfront/blkfront.c
@@ -0,0 +1,813 @@
+/******************************************************************************
+ * blkfront.c
+ * 
+ * XenLinux virtual block device driver.
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004, Christian Limpach
+ * Copyright (c) 2004, Andrew Warfield
+ * Copyright (c) 2005, Christopher Clark
+ * Copyright (c) 2005, XenSource Ltd
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/version.h>
+#include "block.h"
+#include <linux/cdrom.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <scsi/scsi.h>
+#include <xen/evtchn.h>
+#include <xen/xenbus.h>
+#include <xen/interface/grant_table.h>
+#include <xen/gnttab.h>
+#include <asm/hypervisor.h>
+
+#define BLKIF_STATE_DISCONNECTED 0
+#define BLKIF_STATE_CONNECTED    1
+#define BLKIF_STATE_SUSPENDED    2
+
+#define MAXIMUM_OUTSTANDING_BLOCK_REQS \
+    (BLKIF_MAX_SEGMENTS_PER_REQUEST * BLK_RING_SIZE)
+#define GRANT_INVALID_REF	0
+
+static void connect(struct blkfront_info *);
+static void blkfront_closing(struct xenbus_device *);
+static int blkfront_remove(struct xenbus_device *);
+static int talk_to_backend(struct xenbus_device *, struct blkfront_info *);
+static int setup_blkring(struct xenbus_device *, struct blkfront_info *);
+
+static void kick_pending_request_queues(struct blkfront_info *);
+
+static irqreturn_t blkif_int(int irq, void *dev_id, struct pt_regs *ptregs);
+static void blkif_restart_queue(void *arg);
+static void blkif_recover(struct blkfront_info *);
+static void blkif_completion(struct blk_shadow *);
+static void blkif_free(struct blkfront_info *, int);
+
+
+/**
+ * Entry point to this code when a new device is created.  Allocate the basic
+ * structures and the ring buffer for communication with the backend, and
+ * inform the backend of the appropriate details for those.  Switch to
+ * Initialised state.
+ */
+static int blkfront_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	int err, vdevice, i;
+	struct blkfront_info *info;
+
+	/* FIXME: Use dynamic device id if this is not set. */
+	err = xenbus_scanf(XBT_NULL, dev->nodename,
+			   "virtual-device", "%i", &vdevice);
+	if (err != 1) {
+		xenbus_dev_fatal(dev, err, "reading virtual-device");
+		return err;
+	}
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
+		return -ENOMEM;
+	}
+
+	info->xbdev = dev;
+	info->vdevice = vdevice;
+	info->connected = BLKIF_STATE_DISCONNECTED;
+	INIT_WORK(&info->work, blkif_restart_queue, (void *)info);
+
+	for (i = 0; i < BLK_RING_SIZE; i++)
+		info->shadow[i].req.id = i+1;
+	info->shadow[BLK_RING_SIZE-1].req.id = 0x0fffffff;
+
+	/* Front end dir is a number, which is used as the id. */
+	info->handle = simple_strtoul(strrchr(dev->nodename,'/')+1, NULL, 0);
+	dev->data = info;
+
+	err = talk_to_backend(dev, info);
+	if (err) {
+		kfree(info);
+		dev->data = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+
+/**
+ * We are reconnecting to the backend, due to a suspend/resume, or a backend
+ * driver restart.  We tear down our blkif structure and recreate it, but
+ * leave the device-layer structures intact so that this is transparent to the
+ * rest of the kernel.
+ */
+static int blkfront_resume(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+	int err;
+
+	DPRINTK("blkfront_resume: %s\n", dev->nodename);
+
+	blkif_free(info, 1);
+
+	err = talk_to_backend(dev, info);
+	if (!err)
+		blkif_recover(info);
+
+	return err;
+}
+
+
+/* Common code used when first setting up, and when resuming. */
+static int talk_to_backend(struct xenbus_device *dev,
+			   struct blkfront_info *info)
+{
+	const char *message = NULL;
+	xenbus_transaction_t xbt;
+	int err;
+
+	/* Create shared ring, alloc event channel. */
+	err = setup_blkring(dev, info);
+	if (err)
+		goto out;
+
+again:
+	err = xenbus_transaction_start(&xbt);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "starting transaction");
+		goto destroy_blkring;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename,
+			    "ring-ref","%u", info->ring_ref);
+	if (err) {
+		message = "writing ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename,
+			    "event-channel", "%u", info->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_switch_state(dev, xbt, XenbusStateInitialised);
+	if (err)
+		goto abort_transaction;
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err) {
+		if (err == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, err, "completing transaction");
+		goto destroy_blkring;
+	}
+
+	return 0;
+
+ abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(dev, err, "%s", message);
+ destroy_blkring:
+	blkif_free(info, 0);
+ out:
+	return err;
+}
+
+
+static int setup_blkring(struct xenbus_device *dev,
+			 struct blkfront_info *info)
+{
+	struct blkif_sring *sring;
+	int err;
+
+	info->ring_ref = GRANT_INVALID_REF;
+
+	sring = (struct blkif_sring *)__get_free_page(GFP_KERNEL);
+	if (!sring) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring");
+		return -ENOMEM;
+	}
+	SHARED_RING_INIT(sring);
+	FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
+	if (err < 0) {
+		free_page((unsigned long)sring);
+		info->ring.sring = NULL;
+		goto fail;
+	}
+	info->ring_ref = err;
+
+	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	if (err)
+		goto fail;
+
+	err = bind_evtchn_to_irqhandler(
+		info->evtchn, blkif_int, SA_SAMPLE_RANDOM, "blkif", info);
+	if (err <= 0) {
+		xenbus_dev_fatal(dev, err,
+				 "bind_evtchn_to_irqhandler failed");
+		goto fail;
+	}
+	info->irq = err;
+
+	return 0;
+fail:
+	blkif_free(info, 0);
+	return err;
+}
+
+
+/**
+ * Callback received when the backend's state changes.
+ */
+static void backend_changed(struct xenbus_device *dev,
+			    XenbusState backend_state)
+{
+	struct blkfront_info *info = dev->data;
+	struct block_device *bd;
+
+	DPRINTK("blkfront:backend_changed.\n");
+
+	switch (backend_state) {
+	case XenbusStateUnknown:
+	case XenbusStateInitialising:
+	case XenbusStateInitWait:
+	case XenbusStateInitialised:
+	case XenbusStateClosed:
+		break;
+
+	case XenbusStateConnected:
+		connect(info);
+		break;
+
+	case XenbusStateClosing:
+		bd = bdget(info->dev);
+		if (bd == NULL)
+			xenbus_dev_fatal(dev, -ENODEV, "bdget failed");
+
+		mutex_lock(&bd->bd_mutex);
+		if (info->users > 0)
+			xenbus_dev_error(dev, -EBUSY,
+					 "Device in use; refusing to close");
+		else
+			blkfront_closing(dev);
+		mutex_unlock(&bd->bd_mutex);
+		bdput(bd);
+		break;
+	}
+}
+
+
+/* ** Connection ** */
+
+
+/*
+ * Invoked when the backend is finally 'ready' (and has told produced
+ * the details about the physical device - #sectors, size, etc).
+ */
+static void connect(struct blkfront_info *info)
+{
+	unsigned long sectors, sector_size;
+	unsigned int binfo;
+	int err;
+
+	if ((info->connected == BLKIF_STATE_CONNECTED) ||
+	    (info->connected == BLKIF_STATE_SUSPENDED) )
+		return;
+
+	DPRINTK("blkfront.c:connect:%s.\n", info->xbdev->otherend);
+
+	err = xenbus_gather(XBT_NULL, info->xbdev->otherend,
+			    "sectors", "%lu", &sectors,
+			    "info", "%u", &binfo,
+			    "sector-size", "%lu", &sector_size,
+			    NULL);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err,
+				 "reading backend fields at %s",
+				 info->xbdev->otherend);
+		return;
+	}
+
+	err = xlvbd_add(sectors, info->vdevice, binfo, sector_size, info);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
+		                 info->xbdev->otherend);
+		return;
+	}
+
+	(void)xenbus_switch_state(info->xbdev, XBT_NULL, XenbusStateConnected);
+
+	/* Kick pending requests. */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = BLKIF_STATE_CONNECTED;
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+
+	add_disk(info->gd);
+}
+
+/**
+ * Handle the change of state of the backend to Closing.  We must delete our
+ * device-layer structures now, to ensure that writes are flushed through to
+ * the backend.  Once is this done, we can switch to Closed in
+ * acknowledgement.
+ */
+static void blkfront_closing(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+
+	DPRINTK("blkfront_closing: %s removed\n", dev->nodename);
+
+	xlvbd_del(info);
+
+	xenbus_switch_state(dev, XBT_NULL, XenbusStateClosed);
+}
+
+
+static int blkfront_remove(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+
+	DPRINTK("blkfront_remove: %s removed\n", dev->nodename);
+
+	blkif_free(info, 0);
+
+	kfree(info);
+
+	return 0;
+}
+
+
+static inline int GET_ID_FROM_FREELIST(
+	struct blkfront_info *info)
+{
+	unsigned long free = info->shadow_free;
+	BUG_ON(free > BLK_RING_SIZE);
+	info->shadow_free = info->shadow[free].req.id;
+	info->shadow[free].req.id = 0x0fffffee; /* debug */
+	return free;
+}
+
+static inline void ADD_ID_TO_FREELIST(
+	struct blkfront_info *info, unsigned long id)
+{
+	info->shadow[id].req.id  = info->shadow_free;
+	info->shadow[id].request = 0;
+	info->shadow_free = id;
+}
+
+static inline void flush_requests(struct blkfront_info *info)
+{
+	int notify;
+
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+
+	if (notify)
+		notify_remote_via_irq(info->irq);
+}
+
+static void kick_pending_request_queues(struct blkfront_info *info)
+{
+	if (!RING_FULL(&info->ring)) {
+		/* Re-enable calldowns. */
+		blk_start_queue(info->rq);
+		/* Kick things off immediately. */
+		do_blkif_request(info->rq);
+	}
+}
+
+static void blkif_restart_queue(void *arg)
+{
+	struct blkfront_info *info = (struct blkfront_info *)arg;
+	spin_lock_irq(&blkif_io_lock);
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+}
+
+static void blkif_restart_queue_callback(void *arg)
+{
+	struct blkfront_info *info = (struct blkfront_info *)arg;
+	schedule_work(&info->work);
+}
+
+int blkif_open(struct inode *inode, struct file *filep)
+{
+	struct blkfront_info *info = inode->i_bdev->bd_disk->private_data;
+	info->users++;
+	return 0;
+}
+
+
+int blkif_release(struct inode *inode, struct file *filep)
+{
+	struct blkfront_info *info = inode->i_bdev->bd_disk->private_data;
+	info->users--;
+	if (info->users == 0) {
+		/* Check whether we have been instructed to close.  We will
+		   have ignored this request initially, as the device was
+		   still mounted. */
+		struct xenbus_device * dev = info->xbdev;
+		XenbusState state = xenbus_read_driver_state(dev->otherend);
+
+		if (state == XenbusStateClosing)
+			blkfront_closing(dev);
+	}
+	return 0;
+}
+
+
+int blkif_ioctl(struct inode *inode, struct file *filep,
+                unsigned command, unsigned long argument)
+{
+	int i;
+
+	DPRINTK_IOCTL("command: 0x%x, argument: 0x%lx, dev: 0x%04x\n",
+		      command, (long)argument, inode->i_rdev);
+
+	switch (command) {
+	case HDIO_GETGEO:
+		/* return ENOSYS to use defaults */
+		return -ENOSYS;
+
+	case CDROMMULTISESSION:
+		DPRINTK("FIXME: support multisession CDs later\n");
+		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
+			if (put_user(0, (char __user *)(argument + i)))
+				return -EFAULT;
+		return 0;
+
+	default:
+		/*printk(KERN_ALERT "ioctl %08x not supported by Xen blkdev\n",
+		  command);*/
+		return -EINVAL; /* same return as native Linux */
+	}
+
+	return 0;
+}
+
+/*
+ * blkif_queue_request
+ *
+ * request block io
+ *
+ * id: for guest use only.
+ * operation: BLKIF_OP_{READ,WRITE,PROBE}
+ * buffer: buffer to read/write into. this should be a
+ *   virtual address in the guest os.
+ */
+static int blkif_queue_request(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+	unsigned long buffer_mfn;
+	struct blkif_request *ring_req;
+	struct bio *bio;
+	struct bio_vec *bvec;
+	int idx;
+	unsigned long id;
+	unsigned int fsect, lsect;
+	int ref;
+	grant_ref_t gref_head;
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+		return 1;
+
+	if (gnttab_alloc_grant_references(
+		BLKIF_MAX_SEGMENTS_PER_REQUEST, &gref_head) < 0) {
+		gnttab_request_free_callback(
+			&info->callback,
+			blkif_restart_queue_callback,
+			info,
+			BLKIF_MAX_SEGMENTS_PER_REQUEST);
+		return 1;
+	}
+
+	/* Fill out a communications ring structure. */
+	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	id = GET_ID_FROM_FREELIST(info);
+	info->shadow[id].request = (unsigned long)req;
+
+	ring_req->id = id;
+	ring_req->operation = rq_data_dir(req) ?
+		BLKIF_OP_WRITE : BLKIF_OP_READ;
+	ring_req->sector_number = (blkif_sector_t)req->sector;
+	ring_req->handle = info->handle;
+
+	ring_req->nr_segments = 0;
+	rq_for_each_bio (bio, req) {
+		bio_for_each_segment (bvec, bio, idx) {
+			BUG_ON(ring_req->nr_segments
+			       == BLKIF_MAX_SEGMENTS_PER_REQUEST);
+			buffer_mfn = page_to_phys(bvec->bv_page) >> PAGE_SHIFT;
+			fsect = bvec->bv_offset >> 9;
+			lsect = fsect + (bvec->bv_len >> 9) - 1;
+			/* install a grant reference. */
+			ref = gnttab_claim_grant_reference(&gref_head);
+			BUG_ON(ref == -ENOSPC);
+
+			gnttab_grant_foreign_access_ref(
+				ref,
+				info->xbdev->otherend_id,
+				buffer_mfn,
+				rq_data_dir(req) );
+
+			info->shadow[id].frame[ring_req->nr_segments] =
+				mfn_to_pfn(buffer_mfn);
+
+			ring_req->seg[ring_req->nr_segments] =
+				(struct blkif_request_segment) {
+					.gref       = ref,
+					.first_sect = fsect,
+					.last_sect  = lsect };
+
+			ring_req->nr_segments++;
+		}
+	}
+
+	info->ring.req_prod_pvt++;
+
+	/* Keep a private copy so we can reissue requests when recovering. */
+	info->shadow[id].req = *ring_req;
+
+	gnttab_free_grant_references(gref_head);
+
+	return 0;
+}
+
+/*
+ * do_blkif_request
+ *  read a block; request is in a request queue
+ */
+void do_blkif_request(request_queue_t *rq)
+{
+	struct blkfront_info *info = NULL;
+	struct request *req;
+	int queued;
+
+	DPRINTK("Entered do_blkif_request\n");
+
+	queued = 0;
+
+	while ((req = elv_next_request(rq)) != NULL) {
+		info = req->rq_disk->private_data;
+		if (!blk_fs_request(req)) {
+			end_request(req, 0);
+			continue;
+		}
+
+		if (RING_FULL(&info->ring))
+			goto wait;
+
+		DPRINTK("do_blk_req %p: cmd %p, sec %lx, "
+			"(%u/%li) buffer:%p [%s]\n",
+			req, req->cmd, req->sector, req->current_nr_sectors,
+			req->nr_sectors, req->buffer,
+			rq_data_dir(req) ? "write" : "read");
+
+
+		blkdev_dequeue_request(req);
+		if (blkif_queue_request(req)) {
+			blk_requeue_request(rq, req);
+		wait:
+			/* Avoid pointless unplugs. */
+			blk_stop_queue(rq);
+			break;
+		}
+
+		queued++;
+	}
+
+	if (queued != 0)
+		flush_requests(info);
+}
+
+
+static irqreturn_t blkif_int(int irq, void *dev_id, struct pt_regs *ptregs)
+{
+	struct request *req;
+	struct blkif_response *bret;
+	RING_IDX i, rp;
+	unsigned long flags;
+	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+
+	spin_lock_irqsave(&blkif_io_lock, flags);
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
+		spin_unlock_irqrestore(&blkif_io_lock, flags);
+		return IRQ_HANDLED;
+	}
+
+ again:
+	rp = info->ring.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to 'rp'. */
+
+	for (i = info->ring.rsp_cons; i != rp; i++) {
+		unsigned long id;
+		int ret;
+
+		bret = RING_GET_RESPONSE(&info->ring, i);
+		id   = bret->id;
+		req  = (struct request *)info->shadow[id].request;
+
+		blkif_completion(&info->shadow[id]);
+
+		ADD_ID_TO_FREELIST(info, id);
+
+		switch (bret->operation) {
+		case BLKIF_OP_READ:
+		case BLKIF_OP_WRITE:
+			if (unlikely(bret->status != BLKIF_RSP_OKAY))
+				DPRINTK("Bad return from blkdev data "
+					"request: %x\n", bret->status);
+
+			ret = end_that_request_first(
+				req, (bret->status == BLKIF_RSP_OKAY),
+				req->hard_nr_sectors);
+			BUG_ON(ret);
+			end_that_request_last(
+				req, (bret->status == BLKIF_RSP_OKAY));
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	info->ring.rsp_cons = i;
+
+	if (i != info->ring.req_prod_pvt) {
+		int more_to_do;
+		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		if (more_to_do)
+			goto again;
+	} else
+		info->ring.sring->rsp_event = i + 1;
+
+	kick_pending_request_queues(info);
+
+	spin_unlock_irqrestore(&blkif_io_lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static void blkif_free(struct blkfront_info *info, int suspend)
+{
+	/* Prevent new requests being issued until we fix things up. */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = suspend ?
+		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
+	spin_unlock_irq(&blkif_io_lock);
+
+	/* Free resources associated with old device channel. */
+	if (info->ring_ref != GRANT_INVALID_REF) {
+		gnttab_end_foreign_access(info->ring_ref, 0,
+					  (unsigned long)info->ring.sring);
+		info->ring_ref = GRANT_INVALID_REF;
+		info->ring.sring = NULL;
+	}
+	if (info->irq)
+		unbind_from_irqhandler(info->irq, info);
+	info->evtchn = info->irq = 0;
+
+}
+
+static void blkif_completion(struct blk_shadow *s)
+{
+	int i;
+	for (i = 0; i < s->req.nr_segments; i++)
+		gnttab_end_foreign_access(s->req.seg[i].gref, 0, 0UL);
+}
+
+static void blkif_recover(struct blkfront_info *info)
+{
+	int i;
+	struct blkif_request *req;
+	struct blk_shadow *copy;
+	int j;
+
+	/* Stage 1: Make a safe copy of the shadow state. */
+	copy = kmalloc(sizeof(info->shadow), GFP_KERNEL | __GFP_NOFAIL);
+	memcpy(copy, info->shadow, sizeof(info->shadow));
+
+	/* Stage 2: Set up free list. */
+	memset(&info->shadow, 0, sizeof(info->shadow));
+	for (i = 0; i < BLK_RING_SIZE; i++)
+		info->shadow[i].req.id = i+1;
+	info->shadow_free = info->ring.req_prod_pvt;
+	info->shadow[BLK_RING_SIZE-1].req.id = 0x0fffffff;
+
+	/* Stage 3: Find pending requests and requeue them. */
+	for (i = 0; i < BLK_RING_SIZE; i++) {
+		/* Not in use? */
+		if (copy[i].request == 0)
+			continue;
+
+		/* Grab a request slot and copy shadow state into it. */
+		req = RING_GET_REQUEST(
+			&info->ring, info->ring.req_prod_pvt);
+		*req = copy[i].req;
+
+		/* We get a new request id, and must reset the shadow state. */
+		req->id = GET_ID_FROM_FREELIST(info);
+		memcpy(&info->shadow[req->id], &copy[i], sizeof(copy[i]));
+
+		/* Rewrite any grant references invalidated by susp/resume. */
+		for (j = 0; j < req->nr_segments; j++)
+			gnttab_grant_foreign_access_ref(
+				req->seg[j].gref,
+				info->xbdev->otherend_id,
+				pfn_to_mfn(info->shadow[req->id].frame[j]),
+				rq_data_dir(
+					(struct request *)
+					info->shadow[req->id].request));
+		info->shadow[req->id].req = *req;
+
+		info->ring.req_prod_pvt++;
+	}
+
+	kfree(copy);
+
+	(void)xenbus_switch_state(info->xbdev, XBT_NULL, XenbusStateConnected);
+
+	/* Now safe for us to use the shared ring */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = BLKIF_STATE_CONNECTED;
+	spin_unlock_irq(&blkif_io_lock);
+
+	/* Send off requeued requests */
+	flush_requests(info);
+
+	/* Kick any other new requests queued since we resumed */
+	spin_lock_irq(&blkif_io_lock);
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+}
+
+
+/* ** Driver Registration ** */
+
+
+static struct xenbus_device_id blkfront_ids[] = {
+	{ "vbd" },
+	{ "" }
+};
+
+
+static struct xenbus_driver blkfront = {
+	.name = "vbd",
+	.owner = THIS_MODULE,
+	.ids = blkfront_ids,
+	.probe = blkfront_probe,
+	.remove = blkfront_remove,
+	.resume = blkfront_resume,
+	.otherend_changed = backend_changed,
+};
+
+
+static int __init xlblk_init(void)
+{
+	if (xen_init() < 0)
+		return -ENODEV;
+
+	if (xlvbd_alloc_major() < 0)
+		return -ENODEV;
+
+	return xenbus_register_frontend(&blkfront);
+}
+module_init(xlblk_init);
+
+
+static void xlblk_exit(void)
+{
+	return xenbus_unregister_driver(&blkfront);
+}
+module_exit(xlblk_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
--- /dev/null
+++ linus-2.6/drivers/xen/blkfront/block.h
@@ -0,0 +1,156 @@
+/******************************************************************************
+ * block.h
+ * 
+ * Shared definitions between all levels of XenLinux Virtual block devices.
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004-2005, Christian Limpach
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_DRIVERS_BLOCK_H__
+#define __XEN_DRIVERS_BLOCK_H__
+
+#include <linux/config.h>
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/hdreg.h>
+#include <linux/blkdev.h>
+#include <linux/major.h>
+#include <linux/devfs_fs_kernel.h>
+#include <asm/hypervisor.h>
+#include <xen/xenbus.h>
+#include <xen/gnttab.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/io/blkif.h>
+#include <xen/interface/io/ring.h>
+#include <asm/io.h>
+#include <asm/atomic.h>
+#include <asm/uaccess.h>
+
+#if 1
+#define IPRINTK(fmt, args...) \
+    printk(KERN_INFO "xen_blk: " fmt, ##args)
+#else
+#define IPRINTK(fmt, args...) ((void)0)
+#endif
+
+#if 1
+#define WPRINTK(fmt, args...) \
+    printk(KERN_WARNING "xen_blk: " fmt, ##args)
+#else
+#define WPRINTK(fmt, args...) ((void)0)
+#endif
+
+#define DPRINTK(_f, _a...) pr_debug(_f, ## _a)
+
+#if 0
+#define DPRINTK_IOCTL(_f, _a...) printk(KERN_ALERT _f, ## _a)
+#else
+#define DPRINTK_IOCTL(_f, _a...) ((void)0)
+#endif
+
+struct xlbd_type_info
+{
+	int partn_shift;
+	int disks_per_major;
+	char *devname;
+	char *diskname;
+};
+
+struct xlbd_major_info
+{
+	int major;
+	int index;
+	int usage;
+	struct xlbd_type_info *type;
+};
+
+struct blk_shadow {
+	struct blkif_request req;
+	unsigned long request;
+	unsigned long frame[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+};
+
+#define BLK_RING_SIZE __RING_SIZE((struct blkif_sring *)0, PAGE_SIZE)
+
+/*
+ * We have one of these per vbd, whether ide, scsi or 'other'.  They
+ * hang in private_data off the gendisk structure. We may end up
+ * putting all kinds of interesting stuff here :-)
+ */
+struct blkfront_info
+{
+	struct xenbus_device *xbdev;
+	dev_t dev;
+ 	struct gendisk *gd;
+	int vdevice;
+	blkif_vdev_t handle;
+	int connected;
+	int ring_ref;
+	struct blkif_front_ring ring;
+	unsigned int evtchn, irq;
+	struct xlbd_major_info *mi;
+	request_queue_t *rq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_RING_SIZE];
+	unsigned long shadow_free;
+
+	/**
+	 * The number of people holding this device open.  We won't allow a
+	 * hot-unplug unless this is 0.
+	 */
+	int users;
+};
+
+extern spinlock_t blkif_io_lock;
+
+extern int blkif_open(struct inode *inode, struct file *filep);
+extern int blkif_release(struct inode *inode, struct file *filep);
+extern int blkif_ioctl(struct inode *inode, struct file *filep,
+                       unsigned command, unsigned long argument);
+extern int blkif_check(dev_t dev);
+extern int blkif_revalidate(dev_t dev);
+extern void do_blkif_request (request_queue_t *rq);
+
+/* Virtual block device subsystem. */
+int xlvbd_alloc_major(void);
+/* Note that xlvbd_add doesn't call add_disk for you: you're expected
+   to call add_disk on info->gd once the disk is properly connected
+   up. */
+int xlvbd_add(blkif_sector_t capacity, int device,
+	      u16 vdisk_info, u16 sector_size, struct blkfront_info *info);
+void xlvbd_del(struct blkfront_info *info);
+
+#endif /* __XEN_DRIVERS_BLOCK_H__ */
--- /dev/null
+++ linus-2.6/drivers/xen/blkfront/vbd.c
@@ -0,0 +1,216 @@
+/******************************************************************************
+ * vbd.c
+ * 
+ * XenLinux virtual block device driver (xvd).
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004-2005, Christian Limpach
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "block.h"
+#include <linux/blkdev.h>
+#include <linux/list.h>
+
+#define BLKIF_MAJOR(dev) ((dev)>>8)
+#define BLKIF_MINOR(dev) ((dev) & 0xff)
+
+static struct xlbd_type_info xvd_type_info = {
+	.partn_shift = 4,
+	.disks_per_major = 16,
+	.devname = "xvd",
+	.diskname = "xvd"
+};
+
+static struct xlbd_major_info xvd_major_info = {
+	.major = 201,
+	.type = &xvd_type_info
+};
+
+/* Information about our VBDs. */
+#define MAX_VBDS 64
+static LIST_HEAD(vbds_list);
+
+static struct block_device_operations xlvbd_block_fops =
+{
+	.owner = THIS_MODULE,
+	.open = blkif_open,
+	.release = blkif_release,
+	.ioctl  = blkif_ioctl,
+};
+
+spinlock_t blkif_io_lock = SPIN_LOCK_UNLOCKED;
+
+int
+xlvbd_alloc_major(void)
+{
+	printk("Registering block device major %i\n", xvd_major_info.major);
+	if (register_blkdev(xvd_major_info.major,
+			    xvd_major_info.type->devname)) {
+		WPRINTK("can't get major %d with name %s\n",
+			xvd_major_info.major, xvd_major_info.type->devname);
+		return -1;
+	}
+
+	devfs_mk_dir(xvd_major_info.type->devname);
+	return 0;
+}
+
+static int
+xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
+{
+	request_queue_t *rq;
+
+	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
+	if (rq == NULL)
+		return -1;
+
+	elevator_init(rq, "noop");
+
+	/* Hard sector size and max sectors impersonate the equiv. hardware. */
+	blk_queue_hardsect_size(rq, sector_size);
+	blk_queue_max_sectors(rq, 512);
+
+	/* Each segment in a request is up to an aligned page in size. */
+	blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
+	blk_queue_max_segment_size(rq, PAGE_SIZE);
+
+	/* Ensure a merged request will fit in a single I/O ring slot. */
+	blk_queue_max_phys_segments(rq, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	blk_queue_max_hw_segments(rq, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+
+	/* Make sure buffer addresses are sector-aligned. */
+	blk_queue_dma_alignment(rq, 511);
+
+	gd->queue = rq;
+
+	return 0;
+}
+
+static int
+xlvbd_alloc_gendisk(int minor, blkif_sector_t capacity, int vdevice,
+		    u16 vdisk_info, u16 sector_size,
+		    struct blkfront_info *info)
+{
+	struct gendisk *gd;
+	struct xlbd_major_info *mi;
+	int nr_minors = 1;
+	int err = -ENODEV;
+
+	BUG_ON(info->gd != NULL);
+	BUG_ON(info->mi != NULL);
+	BUG_ON(info->rq != NULL);
+
+	mi = &xvd_major_info;
+	info->mi = mi;
+
+	if ((minor & ((1 << mi->type->partn_shift) - 1)) == 0)
+		nr_minors = 1 << mi->type->partn_shift;
+
+	gd = alloc_disk(nr_minors);
+	if (gd == NULL)
+		goto out;
+
+	if (nr_minors > 1)
+		sprintf(gd->disk_name, "%s%c", mi->type->diskname,
+			'a' + mi->index * mi->type->disks_per_major +
+			(minor >> mi->type->partn_shift));
+	else
+		sprintf(gd->disk_name, "%s%c%d", mi->type->diskname,
+			'a' + mi->index * mi->type->disks_per_major +
+			(minor >> mi->type->partn_shift),
+			minor & ((1 << mi->type->partn_shift) - 1));
+
+	gd->major = mi->major;
+	gd->first_minor = minor;
+	gd->fops = &xlvbd_block_fops;
+	gd->private_data = info;
+	gd->driverfs_dev = &(info->xbdev->dev);
+	set_capacity(gd, capacity);
+
+	if (xlvbd_init_blk_queue(gd, sector_size)) {
+		del_gendisk(gd);
+		goto out;
+	}
+
+	info->rq = gd->queue;
+
+	if (vdisk_info & VDISK_READONLY)
+		set_disk_ro(gd, 1);
+
+	if (vdisk_info & VDISK_REMOVABLE)
+		gd->flags |= GENHD_FL_REMOVABLE;
+
+	if (vdisk_info & VDISK_CDROM)
+		gd->flags |= GENHD_FL_CD;
+
+	info->gd = gd;
+
+	return 0;
+
+ out:
+	info->mi = NULL;
+	return err;
+}
+
+int
+xlvbd_add(blkif_sector_t capacity, int vdevice, u16 vdisk_info,
+	  u16 sector_size, struct blkfront_info *info)
+{
+	struct block_device *bd;
+	int err = 0;
+
+	info->dev = MKDEV(BLKIF_MAJOR(vdevice), BLKIF_MINOR(vdevice));
+
+	bd = bdget(info->dev);
+	if (bd == NULL)
+		return -ENODEV;
+
+	err = xlvbd_alloc_gendisk(BLKIF_MINOR(vdevice), capacity, vdevice,
+				  vdisk_info, sector_size, info);
+
+	bdput(bd);
+	return err;
+}
+
+void
+xlvbd_del(struct blkfront_info *info)
+{
+	if (info->mi == NULL)
+		return;
+
+	BUG_ON(info->gd == NULL);
+	del_gendisk(info->gd);
+	put_disk(info->gd);
+	info->gd = NULL;
+
+	info->mi = NULL;
+
+	BUG_ON(info->rq == NULL);
+	blk_cleanup_queue(info->rq);
+	info->rq = NULL;
+}

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 16:39   ` Anthony Liguori
  2006-03-22 16:54     ` Christoph Hellwig
@ 2006-03-27  8:42     ` Gerd Hoffmann
  1 sibling, 0 replies; 26+ messages in thread
From: Gerd Hoffmann @ 2006-03-27  8:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Chris Wright, linux-kernel, virtualization, xen-devel, Ian Pratt

  Hi,

>> +static struct xlbd_type_info xlbd_ide_type = {
>> +static struct xlbd_type_info xlbd_scsi_type = {
>> +static struct xlbd_type_info xlbd_vbd_type = {

> This is another thing that has always put me off.  The virtual block
> device driver has the ability to masquerade as other types of block
> devices.  It actually claims to be an IDE or SCSI device allocating the
> appropriate major/minor numbers.

It's useful sometimes.  Debian/sarge for example doesn't work with xvd
block devices.  At least not out-of-the-box, it needs some manual
tweaks.  Probably it also is handy when moving real machines into an
virtual environment.  I don't think it should be dropped.

Most modern udev-based distros work just fine with xvd though.

> This seems to be pretty evil and creating interesting failure conditions
> for users who load IDE or SCSI modules.  I've seen it trip up a number
> of people in the past.  I think we should only ever use the major number
> that was actually allocated to us.

Print a big fat warning?  And also change the example config files in
the xen source tree to use xvda not hda to advertize them more than we
do right now.  I think lots of users don't even know about the xvd
devices ...

cheers,

  Gerd

-- 
Gerd 'just married' Hoffmann <kraxel@suse.de>
I'm the hacker formerly known as Gerd Knorr.
http://www.suse.de/~kraxel/just-married.jpeg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22 16:39   ` Anthony Liguori
@ 2006-03-22 16:54     ` Christoph Hellwig
  2006-03-27  8:42     ` Gerd Hoffmann
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2006-03-22 16:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Chris Wright, linux-kernel, virtualization, xen-devel, Ian Pratt

On Wed, Mar 22, 2006 at 10:39:25AM -0600, Anthony Liguori wrote:
> This is another thing that has always put me off.  The virtual block 
> device driver has the ability to masquerade as other types of block 
> devices.  It actually claims to be an IDE or SCSI device allocating the 
> appropriate major/minor numbers.
> 
> This seems to be pretty evil and creating interesting failure conditions 
> for users who load IDE or SCSI modules.  I've seen it trip up a number 
> of people in the past.  I think we should only ever use the major number 
> that was actually allocated to us.

Exactly.  We vetoed crap like that in the ibm vio drivers already so
it was removed before merging those drivers.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22  6:31 ` [RFC PATCH 35/35] Add Xen virtual block device driver Chris Wright
@ 2006-03-22 16:39   ` Anthony Liguori
  2006-03-22 16:54     ` Christoph Hellwig
  2006-03-27  8:42     ` Gerd Hoffmann
  0 siblings, 2 replies; 26+ messages in thread
From: Anthony Liguori @ 2006-03-22 16:39 UTC (permalink / raw)
  To: Chris Wright; +Cc: linux-kernel, virtualization, xen-devel, Ian Pratt

Chris Wright wrote:
> The block device frontend driver allows the kernel to access block
> devices exported exported by a virtual machine containing a physical
> block device driver.
>   

> +
> +static struct xlbd_type_info xlbd_ide_type = {
> +	.partn_shift = 6,
> +	.disks_per_major = 2,
> +	.devname = "ide",
> +	.diskname = "hd",
> +};
> +
> +static struct xlbd_type_info xlbd_scsi_type = {
> +	.partn_shift = 4,
> +	.disks_per_major = 16,
> +	.devname = "sd",
> +	.diskname = "sd",
> +};
> +
> +static struct xlbd_type_info xlbd_vbd_type = {
> +	.partn_shift = 4,
> +	.disks_per_major = 16,
> +	.devname = "xvd",
> +	.diskname = "xvd",
> +};
>   

This is another thing that has always put me off.  The virtual block 
device driver has the ability to masquerade as other types of block 
devices.  It actually claims to be an IDE or SCSI device allocating the 
appropriate major/minor numbers.

This seems to be pretty evil and creating interesting failure conditions 
for users who load IDE or SCSI modules.  I've seen it trip up a number 
of people in the past.  I think we should only ever use the major number 
that was actually allocated to us.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 35/35] Add Xen virtual block device driver.
  2006-03-22  6:30 [RFC PATCH 00/35] Xen i386 paravirtualization support Chris Wright
@ 2006-03-22  6:31 ` Chris Wright
  2006-03-22 16:39   ` Anthony Liguori
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach

[-- Attachment #1: 34-blkfront --]
[-- Type: text/plain, Size: 36520 bytes --]

The block device frontend driver allows the kernel to access block
devices exported exported by a virtual machine containing a physical
block device driver.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 drivers/block/Kconfig           |    2 
 drivers/xen/Kconfig.blk         |   14 
 drivers/xen/Makefile            |    1 
 drivers/xen/blkfront/Makefile   |    5 
 drivers/xen/blkfront/blkfront.c |  812 ++++++++++++++++++++++++++++++++++++++++
 drivers/xen/blkfront/block.h    |  152 +++++++
 drivers/xen/blkfront/vbd.c      |  316 +++++++++++++++
 7 files changed, 1302 insertions(+)

--- xen-subarch-2.6.orig/drivers/block/Kconfig
+++ xen-subarch-2.6/drivers/block/Kconfig
@@ -450,6 +450,8 @@ config CDROM_PKTCDVD_WCACHE
 
 source "drivers/s390/block/Kconfig"
 
+source "drivers/xen/Kconfig.blk"
+
 config ATA_OVER_ETH
 	tristate "ATA over Ethernet support"
 	depends on NET
--- xen-subarch-2.6.orig/drivers/xen/Makefile
+++ xen-subarch-2.6/drivers/xen/Makefile
@@ -5,4 +5,5 @@ obj-y	+= util.o
 obj-y	+= console/
 obj-y	+= xenbus/
 
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= blkfront/
 obj-$(CONFIG_XEN_NETDEV_FRONTEND)	+= netfront/
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/Kconfig.blk
@@ -0,0 +1,14 @@
+menu "Xen block device drivers"
+        depends on XEN
+
+config XEN_BLKDEV_FRONTEND
+	tristate "Block device frontend driver"
+	depends on XEN
+	default y
+	help
+	  The block device frontend driver allows the kernel to access block
+	  devices exported from a device driver virtual machine. Unless you
+	  are building a dedicated device driver virtual machine, then you
+	  almost certainly want to say Y here.
+
+endmenu
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/Makefile
@@ -0,0 +1,5 @@
+
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	:= xenblk.o
+
+xenblk-objs := blkfront.o vbd.o
+
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/blkfront.c
@@ -0,0 +1,812 @@
+/******************************************************************************
+ * blkfront.c
+ * 
+ * XenLinux virtual block device driver.
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004, Christian Limpach
+ * Copyright (c) 2004, Andrew Warfield
+ * Copyright (c) 2005, Christopher Clark
+ * Copyright (c) 2005, XenSource Ltd
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/version.h>
+#include "block.h"
+#include <linux/cdrom.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <scsi/scsi.h>
+#include <xen/evtchn.h>
+#include <xen/xenbus.h>
+#include <xen/interface/grant_table.h>
+#include <xen/gnttab.h>
+#include <asm/hypervisor.h>
+
+#define BLKIF_STATE_DISCONNECTED 0
+#define BLKIF_STATE_CONNECTED    1
+#define BLKIF_STATE_SUSPENDED    2
+
+#define MAXIMUM_OUTSTANDING_BLOCK_REQS \
+    (BLKIF_MAX_SEGMENTS_PER_REQUEST * BLK_RING_SIZE)
+#define GRANT_INVALID_REF	0
+
+static void connect(struct blkfront_info *);
+static void blkfront_closing(struct xenbus_device *);
+static int blkfront_remove(struct xenbus_device *);
+static int talk_to_backend(struct xenbus_device *, struct blkfront_info *);
+static int setup_blkring(struct xenbus_device *, struct blkfront_info *);
+
+static void kick_pending_request_queues(struct blkfront_info *);
+
+static irqreturn_t blkif_int(int irq, void *dev_id, struct pt_regs *ptregs);
+static void blkif_restart_queue(void *arg);
+static void blkif_recover(struct blkfront_info *);
+static void blkif_completion(struct blk_shadow *);
+static void blkif_free(struct blkfront_info *, int);
+
+
+/**
+ * Entry point to this code when a new device is created.  Allocate the basic
+ * structures and the ring buffer for communication with the backend, and
+ * inform the backend of the appropriate details for those.  Switch to
+ * Initialised state.
+ */
+static int blkfront_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	int err, vdevice, i;
+	struct blkfront_info *info;
+
+	/* FIXME: Use dynamic device id if this is not set. */
+	err = xenbus_scanf(XBT_NULL, dev->nodename,
+			   "virtual-device", "%i", &vdevice);
+	if (err != 1) {
+		xenbus_dev_fatal(dev, err, "reading virtual-device");
+		return err;
+	}
+
+	info = kmalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
+		return -ENOMEM;
+	}
+
+	memset(info, 0, sizeof(*info));
+	info->xbdev = dev;
+	info->vdevice = vdevice;
+	info->connected = BLKIF_STATE_DISCONNECTED;
+	INIT_WORK(&info->work, blkif_restart_queue, (void *)info);
+
+	for (i = 0; i < BLK_RING_SIZE; i++)
+		info->shadow[i].req.id = i+1;
+	info->shadow[BLK_RING_SIZE-1].req.id = 0x0fffffff;
+
+	/* Front end dir is a number, which is used as the id. */
+	info->handle = simple_strtoul(strrchr(dev->nodename,'/')+1, NULL, 0);
+	dev->data = info;
+
+	err = talk_to_backend(dev, info);
+	if (err) {
+		kfree(info);
+		dev->data = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+
+/**
+ * We are reconnecting to the backend, due to a suspend/resume, or a backend
+ * driver restart.  We tear down our blkif structure and recreate it, but
+ * leave the device-layer structures intact so that this is transparent to the
+ * rest of the kernel.
+ */
+static int blkfront_resume(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+	int err;
+
+	DPRINTK("blkfront_resume: %s\n", dev->nodename);
+
+	blkif_free(info, 1);
+
+	err = talk_to_backend(dev, info);
+	if (!err)
+		blkif_recover(info);
+
+	return err;
+}
+
+
+/* Common code used when first setting up, and when resuming. */
+static int talk_to_backend(struct xenbus_device *dev,
+			   struct blkfront_info *info)
+{
+	const char *message = NULL;
+	xenbus_transaction_t xbt;
+	int err;
+
+	/* Create shared ring, alloc event channel. */
+	err = setup_blkring(dev, info);
+	if (err)
+		goto out;
+
+again:
+	err = xenbus_transaction_start(&xbt);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "starting transaction");
+		goto destroy_blkring;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename,
+			    "ring-ref","%u", info->ring_ref);
+	if (err) {
+		message = "writing ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename,
+			    "event-channel", "%u", info->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_switch_state(dev, xbt, XenbusStateInitialised);
+	if (err)
+		goto abort_transaction;
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err) {
+		if (err == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, err, "completing transaction");
+		goto destroy_blkring;
+	}
+
+	return 0;
+
+ abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(dev, err, "%s", message);
+ destroy_blkring:
+	blkif_free(info, 0);
+ out:
+	return err;
+}
+
+
+static int setup_blkring(struct xenbus_device *dev,
+			 struct blkfront_info *info)
+{
+	struct blkif_sring *sring;
+	int err;
+
+	info->ring_ref = GRANT_INVALID_REF;
+
+	sring = (struct blkif_sring *)__get_free_page(GFP_KERNEL);
+	if (!sring) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring");
+		return -ENOMEM;
+	}
+	SHARED_RING_INIT(sring);
+	FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
+	if (err < 0) {
+		free_page((unsigned long)sring);
+		info->ring.sring = NULL;
+		goto fail;
+	}
+	info->ring_ref = err;
+
+	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	if (err)
+		goto fail;
+
+	err = bind_evtchn_to_irqhandler(
+		info->evtchn, blkif_int, SA_SAMPLE_RANDOM, "blkif", info);
+	if (err <= 0) {
+		xenbus_dev_fatal(dev, err,
+				 "bind_evtchn_to_irqhandler failed");
+		goto fail;
+	}
+	info->irq = err;
+
+	return 0;
+fail:
+	blkif_free(info, 0);
+	return err;
+}
+
+
+/**
+ * Callback received when the backend's state changes.
+ */
+static void backend_changed(struct xenbus_device *dev,
+			    XenbusState backend_state)
+{
+	struct blkfront_info *info = dev->data;
+	struct block_device *bd;
+
+	DPRINTK("blkfront:backend_changed.\n");
+
+	switch (backend_state) {
+	case XenbusStateUnknown:
+	case XenbusStateInitialising:
+	case XenbusStateInitWait:
+	case XenbusStateInitialised:
+	case XenbusStateClosed:
+		break;
+
+	case XenbusStateConnected:
+		connect(info);
+		break;
+
+	case XenbusStateClosing:
+		bd = bdget(info->dev);
+		if (bd == NULL)
+			xenbus_dev_fatal(dev, -ENODEV, "bdget failed");
+
+		down(&bd->bd_sem);
+		if (info->users > 0)
+			xenbus_dev_error(dev, -EBUSY,
+					 "Device in use; refusing to close");
+		else
+			blkfront_closing(dev);
+		up(&bd->bd_sem);
+		bdput(bd);
+		break;
+	}
+}
+
+
+/* ** Connection ** */
+
+
+/*
+ * Invoked when the backend is finally 'ready' (and has told produced
+ * the details about the physical device - #sectors, size, etc).
+ */
+static void connect(struct blkfront_info *info)
+{
+	unsigned long sectors, sector_size;
+	unsigned int binfo;
+	int err;
+
+	if ((info->connected == BLKIF_STATE_CONNECTED) ||
+	    (info->connected == BLKIF_STATE_SUSPENDED) )
+		return;
+
+	DPRINTK("blkfront.c:connect:%s.\n", info->xbdev->otherend);
+
+	err = xenbus_gather(XBT_NULL, info->xbdev->otherend,
+			    "sectors", "%lu", &sectors,
+			    "info", "%u", &binfo,
+			    "sector-size", "%lu", &sector_size,
+			    NULL);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err,
+				 "reading backend fields at %s",
+				 info->xbdev->otherend);
+		return;
+	}
+
+	err = xlvbd_add(sectors, info->vdevice, binfo, sector_size, info);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
+		                 info->xbdev->otherend);
+		return;
+	}
+
+	(void)xenbus_switch_state(info->xbdev, XBT_NULL, XenbusStateConnected);
+
+	/* Kick pending requests. */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = BLKIF_STATE_CONNECTED;
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+
+	add_disk(info->gd);
+}
+
+/**
+ * Handle the change of state of the backend to Closing.  We must delete our
+ * device-layer structures now, to ensure that writes are flushed through to
+ * the backend.  Once is this done, we can switch to Closed in
+ * acknowledgement.
+ */
+static void blkfront_closing(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+
+	DPRINTK("blkfront_closing: %s removed\n", dev->nodename);
+
+	xlvbd_del(info);
+
+	xenbus_switch_state(dev, XBT_NULL, XenbusStateClosed);
+}
+
+
+static int blkfront_remove(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+
+	DPRINTK("blkfront_remove: %s removed\n", dev->nodename);
+
+	blkif_free(info, 0);
+
+	kfree(info);
+
+	return 0;
+}
+
+
+static inline int GET_ID_FROM_FREELIST(
+	struct blkfront_info *info)
+{
+	unsigned long free = info->shadow_free;
+	BUG_ON(free > BLK_RING_SIZE);
+	info->shadow_free = info->shadow[free].req.id;
+	info->shadow[free].req.id = 0x0fffffee; /* debug */
+	return free;
+}
+
+static inline void ADD_ID_TO_FREELIST(
+	struct blkfront_info *info, unsigned long id)
+{
+	info->shadow[id].req.id  = info->shadow_free;
+	info->shadow[id].request = 0;
+	info->shadow_free = id;
+}
+
+static inline void flush_requests(struct blkfront_info *info)
+{
+	int notify;
+
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+
+	if (notify)
+		notify_remote_via_irq(info->irq);
+}
+
+static void kick_pending_request_queues(struct blkfront_info *info)
+{
+	if (!RING_FULL(&info->ring)) {
+		/* Re-enable calldowns. */
+		blk_start_queue(info->rq);
+		/* Kick things off immediately. */
+		do_blkif_request(info->rq);
+	}
+}
+
+static void blkif_restart_queue(void *arg)
+{
+	struct blkfront_info *info = (struct blkfront_info *)arg;
+	spin_lock_irq(&blkif_io_lock);
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+}
+
+static void blkif_restart_queue_callback(void *arg)
+{
+	struct blkfront_info *info = (struct blkfront_info *)arg;
+	schedule_work(&info->work);
+}
+
+int blkif_open(struct inode *inode, struct file *filep)
+{
+	struct blkfront_info *info = inode->i_bdev->bd_disk->private_data;
+	info->users++;
+	return 0;
+}
+
+
+int blkif_release(struct inode *inode, struct file *filep)
+{
+	struct blkfront_info *info = inode->i_bdev->bd_disk->private_data;
+	info->users--;
+	if (info->users == 0) {
+		/* Check whether we have been instructed to close.  We will
+		   have ignored this request initially, as the device was
+		   still mounted. */
+		struct xenbus_device * dev = info->xbdev;
+		XenbusState state = xenbus_read_driver_state(dev->otherend);
+
+		if (state == XenbusStateClosing)
+			blkfront_closing(dev);
+	}
+	return 0;
+}
+
+
+int blkif_ioctl(struct inode *inode, struct file *filep,
+                unsigned command, unsigned long argument)
+{
+	int i;
+
+	DPRINTK_IOCTL("command: 0x%x, argument: 0x%lx, dev: 0x%04x\n",
+		      command, (long)argument, inode->i_rdev);
+
+	switch (command) {
+	case HDIO_GETGEO:
+		/* return ENOSYS to use defaults */
+		return -ENOSYS;
+
+	case CDROMMULTISESSION:
+		DPRINTK("FIXME: support multisession CDs later\n");
+		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
+			if (put_user(0, (char __user *)(argument + i)))
+				return -EFAULT;
+		return 0;
+
+	default:
+		/*printk(KERN_ALERT "ioctl %08x not supported by Xen blkdev\n",
+		  command);*/
+		return -EINVAL; /* same return as native Linux */
+	}
+
+	return 0;
+}
+
+
+/*
+ * blkif_queue_request
+ *
+ * request block io
+ *
+ * id: for guest use only.
+ * operation: BLKIF_OP_{READ,WRITE,PROBE}
+ * buffer: buffer to read/write into. this should be a
+ *   virtual address in the guest os.
+ */
+static int blkif_queue_request(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+	unsigned long buffer_mfn;
+	struct blkif_request *ring_req;
+	struct bio *bio;
+	struct bio_vec *bvec;
+	int idx;
+	unsigned long id;
+	unsigned int fsect, lsect;
+	int ref;
+	grant_ref_t gref_head;
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+		return 1;
+
+	if (gnttab_alloc_grant_references(
+		BLKIF_MAX_SEGMENTS_PER_REQUEST, &gref_head) < 0) {
+		gnttab_request_free_callback(
+			&info->callback,
+			blkif_restart_queue_callback,
+			info,
+			BLKIF_MAX_SEGMENTS_PER_REQUEST);
+		return 1;
+	}
+
+	/* Fill out a communications ring structure. */
+	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	id = GET_ID_FROM_FREELIST(info);
+	info->shadow[id].request = (unsigned long)req;
+
+	ring_req->id = id;
+	ring_req->operation = rq_data_dir(req) ?
+		BLKIF_OP_WRITE : BLKIF_OP_READ;
+	ring_req->sector_number = (blkif_sector_t)req->sector;
+	ring_req->handle = info->handle;
+
+	ring_req->nr_segments = 0;
+	rq_for_each_bio (bio, req) {
+		bio_for_each_segment (bvec, bio, idx) {
+			BUG_ON(ring_req->nr_segments
+			       == BLKIF_MAX_SEGMENTS_PER_REQUEST);
+			buffer_mfn = page_to_phys(bvec->bv_page) >> PAGE_SHIFT;
+			fsect = bvec->bv_offset >> 9;
+			lsect = fsect + (bvec->bv_len >> 9) - 1;
+			/* install a grant reference. */
+			ref = gnttab_claim_grant_reference(&gref_head);
+			BUG_ON(ref == -ENOSPC);
+
+			gnttab_grant_foreign_access_ref(
+				ref,
+				info->xbdev->otherend_id,
+				buffer_mfn,
+				rq_data_dir(req) );
+
+			info->shadow[id].frame[ring_req->nr_segments] =
+				mfn_to_pfn(buffer_mfn);
+
+			ring_req->seg[ring_req->nr_segments] =
+				(struct blkif_request_segment) {
+					.gref       = ref,
+					.first_sect = fsect,
+					.last_sect  = lsect };
+
+			ring_req->nr_segments++;
+		}
+	}
+
+	info->ring.req_prod_pvt++;
+
+	/* Keep a private copy so we can reissue requests when recovering. */
+	info->shadow[id].req = *ring_req;
+
+	gnttab_free_grant_references(gref_head);
+
+	return 0;
+}
+
+/*
+ * do_blkif_request
+ *  read a block; request is in a request queue
+ */
+void do_blkif_request(request_queue_t *rq)
+{
+	struct blkfront_info *info = NULL;
+	struct request *req;
+	int queued;
+
+	DPRINTK("Entered do_blkif_request\n");
+
+	queued = 0;
+
+	while ((req = elv_next_request(rq)) != NULL) {
+		info = req->rq_disk->private_data;
+		if (!blk_fs_request(req)) {
+			end_request(req, 0);
+			continue;
+		}
+
+		if (RING_FULL(&info->ring))
+			goto wait;
+
+		DPRINTK("do_blk_req %p: cmd %p, sec %lx, "
+			"(%u/%li) buffer:%p [%s]\n",
+			req, req->cmd, req->sector, req->current_nr_sectors,
+			req->nr_sectors, req->buffer,
+			rq_data_dir(req) ? "write" : "read");
+
+
+		blkdev_dequeue_request(req);
+		if (blkif_queue_request(req)) {
+			blk_requeue_request(rq, req);
+		wait:
+			/* Avoid pointless unplugs. */
+			blk_stop_queue(rq);
+			break;
+		}
+
+		queued++;
+	}
+
+	if (queued != 0)
+		flush_requests(info);
+}
+
+
+static irqreturn_t blkif_int(int irq, void *dev_id, struct pt_regs *ptregs)
+{
+	struct request *req;
+	struct blkif_response *bret;
+	RING_IDX i, rp;
+	unsigned long flags;
+	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+
+	spin_lock_irqsave(&blkif_io_lock, flags);
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
+		spin_unlock_irqrestore(&blkif_io_lock, flags);
+		return IRQ_HANDLED;
+	}
+
+ again:
+	rp = info->ring.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to 'rp'. */
+
+	for (i = info->ring.rsp_cons; i != rp; i++) {
+		unsigned long id;
+		int ret;
+
+		bret = RING_GET_RESPONSE(&info->ring, i);
+		id   = bret->id;
+		req  = (struct request *)info->shadow[id].request;
+
+		blkif_completion(&info->shadow[id]);
+
+		ADD_ID_TO_FREELIST(info, id);
+
+		switch (bret->operation) {
+		case BLKIF_OP_READ:
+		case BLKIF_OP_WRITE:
+			if (unlikely(bret->status != BLKIF_RSP_OKAY))
+				DPRINTK("Bad return from blkdev data "
+					"request: %x\n", bret->status);
+
+			ret = end_that_request_first(
+				req, (bret->status == BLKIF_RSP_OKAY),
+				req->hard_nr_sectors);
+			BUG_ON(ret);
+			end_that_request_last(
+				req, (bret->status == BLKIF_RSP_OKAY));
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	info->ring.rsp_cons = i;
+
+	if (i != info->ring.req_prod_pvt) {
+		int more_to_do;
+		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		if (more_to_do)
+			goto again;
+	} else
+		info->ring.sring->rsp_event = i + 1;
+
+	kick_pending_request_queues(info);
+
+	spin_unlock_irqrestore(&blkif_io_lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static void blkif_free(struct blkfront_info *info, int suspend)
+{
+	/* Prevent new requests being issued until we fix things up. */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = suspend ?
+		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
+	spin_unlock_irq(&blkif_io_lock);
+
+	/* Free resources associated with old device channel. */
+	if (info->ring_ref != GRANT_INVALID_REF) {
+		gnttab_end_foreign_access(info->ring_ref, 0,
+					  (unsigned long)info->ring.sring);
+		info->ring_ref = GRANT_INVALID_REF;
+		info->ring.sring = NULL;
+	}
+	if (info->irq)
+		unbind_from_irqhandler(info->irq, info);
+	info->evtchn = info->irq = 0;
+
+}
+
+static void blkif_completion(struct blk_shadow *s)
+{
+	int i;
+	for (i = 0; i < s->req.nr_segments; i++)
+		gnttab_end_foreign_access(s->req.seg[i].gref, 0, 0UL);
+}
+
+static void blkif_recover(struct blkfront_info *info)
+{
+	int i;
+	struct blkif_request *req;
+	struct blk_shadow *copy;
+	int j;
+
+	/* Stage 1: Make a safe copy of the shadow state. */
+	copy = kmalloc(sizeof(info->shadow), GFP_KERNEL | __GFP_NOFAIL);
+	memcpy(copy, info->shadow, sizeof(info->shadow));
+
+	/* Stage 2: Set up free list. */
+	memset(&info->shadow, 0, sizeof(info->shadow));
+	for (i = 0; i < BLK_RING_SIZE; i++)
+		info->shadow[i].req.id = i+1;
+	info->shadow_free = info->ring.req_prod_pvt;
+	info->shadow[BLK_RING_SIZE-1].req.id = 0x0fffffff;
+
+	/* Stage 3: Find pending requests and requeue them. */
+	for (i = 0; i < BLK_RING_SIZE; i++) {
+		/* Not in use? */
+		if (copy[i].request == 0)
+			continue;
+
+		/* Grab a request slot and copy shadow state into it. */
+		req = RING_GET_REQUEST(
+			&info->ring, info->ring.req_prod_pvt);
+		*req = copy[i].req;
+
+		/* We get a new request id, and must reset the shadow state. */
+		req->id = GET_ID_FROM_FREELIST(info);
+		memcpy(&info->shadow[req->id], &copy[i], sizeof(copy[i]));
+
+		/* Rewrite any grant references invalidated by susp/resume. */
+		for (j = 0; j < req->nr_segments; j++)
+			gnttab_grant_foreign_access_ref(
+				req->seg[j].gref,
+				info->xbdev->otherend_id,
+				pfn_to_mfn(info->shadow[req->id].frame[j]),
+				rq_data_dir(
+					(struct request *)
+					info->shadow[req->id].request));
+		info->shadow[req->id].req = *req;
+
+		info->ring.req_prod_pvt++;
+	}
+
+	kfree(copy);
+
+	(void)xenbus_switch_state(info->xbdev, XBT_NULL, XenbusStateConnected);
+
+	/* Now safe for us to use the shared ring */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = BLKIF_STATE_CONNECTED;
+	spin_unlock_irq(&blkif_io_lock);
+
+	/* Send off requeued requests */
+	flush_requests(info);
+
+	/* Kick any other new requests queued since we resumed */
+	spin_lock_irq(&blkif_io_lock);
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+}
+
+
+/* ** Driver Registration ** */
+
+
+static struct xenbus_device_id blkfront_ids[] = {
+	{ "vbd" },
+	{ "" }
+};
+
+
+static struct xenbus_driver blkfront = {
+	.name = "vbd",
+	.owner = THIS_MODULE,
+	.ids = blkfront_ids,
+	.probe = blkfront_probe,
+	.remove = blkfront_remove,
+	.resume = blkfront_resume,
+	.otherend_changed = backend_changed,
+};
+
+
+static int __init xlblk_init(void)
+{
+	if (xen_init() < 0)
+		return -ENODEV;
+
+	return xenbus_register_frontend(&blkfront);
+}
+module_init(xlblk_init);
+
+
+static void xlblk_exit(void)
+{
+	return xenbus_unregister_driver(&blkfront);
+}
+module_exit(xlblk_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/block.h
@@ -0,0 +1,152 @@
+/******************************************************************************
+ * block.h
+ * 
+ * Shared definitions between all levels of XenLinux Virtual block devices.
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004-2005, Christian Limpach
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_DRIVERS_BLOCK_H__
+#define __XEN_DRIVERS_BLOCK_H__
+
+#include <linux/config.h>
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/hdreg.h>
+#include <linux/blkdev.h>
+#include <linux/major.h>
+#include <linux/devfs_fs_kernel.h>
+#include <asm/hypervisor.h>
+#include <xen/xenbus.h>
+#include <xen/gnttab.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/io/blkif.h>
+#include <xen/interface/io/ring.h>
+#include <asm/io.h>
+#include <asm/atomic.h>
+#include <asm/uaccess.h>
+
+#if 1
+#define IPRINTK(fmt, args...) \
+    printk(KERN_INFO "xen_blk: " fmt, ##args)
+#else
+#define IPRINTK(fmt, args...) ((void)0)
+#endif
+
+#if 1
+#define WPRINTK(fmt, args...) \
+    printk(KERN_WARNING "xen_blk: " fmt, ##args)
+#else
+#define WPRINTK(fmt, args...) ((void)0)
+#endif
+
+#define DPRINTK(_f, _a...) pr_debug(_f, ## _a)
+
+#if 0
+#define DPRINTK_IOCTL(_f, _a...) printk(KERN_ALERT _f, ## _a)
+#else
+#define DPRINTK_IOCTL(_f, _a...) ((void)0)
+#endif
+
+struct xlbd_type_info
+{
+	int partn_shift;
+	int disks_per_major;
+	char *devname;
+	char *diskname;
+};
+
+struct xlbd_major_info
+{
+	int major;
+	int index;
+	int usage;
+	struct xlbd_type_info *type;
+};
+
+struct blk_shadow {
+	struct blkif_request req;
+	unsigned long request;
+	unsigned long frame[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+};
+
+#define BLK_RING_SIZE __RING_SIZE((struct blkif_sring *)0, PAGE_SIZE)
+
+/*
+ * We have one of these per vbd, whether ide, scsi or 'other'.  They
+ * hang in private_data off the gendisk structure. We may end up
+ * putting all kinds of interesting stuff here :-)
+ */
+struct blkfront_info
+{
+	struct xenbus_device *xbdev;
+	dev_t dev;
+ 	struct gendisk *gd;
+	int vdevice;
+	blkif_vdev_t handle;
+	int connected;
+	int ring_ref;
+	struct blkif_front_ring ring;
+	unsigned int evtchn, irq;
+	struct xlbd_major_info *mi;
+	request_queue_t *rq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_RING_SIZE];
+	unsigned long shadow_free;
+
+	/**
+	 * The number of people holding this device open.  We won't allow a
+	 * hot-unplug unless this is 0.
+	 */
+	int users;
+};
+
+extern spinlock_t blkif_io_lock;
+
+extern int blkif_open(struct inode *inode, struct file *filep);
+extern int blkif_release(struct inode *inode, struct file *filep);
+extern int blkif_ioctl(struct inode *inode, struct file *filep,
+                       unsigned command, unsigned long argument);
+extern int blkif_check(dev_t dev);
+extern int blkif_revalidate(dev_t dev);
+extern void do_blkif_request (request_queue_t *rq);
+
+/* Virtual block device subsystem. */
+/* Note that xlvbd_add doesn't call add_disk for you: you're expected
+   to call add_disk on info->gd once the disk is properly connected
+   up. */
+int xlvbd_add(blkif_sector_t capacity, int device,
+	      u16 vdisk_info, u16 sector_size, struct blkfront_info *info);
+void xlvbd_del(struct blkfront_info *info);
+
+#endif /* __XEN_DRIVERS_BLOCK_H__ */
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/vbd.c
@@ -0,0 +1,316 @@
+/******************************************************************************
+ * vbd.c
+ * 
+ * XenLinux virtual block device driver (xvd).
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004-2005, Christian Limpach
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "block.h"
+#include <linux/blkdev.h>
+#include <linux/list.h>
+
+#define BLKIF_MAJOR(dev) ((dev)>>8)
+#define BLKIF_MINOR(dev) ((dev) & 0xff)
+
+/*
+ * For convenience we distinguish between ide, scsi and 'other' (i.e.,
+ * potentially combinations of the two) in the naming scheme and in a few other
+ * places.
+ */
+
+#define NUM_IDE_MAJORS 10
+#define NUM_SCSI_MAJORS 9
+#define NUM_VBD_MAJORS 1
+
+static struct xlbd_type_info xlbd_ide_type = {
+	.partn_shift = 6,
+	.disks_per_major = 2,
+	.devname = "ide",
+	.diskname = "hd",
+};
+
+static struct xlbd_type_info xlbd_scsi_type = {
+	.partn_shift = 4,
+	.disks_per_major = 16,
+	.devname = "sd",
+	.diskname = "sd",
+};
+
+static struct xlbd_type_info xlbd_vbd_type = {
+	.partn_shift = 4,
+	.disks_per_major = 16,
+	.devname = "xvd",
+	.diskname = "xvd",
+};
+
+static struct xlbd_major_info *major_info[NUM_IDE_MAJORS + NUM_SCSI_MAJORS +
+					 NUM_VBD_MAJORS];
+
+#define XLBD_MAJOR_IDE_START	0
+#define XLBD_MAJOR_SCSI_START	(NUM_IDE_MAJORS)
+#define XLBD_MAJOR_VBD_START	(NUM_IDE_MAJORS + NUM_SCSI_MAJORS)
+
+#define XLBD_MAJOR_IDE_RANGE	XLBD_MAJOR_IDE_START ... XLBD_MAJOR_SCSI_START - 1
+#define XLBD_MAJOR_SCSI_RANGE	XLBD_MAJOR_SCSI_START ... XLBD_MAJOR_VBD_START - 1
+#define XLBD_MAJOR_VBD_RANGE	XLBD_MAJOR_VBD_START ... XLBD_MAJOR_VBD_START + NUM_VBD_MAJORS - 1
+
+/* Information about our VBDs. */
+#define MAX_VBDS 64
+static LIST_HEAD(vbds_list);
+
+static struct block_device_operations xlvbd_block_fops =
+{
+	.owner = THIS_MODULE,
+	.open = blkif_open,
+	.release = blkif_release,
+	.ioctl  = blkif_ioctl,
+};
+
+spinlock_t blkif_io_lock = SPIN_LOCK_UNLOCKED;
+
+static struct xlbd_major_info *
+xlbd_alloc_major_info(int major, int minor, int index)
+{
+	struct xlbd_major_info *ptr;
+
+	ptr = kmalloc(sizeof(struct xlbd_major_info), GFP_KERNEL);
+	if (ptr == NULL)
+		return NULL;
+
+	memset(ptr, 0, sizeof(struct xlbd_major_info));
+
+	ptr->major = major;
+
+	switch (index) {
+	case XLBD_MAJOR_IDE_RANGE:
+		ptr->type = &xlbd_ide_type;
+		ptr->index = index - XLBD_MAJOR_IDE_START;
+		break;
+	case XLBD_MAJOR_SCSI_RANGE:
+		ptr->type = &xlbd_scsi_type;
+		ptr->index = index - XLBD_MAJOR_SCSI_START;
+		break;
+	case XLBD_MAJOR_VBD_RANGE:
+		ptr->type = &xlbd_vbd_type;
+		ptr->index = index - XLBD_MAJOR_VBD_START;
+		break;
+	}
+
+	printk("Registering block device major %i\n", ptr->major);
+	if (register_blkdev(ptr->major, ptr->type->devname)) {
+		WPRINTK("can't get major %d with name %s\n",
+			ptr->major, ptr->type->devname);
+		kfree(ptr);
+		return NULL;
+	}
+
+	devfs_mk_dir(ptr->type->devname);
+	major_info[index] = ptr;
+	return ptr;
+}
+
+static struct xlbd_major_info *
+xlbd_get_major_info(int vdevice)
+{
+	struct xlbd_major_info *mi;
+	int major, minor, index;
+
+	major = BLKIF_MAJOR(vdevice);
+	minor = BLKIF_MINOR(vdevice);
+
+	switch (major) {
+	case IDE0_MAJOR: index = 0; break;
+	case IDE1_MAJOR: index = 1; break;
+	case IDE2_MAJOR: index = 2; break;
+	case IDE3_MAJOR: index = 3; break;
+	case IDE4_MAJOR: index = 4; break;
+	case IDE5_MAJOR: index = 5; break;
+	case IDE6_MAJOR: index = 6; break;
+	case IDE7_MAJOR: index = 7; break;
+	case IDE8_MAJOR: index = 8; break;
+	case IDE9_MAJOR: index = 9; break;
+	case SCSI_DISK0_MAJOR: index = 10; break;
+	case SCSI_DISK1_MAJOR ... SCSI_DISK7_MAJOR:
+		index = 11 + major - SCSI_DISK1_MAJOR;
+		break;
+	case SCSI_CDROM_MAJOR: index = 18; break;
+	default: index = 19; break;
+	}
+
+	mi = ((major_info[index] != NULL) ? major_info[index] :
+	      xlbd_alloc_major_info(major, minor, index));
+	if (mi)
+		mi->usage++;
+	return mi;
+}
+
+static void
+xlbd_put_major_info(struct xlbd_major_info *mi)
+{
+	mi->usage--;
+	/* XXX: release major if 0 */
+}
+
+static int
+xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
+{
+	request_queue_t *rq;
+
+	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
+	if (rq == NULL)
+		return -1;
+
+	elevator_init(rq, "noop");
+
+	/* Hard sector size and max sectors impersonate the equiv. hardware. */
+	blk_queue_hardsect_size(rq, sector_size);
+	blk_queue_max_sectors(rq, 512);
+
+	/* Each segment in a request is up to an aligned page in size. */
+	blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
+	blk_queue_max_segment_size(rq, PAGE_SIZE);
+
+	/* Ensure a merged request will fit in a single I/O ring slot. */
+	blk_queue_max_phys_segments(rq, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	blk_queue_max_hw_segments(rq, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+
+	/* Make sure buffer addresses are sector-aligned. */
+	blk_queue_dma_alignment(rq, 511);
+
+	gd->queue = rq;
+
+	return 0;
+}
+
+static int
+xlvbd_alloc_gendisk(int minor, blkif_sector_t capacity, int vdevice,
+		    u16 vdisk_info, u16 sector_size,
+		    struct blkfront_info *info)
+{
+	struct gendisk *gd;
+	struct xlbd_major_info *mi;
+	int nr_minors = 1;
+	int err = -ENODEV;
+
+	BUG_ON(info->gd != NULL);
+	BUG_ON(info->mi != NULL);
+	BUG_ON(info->rq != NULL);
+
+	mi = xlbd_get_major_info(vdevice);
+	if (mi == NULL)
+		goto out;
+	info->mi = mi;
+
+	if ((minor & ((1 << mi->type->partn_shift) - 1)) == 0)
+		nr_minors = 1 << mi->type->partn_shift;
+
+	gd = alloc_disk(nr_minors);
+	if (gd == NULL)
+		goto out;
+
+	if (nr_minors > 1)
+		sprintf(gd->disk_name, "%s%c", mi->type->diskname,
+			'a' + mi->index * mi->type->disks_per_major +
+			(minor >> mi->type->partn_shift));
+	else
+		sprintf(gd->disk_name, "%s%c%d", mi->type->diskname,
+			'a' + mi->index * mi->type->disks_per_major +
+			(minor >> mi->type->partn_shift),
+			minor & ((1 << mi->type->partn_shift) - 1));
+
+	gd->major = mi->major;
+	gd->first_minor = minor;
+	gd->fops = &xlvbd_block_fops;
+	gd->private_data = info;
+	gd->driverfs_dev = &(info->xbdev->dev);
+	set_capacity(gd, capacity);
+
+	if (xlvbd_init_blk_queue(gd, sector_size)) {
+		del_gendisk(gd);
+		goto out;
+	}
+
+	info->rq = gd->queue;
+
+	if (vdisk_info & VDISK_READONLY)
+		set_disk_ro(gd, 1);
+
+	if (vdisk_info & VDISK_REMOVABLE)
+		gd->flags |= GENHD_FL_REMOVABLE;
+
+	if (vdisk_info & VDISK_CDROM)
+		gd->flags |= GENHD_FL_CD;
+
+	info->gd = gd;
+
+	return 0;
+
+ out:
+	if (mi)
+		xlbd_put_major_info(mi);
+	info->mi = NULL;
+	return err;
+}
+
+int
+xlvbd_add(blkif_sector_t capacity, int vdevice, u16 vdisk_info,
+	  u16 sector_size, struct blkfront_info *info)
+{
+	struct block_device *bd;
+	int err = 0;
+
+	info->dev = MKDEV(BLKIF_MAJOR(vdevice), BLKIF_MINOR(vdevice));
+
+	bd = bdget(info->dev);
+	if (bd == NULL)
+		return -ENODEV;
+
+	err = xlvbd_alloc_gendisk(BLKIF_MINOR(vdevice), capacity, vdevice,
+				  vdisk_info, sector_size, info);
+
+	bdput(bd);
+	return err;
+}
+
+void
+xlvbd_del(struct blkfront_info *info)
+{
+	if (info->mi == NULL)
+		return;
+
+	BUG_ON(info->gd == NULL);
+	del_gendisk(info->gd);
+	put_disk(info->gd);
+	info->gd = NULL;
+
+	xlbd_put_major_info(info->mi);
+	info->mi = NULL;
+
+	BUG_ON(info->rq == NULL);
+	blk_cleanup_queue(info->rq);
+	info->rq = NULL;
+}

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-05-09 12:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-22 16:52 [RFC PATCH 35/35] Add Xen virtual block device driver Ian Pratt
2006-03-22 17:09 ` Anthony Liguori
2006-03-22 23:09 ` Jeff Garzik
2006-03-24 12:17   ` Alan Cox
2006-03-24 12:38     ` Jeff Garzik
2006-03-24 13:37       ` Jeff Garzik
2006-03-24 13:40         ` Arjan van de Ven
2006-03-24 13:50           ` Jeff Garzik
2006-03-24 15:33             ` Dave C Boutcher
2006-03-24 19:04               ` Mike Christie
2006-03-24 19:19                 ` Dave C Boutcher
2006-03-25  0:32                   ` FUJITA Tomonori
2006-03-25  0:47                   ` Roland Dreier
2006-03-24 15:55       ` Alan Cox
2006-03-25 10:03         ` Rusty Russell
2006-03-27 10:14   ` Peter Chubb
2006-03-23  8:19 ` Arjan van de Ven
2006-03-23  9:34   ` Keir Fraser
2006-03-23  9:41     ` Arjan van de Ven
2006-03-23  9:42     ` Arjan van de Ven
  -- strict thread matches above, loose matches on Subject: below --
2006-05-09  8:49 [RFC PATCH 00/35] Xen i386 paravirtualization support Chris Wright
2006-05-09  7:00 ` [RFC PATCH 35/35] Add Xen virtual block device driver Chris Wright
2006-05-09 12:01   ` Christoph Hellwig
2006-03-22  6:30 [RFC PATCH 00/35] Xen i386 paravirtualization support Chris Wright
2006-03-22  6:31 ` [RFC PATCH 35/35] Add Xen virtual block device driver Chris Wright
2006-03-22 16:39   ` Anthony Liguori
2006-03-22 16:54     ` Christoph Hellwig
2006-03-27  8:42     ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).