All of lore.kernel.org
 help / color / mirror / Atom feed
* userspace block backend / gntdev problems
@ 2008-01-04 13:48 Gerd Hoffmann
  2008-01-04 14:50 ` Derek Murray
  0 siblings, 1 reply; 8+ messages in thread
From: Gerd Hoffmann @ 2008-01-04 13:48 UTC (permalink / raw)
  To: Derek Murray; +Cc: Xen Development Mailing List

  Hi,

I'm running into trouble over and over again with my userspace block
backend daemon (blkbackd) developed as part of the xenner project.

First problem is the fixed limit of 128 slots.  The frontend submits up
to 32 requests, with up to 11 grants each.  With the shared ring this
sums up to 353 grants per block device.  When is blkbackd running in aio
mode, thus many requests are in flight at the same time and thus also
many grants mapped at the same time, the 128 limit is easily reached.  I
don't even need to stress the disk with bonnie or something, just
booting the virtual machine is enougth.  Any chance replace the
fix-sized array with a list to remove that hard-coded limit?  Or at
least raise the limit to -- say -- 1024 grants?

Second problem is that batched grant mappings (using
xc_gnttab_map_grant_refs) don't work reliable.  Symtoms I see are random
failures with ENOMEM for no obvious reason (128 grant limit is *far*
away).  Also host kernel crashes (kernel 2.6.21-2952.fc8xen).

When using xc_gnttab_map_grant_ref only (no batching) and limiting the
number requests in flight to 8 (so we stay below the 128 grant limit)
everything works nicely though.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: userspace block backend / gntdev problems
  2008-01-04 13:48 userspace block backend / gntdev problems Gerd Hoffmann
@ 2008-01-04 14:50 ` Derek Murray
  2008-01-04 15:24   ` Gerd Hoffmann
  2008-01-21 18:41   ` Markus Armbruster
  0 siblings, 2 replies; 8+ messages in thread
From: Derek Murray @ 2008-01-04 14:50 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Xen Development Mailing List

Hi Gerd,

On 4 Jan 2008, at 13:48, Gerd Hoffmann wrote:
> First problem is the fixed limit of 128 slots.  The frontend  
> submits up
> to 32 requests, with up to 11 grants each.  With the shared ring this
> sums up to 353 grants per block device.  When is blkbackd running  
> in aio
> mode, thus many requests are in flight at the same time and thus also
> many grants mapped at the same time, the 128 limit is easily  
> reached.  I
> don't even need to stress the disk with bonnie or something, just
> booting the virtual machine is enougth.  Any chance replace the
> fix-sized array with a list to remove that hard-coded limit?  Or at
> least raise the limit to -- say -- 1024 grants?

The 128-grant limit is fairly arbitrary, and I wanted to see how  
people were using gntdev before changing this. The reason for using a  
fixed-size array is that it gives us O(1)-time mapping and unmapping  
of single grants, which I anticipated would be the most frequently- 
used case. I'll prepare a patch that enables the configuration of  
this limit when the device is opened.

> Second problem is that batched grant mappings (using
> xc_gnttab_map_grant_refs) don't work reliable.  Symtoms I see are  
> random
> failures with ENOMEM for no obvious reason (128 grant limit is *far*
> away).

If it's failing with ENOMEM, a possible reason is that the address  
space for mapping grants within gntdev (the array I mentioned above)  
is becoming fragmented. Are you combining the mapping of single  
grants and batches within the same gntdev instance? A possible  
workaround would be to use separate gntdev instances for mapping the  
single grants, and for mapping the batches. That way, the  
fragmentation should not occur, if the batches are all of the same size.

> Also host kernel crashes (kernel 2.6.21-2952.fc8xen).

When does this happen? Could you post the kernel OOPS?

> When using xc_gnttab_map_grant_ref only (no batching) and limiting the
> number requests in flight to 8 (so we stay below the 128 grant limit)
> everything works nicely though.

That's good to know, thanks!

Regards,

Derek Murray.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: userspace block backend / gntdev problems
  2008-01-04 14:50 ` Derek Murray
@ 2008-01-04 15:24   ` Gerd Hoffmann
  2008-01-21 18:41   ` Markus Armbruster
  1 sibling, 0 replies; 8+ messages in thread
From: Gerd Hoffmann @ 2008-01-04 15:24 UTC (permalink / raw)
  To: Derek Murray; +Cc: Xen Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

Derek Murray wrote:
> The 128-grant limit is fairly arbitrary, and I wanted to see how people
> were using gntdev before changing this. The reason for using a
> fixed-size array is that it gives us O(1)-time mapping and unmapping of
> single grants, which I anticipated would be the most frequently-used
> case.

Ok, try a hash instead of a list then ;)

>> Second problem is that batched grant mappings (using
>> xc_gnttab_map_grant_refs) don't work reliable.  Symtoms I see are random
>> failures with ENOMEM for no obvious reason (128 grant limit is *far*
>> away).
> 
> If it's failing with ENOMEM, a possible reason is that the address space
> for mapping grants within gntdev (the array I mentioned above) is
> becoming fragmented. Are you combining the mapping of single grants and
> batches within the same gntdev instance?

Yes, I'm mixing up single and batched maps (the later can have different
sizes too, depending on the requests coming in, in the 1 -> 11 range).
But I've seen ENOMEM failures with *only* the shared ring being mapped,
i.e. one of 128 slots being used.  That can't be fragmentation ...

>> Also host kernel crashes (kernel 2.6.21-2952.fc8xen).
> 
> When does this happen? Could you post the kernel OOPS?

Dunno what exactly triggers it.  Oops attached.

cheers,
  Gerd



[-- Attachment #2: gntdev-oops --]
[-- Type: text/plain, Size: 2983 bytes --]

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
0143e000 -> *pde = 00000000:5016e001
2c76e000 -> *pme = 00000000:00000000
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/xen-backend/vbd-1-51712/statistics/wr_sect
Modules linked in: ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) nf_conntrack_ipv4(U) xt_state(U) nf_conntrack(U) nfnetlink(U) ipt_REJECT(U) xt_tcpudp(U) iptable_filter(U) ip_tables(U) x_tables(U) bridge(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) autofs4(U) sunrpc(U) ipv6(U) ext2(U) loop(U) dm_multipath(U) netbk(U) blkbk(U) 8250_pnp(U) 8250_pci(U) snd_hda_intel(U) snd_hda_codec(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) i2c_i801(U) parport_pc(U) snd_timer(U) i2c_core(U) snd(U) parport(U) 8250(U) e1000(U) pcspkr(U) soundcore(U) serio_raw(U) serial_core(U) ata_generic(U) snd_page_alloc(U) sr_mod(U) sg(U) cdrom(U) ata_piix(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) ahci(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) mbcache(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
CPU:    0
EIP:    0061:[<c10e85ba>]    Not tainted VLI
EFLAGS: 00010282   (2.6.21-2952.fc8xen #1)
EIP is at __sync_single+0x1c/0x197
eax: 00000000   ebx: 0005a6ca   ecx: 00000002   edx: 00000000
esi: 00000000   edi: 00000000   ebp: 00000400   esp: c136ce80
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0069
Process swapper (pid: 0, ti=c136c000 task=c12d4260 task.ti=c1314000)
Stack: 00000002 ed6a1000 c1c5d100 c1c5d100 c136cee8 c1c5d5c0 00000000 c102b7a1 
       0005a6ca 00000000 00000000 ed6a1000 c10e87db 00000400 00000002 00000400 
       00000000 00000400 00000000 ec7fb480 c10e8a3e 00000002 00000001 c1d87848 
Call Trace:
 [<c102b7a1>] lock_timer_base+0x19/0x35
 [<c10e87db>] unmap_single+0x55/0xd2
 [<c10e8a3e>] swiotlb_unmap_sg+0x103/0x120
 [<ee107fec>] ata_sg_clean+0x103/0x1b9 [libata]
 [<ee1080f0>] __ata_qc_complete+0x4e/0x92 [libata]
 [<c1009859>] timer_interrupt+0x5a4/0x5b7
 [<ee10bc70>] ata_qc_complete_multiple+0x87/0x9d [libata]
 [<ee0e5f22>] ahci_interrupt+0x2ff/0x4bd [ahci]
 [<c104a53a>] handle_IRQ_event+0x36/0x6e
 [<c104b9f2>] handle_level_irq+0x81/0xc7
 [<c104b971>] handle_level_irq+0x0/0xc7
 [<c100719a>] do_IRQ+0xac/0xd2
 [<c1036cb6>] ktime_get+0xf/0x2b
 [<c114f076>] evtchn_do_upcall+0x82/0xdb
 [<c100585e>] hypervisor_callback+0x46/0x4e
 [<c1008840>] raw_safe_halt+0xb3/0xd5
 [<c100452e>] xen_idle+0x31/0x5c
 [<c1003435>] cpu_idle+0xa3/0xbc
 [<c1319be4>] start_kernel+0x481/0x489
 [<c131925a>] unknown_bootoption+0x0/0x202
 =======================
Code: c8 09 d0 5a 0f 94 c0 59 0f b6 c0 5b 5e 5f c3 55 57 56 89 c6 53 83 ec 20 89 4c 24 04 8b 4c 24 38 89 54 24 18 8b 6c 24 34 89 0c 24 <8b> 08 c1 e9 1e 69 c9 80 12 00 00 81 c1 00 9e 2d c1 8b 99 0c 12 
EIP: [<c10e85ba>] __sync_single+0x1c/0x197 SS:ESP 0069:c136ce80
Kernel panic - not syncing: Fatal exception in interrupt
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.


[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: userspace block backend / gntdev problems
  2008-01-04 14:50 ` Derek Murray
  2008-01-04 15:24   ` Gerd Hoffmann
@ 2008-01-21 18:41   ` Markus Armbruster
  2008-01-25 23:29     ` Pat Campbell
  1 sibling, 1 reply; 8+ messages in thread
From: Markus Armbruster @ 2008-01-21 18:41 UTC (permalink / raw)
  To: Derek Murray; +Cc: Xen Development Mailing List, Gerd Hoffmann

Derek Murray <Derek.Murray@cl.cam.ac.uk> writes:

> Hi Gerd,
>
> On 4 Jan 2008, at 13:48, Gerd Hoffmann wrote:
>> First problem is the fixed limit of 128 slots.  The frontend
>> submits up
>> to 32 requests, with up to 11 grants each.  With the shared ring this
>> sums up to 353 grants per block device.  When is blkbackd running
>> in aio
>> mode, thus many requests are in flight at the same time and thus also
>> many grants mapped at the same time, the 128 limit is easily
>> reached.  I
>> don't even need to stress the disk with bonnie or something, just
>> booting the virtual machine is enougth.  Any chance replace the
>> fix-sized array with a list to remove that hard-coded limit?  Or at
>> least raise the limit to -- say -- 1024 grants?
>
> The 128-grant limit is fairly arbitrary, and I wanted to see how
> people were using gntdev before changing this. The reason for using a
> fixed-size array is that it gives us O(1)-time mapping and unmapping
> of single grants, which I anticipated would be the most frequently- 
> used case. I'll prepare a patch that enables the configuration of
> this limit when the device is opened.

Any news on this?  I'd like to try converting the PV framebuffer to
use grants.  I need to map ~2000-5000 pages, depending on the pvfb's
resolution.

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: userspace block backend / gntdev problems
  2008-01-21 18:41   ` Markus Armbruster
@ 2008-01-25 23:29     ` Pat Campbell
  2008-01-26  8:41       ` Markus Armbruster
  0 siblings, 1 reply; 8+ messages in thread
From: Pat Campbell @ 2008-01-25 23:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Derek Murray, Xen Development Mailing List, Gerd Hoffmann

Markus Armbruster wrote:
> Derek Murray <Derek.Murray@cl.cam.ac.uk> writes:
>
>   
>> Hi Gerd,
>>
>> On 4 Jan 2008, at 13:48, Gerd Hoffmann wrote:
>>     
>>> First problem is the fixed limit of 128 slots.  The frontend
>>> submits up
>>> to 32 requests, with up to 11 grants each.  With the shared ring this
>>> sums up to 353 grants per block device.  When is blkbackd running
>>> in aio
>>> mode, thus many requests are in flight at the same time and thus also
>>> many grants mapped at the same time, the 128 limit is easily
>>> reached.  I
>>> don't even need to stress the disk with bonnie or something, just
>>> booting the virtual machine is enougth.  Any chance replace the
>>> fix-sized array with a list to remove that hard-coded limit?  Or at
>>> least raise the limit to -- say -- 1024 grants?
>>>       
>> The 128-grant limit is fairly arbitrary, and I wanted to see how
>> people were using gntdev before changing this. The reason for using a
>> fixed-size array is that it gives us O(1)-time mapping and unmapping
>> of single grants, which I anticipated would be the most frequently- 
>> used case. I'll prepare a patch that enables the configuration of
>> this limit when the device is opened.
>>     
>
> Any news on this?  I'd like to try converting the PV framebuffer to
> use grants.  I need to map ~2000-5000 pages, depending on the pvfb's
> resolution.
>
> [...]
>   
In my latest post on "Dynamic modes support for PV xenfb" I am using
grants to map an extended framebuffer. I have a single grant ref that
points to 10 other refs. The other refs contain MFNs. Same technique as
the current framebuffer pd array but avoids the 64bit long issue. Kind
of a hybrid approach. I am able to map a 22MB framebuffer when running a
64 bit guest and 44MB when running a 32 bit guest. When the backend is
done with the mapping it sends a message to the frontend to free up the
refs.

I did try to map the whole framebuffers via grants, failed. Like you say
you need a whole bunch of them.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: userspace block backend / gntdev problems
  2008-01-25 23:29     ` Pat Campbell
@ 2008-01-26  8:41       ` Markus Armbruster
  2008-01-26  8:48         ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: Markus Armbruster @ 2008-01-26  8:41 UTC (permalink / raw)
  To: Pat Campbell; +Cc: Derek Murray, Xen Development Mailing List, Gerd Hoffmann

Pat Campbell <plc@novell.com> writes:

[...]
> In my latest post on "Dynamic modes support for PV xenfb" I am using
> grants to map an extended framebuffer. I have a single grant ref that
> points to 10 other refs. The other refs contain MFNs. Same technique as
> the current framebuffer pd array but avoids the 64bit long issue. Kind
> of a hybrid approach. I am able to map a 22MB framebuffer when running a
> 64 bit guest and 44MB when running a 32 bit guest. When the backend is
> done with the mapping it sends a message to the frontend to free up the
> refs.

Uhm, I fear I didn't get the advantage of your hybrid approach.  Could
you explain?

> I did try to map the whole framebuffers via grants, failed. Like you say
> you need a whole bunch of them.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: userspace block backend / gntdev problems
  2008-01-26  8:41       ` Markus Armbruster
@ 2008-01-26  8:48         ` Keir Fraser
  2008-01-28  0:40           ` Pat Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2008-01-26  8:48 UTC (permalink / raw)
  To: Markus Armbruster, Pat Campbell
  Cc: Derek Murray, Xen Development Mailing List, Gerd Hoffmann

On 26/1/08 08:41, "Markus Armbruster" <armbru@redhat.com> wrote:

>> In my latest post on "Dynamic modes support for PV xenfb" I am using
>> grants to map an extended framebuffer. I have a single grant ref that
>> points to 10 other refs. The other refs contain MFNs. Same technique as
>> the current framebuffer pd array but avoids the 64bit long issue. Kind
>> of a hybrid approach. I am able to map a 22MB framebuffer when running a
>> 64 bit guest and 44MB when running a 32 bit guest. When the backend is
>> done with the mapping it sends a message to the frontend to free up the
>> refs.
> 
> Uhm, I fear I didn't get the advantage of your hybrid approach.  Could
> you explain?

Presumably it allows creation of huge framebuffers without using up lots of
grants, or slots in accounting tables that Xen maintains. Given that those
tables can dynamically grow, I'm not sure how useful the two-level grant
table would be.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: userspace block backend / gntdev problems
  2008-01-26  8:48         ` Keir Fraser
@ 2008-01-28  0:40           ` Pat Campbell
  0 siblings, 0 replies; 8+ messages in thread
From: Pat Campbell @ 2008-01-28  0:40 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Derek Murray, Xen Development Mailing List, Markus Armbruster,
	Gerd Hoffmann

Keir Fraser wrote:
> On 26/1/08 08:41, "Markus Armbruster" <armbru@redhat.com> wrote:
>
>   
>>> In my latest post on "Dynamic modes support for PV xenfb" I am using
>>> grants to map an extended framebuffer. I have a single grant ref that
>>> points to 10 other refs. The other refs contain MFNs. Same technique as
>>> the current framebuffer pd array but avoids the 64bit long issue. Kind
>>> of a hybrid approach. I am able to map a 22MB framebuffer when running a
>>> 64 bit guest and 44MB when running a 32 bit guest. When the backend is
>>> done with the mapping it sends a message to the frontend to free up the
>>> refs.
>>>       
>> Uhm, I fear I didn't get the advantage of your hybrid approach.  Could
>> you explain?
>>     
>
> Presumably it allows creation of huge framebuffers without using up lots of
> grants, or slots in accounting tables that Xen maintains. Given that those
> tables can dynamically grow, I'm not sure how useful the two-level grant
> table would be.
>
>  -- Keir
>
>
>   
Well, turns out my email did not really get sent.  Keir is right, it is
a two level grant table.  Solution might not be useful in the general
case but for this device I think it fits the bill. Existing xenfb code
is already doing a two level table which has to be maintained for
backward compatibility reasons, might as well be consistent.  Also by
using a two level grant table we don't have to extend the event
structure which might be a compatibility issue.

I will get my patches sent up for your review directly.

Pat

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-01-28  0:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-04 13:48 userspace block backend / gntdev problems Gerd Hoffmann
2008-01-04 14:50 ` Derek Murray
2008-01-04 15:24   ` Gerd Hoffmann
2008-01-21 18:41   ` Markus Armbruster
2008-01-25 23:29     ` Pat Campbell
2008-01-26  8:41       ` Markus Armbruster
2008-01-26  8:48         ` Keir Fraser
2008-01-28  0:40           ` Pat Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.