* Next steps with pv_ops for Xen @ 2007-11-21 22:05 Stephen C. Tweedie 2007-11-21 23:12 ` Jeremy Fitzhardinge 2007-12-03 12:54 ` Gerd Hoffmann 0 siblings, 2 replies; 57+ messages in thread From: Stephen C. Tweedie @ 2007-11-21 22:05 UTC (permalink / raw) To: xen-devel, virtualization Cc: Jeremy Fitzhardinge, Eduardo Habkost, Juan Quintela, Stephen Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright Hi all, I've been looking at the next steps to try to get Xen running fully on top of pv_ops. To that end, I've (just) started looking at one of the next major jobs --- i686 dom0 on pv_ops. There are still a number of things needing done to reach parity with xen-unstable: x86_64 xen on pv_ops Paravirt framebuffer/keyboard CPU hotplug Balloon kexec driver domains but it looks like these can largely proceed in parallel if desired. My short-term goal with this is simply to come up with a first-pass merge of the linux-2.6.18-xen.hg dom0 support into the current kernel.org tree's pv_ops support. No major refactoring in the first pass, but absolutely no *-xen.c code copying. I'm just starting this, but at least with the version magic check (see http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00601.html ) out of the way, an SMP dom0 running pv_ops gets all the way through start_kernel() and into rest_init() before dying with an unsupported cr0 write. (I'm using direct console hypercalls for printk for now, full xencons is not working yet.) Current goal is to get as far as I can with the normal domU boot process in a dom0 environment (getting console set up correctly, etc), before starting to piece in the additional extra bits needed for dom0 startup (mostly, but by no means exclusively, setup-xen.c). I'm happy to put up a git tree for this once it gets anywhere. We'd need to decide which tree to track for that purpose --- Linus's, or perhaps the tglx or mingo x86 merge tree might make more sense. --Stephen ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Next steps with pv_ops for Xen 2007-11-21 22:05 Next steps with pv_ops for Xen Stephen C. Tweedie @ 2007-11-21 23:12 ` Jeremy Fitzhardinge 2007-11-26 14:02 ` Juan Quintela 2007-12-03 12:54 ` Gerd Hoffmann 1 sibling, 1 reply; 57+ messages in thread From: Jeremy Fitzhardinge @ 2007-11-21 23:12 UTC (permalink / raw) To: Stephen C. Tweedie Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization [-- Attachment #1: Type: text/plain, Size: 2529 bytes --] Stephen C. Tweedie wrote: > I've been looking at the next steps to try to get Xen running fully on > top of pv_ops. To that end, I've (just) started looking at one of the > next major jobs --- i686 dom0 on pv_ops. > Great! > There are still a number of things needing done to reach parity with > xen-unstable: > > x86_64 xen on pv_ops > I think once pvops has been unified, Xen support should be fairly straightforward. I wrote most of the existing code with 64-bit in mind, so I'm hoping I got it right... > Paravirt framebuffer/keyboard > CPU hotplug > Balloon > I've done some preliminary work on balloon and hotplug. I think balloon should make more use of memory hotplug, but a straight port would be a good first step. > kexec > driver domains > > but it looks like these can largely proceed in parallel if desired. > > My short-term goal with this is simply to come up with a first-pass > merge of the linux-2.6.18-xen.hg dom0 support into the current > kernel.org tree's pv_ops support. No major refactoring in the first > pass, but absolutely no *-xen.c code copying. > Yes. #ifdefs are the way to go here. > I'm just starting this, but at least with the version magic check (see > > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00601.html > I was just about to post a fix for this. > ) out of the way, an SMP dom0 running pv_ops gets all the way through > start_kernel() and into rest_init() before dying with an unsupported cr0 > write. (I'm using direct console hypercalls for printk for now, full > xencons is not working yet.) > I have some early dom0 patches already, though they're a few months old now. Not much there, but I did do an early console implementation. > I'm happy to put up a git tree for this once it gets anywhere. We'd > need to decide which tree to track for that purpose --- Linus's, or > perhaps the tglx or mingo x86 merge tree might make more sense. > Yes, I think the x86 tree is where we need to be, since there's a lot of activity there. I'll attach my dom0 patches for whatever use you can make of them. The definitely won't apply to anything, not least because of the arch merge (though it looks like they did get converted by script), but also because they're based on some defunct experimental booting-from-bzImage patches. But perhaps there's some useful stuff in there. I've also attached my xen-balloon and hotplug patches as-is. They don't work completely, but they should be closer to applying. J [-- Attachment #2: xen-dom0-boot.patch --] [-- Type: text/x-patch, Size: 6090 bytes --] --- arch/x86/boot/compressed/notes-xen.c | 16 --------- arch/x86/xen/Makefile | 2 - arch/x86/xen/early.c | 5 +- arch/x86/xen/enlighten.c | 4 +- arch/x86/xen/legacy_boot.c | 60 ++++++++++++++++++++++++++++++++++ arch/x86/xen/notes.c | 19 ++++++++++ arch/x86/xen/xen-ops.h | 3 + 7 files changed, 89 insertions(+), 20 deletions(-) =================================================================== --- a/arch/x86/boot/compressed/notes-xen.c +++ b/arch/x86/boot/compressed/notes-xen.c @@ -1,17 +1,3 @@ #ifdef CONFIG_XEN -#include <linux/elfnote.h> -#include <xen/interface/elfnote.h> - -ELFNOTE("Xen", XEN_ELFNOTE_GUEST_OS, "linux"); -ELFNOTE("Xen", XEN_ELFNOTE_GUEST_VERSION, "2.6"); -ELFNOTE("Xen", XEN_ELFNOTE_XEN_VERSION, "xen-3.0"); -ELFNOTE("Xen", XEN_ELFNOTE_FEATURES, - "!writable_page_tables|pae_pgdir_above_4gb"); -ELFNOTE("Xen", XEN_ELFNOTE_LOADER, "generic"); - -#ifdef CONFIG_X86_PAE - ELFNOTE("Xen", XEN_ELFNOTE_PAE_MODE, "yes"); -#else - ELFNOTE("Xen", XEN_ELFNOTE_PAE_MODE, "no"); +#include "../../xen/notes.c" #endif -#endif =================================================================== --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,4 +1,4 @@ obj-y := early.o enlighten.o setup.o fe obj-y := early.o enlighten.o setup.o features.o multicalls.o mmu.o \ - events.o time.o manage.o xen-asm.o + events.o time.o manage.o xen-asm.o notes.o legacy_boot.o obj-$(CONFIG_SMP) += smp.o =================================================================== --- a/arch/x86/xen/early.c +++ b/arch/x86/xen/early.c @@ -50,7 +50,7 @@ static __init unsigned long early_m2p(un return ret; } -static __init void setup_hypercall_page(struct start_info *info) +__init void xen_setup_hypercall_page(struct start_info *info) { unsigned long *mfn_list = (unsigned long *)info->mfn_list; unsigned eax, ebx, ecx, edx; @@ -183,7 +183,7 @@ void __init xen_entry(void) BUG_ON(memcmp(info->magic, PA(&"xen-3.0"), 7) != 0); /* establish a hypercall page */ - setup_hypercall_page(info); + xen_setup_hypercall_page(info); /* work out how far we need to remap */ limit = __pa(_end); @@ -203,6 +203,7 @@ void __init xen_entry(void) /* repoint things to their new virtual addresses */ info->pt_base = (unsigned long)__va(info->pt_base); info->mfn_list = (unsigned long)__va(info->mfn_list); + boot_params.hdr.hardware_subarch_data = (unsigned long)__va(info); init_pg_tables_end = limit; =================================================================== --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1106,8 +1106,8 @@ void __init xen_start_kernel(void) { pgd_t *pgd; - xen_start_info = (struct start_info *) - __va(boot_params.hdr.hardware_subarch_data); + xen_start_info = (struct start_info *)(unsigned long) + boot_params.hdr.hardware_subarch_data; /* Get mfn list */ phys_to_machine_mapping = (unsigned long *)xen_start_info->mfn_list; =================================================================== --- /dev/null +++ b/arch/x86/xen/legacy_boot.c @@ -0,0 +1,60 @@ +/* + * Notes and setup needed for legacy booting. This is used either + * when loading a domU with vmlinux directly, or for booting + * dom0. Normally we'd expect to be booted via the normal boot + * protocol. + */ +#include <linux/sched.h> +#include <linux/elfnote.h> +#include <linux/linkage.h> +#include <linux/init.h> + +#include <asm/setup.h> +#include <asm/page.h> +#include <asm/bootparam.h> + +#include <xen/interface/xen.h> +#include <xen/interface/elfnote.h> + +#include "xen-ops.h" + +extern void xen_legacy_entry(void *); + +/* Extra notes needed to set the xen-specific + entrypoint and virtual offset */ +ELFNOTE("Xen", XEN_ELFNOTE_ENTRY, &xen_legacy_entry); +ELFNOTE("Xen", XEN_ELFNOTE_VIRT_BASE, PAGE_OFFSET); + +static __init __used fastcall void xen_legacy_setup(struct start_info *info) +{ + memset(&boot_params, 0, sizeof(boot_params)); + + boot_params.hdr.type_of_loader = 0x90; /* xen */ + + boot_params.hdr.hardware_subarch = 2; /* xen */ + boot_params.hdr.hardware_subarch_data = (unsigned long)info; + + boot_params.hdr.ramdisk_image = info->mod_start; + boot_params.hdr.ramdisk_size = info->mod_len; + + boot_params.hdr.cmd_line_ptr = (unsigned long)info->cmd_line; + boot_params.hdr.cmdline_size = sizeof(info->cmd_line); + + xen_setup_hypercall_page(info); + + /* jump to xen_start_kernel with appropriate stack */ + asm volatile("mov %0,%%esp;" + "push $0;" + "jmp xen_start_kernel" + : + : "i" (&init_thread_union.stack[THREAD_SIZE/sizeof(long)]) + : "memory"); +} + + +asm(".section \".init.text\",\"ax\",@progbits \n" + ".globl xen_legacy_entry \n" + "xen_legacy_entry: \n" + " mov %esi, %eax \n" + " jmp xen_legacy_setup \n" + ".previous"); =================================================================== --- /dev/null +++ b/arch/x86/xen/notes.c @@ -0,0 +1,19 @@ +/* + * Common ELF notes needed for all Xen kernel images + */ +#include <linux/elfnote.h> +#include <xen/interface/elfnote.h> + +ELFNOTE("Xen", XEN_ELFNOTE_GUEST_OS, "linux"); +ELFNOTE("Xen", XEN_ELFNOTE_GUEST_VERSION, "2.6"); +ELFNOTE("Xen", XEN_ELFNOTE_XEN_VERSION, "xen-3.0"); +ELFNOTE("Xen", XEN_ELFNOTE_FEATURES, + "!writable_page_tables|pae_pgdir_above_4gb"); +ELFNOTE("Xen", XEN_ELFNOTE_LOADER, "generic"); + +#ifdef CONFIG_X86_PAE + ELFNOTE("Xen", XEN_ELFNOTE_PAE_MODE, "yes"); +#else + ELFNOTE("Xen", XEN_ELFNOTE_PAE_MODE, "no"); +#endif + =================================================================== --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -2,10 +2,13 @@ #define XEN_OPS_H #include <linux/init.h> +#include <linux/percpu.h> /* These are code, but not functions. Defined in entry.S */ extern const char xen_hypervisor_callback[]; extern const char xen_failsafe_callback[]; + +void xen_setup_hypercall_page(struct start_info *info); void xen_copy_trap_info(struct trap_info *traps); [-- Attachment #3: xen-dom0-xenbus.patch --] [-- Type: text/x-patch, Size: 1584 bytes --] --- drivers/xen/xenbus/xenbus_probe.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) =================================================================== --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -786,6 +786,7 @@ static int __init xenbus_probe_init(void static int __init xenbus_probe_init(void) { int err = 0; + unsigned long page = 0; DPRINTK(""); @@ -806,7 +807,31 @@ static int __init xenbus_probe_init(void * Domain0 doesn't have a store_evtchn or store_mfn yet. */ if (is_initial_xendomain()) { - /* dom0 not yet supported */ + struct evtchn_alloc_unbound alloc_unbound; + + /* Allocate page. */ + page = get_zeroed_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + xen_store_mfn = xen_start_info->store_mfn = + pfn_to_mfn(virt_to_phys((void *)page) >> + PAGE_SHIFT); + + /* Next allocate a local port which xenstored can bind to */ + alloc_unbound.dom = DOMID_SELF; + alloc_unbound.remote_dom = 0; + + err = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, + &alloc_unbound); + if (err == -ENOSYS) + goto out_unreg_front; + + BUG_ON(err); + xen_store_evtchn = xen_start_info->store_evtchn = + alloc_unbound.port; + + xen_store_interface = mfn_to_virt(xen_store_mfn); } else { xenstored_ready = 1; xen_store_evtchn = xen_start_info->store_evtchn; @@ -834,6 +859,9 @@ static int __init xenbus_probe_init(void bus_unregister(&xenbus_frontend.bus); out_error: + if (page != 0) + free_page(page); + return err; } [-- Attachment #4: xen-dom0-ide.patch --] [-- Type: text/x-patch, Size: 2507 bytes --] --- arch/x86/mm/ioremap_32.c | 3 --- arch/x86/xen/enlighten.c | 20 ++++++++++++++++++++ arch/x86/xen/setup.c | 3 ++- include/asm-x86/io_32.h | 4 ++++ 4 files changed, 26 insertions(+), 4 deletions(-) =================================================================== --- a/arch/x86/mm/ioremap_32.c +++ b/arch/x86/mm/ioremap_32.c @@ -18,9 +18,6 @@ #include <asm/tlbflush.h> #include <asm/pgtable.h> -#define ISA_START_ADDRESS 0xa0000 -#define ISA_END_ADDRESS 0x100000 - /* * Generic mapping function (not visible outside): */ =================================================================== --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -45,6 +45,7 @@ #include <asm/smp.h> #include <asm/tlbflush.h> #include <asm/reboot.h> +#include <asm/io.h> #include "xen-ops.h" #include "mmu.h" @@ -826,6 +827,19 @@ static __init void xen_pagetable_setup_d if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF)) BUG(); } + + /* + * If we're dom0, then 1:1 map the ISA machine addresses into + * the kernel's address space. + */ + if (is_initial_xendomain()) { + unsigned i; + + for(i = ISA_START_ADDRESS; i < ISA_END_ADDRESS; i += PAGE_SIZE) + set_pte_mfn(PAGE_OFFSET + i, PFN_DOWN(i), PAGE_KERNEL); + + reserve_bootmem(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS); + } } /* This is called once we have the cpu_possible_map */ @@ -1144,6 +1158,12 @@ void __init xen_start_kernel(void) if (xen_feature(XENFEAT_supervisor_mode_kernel)) paravirt_ops.kernel_rpl = 0; + if (is_initial_xendomain()) { + struct physdev_set_iopl set_iopl; + set_iopl.iopl = 1; + HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl); + } + /* set the limit of our address space */ xen_reserve_top(); =================================================================== --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -92,5 +92,6 @@ void __init xen_arch_setup(void) xen_fill_possible_map(); #endif - paravirt_disable_iospace(); + if (!is_initial_xendomain()) + paravirt_disable_iospace(); } =================================================================== --- a/include/asm-x86/io_32.h +++ b/include/asm-x86/io_32.h @@ -135,6 +135,10 @@ extern void __iomem *fix_ioremap(unsigne #define dmi_ioremap bt_ioremap #define dmi_iounmap bt_iounmap #define dmi_alloc alloc_bootmem + + +#define ISA_START_ADDRESS 0xa0000 +#define ISA_END_ADDRESS 0x100000 /* * ISA I/O bus memory addresses are 1:1 with the physical address. [-- Attachment #5: xen-dom0-console.patch --] [-- Type: text/x-patch, Size: 3840 bytes --] --- arch/x86/xen/events.c | 2 - drivers/char/hvc_xen.c | 61 +++++++++++++++++++++++++++++++++++++++++------- include/xen/events.h | 2 + 3 files changed, 56 insertions(+), 9 deletions(-) =================================================================== --- a/arch/x86/xen/events.c +++ b/arch/x86/xen/events.c @@ -308,7 +308,7 @@ static int bind_ipi_to_irq(unsigned int } -static int bind_virq_to_irq(unsigned int virq, unsigned int cpu) +int bind_virq_to_irq(unsigned int virq, unsigned int cpu) { struct evtchn_bind_virq bind_virq; int evtchn, irq; =================================================================== --- a/drivers/char/hvc_xen.c +++ b/drivers/char/hvc_xen.c @@ -50,7 +50,7 @@ static inline void notify_daemon(void) notify_remote_via_evtchn(xen_start_info->console.domU.evtchn); } -static int write_console(uint32_t vtermno, const char *data, int len) +static int domU_write_console(uint32_t vtermno, const char *data, int len) { struct xencons_interface *intf = xencons_interface(); XENCONS_RING_IDX cons, prod; @@ -71,7 +71,28 @@ static int write_console(uint32_t vtermn return sent; } -static int read_console(uint32_t vtermno, char *buf, int len) +static int dom0_write_console(uint32_t vtermno, const char *data, int len) +{ + int ret; + + ret = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)data); + + return ret < 0 ? 0 : len; +} + +static int write_console(uint32_t vtermno, const char *data, int len) +{ + int ret; + + if (is_initial_xendomain()) + ret = dom0_write_console(vtermno, data, len); + else + ret = domU_write_console(vtermno, data, len); + + return ret; +} + +static int domU_read_console(uint32_t vtermno, char *buf, int len) { struct xencons_interface *intf = xencons_interface(); XENCONS_RING_IDX cons, prod; @@ -92,22 +113,40 @@ static int read_console(uint32_t vtermno return recv; } -static struct hv_ops hvc_ops = { - .get_chars = read_console, - .put_chars = write_console, +static int dom0_read_console(uint32_t vtermno, char *buf, int len) +{ + return HYPERVISOR_console_io(CONSOLEIO_read, len, buf); +} + +static struct hv_ops domU_hvc_ops = { + .get_chars = domU_read_console, + .put_chars = domU_write_console, +}; + +static struct hv_ops dom0_hvc_ops = { + .get_chars = dom0_read_console, + .put_chars = dom0_write_console, }; static int __init xen_init(void) { struct hvc_struct *hp; + struct hv_ops *ops; if (!is_running_on_xen()) return 0; - xencons_irq = bind_evtchn_to_irq(xen_start_info->console.domU.evtchn); + if (is_initial_xendomain()) { + ops = &dom0_hvc_ops; + xencons_irq = bind_virq_to_irq(VIRQ_CONSOLE, smp_processor_id()); + } else { + ops = &domU_hvc_ops; + xencons_irq = bind_evtchn_to_irq(xen_start_info->console.domU.evtchn); + } + if (xencons_irq < 0) xencons_irq = 0 /* NO_IRQ */; - hp = hvc_alloc(HVC_COOKIE, xencons_irq, &hvc_ops, 256); + hp = hvc_alloc(HVC_COOKIE, xencons_irq, ops, 256); if (IS_ERR(hp)) return PTR_ERR(hp); @@ -123,10 +162,16 @@ static void __exit xen_fini(void) static int xen_cons_init(void) { + struct hv_ops *ops; + if (!is_running_on_xen()) return 0; - hvc_instantiate(HVC_COOKIE, 0, &hvc_ops); + ops = &domU_hvc_ops; + if (is_initial_xendomain()) + ops = &dom0_hvc_ops; + + hvc_instantiate(HVC_COOKIE, 0, ops); return 0; } =================================================================== --- a/include/xen/events.h +++ b/include/xen/events.h @@ -18,6 +18,8 @@ int bind_evtchn_to_irqhandler(unsigned i irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id); +int bind_virq_to_irq(unsigned int virq, unsigned int cpu); + int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu, irq_handler_t handler, unsigned long irqflags, const char *devname, [-- Attachment #6: xen-dom0-set_fixmap.patch --] [-- Type: text/x-patch, Size: 7189 bytes --] --- arch/x86/kernel/paravirt_32.c | 2 ++ arch/x86/mm/pgtable_32.c | 16 ++++++++++------ arch/x86/xen/enlighten.c | 41 +++++++++++++++++++++++++++++++++++++---- arch/x86/xen/mmu.c | 30 +----------------------------- include/asm-x86/fixmap_32.h | 13 +++++++++++-- include/asm-x86/paravirt.h | 13 +++++++++++++ include/asm-x86/pgtable_32.h | 3 +++ 7 files changed, 77 insertions(+), 41 deletions(-) =================================================================== --- a/arch/x86/kernel/paravirt_32.c +++ b/arch/x86/kernel/paravirt_32.c @@ -377,6 +377,8 @@ struct paravirt_ops paravirt_ops = { .dup_mmap = paravirt_nop, .exit_mmap = paravirt_nop, .activate_mm = paravirt_nop, + + .set_fixmap = native_set_fixmap, }; EXPORT_SYMBOL(paravirt_ops); =================================================================== --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -73,7 +73,7 @@ void show_mem(void) * Associate a virtual page frame with a given physical page frame * and protection flags for that frame. */ -static void set_pte_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags) +void set_pte_vaddr(unsigned long vaddr, pte_t pteval) { pgd_t *pgd; pud_t *pud; @@ -96,9 +96,8 @@ static void set_pte_pfn(unsigned long va return; } pte = pte_offset_kernel(pmd, vaddr); - if (pgprot_val(flags)) - /* <pfn,flags> stored as-is, to permit clearing entries */ - set_pte(pte, pfn_pte(pfn, flags)); + if (pte_val(pteval)) + set_pte_at(&init_mm, vaddr, pte, pteval); else pte_clear(&init_mm, vaddr, pte); @@ -148,7 +147,7 @@ unsigned long __FIXADDR_TOP = 0xfffff000 unsigned long __FIXADDR_TOP = 0xfffff000; EXPORT_SYMBOL(__FIXADDR_TOP); -void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags) +void __native_set_fixmap(enum fixed_addresses idx, pte_t pte) { unsigned long address = __fix_to_virt(idx); @@ -156,8 +155,13 @@ void __set_fixmap (enum fixed_addresses BUG(); return; } - set_pte_pfn(address, phys >> PAGE_SHIFT, flags); + set_pte_vaddr(address, pte); fixmaps++; +} + +void native_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags) +{ + __native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags)); } /** =================================================================== --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -131,10 +131,12 @@ static void xen_cpuid(unsigned int *eax, * Mask out inconvenient features, to try and disable as many * unsupported kernel subsystems as possible. */ - if (*eax == 1) - maskedx = ~((1 << X86_FEATURE_APIC) | /* disable APIC */ - (1 << X86_FEATURE_ACPI) | /* disable ACPI */ - (1 << X86_FEATURE_ACC)); /* thermal monitoring */ + if (*eax == 1) { + maskedx = ~(1 << X86_FEATURE_APIC); /* disable local APIC */ + if (!is_initial_xendomain()) + maskedx &= ~((1 << X86_FEATURE_ACPI) | /* disable ACPI */ + (1 << X86_FEATURE_ACC)); /* thermal monitoring */ + } asm(XEN_EMULATE_PREFIX "cpuid" : "=a" (*eax), @@ -916,6 +918,35 @@ static unsigned xen_patch(u8 type, u16 c return ret; } +static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot) +{ + pte_t pte; + + phys >>= PAGE_SHIFT; + + switch (idx) { +#ifdef CONFIG_X86_F00F_BUG + case FIX_F00F_IDT: +#endif + case FIX_WP_TEST: + case FIX_VDSO: +#ifdef CONFIG_X86_LOCAL_APIC + case FIX_APIC_BASE: /* maps dummy local APIC */ +#endif + pte = pfn_pte(phys, prot); + break; + + default: + pte = mfn_pte(phys, prot); + break; + } + + printk("xen_set_fixmap: idx=%d phys=%lx prot=%lx\n", + idx, phys, (unsigned long)pgprot_val(prot)); + + __native_set_fixmap(idx, pte); +} + static const struct paravirt_ops xen_paravirt_ops __initdata = { .paravirt_enabled = 1, .shared_kernel_pmd = 0, @@ -1046,6 +1077,8 @@ static const struct paravirt_ops xen_par .exit_mmap = xen_exit_mmap, .set_lazy_mode = xen_set_lazy_mode, + + .set_fixmap = xen_set_fixmap, }; #ifdef CONFIG_SMP =================================================================== --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -117,35 +117,7 @@ void xen_set_pmd(pmd_t *ptr, pmd_t val) */ void set_pte_mfn(unsigned long vaddr, unsigned long mfn, pgprot_t flags) { - pgd_t *pgd; - pud_t *pud; - pmd_t *pmd; - pte_t *pte; - - pgd = swapper_pg_dir + pgd_index(vaddr); - if (pgd_none(*pgd)) { - BUG(); - return; - } - pud = pud_offset(pgd, vaddr); - if (pud_none(*pud)) { - BUG(); - return; - } - pmd = pmd_offset(pud, vaddr); - if (pmd_none(*pmd)) { - BUG(); - return; - } - pte = pte_offset_kernel(pmd, vaddr); - /* <mfn,flags> stored as-is, to permit clearing entries */ - xen_set_pte(pte, mfn_pte(mfn, flags)); - - /* - * It's enough to flush this one mapping. - * (PGE mappings get flushed as well) - */ - __flush_tlb_one(vaddr); + set_pte_vaddr(vaddr, mfn_pte(mfn, flags)); } void xen_set_pte_at(struct mm_struct *mm, unsigned long addr, =================================================================== --- a/include/asm-x86/fixmap_32.h +++ b/include/asm-x86/fixmap_32.h @@ -98,8 +98,17 @@ enum fixed_addresses { __end_of_fixed_addresses }; -extern void __set_fixmap (enum fixed_addresses idx, - unsigned long phys, pgprot_t flags); +void __native_set_fixmap(enum fixed_addresses idx, pte_t pte); +void native_set_fixmap(enum fixed_addresses idx, + unsigned long phys, pgprot_t flags); + +#ifndef CONFIG_PARAVIRT +static inline void __set_fixmap(enum fixed_addresses idx, + unsigned long phys, pgprot_t flags) +{ + native_set_fixmap(idx, phys, flags); +} +#endif extern void reserve_top_address(unsigned long reserve); #define set_fixmap(idx, phys) \ =================================================================== --- a/include/asm-x86/paravirt.h +++ b/include/asm-x86/paravirt.h @@ -222,6 +222,13 @@ struct paravirt_ops /* These two are jmp to, not actually called. */ void (*irq_enable_sysexit)(void); void (*iret)(void); + + /* dom0 ops */ + + /* Sometimes the physical address is a pfn, and sometimes its + an mfn. We can tell which is which from the index. */ + void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx, + unsigned long phys, pgprot_t flags); }; extern struct paravirt_ops paravirt_ops; @@ -931,6 +938,12 @@ static inline void arch_flush_lazy_mmu_m PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_FLUSH); } +static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, + unsigned long phys, pgprot_t flags) +{ + paravirt_ops.set_fixmap(idx, phys, flags); +} + void _paravirt_nop(void); #define paravirt_nop ((void *)_paravirt_nop) =================================================================== --- a/include/asm-x86/pgtable_32.h +++ b/include/asm-x86/pgtable_32.h @@ -522,6 +522,9 @@ void native_pagetable_setup_start(pgd_t void native_pagetable_setup_start(pgd_t *base); void native_pagetable_setup_done(pgd_t *base); +/* Install a pte for a particular vaddr in kernel space. */ +void set_pte_vaddr(unsigned long vaddr, pte_t pte); + #ifndef CONFIG_PARAVIRT static inline void paravirt_pagetable_setup_start(pgd_t *base) { [-- Attachment #7: xen-signature-check.patch --] [-- Type: text/x-patch, Size: 687 bytes --] Subject: xen: relax signature check Some versions of Xen 3.x set their magic number to "xen-3.[12]", so relax the test to match them. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> --- arch/x86/xen/enlighten.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) =================================================================== --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1131,7 +1131,7 @@ asmlinkage void __init xen_start_kernel( if (!xen_start_info) return; - BUG_ON(memcmp(xen_start_info->magic, "xen-3.0", 7) != 0); + BUG_ON(memcmp(xen_start_info->magic, "xen-3", 5) != 0); /* Install Xen paravirt ops */ pv_info = xen_info; [-- Attachment #8: xen-balloon.patch --] [-- Type: text/x-patch, Size: 25580 bytes --] --- drivers/Kconfig | 2 drivers/xen/Kconfig | 19 + drivers/xen/Makefile | 2 drivers/xen/balloon.c | 712 ++++++++++++++++++++++++++++++++++++++++ include/xen/balloon.h | 61 +++ include/xen/interface/memory.h | 12 6 files changed, 800 insertions(+), 8 deletions(-) =================================================================== --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -95,4 +95,6 @@ source "drivers/uio/Kconfig" source "drivers/uio/Kconfig" source "drivers/virtio/Kconfig" + +source "drivers/xen/Kconfig" endmenu =================================================================== --- /dev/null +++ b/drivers/xen/Kconfig @@ -0,0 +1,19 @@ +config XEN_BALLOON + bool "Xen memory balloon driver" + depends on XEN + default y + help + The balloon driver allows the Xen domain to request more memory from + the system to expand the domain's memory allocation, or alternatively + return unneeded memory to the system. + +config XEN_SCRUB_PAGES + bool "Scrub pages before returning them to system" + depends on XEN_BALLOON + default y + help + Scrub pages before returning them to the system for reuse by + other domains. This makes sure that any confidential data + is not accidentally visible to other domains. Is it more + secure, but slightly less efficient. + If in doubt, say yes. =================================================================== --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -1,2 +1,4 @@ obj-y += grant-table.o obj-y += grant-table.o obj-y += xenbus/ + +obj-$(CONFIG_XEN_BALLOON) += balloon.o =================================================================== --- /dev/null +++ b/drivers/xen/balloon.c @@ -0,0 +1,712 @@ +/****************************************************************************** + * balloon.c + * + * Xen balloon driver - enables returning/claiming memory to/from Xen. + * + * Copyright (c) 2003, B Dragovic + * Copyright (c) 2003-2004, M Williamson, K Fraser + * Copyright (c) 2005 Dan M. Smith, IBM Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/sched.h> +#include <linux/errno.h> +#include <linux/mm.h> +#include <linux/bootmem.h> +#include <linux/pagemap.h> +#include <linux/highmem.h> +#include <linux/mutex.h> +#include <linux/highmem.h> +#include <linux/list.h> +#include <linux/sysdev.h> + +#include <asm/xen/hypervisor.h> +#include <asm/page.h> +#include <asm/pgalloc.h> +#include <asm/pgtable.h> +#include <asm/uaccess.h> +#include <asm/tlb.h> + +#include <xen/interface/memory.h> +#include <xen/balloon.h> +#include <xen/xenbus.h> +#include <xen/features.h> +#include <xen/page.h> + +#define PAGES2KB(_p) ((_p)<<(PAGE_SHIFT-10)) + +#define BALLOON_CLASS_NAME "memory" + +struct balloon_stats { + /* We aim for 'current allocation' == 'target allocation'. */ + unsigned long current_pages; + unsigned long target_pages; + /* We may hit the hard limit in Xen. If we do then we remember it. */ + unsigned long hard_limit; + /* + * Drivers may alter the memory reservation independently, but they + * must inform the balloon driver so we avoid hitting the hard limit. + */ + unsigned long driver_pages; + /* Number of pages in high- and low-memory balloons. */ + unsigned long balloon_low; + unsigned long balloon_high; +}; + +static DEFINE_MUTEX(balloon_mutex); + +static struct sys_device balloon_sysdev; + +static int register_balloon(struct sys_device *sysdev); + +/* + * Protects atomic reservation decrease/increase against concurrent increases. + * Also protects non-atomic updates of current_pages and driver_pages, and + * balloon lists. + */ +static DEFINE_SPINLOCK(balloon_lock); + +static struct balloon_stats balloon_stats; + +/* We increase/decrease in batches which fit in a page */ +static unsigned long frame_list[PAGE_SIZE / sizeof(unsigned long)]; + +/* VM /proc information for memory */ +extern unsigned long totalram_pages; + +#ifdef CONFIG_HIGHMEM +extern unsigned long totalhigh_pages; +#define inc_totalhigh_pages() (totalhigh_pages++) +#define dec_totalhigh_pages() (totalhigh_pages--) +#else +#define inc_totalhigh_pages() do {} while(0) +#define dec_totalhigh_pages() do {} while(0) +#endif + +/* List of ballooned pages, threaded through the mem_map array. */ +static LIST_HEAD(ballooned_pages); + +/* Main work function, always executed in process context. */ +static void balloon_process(struct work_struct *work); +static DECLARE_WORK(balloon_worker, balloon_process); +static struct timer_list balloon_timer; + +/* When ballooning out (allocating memory to return to Xen) we don't really + want the kernel to try too hard since that can trigger the oom killer. */ +#define GFP_BALLOON \ + (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) + +static void scrub_page(struct page *page) +{ +#ifdef CONFIG_XEN_SCRUB_PAGES + if (PageHighMem(page)) { + void *v = kmap(page); + clear_page(v); + kunmap(v); + } else { + void *v = page_address(page); + clear_page(v); + } +#endif +} + +/* balloon_append: add the given page to the balloon. */ +static void balloon_append(struct page *page) +{ + /* Lowmem is re-populated first, so highmem pages go at list tail. */ + if (PageHighMem(page)) { + list_add_tail(&page->lru, &ballooned_pages); + balloon_stats.balloon_high++; + dec_totalhigh_pages(); + } else { + list_add(&page->lru, &ballooned_pages); + balloon_stats.balloon_low++; + } +} + +/* balloon_retrieve: rescue a page from the balloon, if it is not empty. */ +static struct page *balloon_retrieve(void) +{ + struct page *page; + + if (list_empty(&ballooned_pages)) + return NULL; + + page = list_entry(ballooned_pages.next, struct page, lru); + list_del(&page->lru); + + if (PageHighMem(page)) { + balloon_stats.balloon_high--; + inc_totalhigh_pages(); + } + else + balloon_stats.balloon_low--; + + return page; +} + +static struct page *balloon_first_page(void) +{ + if (list_empty(&ballooned_pages)) + return NULL; + return list_entry(ballooned_pages.next, struct page, lru); +} + +static struct page *balloon_next_page(struct page *page) +{ + struct list_head *next = page->lru.next; + if (next == &ballooned_pages) + return NULL; + return list_entry(next, struct page, lru); +} + +static void balloon_alarm(unsigned long unused) +{ + schedule_work(&balloon_worker); +} + +static unsigned long current_target(void) +{ + unsigned long target = min(balloon_stats.target_pages, balloon_stats.hard_limit); + + target = min(target, + balloon_stats.current_pages + + balloon_stats.balloon_low + + balloon_stats.balloon_high); + + return target; +} + +static int increase_reservation(unsigned long nr_pages) +{ + unsigned long pfn, i, flags; + struct page *page; + long rc; + struct xen_memory_reservation reservation = { + .address_bits = 0, + .extent_order = 0, + .domid = DOMID_SELF + }; + + if (nr_pages > ARRAY_SIZE(frame_list)) + nr_pages = ARRAY_SIZE(frame_list); + + spin_lock_irqsave(&balloon_lock, flags); + + page = balloon_first_page(); + for (i = 0; i < nr_pages; i++) { + BUG_ON(page == NULL); + frame_list[i] = page_to_pfn(page);; + page = balloon_next_page(page); + } + + reservation.extent_start = (unsigned long)frame_list; + reservation.nr_extents = nr_pages; + rc = HYPERVISOR_memory_op( + XENMEM_populate_physmap, &reservation); + if (rc < nr_pages) { + if (rc > 0) { + int ret; + + /* We hit the Xen hard limit: reprobe. */ + reservation.nr_extents = rc; + ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, + &reservation); + BUG_ON(ret != rc); + } + if (rc >= 0) + balloon_stats.hard_limit = (balloon_stats.current_pages + rc - + balloon_stats.driver_pages); + goto out; + } + + for (i = 0; i < nr_pages; i++) { + page = balloon_retrieve(); + BUG_ON(page == NULL); + + pfn = page_to_pfn(page); + BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && + phys_to_machine_mapping_valid(pfn)); + + set_phys_to_machine(pfn, frame_list[i]); + + /* Link back into the page tables if not highmem. */ + if (pfn < max_low_pfn) { + int ret; + ret = HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << PAGE_SHIFT), + mfn_pte(frame_list[i], PAGE_KERNEL), + 0); + BUG_ON(ret); + } + + /* Relinquish the page back to the allocator. */ + ClearPageReserved(page); + init_page_count(page); + __free_page(page); + } + + balloon_stats.current_pages += nr_pages; + totalram_pages = balloon_stats.current_pages; + + out: + spin_unlock_irqrestore(&balloon_lock, flags); + + return 0; +} + +static int decrease_reservation(unsigned long nr_pages) +{ + unsigned long pfn, i, flags; + struct page *page; + int need_sleep = 0; + int ret; + struct xen_memory_reservation reservation = { + .address_bits = 0, + .extent_order = 0, + .domid = DOMID_SELF + }; + + if (nr_pages > ARRAY_SIZE(frame_list)) + nr_pages = ARRAY_SIZE(frame_list); + + for (i = 0; i < nr_pages; i++) { + if ((page = alloc_page(GFP_BALLOON)) == NULL) { + nr_pages = i; + need_sleep = 1; + break; + } + + pfn = page_to_pfn(page); + frame_list[i] = pfn_to_mfn(pfn); + + scrub_page(page); + } + + /* Ensure that ballooned highmem pages don't have kmaps. */ + kmap_flush_unused(); + flush_tlb_all(); + + spin_lock_irqsave(&balloon_lock, flags); + + /* No more mappings: invalidate P2M and add to balloon. */ + for (i = 0; i < nr_pages; i++) { + pfn = mfn_to_pfn(frame_list[i]); + set_phys_to_machine(pfn, INVALID_P2M_ENTRY); + balloon_append(pfn_to_page(pfn)); + } + + reservation.extent_start = (unsigned long)frame_list; + reservation.nr_extents = nr_pages; + ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation); + BUG_ON(ret != nr_pages); + + balloon_stats.current_pages -= nr_pages; + totalram_pages = balloon_stats.current_pages; + + spin_unlock_irqrestore(&balloon_lock, flags); + + return need_sleep; +} + +/* + * We avoid multiple worker processes conflicting via the balloon mutex. + * We may of course race updates of the target counts (which are protected + * by the balloon lock), or with changes to the Xen hard limit, but we will + * recover from these in time. + */ +static void balloon_process(struct work_struct *work) +{ + int need_sleep = 0; + long credit; + + mutex_lock(&balloon_mutex); + + do { + credit = current_target() - balloon_stats.current_pages; + if (credit > 0) + need_sleep = (increase_reservation(credit) != 0); + if (credit < 0) + need_sleep = (decrease_reservation(-credit) != 0); + +#ifndef CONFIG_PREEMPT + if (need_resched()) + schedule(); +#endif + } while ((credit != 0) && !need_sleep); + + /* Schedule more work if there is some still to be done. */ + if (current_target() != balloon_stats.current_pages) + mod_timer(&balloon_timer, jiffies + HZ); + + mutex_unlock(&balloon_mutex); +} + +/* Resets the Xen limit, sets new target, and kicks off processing. */ +void balloon_set_new_target(unsigned long target) +{ + /* No need for lock. Not read-modify-write updates. */ + balloon_stats.hard_limit = ~0UL; + balloon_stats.target_pages = target; + schedule_work(&balloon_worker); +} + +static struct xenbus_watch target_watch = +{ + .node = "memory/target" +}; + +/* React to a change in the target key */ +static void watch_target(struct xenbus_watch *watch, + const char **vec, unsigned int len) +{ + unsigned long long new_target; + int err; + + err = xenbus_scanf(XBT_NIL, "memory", "target", "%llu", &new_target); + if (err != 1) { + /* This is ok (for domain0 at least) - so just return */ + return; + } + + /* The given memory/target value is in KiB, so it needs converting to + * pages. PAGE_SHIFT converts bytes to pages, hence PAGE_SHIFT - 10. + */ + balloon_set_new_target(new_target >> (PAGE_SHIFT - 10)); +} + +static int balloon_init_watcher(struct notifier_block *notifier, + unsigned long event, + void *data) +{ + int err; + + err = register_xenbus_watch(&target_watch); + if (err) + printk(KERN_ERR "Failed to set balloon watcher\n"); + + return NOTIFY_DONE; +} + +static struct notifier_block xenstore_notifier; + +static int __init balloon_init(void) +{ + unsigned long pfn; + struct page *page; + + if (!is_running_on_xen()) + return -ENODEV; + + pr_info("xen_balloon: Initialising balloon driver.\n"); + + balloon_stats.current_pages = min(xen_start_info->nr_pages, max_pfn); + totalram_pages = balloon_stats.current_pages; + balloon_stats.target_pages = balloon_stats.current_pages; + balloon_stats.balloon_low = 0; + balloon_stats.balloon_high = 0; + balloon_stats.driver_pages = 0UL; + balloon_stats.hard_limit = ~0UL; + + init_timer(&balloon_timer); + balloon_timer.data = 0; + balloon_timer.function = balloon_alarm; + + register_balloon(&balloon_sysdev); + + /* Initialise the balloon with excess memory space. */ + for (pfn = xen_start_info->nr_pages; pfn < max_pfn; pfn++) { + page = pfn_to_page(pfn); + if (!PageReserved(page)) + balloon_append(page); + } + + target_watch.callback = watch_target; + xenstore_notifier.notifier_call = balloon_init_watcher; + + register_xenstore_notifier(&xenstore_notifier); + + return 0; +} + +subsys_initcall(balloon_init); + +static void balloon_exit(void) +{ + /* XXX - release balloon here */ + return; +} + +module_exit(balloon_exit); + +static void balloon_update_driver_allowance(long delta) +{ + unsigned long flags; + + spin_lock_irqsave(&balloon_lock, flags); + balloon_stats.driver_pages += delta; + spin_unlock_irqrestore(&balloon_lock, flags); +} + +static int dealloc_pte_fn( + pte_t *pte, struct page *pmd_page, unsigned long addr, void *data) +{ + unsigned long mfn = pte_mfn(*pte); + int ret; + struct xen_memory_reservation reservation = { + .nr_extents = 1, + .extent_order = 0, + .domid = DOMID_SELF + }; + reservation.extent_start = (unsigned long)&mfn; + set_pte_at(&init_mm, addr, pte, __pte_ma(0ull)); + set_phys_to_machine(__pa(addr) >> PAGE_SHIFT, INVALID_P2M_ENTRY); + ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation); + BUG_ON(ret != 1); + return 0; +} + +static struct page **alloc_empty_pages_and_pagevec(int nr_pages) +{ + unsigned long vaddr, flags; + struct page *page, **pagevec; + int i, ret; + + pagevec = kmalloc(sizeof(page) * nr_pages, GFP_KERNEL); + if (pagevec == NULL) + return NULL; + + for (i = 0; i < nr_pages; i++) { + page = pagevec[i] = alloc_page(GFP_KERNEL); + if (page == NULL) + goto err; + + vaddr = (unsigned long)page_address(page); + + scrub_page(page); + + spin_lock_irqsave(&balloon_lock, flags); + + if (xen_feature(XENFEAT_auto_translated_physmap)) { + unsigned long gmfn = page_to_pfn(page); + struct xen_memory_reservation reservation = { + .nr_extents = 1, + .extent_order = 0, + .domid = DOMID_SELF + }; + reservation.extent_start = (unsigned long)&gmfn; + ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, + &reservation); + if (ret == 1) + ret = 0; /* success */ + } else { + ret = apply_to_page_range(&init_mm, vaddr, PAGE_SIZE, + dealloc_pte_fn, NULL); + } + + if (ret != 0) { + spin_unlock_irqrestore(&balloon_lock, flags); + __free_page(page); + goto err; + } + + totalram_pages = --balloon_stats.current_pages; + + spin_unlock_irqrestore(&balloon_lock, flags); + } + + out: + schedule_work(&balloon_worker); + flush_tlb_all(); + return pagevec; + + err: + spin_lock_irqsave(&balloon_lock, flags); + while (--i >= 0) + balloon_append(pagevec[i]); + spin_unlock_irqrestore(&balloon_lock, flags); + kfree(pagevec); + pagevec = NULL; + goto out; +} + +static void free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages) +{ + unsigned long flags; + int i; + + if (pagevec == NULL) + return; + + spin_lock_irqsave(&balloon_lock, flags); + for (i = 0; i < nr_pages; i++) { + BUG_ON(page_count(pagevec[i]) != 1); + balloon_append(pagevec[i]); + } + spin_unlock_irqrestore(&balloon_lock, flags); + + kfree(pagevec); + + schedule_work(&balloon_worker); +} + +static void balloon_release_driver_page(struct page *page) +{ + unsigned long flags; + + spin_lock_irqsave(&balloon_lock, flags); + balloon_append(page); + balloon_stats.driver_pages--; + spin_unlock_irqrestore(&balloon_lock, flags); + + schedule_work(&balloon_worker); +} + + +#define BALLOON_SHOW(name, format, args...) \ + static ssize_t show_##name(struct sys_device *dev, \ + char *buf) \ + { \ + return sprintf(buf, format, ##args); \ + } \ + static SYSDEV_ATTR(name, S_IRUGO, show_##name, NULL) + +BALLOON_SHOW(current_kb, "%lu\n", PAGES2KB(balloon_stats.current_pages)); +BALLOON_SHOW(low_kb, "%lu\n", PAGES2KB(balloon_stats.balloon_low)); +BALLOON_SHOW(high_kb, "%lu\n", PAGES2KB(balloon_stats.balloon_high)); +BALLOON_SHOW(hard_limit_kb, + (balloon_stats.hard_limit!=~0UL) ? "%lu\n" : "???\n", + (balloon_stats.hard_limit!=~0UL) ? PAGES2KB(balloon_stats.hard_limit) : 0); +BALLOON_SHOW(driver_kb, "%lu\n", PAGES2KB(balloon_stats.driver_pages)); + +static ssize_t show_target_kb(struct sys_device *dev, char *buf) +{ + return sprintf(buf, "%lu\n", PAGES2KB(balloon_stats.target_pages)); +} + +static ssize_t store_target_kb(struct sys_device *dev, + const char *buf, + size_t count) +{ + char memstring[64], *endchar; + unsigned long long target_bytes; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (count <= 1) + return -EBADMSG; /* runt */ + if (count > sizeof(memstring)) + return -EFBIG; /* too long */ + strcpy(memstring, buf); + + target_bytes = memparse(memstring, &endchar); + balloon_set_new_target(target_bytes >> PAGE_SHIFT); + + return count; +} + +static SYSDEV_ATTR(target_kb, S_IRUGO | S_IWUSR, + show_target_kb, store_target_kb); + +static struct sysdev_attribute *balloon_attrs[] = { + &attr_target_kb, +}; + +static struct attribute *balloon_info_attrs[] = { + &attr_current_kb.attr, + &attr_low_kb.attr, + &attr_high_kb.attr, + &attr_hard_limit_kb.attr, + &attr_driver_kb.attr, + NULL +}; + +static struct attribute_group balloon_info_group = { + .name = "info", + .attrs = balloon_info_attrs, +}; + +static struct sysdev_class balloon_sysdev_class = { + set_kset_name(BALLOON_CLASS_NAME), +}; + +static int register_balloon(struct sys_device *sysdev) +{ + int i, error; + + error = sysdev_class_register(&balloon_sysdev_class); + if (error) + return error; + + sysdev->id = 0; + sysdev->cls = &balloon_sysdev_class; + + error = sysdev_register(sysdev); + if (error) { + sysdev_class_unregister(&balloon_sysdev_class); + return error; + } + + for (i = 0; i < ARRAY_SIZE(balloon_attrs); i++) { + error = sysdev_create_file(sysdev, balloon_attrs[i]); + if (error) + goto fail; + } + + error = sysfs_create_group(&sysdev->kobj, &balloon_info_group); + if (error) + goto fail; + + return 0; + + fail: + while (--i >= 0) + sysdev_remove_file(sysdev, balloon_attrs[i]); + sysdev_unregister(sysdev); + sysdev_class_unregister(&balloon_sysdev_class); + return error; +} + +static void unregister_balloon(struct sys_device *sysdev) +{ + int i; + + sysfs_remove_group(&sysdev->kobj, &balloon_info_group); + for (i = 0; i < ARRAY_SIZE(balloon_attrs); i++) + sysdev_remove_file(sysdev, balloon_attrs[i]); + sysdev_unregister(sysdev); + sysdev_class_unregister(&balloon_sysdev_class); +} + +static void balloon_sysfs_exit(void) +{ + unregister_balloon(&balloon_sysdev); +} + +MODULE_LICENSE("GPL"); =================================================================== --- /dev/null +++ b/include/xen/balloon.h @@ -0,0 +1,61 @@ +/****************************************************************************** + * balloon.h + * + * Xen balloon driver - enables returning/claiming memory to/from Xen. + * + * Copyright (c) 2003, B Dragovic + * Copyright (c) 2003-2004, M Williamson, K Fraser + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef __XEN_BALLOON_H__ +#define __XEN_BALLOON_H__ + +#include <linux/spinlock.h> + +#if 0 +/* + * Inform the balloon driver that it should allow some slop for device-driver + * memory activities. + */ +void balloon_update_driver_allowance(long delta); + +/* Allocate/free a set of empty pages in low memory (i.e., no RAM mapped). */ +struct page **alloc_empty_pages_and_pagevec(int nr_pages); +void free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages); + +void balloon_release_driver_page(struct page *page); + +/* + * Prevent the balloon driver from changing the memory reservation during + * a driver critical region. + */ +extern spinlock_t balloon_lock; +#define balloon_lock(__flags) spin_lock_irqsave(&balloon_lock, __flags) +#define balloon_unlock(__flags) spin_unlock_irqrestore(&balloon_lock, __flags) +#endif + +#endif /* __XEN_BALLOON_H__ */ =================================================================== --- a/include/xen/interface/memory.h +++ b/include/xen/interface/memory.h @@ -29,7 +29,7 @@ struct xen_memory_reservation { * OUT: GMFN bases of extents that were allocated * (NB. This command also updates the mach_to_phys translation table) */ - GUEST_HANDLE(ulong) extent_start; + ulong extent_start; /* Number of extents, and size/alignment of each (2^extent_order pages). */ unsigned long nr_extents; @@ -50,7 +50,6 @@ struct xen_memory_reservation { domid_t domid; }; -DEFINE_GUEST_HANDLE_STRUCT(xen_memory_reservation); /* * Returns the maximum machine frame number of mapped RAM in this system. @@ -86,7 +85,7 @@ struct xen_machphys_mfn_list { * any large discontiguities in the machine address space, 2MB gaps in * the machphys table will be represented by an MFN base of zero. */ - GUEST_HANDLE(ulong) extent_start; + ulong extent_start; /* * Number of extents written to the above array. This will be smaller @@ -94,7 +93,6 @@ struct xen_machphys_mfn_list { */ unsigned int nr_extents; }; -DEFINE_GUEST_HANDLE_STRUCT(xen_machphys_mfn_list); /* * Sets the GPFN at which a particular page appears in the specified guest's @@ -117,7 +115,6 @@ struct xen_add_to_physmap { /* GPFN where the source mapping page should appear. */ unsigned long gpfn; }; -DEFINE_GUEST_HANDLE_STRUCT(xen_add_to_physmap); /* * Translates a list of domain-specific GPFNs into MFNs. Returns a -ve error @@ -132,14 +129,13 @@ struct xen_translate_gpfn_list { unsigned long nr_gpfns; /* List of GPFNs to translate. */ - GUEST_HANDLE(ulong) gpfn_list; + ulong gpfn_list; /* * Output list to contain MFN translations. May be the same as the input * list (in which case each input GPFN is overwritten with the output MFN). */ - GUEST_HANDLE(ulong) mfn_list; + ulong mfn_list; }; -DEFINE_GUEST_HANDLE_STRUCT(xen_translate_gpfn_list); #endif /* __XEN_PUBLIC_MEMORY_H__ */ [-- Attachment #9: xen-cpu-hotplug.patch --] [-- Type: text/x-patch, Size: 7208 bytes --] --- arch/x86/kernel/smp_32.c | 2 + arch/x86/kernel/smpboot_32.c | 6 ++-- arch/x86/xen/enlighten.c | 15 ++++++++++- arch/x86/xen/smp.c | 54 ++++++++++++++++++++++++++++-------------- arch/x86/xen/xen-ops.h | 1 include/asm-x86/smp_32.h | 18 ++++++++++++-- 6 files changed, 72 insertions(+), 24 deletions(-) =================================================================== --- a/arch/x86/kernel/smp_32.c +++ b/arch/x86/kernel/smp_32.c @@ -704,4 +704,6 @@ struct smp_ops smp_ops = { .smp_send_stop = native_smp_send_stop, .smp_send_reschedule = native_smp_send_reschedule, .smp_call_function_mask = native_smp_call_function_mask, + + .cpu_disable = native_cpu_disable, }; =================================================================== --- a/arch/x86/kernel/smpboot_32.c +++ b/arch/x86/kernel/smpboot_32.c @@ -1166,7 +1166,7 @@ void remove_siblinginfo(int cpu) cpu_clear(cpu, cpu_sibling_setup_map); } -int __cpu_disable(void) +int native_cpu_disable(void) { cpumask_t map = cpu_online_map; int cpu = smp_processor_id(); @@ -1216,12 +1216,12 @@ void __cpu_die(unsigned int cpu) printk(KERN_ERR "CPU %u didn't die...\n", cpu); } #else /* ... !CONFIG_HOTPLUG_CPU */ -int __cpu_disable(void) +int native_cpu_disable(void) { return -ENOSYS; } -void __cpu_die(unsigned int cpu) +void native_cpu_die(unsigned int cpu) { /* We said "no" in __cpu_disable */ BUG(); =================================================================== --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -254,10 +254,21 @@ static void xen_safe_halt(void) BUG(); } +static void xen_shutdown_cpu(void) +{ + int cpu = smp_processor_id(); + + /* make sure we're not pinning something down */ + load_cr3(swapper_pg_dir); + /* GDT too? */ + + HYPERVISOR_vcpu_op(VCPUOP_down, cpu, NULL); +} + static void xen_halt(void) { if (irqs_disabled()) - HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL); + xen_shutdown_cpu(); else xen_safe_halt(); } @@ -1069,6 +1080,8 @@ static const struct smp_ops xen_smp_ops .smp_send_stop = xen_smp_send_stop, .smp_send_reschedule = xen_smp_send_reschedule, .smp_call_function_mask = xen_smp_call_function_mask, + + .cpu_disable = xen_cpu_disable, }; #endif /* CONFIG_SMP */ =================================================================== --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -189,8 +189,14 @@ void __init xen_smp_prepare_cpus(unsigne panic("failed fork for CPU %d", cpu); cpu_set(cpu, cpu_present_map); - } - + + smp_store_cpu_info(cpu); + init_gdt(cpu); + irq_ctx_init(cpu); + xen_setup_timer(cpu); + xen_smp_intr_init(cpu); + } + //init_xenbus_allowed_cpumask(); } @@ -198,7 +204,7 @@ cpu_initialize_context(unsigned int cpu, cpu_initialize_context(unsigned int cpu, struct task_struct *idle) { struct vcpu_guest_context *ctxt; - struct gdt_page *gdt = &per_cpu(gdt_page, cpu); + struct desc_struct *gdt = get_cpu_gdt_table(cpu); if (cpu_test_and_set(cpu, cpu_initialized_map)) return 0; @@ -222,11 +228,11 @@ cpu_initialize_context(unsigned int cpu, ctxt->ldt_ents = 0; - BUG_ON((unsigned long)gdt->gdt & ~PAGE_MASK); - make_lowmem_page_readonly(gdt->gdt); - - ctxt->gdt_frames[0] = virt_to_mfn(gdt->gdt); - ctxt->gdt_ents = ARRAY_SIZE(gdt->gdt); + BUG_ON((unsigned long)gdt & ~PAGE_MASK); + make_lowmem_page_readonly(gdt); + + ctxt->gdt_frames[0] = virt_to_mfn(gdt); + ctxt->gdt_ents = GDT_ENTRIES; ctxt->user_regs.cs = __KERNEL_CS; ctxt->user_regs.esp = idle->thread.esp0 - sizeof(struct pt_regs); @@ -260,26 +266,20 @@ int __cpuinit xen_cpu_up(unsigned int cp return rc; #endif - init_gdt(cpu); per_cpu(current_task, cpu) = idle; - irq_ctx_init(cpu); - xen_setup_timer(cpu); /* make sure interrupts start blocked */ per_cpu(xen_vcpu, cpu)->evtchn_upcall_mask = 1; rc = cpu_initialize_context(cpu, idle); if (rc) - return rc; + goto out; if (num_online_cpus() == 1) alternatives_smp_switch(1); - rc = xen_smp_intr_init(cpu); - if (rc) - return rc; - - smp_store_cpu_info(cpu); + get_cpu(); /* set_cpu_sibling_map wants no preempt */ + set_cpu_sibling_map(cpu); /* This must be done before setting cpu_online_map */ wmb(); @@ -289,7 +289,10 @@ int __cpuinit xen_cpu_up(unsigned int cp rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL); BUG_ON(rc); - return 0; + put_cpu(); + + out: + return rc; } void xen_smp_cpus_done(unsigned int max_cpus) @@ -408,3 +411,18 @@ int xen_smp_call_function_mask(cpumask_t return 0; } + +int xen_cpu_disable(void) +{ + cpumask_t map = cpu_online_map; + int cpu = smp_processor_id(); + + remove_siblinginfo(cpu); + + cpu_clear(cpu, map); + fixup_irqs(map); + /* It's now safe to remove this processor from the online map */ + cpu_clear(cpu, cpu_online_map); + + return 0; +} =================================================================== --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -39,6 +39,7 @@ void xen_smp_prepare_cpus(unsigned int m void xen_smp_prepare_cpus(unsigned int max_cpus); int xen_cpu_up(unsigned int cpu); void xen_smp_cpus_done(unsigned int max_cpus); +int xen_cpu_disable(void); void xen_smp_send_stop(void); void xen_smp_send_reschedule(int cpu); =================================================================== --- a/include/asm-x86/smp_32.h +++ b/include/asm-x86/smp_32.h @@ -63,6 +63,9 @@ struct smp_ops int (*smp_call_function_mask)(cpumask_t mask, void (*func)(void *info), void *info, int wait); + + int (*cpu_disable)(void); + void (*cpu_die)(unsigned int cpu); }; extern struct smp_ops smp_ops; @@ -71,14 +74,17 @@ static inline void smp_prepare_boot_cpu( { smp_ops.smp_prepare_boot_cpu(); } + static inline void smp_prepare_cpus(unsigned int max_cpus) { smp_ops.smp_prepare_cpus(max_cpus); } + static inline int __cpu_up(unsigned int cpu) { return smp_ops.cpu_up(cpu); } + static inline void smp_cpus_done(unsigned int max_cpus) { smp_ops.smp_cpus_done(max_cpus); @@ -88,10 +94,12 @@ static inline void smp_send_stop(void) { smp_ops.smp_send_stop(); } + static inline void smp_send_reschedule(int cpu) { smp_ops.smp_send_reschedule(cpu); } + static inline int smp_call_function_mask(cpumask_t mask, void (*func) (void *info), void *info, int wait) @@ -99,10 +107,18 @@ static inline int smp_call_function_mask return smp_ops.smp_call_function_mask(mask, func, info, wait); } +static inline int __cpu_disable(void) +{ + return smp_ops.cpu_disable(); +} + + void native_smp_prepare_boot_cpu(void); void native_smp_prepare_cpus(unsigned int max_cpus); int native_cpu_up(unsigned int cpunum); void native_smp_cpus_done(unsigned int max_cpus); +extern int native_cpu_disable(void); +extern void __cpu_die(unsigned int cpu); #ifndef CONFIG_PARAVIRT #define startup_ipi_hook(phys_apicid, start_eip, start_esp) \ @@ -128,8 +144,6 @@ static inline int num_booting_cpus(void) } extern int safe_smp_processor_id(void); -extern int __cpu_disable(void); -extern void __cpu_die(unsigned int cpu); extern unsigned int num_processors; void __cpuinit smp_store_cpu_info(int id); [-- Attachment #10: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-11-21 23:12 ` Jeremy Fitzhardinge @ 2007-11-26 14:02 ` Juan Quintela 2007-11-26 18:52 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 57+ messages in thread From: Juan Quintela @ 2007-11-26 14:02 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Hi, your console works great, but rest of patches are assuming: arch/x86/boot/compressed/notes-xen.c arch/x86/xen/early.c at least. It looks as if there is missing another patche, could you take a look, please? Otherwise, I will take a look at what is missing. It breaks with: Intel machine check architecture supported. (XEN) traps.c:1734:d0 Domain attempted WRMSR 00000404 from 00000000:00000001 to ffffffff:ffffffff. Intel machine check reporting enabled on CPU#0. general protection fault: 0000 [#1] SMP Modules linked in: Pid: 1, comm: swapper Not tainted (2.6.24-rc3-q2 #10) EIP: 0061:[<c0101790>] EFLAGS: 00010082 CPU: 0 EIP is at native_write_cr0+0x0/0x4 EAX: c005003b EBX: c03902a0 ECX: ed03f288 EDX: 00000005 ESI: c1c10c80 EDI: ed054200 EBP: 00000001 ESP: ed027eb8 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: e021 Process swapper (pid: 1, ti=ed027000 task=ed03ebb0 task.ti=ed027000) Stack: c01125e9 00000000 c03902a0 c1c10c80 ed054200 c01128c6 c03900a0 00000008 c010e0aa c037b48d 00000000 ed00efa0 ed027f24 0000000a c035215c c01e20a7 c1c10c80 80000008 000006f4 00020800 c0143563 ed03ebb0 017fe000 c03902a0 Call Trace: [<c01125e9>] prepare_set+0x20/0x86 [<c01128c6>] generic_set_all+0x28/0x34a [<c010e0aa>] identify_cpu+0x525/0x52d [<c01e20a7>] kvasprintf+0x3f/0x48 [<c0143563>] trace_hardirqs_off+0x28/0xa1 [<c0111ac6>] mtrr_ap_init+0x33/0x5d [<c0117547>] smp_store_cpu_info+0x32/0xb9 [<c0104e78>] xen_cpu_up+0x22c/0x3b4 [<c0148fdf>] _cpu_up+0xab/0x120 [<c014913e>] cpu_up+0x4e/0x61 [<c03d33f8>] kernel_init+0x9e/0x2c6 [<c0107017>] restore_nocheck+0x12/0x15 [<c03d335a>] kernel_init+0x0/0x2c6 [<c03d335a>] kernel_init+0x0/0x2c6 [<c0107c7f>] kernel_thread_helper+0x7/0x10 ======================= Code: 53 89 cb 83 ec 08 89 14 24 89 da 8b 04 24 89 4c 24 04 89 f9 0f 30 31 c0 5a 59 5b 5e 5f c3 0f 31 c3 0f 33 c3 0f 06 c3 0f 20 c0 c3 <0f> 22 c0 c3 0f 20 e0 c3 31 c0 0f 20 e0 c3 0f 09 c3 0f 01 00 c3 EIP: [<c0101790>] native_write_cr0+0x0/0x4 SS:ESP e021:ed027eb8 Kernel panic - not syncing: Attempted to kill init! Later, Juan. On Nov 22, 2007 12:12 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Stephen C. Tweedie wrote: > > I've been looking at the next steps to try to get Xen running fully on > > top of pv_ops. To that end, I've (just) started looking at one of the > > next major jobs --- i686 dom0 on pv_ops. > > > > Great! > > > There are still a number of things needing done to reach parity with > > xen-unstable: > > > > x86_64 xen on pv_ops > > > > I think once pvops has been unified, Xen support should be fairly > straightforward. I wrote most of the existing code with 64-bit in mind, > so I'm hoping I got it right... > > > Paravirt framebuffer/keyboard > > CPU hotplug > > Balloon > > > > I've done some preliminary work on balloon and hotplug. I think balloon > should make more use of memory hotplug, but a straight port would be a > good first step. > > > kexec > > driver domains > > > > but it looks like these can largely proceed in parallel if desired. > > > > My short-term goal with this is simply to come up with a first-pass > > merge of the linux-2.6.18-xen.hg dom0 support into the current > > kernel.org tree's pv_ops support. No major refactoring in the first > > pass, but absolutely no *-xen.c code copying. > > > > Yes. #ifdefs are the way to go here. > > > I'm just starting this, but at least with the version magic check (see > > > > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00601.html > > > > I was just about to post a fix for this. > > > ) out of the way, an SMP dom0 running pv_ops gets all the way through > > start_kernel() and into rest_init() before dying with an unsupported cr0 > > write. (I'm using direct console hypercalls for printk for now, full > > xencons is not working yet.) > > > > I have some early dom0 patches already, though they're a few months old > now. Not much there, but I did do an early console implementation. > > > I'm happy to put up a git tree for this once it gets anywhere. We'd > > need to decide which tree to track for that purpose --- Linus's, or > > perhaps the tglx or mingo x86 merge tree might make more sense. > > > > Yes, I think the x86 tree is where we need to be, since there's a lot of > activity there. > > I'll attach my dom0 patches for whatever use you can make of them. The > definitely won't apply to anything, not least because of the arch merge > (though it looks like they did get converted by script), but also > because they're based on some defunct experimental booting-from-bzImage > patches. But perhaps there's some useful stuff in there. > > I've also attached my xen-balloon and hotplug patches as-is. They don't > work completely, but they should be closer to applying. > > J > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-11-26 14:02 ` Juan Quintela @ 2007-11-26 18:52 ` Jeremy Fitzhardinge 2007-11-27 8:30 ` Jan Beulich 0 siblings, 1 reply; 57+ messages in thread From: Jeremy Fitzhardinge @ 2007-11-26 18:52 UTC (permalink / raw) To: Juan Quintela Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Juan Quintela wrote: > Hi, > > your console works great, but rest of patches are assuming: > > arch/x86/boot/compressed/notes-xen.c > arch/x86/xen/early.c > Yes, those are leftovers from a somewhat unsuccessful attempt at getting ELF-in-bzImage booting working. I need to go back and make bzImage booting work properly. I posted those patches as a source of possibly useful code snippets/summary of things I've looked at so far, rather than something that can be directly used. > at least. It looks as if there is missing another patche, could you > take a look, please? > Otherwise, I will take a look at what is missing. > > It breaks with: > > Intel machine check architecture supported. > (XEN) traps.c:1734:d0 Domain attempted WRMSR 00000404 from 00000000:00000001 to > ffffffff:ffffffff. > Intel machine check reporting enabled on CPU#0. > general protection fault: 0000 [#1] SMP > Modules linked in: > Hm. Looks like Xen is getting upset about dom0 trying to disable caching. No, wait: 0xffffffff:ffffffff? That's strange; I wonder if its just misreporting the value, because the code doesn't look like its trying to write that. Either way, the fix is to implement xen_write_cr0, and mask off any bits that Xen won't want us to set/clear (or if it doesn't allow dom0 to change cr0, just ignore all updates). J ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-11-26 18:52 ` Jeremy Fitzhardinge @ 2007-11-27 8:30 ` Jan Beulich 2007-11-27 17:00 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 57+ messages in thread From: Jan Beulich @ 2007-11-27 8:30 UTC (permalink / raw) To: Juan Quintela, Jeremy Fitzhardinge Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Glauber de Oliveira Costa, Chris Wright, virtualization >> It breaks with: >> >> Intel machine check architecture supported. >> (XEN) traps.c:1734:d0 Domain attempted WRMSR 00000404 from 00000000:00000001 to >> ffffffff:ffffffff. >> Intel machine check reporting enabled on CPU#0. >> general protection fault: 0000 [#1] SMP >> Modules linked in: >> > >Hm. Looks like Xen is getting upset about dom0 trying to disable >caching. No, wait: 0xffffffff:ffffffff? That's strange; I wonder if >its just misreporting the value, because the code doesn't look like its >trying to write that. > >Either way, the fix is to implement xen_write_cr0, and mask off any bits >that Xen won't want us to set/clear (or if it doesn't allow dom0 to >change cr0, just ignore all updates). Why do you think that's a CR0 write? The messages clearly indicate an MSR write, and these writes are clearly visible in intel_p{4,6}_mcheck_init() and amd_mcheck_init(). The question is why intel_p4_mcheck_init() doesn't check CPUID bits before trying to touch any registers... (And similarly amd_mcheck_init() is checking only the MCE bit, not the MCA one.) But then I just noticed that Xen itself doesn't clear the MCE/MCA bits either in emulate_forced_invalid_op(), apparently under the assumption that PV guests wouldn't try to make use of this feature. A simple workaround would be to force mce_disabled to 1 in early Xen initialization. Jan ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-11-27 8:30 ` Jan Beulich @ 2007-11-27 17:00 ` Jeremy Fitzhardinge 2007-11-27 17:14 ` Jan Beulich 2007-11-27 17:15 ` Stephen C. Tweedie 0 siblings, 2 replies; 57+ messages in thread From: Jeremy Fitzhardinge @ 2007-11-27 17:00 UTC (permalink / raw) To: Jan Beulich Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Glauber de Oliveira Costa, Chris Wright, virtualization, Juan Quintela Jan Beulich wrote: >>> It breaks with: >>> >>> Intel machine check architecture supported. >>> (XEN) traps.c:1734:d0 Domain attempted WRMSR 00000404 from 00000000:00000001 to >>> ffffffff:ffffffff. >>> Intel machine check reporting enabled on CPU#0. >>> general protection fault: 0000 [#1] SMP >>> Modules linked in: >>> >>> >> Hm. Looks like Xen is getting upset about dom0 trying to disable >> caching. No, wait: 0xffffffff:ffffffff? That's strange; I wonder if >> its just misreporting the value, because the code doesn't look like its >> trying to write that. >> >> Either way, the fix is to implement xen_write_cr0, and mask off any bits >> that Xen won't want us to set/clear (or if it doesn't allow dom0 to >> change cr0, just ignore all updates). >> > > Why do you think that's a CR0 write? Well, the oops says "EIP is at native_write_cr0+0x0/0x4", and the caller is prepare_set(), which does: /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */ cr0 = read_cr0() | X86_CR0_CD; write_cr0(cr0); wbinvd(); This is in preparation to setting up the MTRRs, which needs to be all skipped anyway. > The messages clearly indicate an > MSR write, and these writes are clearly visible in intel_p{4,6}_mcheck_init() > and amd_mcheck_init(). The question is why intel_p4_mcheck_init() doesn't > check CPUID bits before trying to touch any registers... (And similarly > amd_mcheck_init() is checking only the MCE bit, not the MCA one.) > The oops and backtrace doesn't suggest it's an MSR write. Does a crX write take the same path through the emulator as an MSR write? > A simple workaround would be to force mce_disabled to 1 in early Xen > initialization. > That's probably necessary too. J ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-11-27 17:00 ` Jeremy Fitzhardinge @ 2007-11-27 17:14 ` Jan Beulich 2007-11-27 17:15 ` Stephen C. Tweedie 1 sibling, 0 replies; 57+ messages in thread From: Jan Beulich @ 2007-11-27 17:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Glauber de Oliveira Costa, Chris Wright, virtualization, Juan Quintela >The oops and backtrace doesn't suggest it's an MSR write. Does a crX Oh, right, the MSR write is being ignored, not failed. >write take the same path through the emulator as an MSR write? No, the two operations take different paths. Jan ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-11-27 17:00 ` Jeremy Fitzhardinge 2007-11-27 17:14 ` Jan Beulich @ 2007-11-27 17:15 ` Stephen C. Tweedie 1 sibling, 0 replies; 57+ messages in thread From: Stephen C. Tweedie @ 2007-11-27 17:15 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Juan Quintela Hi, On Tue, 2007-11-27 at 09:00 -0800, Jeremy Fitzhardinge wrote: > > Why do you think that's a CR0 write? > > Well, the oops says "EIP is at native_write_cr0+0x0/0x4", and the caller > is prepare_set(), which does: > > /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */ > cr0 = read_cr0() | X86_CR0_CD; > write_cr0(cr0); > wbinvd(); > > This is in preparation to setting up the MTRRs, Right: cpu 0 gets past the early mtrr init (on the boot CPU, all the kernel does is to probe the existing mtrr config), but it dies on cpu 1 trying to copy the mtrr config across. (As a consequence, we don't hit this problem on UP configs.) > which needs to be all skipped anyway. We _could_ just skip it, but we still want some mtrr support for dom0. Fortunately the kernel's mtrr interfaces are nicely modular already, so I'm currently starting to plug the 2.6.18 mtrr/main-xen.c into pv_ops as a modular mtrr provider. > > The messages clearly indicate an > > MSR write, and these writes are clearly visible in intel_p{4,6}_mcheck_init() > > and amd_mcheck_init(). The question is why intel_p4_mcheck_init() doesn't > > check CPUID bits before trying to touch any registers... (And similarly > > amd_mcheck_init() is checking only the MCE bit, not the MCA one.) > The oops and backtrace doesn't suggest it's an MSR write. Does a crX > write take the same path through the emulator as an MSR write? We get a slew of MCE-related MSR write warnings from the HV on both boot and auxiliary processor bring-up, but the kernel doesn't crash at those points. (Which is not necessarily a good thing, as it implies the kernel thinks it has registered its MCE handling, but the MSR writes have not actually been honoured.) So it's not a showstopper right now, but is something we'll still want to deal with at some stage. > > A simple workaround would be to force mce_disabled to 1 in early Xen > > initialization. > That's probably necessary too. It doesn't seem to be necessary, given that the kernel does get past this point; it's probably desirable, though, at least until such time as we can actually do the MCE support correctly. --Stephen ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Next steps with pv_ops for Xen 2007-11-21 22:05 Next steps with pv_ops for Xen Stephen C. Tweedie 2007-11-21 23:12 ` Jeremy Fitzhardinge @ 2007-12-03 12:54 ` Gerd Hoffmann 2007-12-03 13:19 ` Derek Murray 1 sibling, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-03 12:54 UTC (permalink / raw) To: Stephen C. Tweedie Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Stephen C. Tweedie wrote: > Hi all, > > driver domains Looked at the gntdev (grant table mappings for user space) driver, noticed that one is not self-contained. It needs a hook for page unmapping: http://xenbits.xensource.com/xen-3.1-testing.hg?rev/7180d2e61f92 plus an s/ptep_get_and_clear_full/zap_pte/ fixup a few changesets later. Upstreaming that one could become *uhm* intresting. Nevertheless the gntdev functionality is quite useful for writing pure userspace backend drivers ... cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 12:54 ` Gerd Hoffmann @ 2007-12-03 13:19 ` Derek Murray 2007-12-03 14:16 ` Gerd Hoffmann 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-03 13:19 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization I take the blame for that one. I added the hook because, if a process were to die whilst holding one or more grants, there were no hooks that would make it possible to carry out the grant-unmap. All existing hooks on either the device or the VMA were called *after* the PTEs were cleared. It gets better, though. The same hook is used in the version of blktap in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for xen-3.1-testing): http://xenbits.xensource.com/linux-2.6.18-xen.hg?file/fd879c0688bf/drivers/xen/blktap/blktap.c Reverting back to the old (hookless) behaviour would be a retrograde step IMHO. Cheers, Derek Murray. Gerd Hoffmann wrote: > Stephen C. Tweedie wrote: >> Hi all, >> >> driver domains > > Looked at the gntdev (grant table mappings for user space) driver, > noticed that one is not self-contained. It needs a hook for page unmapping: > > http://xenbits.xensource.com/xen-3.1-testing.hg?rev/7180d2e61f92 > plus an s/ptep_get_and_clear_full/zap_pte/ fixup a few changesets > later. > > Upstreaming that one could become *uhm* intresting. Nevertheless the > gntdev functionality is quite useful for writing pure userspace > backend drivers ... > > cheers, > Gerd > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 13:19 ` Derek Murray @ 2007-12-03 14:16 ` Gerd Hoffmann 2007-12-03 14:51 ` Derek Murray 0 siblings, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-03 14:16 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Derek Murray wrote: > I take the blame for that one. I added the hook because, if a process > were to die whilst holding one or more grants, there were no hooks that > would make it possible to carry out the grant-unmap. All existing hooks > on either the device or the VMA were called *after* the PTEs were cleared. Hmm. What exactly is the issue here? This is about *userspace* mappings, right? As far as I can see from a quick scan there of the code is an additional kernel space mapping for the grants and the userspace mapping is optional. I don't see any problems with userspace mapping going away without *instant* notification. Cleaning up a bit later, called from the file_ops->release callback maybe, should work ok. The problem I see with the additional vm_ops callback is that I suspect you'll have to come up with some *very* good arguments to get it accepted by the VM (as in "virtual memory") folks and merged mainline. > It gets better, though. The same hook is used in the version of blktap > in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for > xen-3.1-testing): Oh, I'm thinking more in the direction of killing blktap altogether in favor of a pure userspace implementation on top of gntdev. cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 14:16 ` Gerd Hoffmann @ 2007-12-03 14:51 ` Derek Murray 2007-12-03 17:18 ` Mark Williamson 2007-12-03 20:38 ` Gerd Hoffmann 0 siblings, 2 replies; 57+ messages in thread From: Derek Murray @ 2007-12-03 14:51 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Gerd Hoffmann wrote: > Derek Murray wrote: >> I take the blame for that one. I added the hook because, if a process >> were to die whilst holding one or more grants, there were no hooks that >> would make it possible to carry out the grant-unmap. All existing hooks >> on either the device or the VMA were called *after* the PTEs were cleared. > > Hmm. What exactly is the issue here? > > This is about *userspace* mappings, right? As far as I can see from a > quick scan there of the code is an additional kernel space mapping for > the grants and the userspace mapping is optional. I don't see any > problems with userspace mapping going away without *instant* > notification. Cleaning up a bit later, called from the > file_ops->release callback maybe, should work ok. If we let Linux zap the page tables before we unmap the grant reference, then it is not possible to unmap the grant reference. The unmap_grant_ref hypercall ultimately calls destroy_grant_pte_mapping in xen/arch/x86/mm.c, which ensures that the PTE does in fact point to the granted frame. Note also the comment further up in that file (in put_page_from_l1e): /* * Check if this is a mapping that was established via a grant reference. * If it was then we should not be here: we require that such mappings are * explicitly destroyed via the grant-table interface. * * The upshot of this is that the guest can end up with active grants that * it cannot destroy (because it no longer has a PTE to present to the * grant-table interface). This can lead to subtle hard-to-catch bugs, * hence a special grant PTE flag can be enabled to catch the bug early. * * (Note that the undestroyable active grants are not a security hole in * Xen. All active grants can safely be cleaned up when the domain dies.) */ Effectively, there is a debug option that sets a bit in PTEs that map granted pages, and this can be used to force a domain_crash in the event that a VM tries to zap the entries normally. The normal behaviour is to silently accept the zap operation, and leak granted pages until the grantee domain is killed. > The problem I see with the additional vm_ops callback is that I suspect > you'll have to come up with some *very* good arguments to get it > accepted by the VM (as in "virtual memory") folks and merged mainline. On this point I completely agree with you! If anyone has any less radical suggestions, then I'd be delighted to refactor the gntdev code to use them. However, I'm not currently aware of any alternative that maintains robustness to process crashes. >> It gets better, though. The same hook is used in the version of blktap >> in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for >> xen-3.1-testing): > > Oh, I'm thinking more in the direction of killing blktap altogether in > favor of a pure userspace implementation on top of gntdev. I think this would represent good progress, though I wonder if there would be a performance penalty due to performing the mapping and unmapping in user-space (multiple syscalls per mapping versus a single hypercall). Cheers, Derek Murray. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 14:51 ` Derek Murray @ 2007-12-03 17:18 ` Mark Williamson 2007-12-03 18:36 ` D.G. Murray 2007-12-03 20:38 ` Gerd Hoffmann 1 sibling, 1 reply; 57+ messages in thread From: Mark Williamson @ 2007-12-03 17:18 UTC (permalink / raw) To: xen-devel Cc: Derek Murray, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann > >> It gets better, though. The same hook is used in the version of blktap > >> in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for > >> xen-3.1-testing): > > > > Oh, I'm thinking more in the direction of killing blktap altogether in > > favor of a pure userspace implementation on top of gntdev. > > I think this would represent good progress, though I wonder if there > would be a performance penalty due to performing the mapping and > unmapping in user-space (multiple syscalls per mapping versus a single > hypercall). Maybe a change to the gntdev userspace API to allow batching of mapping requests? I'm not aware of a batched mmap interface, which would seem to be the ideal solution; but it should be possible to batch this stuff somehow. Although it seems like some kind of really weird ioctl might be needed :-S to do it *without* such a batched interface... blktap in userspace, if any performance problems can be addressed, would seem to be a far nicer way of doing things. And it's less code to merge upstream ;-) Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! ^ permalink raw reply [flat|nested] 57+ messages in thread
* RE: Re: Next steps with pv_ops for Xen 2007-12-03 17:18 ` Mark Williamson @ 2007-12-03 18:36 ` D.G. Murray 2007-12-03 19:08 ` Mark Williamson ` (3 more replies) 0 siblings, 4 replies; 57+ messages in thread From: D.G. Murray @ 2007-12-03 18:36 UTC (permalink / raw) To: 'Mark Williamson', xen-devel Cc: 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Gerd Hoffmann' Hi Mark, > Maybe a change to the gntdev userspace API to allow batching > of mapping requests? Something along the lines of the following? /** * Memory maps one or more grant references from one or more domains to a * contiguous local address range. Mappings should be unmapped with * xc_gnttab_munmap. Returns NULL on failure. * * @parm xcg_handle a handle on an open grant table interface * @parm count the number of grant references to be mapped * @parm domids an array of @count domain IDs by which the corresponding @refs * were granted * @parm refs an array of @count grant references to be mapped * @parm prot same flag as in mmap() */ void *xc_gnttab_map_grant_refs(int xcg_handle, uint32_t count, uint32_t *domids, uint32_t *refs, int prot); http://xenbits.xensource.com/xen-unstable.hg?file/3057f813da14/tools/libxc/x enctrl.h > blktap in userspace, if any performance problems can be > addressed, would seem to be a far nicer way of doing things. > And it's less code to merge upstream ;-) Agreed. Cheers, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 18:36 ` D.G. Murray @ 2007-12-03 19:08 ` Mark Williamson 2007-12-04 9:35 ` tgh 2007-12-06 15:21 ` Gerd Hoffmann ` (2 subsequent siblings) 3 siblings, 1 reply; 57+ messages in thread From: Mark Williamson @ 2007-12-03 19:08 UTC (permalink / raw) To: dgm36 Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Gerd Hoffmann' > Hi Mark, > > > Maybe a change to the gntdev userspace API to allow batching > > of mapping requests? > > Something along the lines of the following? Just like that :-D When you said "multiple syscalls per mapping" I assumed you meant that we'd lose the batching you get by doing a mulicall. If it's just a couple of syscalls (plus, presumably a couple of hypercalls) per batch of mappings, my gut says it's probably not going to hurt block performance. My guts have been wrong in (many!) ways before of course... I guess the overhead *could* be reduced even more by just having a magic ioctl that did all the mmap-ing stuff in one operation, but that'd probably be really gross if it wasn't necessary! And I doubt it'd make upstream very happy... We'll also be eliminating the overheads involved in having a blktap ring for talking to userspace and having to move requests between that ring and the real block ring, so there's some definite wins in overheads as well. Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 19:08 ` Mark Williamson @ 2007-12-04 9:35 ` tgh 2007-12-05 3:42 ` Mark Williamson 0 siblings, 1 reply; 57+ messages in thread From: tgh @ 2007-12-04 9:35 UTC (permalink / raw) To: Mark Williamson Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, dgm36, 'Gerd Hoffmann' hi I am not quite clear about the purpose of pv-ops , what do we want to deal with by developping "pv-ops"? is it used for HVM or for PV or KVM or something ? I have seen it for a few months in the list ,and "pv-ops"is an active project ,but i am not clear about what is the aim of "pv-ops" ,could you give me an explanation about it Thanks in advance Mark Williamson 写道: >> Hi Mark, >> >> >>> Maybe a change to the gntdev userspace API to allow batching >>> of mapping requests? >>> >> Something along the lines of the following? >> > > Just like that :-D > > When you said "multiple syscalls per mapping" I assumed you meant that we'd > lose the batching you get by doing a mulicall. If it's just a couple of > syscalls (plus, presumably a couple of hypercalls) per batch of mappings, my > gut says it's probably not going to hurt block performance. My guts have > been wrong in (many!) ways before of course... > > I guess the overhead *could* be reduced even more by just having a magic ioctl > that did all the mmap-ing stuff in one operation, but that'd probably be > really gross if it wasn't necessary! And I doubt it'd make upstream very > happy... > > We'll also be eliminating the overheads involved in having a blktap ring for > talking to userspace and having to move requests between that ring and the > real block ring, so there's some definite wins in overheads as well. > > Cheers, > Mark > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 9:35 ` tgh @ 2007-12-05 3:42 ` Mark Williamson 0 siblings, 0 replies; 57+ messages in thread From: Mark Williamson @ 2007-12-05 3:42 UTC (permalink / raw) To: tgh Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, dgm36, 'Gerd Hoffmann' > I am not quite clear about the purpose of pv-ops , what do we want to > deal with by developping "pv-ops"? is it used for HVM or for PV or KVM > or something ? I have seen it for a few months in the list ,and > "pv-ops"is an active project ,but i am not clear about what is the aim > of "pv-ops" ,could you give me an explanation about it PV-ops is an API within Linux which is used to support paravirtualisation. paravirt-ops makes it possible to compile a Linux kernel which can boot on bare hardware, or on Xen, or using VMI (VMware's paravirtualised interface), lguest, or any other VMM that is supported. The resulting kernel can then boot on any of those and make proper use of paravirtualisation. For instance, with 2.6.23 from kernel.org you should be able to compile a kernel that will boot both on bare hardware and in a Xen domU in PV mode. Various tricks are used to ensure that it will run with good performance on both. pv-ops mostly deals with the paravirtualisation of the CPU. IO devices such as block and network are handled using Xen-aware drivers rather similar to those in the XenSource Linux kernels, they are not part of pv-ops. Cheers, Mark > Thanks in advance > > Mark Williamson 写道: > >> Hi Mark, > >> > >>> Maybe a change to the gntdev userspace API to allow batching > >>> of mapping requests? > >> > >> Something along the lines of the following? > > > > Just like that :-D > > > > When you said "multiple syscalls per mapping" I assumed you meant that > > we'd lose the batching you get by doing a mulicall. If it's just a > > couple of syscalls (plus, presumably a couple of hypercalls) per batch of > > mappings, my gut says it's probably not going to hurt block performance. > > My guts have been wrong in (many!) ways before of course... > > > > I guess the overhead *could* be reduced even more by just having a magic > > ioctl that did all the mmap-ing stuff in one operation, but that'd > > probably be really gross if it wasn't necessary! And I doubt it'd make > > upstream very happy... > > > > We'll also be eliminating the overheads involved in having a blktap ring > > for talking to userspace and having to move requests between that ring > > and the real block ring, so there's some definite wins in overheads as > > well. > > > > Cheers, > > Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 18:36 ` D.G. Murray 2007-12-03 19:08 ` Mark Williamson @ 2007-12-06 15:21 ` Gerd Hoffmann 2007-12-06 15:32 ` Derek Murray 2007-12-21 12:58 ` Gerd Hoffmann 2007-12-21 12:58 ` [Xen-devel] " Gerd Hoffmann 3 siblings, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-06 15:21 UTC (permalink / raw) To: dgm36 Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Mark Williamson' D.G. Murray wrote: > Hi Mark, > >> Maybe a change to the gntdev userspace API to allow batching >> of mapping requests? > > Something along the lines of the following? > > void *xc_gnttab_map_grant_refs(int xcg_handle, > uint32_t count, > uint32_t *domids, > uint32_t *refs, > int prot); Yes, except that it should actually work ;) It doesn't for me (Fedora 8 again). Grab xenner 0.9 (just uploaded), edit blkbackd.c and flip the BATCH_MAPS from 0 to 1, compile, run, see it not work. With BATCH_MAPS being 0 blkbackd works nicely as blktap/tapdisk drop-in replacement. cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-06 15:21 ` Gerd Hoffmann @ 2007-12-06 15:32 ` Derek Murray 2007-12-06 15:55 ` Gerd Hoffmann 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-06 15:32 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Mark Williamson' Gerd Hoffmann wrote: > Yes, except that it should actually work ;) > > It doesn't for me (Fedora 8 again). Grab xenner 0.9 (just uploaded), > edit blkbackd.c and flip the BATCH_MAPS from 0 to 1, compile, run, see > it not work. Which version of the Xen tools are you using? There was a bug in the version released with Xen 3.1, which should have been cleaned up in the subsequent minor versions. Try grabbing the patch to libxc at: http://xenbits.xensource.com/xen-3.1-testing.hg?raw-rev/135d5088909f Otherwise, if this doesn't work/is some other issue, could you post the OOPS and relevant Xen console output? Thanks, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-06 15:32 ` Derek Murray @ 2007-12-06 15:55 ` Gerd Hoffmann 0 siblings, 0 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-06 15:55 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Mark Williamson' > Which version of the Xen tools are you using? There was a bug in the > version released with Xen 3.1, which should have been cleaned up in the > subsequent minor versions. Try grabbing the patch to libxc at: > > http://xenbits.xensource.com/xen-3.1-testing.hg?raw-rev/135d5088909f Probably it is this one, according to rpm version is 3.1.0, so most likely the fix isn't there. > Otherwise, if this doesn't work/is some other issue, could you post the > OOPS and relevant Xen console output? There isn't any, the mapping just doesn't work (libxc returning NULL). thanks, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 18:36 ` D.G. Murray 2007-12-03 19:08 ` Mark Williamson 2007-12-06 15:21 ` Gerd Hoffmann @ 2007-12-21 12:58 ` Gerd Hoffmann 2007-12-21 12:58 ` [Xen-devel] " Gerd Hoffmann 3 siblings, 0 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-21 12:58 UTC (permalink / raw) To: dgm36 Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Stephen C. Tweedie', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Mark Williamson' D.G. Murray wrote: > Hi Mark, > void *xc_gnttab_map_grant_refs(int xcg_handle, > uint32_t count, > uint32_t *domids, > uint32_t *refs, > int prot); Fedora 8 has 3.1.2 packages now, still doesn't work for me though. Bored at xmas? Want try fixing it? Fetch xenner 0.15 from http://dl.bytesex.org/releases/xenner/, build ("make blkbackd"), run it as drop-in replacement for blktap. You have to pass the "-b" switch to make it try batching grant maps. Code is in ioreq_map(), blkbackd.c. Oh, and I think the limit should better be raised. 32 requests with up to 11 sectors each sums up to 352 pages. Which is way beyound the current 128 grants limit, so it may fail under heavy I/O load. cheers and happy xmas, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Xen-devel] Re: Next steps with pv_ops for Xen 2007-12-03 18:36 ` D.G. Murray ` (2 preceding siblings ...) 2007-12-21 12:58 ` Gerd Hoffmann @ 2007-12-21 12:58 ` Gerd Hoffmann 3 siblings, 0 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-21 12:58 UTC (permalink / raw) To: dgm36 Cc: xen-devel, 'Eduardo Habkost', 'Juan Quintela', 'Jan Beulich', 'Glauber de Oliveira Costa', 'Chris Wright', virtualization, 'Mark Williamson' D.G. Murray wrote: > Hi Mark, > void *xc_gnttab_map_grant_refs(int xcg_handle, > uint32_t count, > uint32_t *domids, > uint32_t *refs, > int prot); Fedora 8 has 3.1.2 packages now, still doesn't work for me though. Bored at xmas? Want try fixing it? Fetch xenner 0.15 from http://dl.bytesex.org/releases/xenner/, build ("make blkbackd"), run it as drop-in replacement for blktap. You have to pass the "-b" switch to make it try batching grant maps. Code is in ioreq_map(), blkbackd.c. Oh, and I think the limit should better be raised. 32 requests with up to 11 sectors each sums up to 352 pages. Which is way beyound the current 128 grants limit, so it may fail under heavy I/O load. cheers and happy xmas, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 14:51 ` Derek Murray 2007-12-03 17:18 ` Mark Williamson @ 2007-12-03 20:38 ` Gerd Hoffmann 2007-12-04 9:40 ` Derek Murray 1 sibling, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-03 20:38 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization [-- Attachment #1: Type: text/plain, Size: 1431 bytes --] Derek Murray wrote: > If we let Linux zap the page tables before we unmap the grant reference, > then it is not possible to unmap the grant reference. The > unmap_grant_ref hypercall ultimately calls destroy_grant_pte_mapping in > xen/arch/x86/mm.c, which ensures that the PTE does in fact point to the > granted frame. Hmm, I see. You have to do that for every mapping, not just the last (kernel) one to get release the grant. And just dropping that check is probably out of question because the guest could fool xen's reference counting then? > On this point I completely agree with you! If anyone has any less > radical suggestions, then I'd be delighted to refactor the gntdev code > to use them. However, I'm not currently aware of any alternative that > maintains robustness to process crashes. Oh, for me it isn't robust at all, it crashes on the first munmap syscall. It is the Fedora 8 kernel. See attachment. Didn't try xensource 2.6.18 yet. Ideas what is wrong? Who uses the gntdev device right now? > I think this would represent good progress, though I wonder if there > would be a performance penalty due to performing the mapping and > unmapping in user-space (multiple syscalls per mapping versus a single > hypercall). I'd expect the hard disk (and how I/O is scheduled) being the bottleneck, not the syscall overhead. Nevertheless I plan to benchmark it once I have it up and running. cheers, Gerd [-- Attachment #2: oops --] [-- Type: text/plain, Size: 25856 bytes --] Linux version 2.6.21-2952.fc8xen (kojibuilder@hammer2.fedora.redhat.com) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Mon Nov 19 07:06:55 EST 2007 BIOS-provided physical RAM map: sanitize start sanitize bail 0 copy_e820_map() start: 0000000000000000 size: 000000007491e000 end: 000000007491e000 type: 1 Xen: 0000000000000000 - 000000007491e000 (usable) 1137MB HIGHMEM available. 727MB LOWMEM available. NX (Execute Disable) protection: active Entering add_active_range(0, 0, 477470) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 186366 Normal 186366 -> 186366 HighMem 186366 -> 477470 early_node_map[1] active PFN ranges 0: 0 -> 477470 On node 0 totalpages: 477470 DMA zone: 1455 pages used for memmap DMA zone: 0 pages reserved DMA zone: 184911 pages, LIFO batch:31 Normal zone: 0 pages used for memmap HighMem zone: 2274 pages used for memmap HighMem zone: 288830 pages, LIFO batch:31 found SMP MP-table at 000ff780 DMI present. ACPI: RSDP 000F9990, 0014 (r0 ACPIAM) ACPI: RSDT 7D6B0000, 0044 (r1 A M I OEMRSDT 5000708 MSFT 97) ACPI: FACP 7D6B0200, 0084 (r2 A M I OEMFACP 5000708 MSFT 97) ACPI Warning (tbfadt-0360): Ignoring BIOS FADT r2 C-state control [20070126] ACPI: DSDT 7D6B0490, 6643 (r1 SDBLI9 SDBLI944 44 INTL 20051117) ACPI: FACS 7D6BE000, 0040 ACPI: APIC 7D6B0390, 006C (r1 A M I OEMAPIC 5000708 MSFT 97) ACPI: MCFG 7D6B0450, 003C (r1 A M I OEMMCFG 5000708 MSFT 97) ACPI: OEMB 7D6BE040, 0079 (r1 A M I AMI_OEM 5000708 MSFT 97) ACPI: ASF! 7D6B6AE0, 0099 (r32 LEGEND I865PASF 1 INTL 20051117) ACPI: GSCI 7D6BE0C0, 2024 (r1 A M I GMCHSCI 5000708 MSFT 97) ACPI: iEIT 7D6C00F0, 00B0 (r1 A M I EITTABLE 5000708 MSFT 97) ACPI: SSDT 7D6C0BC0, 0877 (r1 DpgPmm CpuPm 12 INTL 20051117) ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Detected 2992.804 MHz processor. Built 1 zonelists. Total pages: 473741 Kernel command line: ro root=/dev/xeni/fedora32 console=tty1 xencons=xvc0 console=xvc0 panic=30 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c136c000 soft=c134c000 PID hash table entries: 4096 (order: 12, 16384 bytes) Xen reported: 2992.594 MHz processor. Console: colour VGA+ 80x50 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Software IO TLB enabled: Aperture: 2 megabytes Kernel range: c033c000 - c053c000 Address size: 24 bits vmalloc area: ee000000-f4ffe000, maxmem 2d7fe000 Memory: 1866468k/1909880k available (2071k kernel code, 34068k reserved, 1080k data, 188k init, 1164416k highmem) virtual kernel memory layout: fixmap : 0xf5315000 - 0xf57fe000 (5028 kB) pkmap : 0xf5000000 - 0xf5200000 (2048 kB) vmalloc : 0xee000000 - 0xf4ffe000 ( 111 MB) lowmem : 0xc0000000 - 0xed7fe000 ( 727 MB) .init : 0xc1319000 - 0xc1348000 ( 188 kB) .data : 0xc1205e5e - 0xc1313fd4 (1080 kB) .text : 0xc1000000 - 0xc1205e5e (2071 kB) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 5991.67 BogoMIPS (lpj=11983344) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebd3f1 20100000 00000000 00000000 0000e3fd 00000000 00000001 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU: After all inits, caps: bfebd3f1 20100000 00000000 00003940 0000e3fd 00000000 00000001 Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code ACPI: Core revision 20070126 CPU 1 irqstacks, hard=c136d000 soft=c134d000 ENABLING IO-APIC IRQs SMP alternatives: switching to SMP code Brought up 2 CPUs Initializing CPU#1 sizeof(vma)=88 bytes sizeof(page)=32 bytes sizeof(inode)=336 bytes sizeof(dentry)=132 bytes sizeof(ext3inode)=488 bytes sizeof(buffer_head)=56 bytes sizeof(skbuff)=176 bytes sizeof(task_struct)=1376 bytes migration_cost=19 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 Setting up standard PCI resources Allocating PCI resources starting at 80000000 (gap: 7d6b0000:82950000) ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing Error attaching device data Error attaching device data Error attaching device data Error attaching device data ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) Boot video device is 0000:00:02.0 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P6._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 12 14 *15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 12 *14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 *10 12 14 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 *5 6 7 10 12 14 15) ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 10 12 14 15) ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 12 14 *15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 23 devices xen_mem: Initialising balloon driver. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default pnp: 00:01: iomem range 0xfed14000-0xfed19fff has been reserved pnp: 00:0a: ioport range 0xa20-0xa3f has been reserved pnp: 00:0a: ioport range 0xa00-0xa0f has been reserved pnp: 00:0a: ioport range 0xa10-0xa1f has been reserved pnp: 00:0a: ioport range 0xa40-0xa5f has been reserved pnp: 00:0b: iomem range 0xfed1c000-0xfed1ffff has been reserved pnp: 00:0b: iomem range 0xfed20000-0xfed8ffff has been reserved pnp: 00:0e: iomem range 0xffc00000-0xffefffff has been reserved pnp: 00:0f: iomem range 0xfec00000-0xfec00fff has been reserved pnp: 00:0f: iomem range 0xfee00000-0xfee00fff has been reserved pnp: 00:10: iomem range 0xe0000000-0xefffffff has been reserved pnp: 00:11: iomem range 0xffa77000-0xffa77fff has been reserved pnp: 00:12: iomem range 0x0-0x0 could not be reserved pnp: 00:13: iomem range 0x0-0x0 could not be reserved pnp: 00:14: iomem range 0x0-0x0 could not be reserved pnp: 00:15: iomem range 0x0-0x0 could not be reserved pnp: 00:16: iomem range 0x0-0x9ffff could not be reserved pnp: 00:16: iomem range 0xc0000-0xcffff could not be reserved pnp: 00:16: iomem range 0xe0000-0xfffff could not be reserved pnp: 00:16: iomem range 0x100000-0x7d6fffff could not be reserved PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0 PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:02:00.0 IO window: disabled. MEM window: ff600000-ff6fffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1c.2 IO window: disabled. MEM window: ff600000-ff6fffff PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:1c.0 to 64 ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1c.2 to 64 PCI: Setting latency timer of device 0000:02:00.0 to 64 PCI: Setting latency timer of device 0000:00:1e.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1572864 bytes) TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered checking if image is initramfs... it is Freeing initrd memory: 8554k freed audit: initializing netlink socket (disabled) audit(1196713193.812:1): initialized highmem bounce pool size: 64 pages VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) SELinux: Registering netfilter hooks ksign: Installing public key data Loading keyring io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) PCI: Setting latency timer of device 0000:00:1c.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:1c.0:pcie00] Allocate Port Service[0000:00:1c.0:pcie02] PCI: Setting latency timer of device 0000:00:1c.2 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:1c.2:pcie00] Allocate Port Service[0000:00:1c.2:pcie02] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI Warning (tbutils-0158): Incorrect checksum in table [OEMB] - 5E, should be 4F [20070126] ACPI: SSDT 7D6C01A0, 02F4 (r1 DpgPmm P001Ist 11 INTL 20051117) Monitor-Mwait will be used to enter C-1 state ACPI: CPU0 (power states: C1[C1] C2[C2]) ACPI: SSDT 7D6C06B0, 02F4 (r1 DpgPmm P002Ist 12 INTL 20051117) ACPI: CPU1 (power states: C1[C1] C2[C2]) ACPI Exception (processor_core-0783): AE_NOT_FOUND, Processor Device is not present [20070126] ACPI Exception (processor_core-0783): AE_NOT_FOUND, Processor Device is not present [20070126] Real Time Clock Driver v1.12ac Non-volatile memory driver v1.2 Linux agpgart interface v0.102 (c) Dave Jones RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize input: Macintosh mouse button emulation as /class/input/input0 Xen virtual console successfully installed as xvc0 Event-channel device installed. usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice TCP bic registered Initializing XFRM netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 Using IPI No-Shortcut mode drivers/rtc/hctosys.c: unable to open rtc device (rtc0) Freeing unused kernel memory: 188k freed Write protecting the kernel read-only data: 795k ACPI: PCI Interrupt 0000:00:1a.7[D] -> GSI 19 (level, low) -> IRQ 18 PCI: Setting latency timer of device 0000:00:1a.7 to 64 ehci_hcd 0000:00:1a.7: EHCI Host Controller ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:1a.7: debug port 1 PCI: cache line size of 32 is not supported by device 0000:00:1a.7 ehci_hcd 0000:00:1a.7: irq 18, io mem 0xffa7b400 ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 6 ports detected ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2 ehci_hcd 0000:00:1d.7: debug port 1 PCI: cache line size of 32 is not supported by device 0000:00:1d.7 ehci_hcd 0000:00:1d.7: irq 19, io mem 0xffa7b000 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 6 ports detected ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver USB Universal Host Controller Interface driver v3.0 ACPI: PCI Interrupt 0000:00:1a.0[A] -> GSI 16 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:1a.0 to 64 uhci_hcd 0000:00:1a.0: UHCI Host Controller uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1a.0: irq 20, io base 0x0000d880 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1a.1[B] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:1a.1 to 64 uhci_hcd 0000:00:1a.1: UHCI Host Controller uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1a.1: irq 21, io base 0x0000d800 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1a.2[C] -> GSI 18 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1a.2 to 64 uhci_hcd 0000:00:1a.2: UHCI Host Controller uhci_hcd 0000:00:1a.2: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1a.2: irq 17, io base 0x0000d480 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1d.0 to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 6 uhci_hcd 0000:00:1d.0: irq 19, io base 0x0000d400 usb usb6: configuration #1 chosen from 1 choice hub 6-0:1.0: USB hub found hub 6-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 18 PCI: Setting latency timer of device 0000:00:1d.1 to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 7 uhci_hcd 0000:00:1d.1: irq 18, io base 0x0000d080 usb usb7: configuration #1 chosen from 1 choice hub 7-0:1.0: USB hub found hub 7-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.2[D] -> GSI 16 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:1d.2 to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 8 uhci_hcd 0000:00:1d.2: irq 20, io base 0x0000d000 usb usb8: configuration #1 chosen from 1 choice hub 8-0:1.0: USB hub found hub 8-0:1.0: 2 ports detected SCSI subsystem initialized libata version 2.20 loaded. ahci 0000:00:1f.2: version 2.1 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 18 usb 6-1: new low speed USB device using uhci_hcd and address 2 usb 6-1: configuration #1 chosen from 1 choice input: SOLIDTEK USB Composite Keyboard as /class/input/input1 input: USB HID v1.10 Keyboard [SOLIDTEK USB Composite Keyboard] on usb-0000:00:1d.0-1 input: SOLIDTEK USB Composite Keyboard as /class/input/input2 input,hiddev96: USB HID v1.10 Device [SOLIDTEK USB Composite Keyboard] on usb-0000:00:1d.0-1 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports ? Gbps 0x3f impl SATA mode ahci 0000:00:1f.2: flags: 64bit ncq stag led clo pmp pio slum part ata1: SATA max UDMA/133 cmd 0xee074900 ctl 0x00000000 bmdma 0x00000000 irq 18 ata2: SATA max UDMA/133 cmd 0xee074980 ctl 0x00000000 bmdma 0x00000000 irq 18 ata3: SATA max UDMA/133 cmd 0xee074a00 ctl 0x00000000 bmdma 0x00000000 irq 18 ata4: SATA max UDMA/133 cmd 0xee074a80 ctl 0x00000000 bmdma 0x00000000 irq 18 ata5: SATA max UDMA/133 cmd 0xee074b00 ctl 0x00000000 bmdma 0x00000000 irq 18 ata6: SATA max UDMA/133 cmd 0xee074b80 ctl 0x00000000 bmdma 0x00000000 irq 18 scsi0 : ahci ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ata_hpa_resize 1: sectors = 321672960, hpa_sectors = 321672960 ata1.00: ATA-7: Hitachi HDS721616PLA380, P22OAB3A, max UDMA/133 ata1.00: 321672960 sectors, multi 0: LBA48 NCQ (depth 31/32) ata1.00: ata_hpa_resize 1: sectors = 321672960, hpa_sectors = 321672960 ata1.00: configured for UDMA/133 scsi1 : ahci ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATAPI, max UDMA/66 ata2.00: configured for UDMA/66 scsi2 : ahci ata3: SATA link down (SStatus 0 SControl 300) scsi3 : ahci ata4: SATA link down (SStatus 0 SControl 300) scsi4 : ahci ata5: SATA link down (SStatus 0 SControl 300) scsi5 : ahci ata6: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA Hitachi HDS72161 P22O PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 321672960 512-byte hardware sectors (164697 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 321672960 512-byte hardware sectors (164697 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk scsi 1:0:0:0: CD-ROM PLEXTOR DVDR PX-755A 1.04 PQ: 0 ANSI: 5 device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. ata1: D2H reg with I during NCQ, this message won't be printed again kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. security: 5 users, 11 roles, 2391 types, 114 bools, 1 sens, 1024 cats security: 67 classes, 215624 rules SELinux: Completing initialization. SELinux: Setting up existing superblocks. SELinux: initialized (dev dm-0, type ext3), uses xattr SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev devpts, type devpts), uses transition SIDs SELinux: initialized (dev eventpollfs, type eventpollfs), uses task SIDs SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev futexfs, type futexfs), uses genfs_contexts SELinux: initialized (dev pipefs, type pipefs), uses task SIDs SELinux: initialized (dev sockfs, type sockfs), uses task SIDs SELinux: initialized (dev cpuset, type cpuset), uses genfs_contexts SELinux: initialized (dev proc, type proc), uses genfs_contexts SELinux: initialized (dev bdev, type bdev), uses genfs_contexts SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts audit(1196713206.804:2): policy loaded auid=4294967295 sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 1:0:0:0: Attached scsi CD-ROM sr0 sd 0:0:0:0: Attached scsi generic sg0 type 0 sr 1:0:0:0: Attached scsi generic sg1 type 5 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled input: PC Speaker as /class/input/input3 8250_pnp: Unknown symbol serial8250_unregister_port 8250_pnp: Unknown symbol serial8250_register_port 8250_pnp: Unknown symbol serial8250_unregister_port 8250_pnp: Unknown symbol serial8250_register_port serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A FDC 0 is a National Semiconductor PC87306 Intel(R) PRO/1000 Network Driver - version 7.5.5.1-NAPI Copyright (c) 1999-2007 Intel Corporation. ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:19.0 to 64 e1000: 0000:00:19.0: e1000_probe: PHY reset is blocked due to SOL/IDER session. parport: PnPBIOS parport detected. parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE,EPP] e1000: 0000:00:19.0: e1000_check_copper_options: Link active due to SoL/IDER Session. Speed/Duplex/AutoNeg parameter ignored. e1000: 0000:00:19.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:13:20:f5:f8:50 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt 0000:00:1f.3[C] -> GSI 18 (level, low) -> IRQ 17 ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 22 (level, low) -> IRQ 23 PCI: Setting latency timer of device 0000:00:1b.0 to 64 serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ACPI: PCI Interrupt 0000:00:03.3[B] -> GSI 17 (level, low) -> IRQ 16 device-mapper: multipath: version 1.0.5 loaded loop: loaded (max 8 devices) EXT3 FS on dm-0, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on dm-2, internal journal EXT3-fs: mounted filesystem with ordered data mode. SELinux: initialized (dev dm-2, type ext3), uses xattr SELinux: initialized (dev sda1, type ext2), uses xattr SELinux: initialized (dev sda2, type ext2), uses xattr SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs Adding 4194296k swap on /dev/mapper/xeni-swap. Priority:-1 extents:1 across:4194296k SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts IA-32 Microcode Update Driver: v1.14a-xen <tigran@veritas.com> NET: Registered protocol family 10 lo: Disabled Privacy Extensions Mobile IPv6 ip6_tables: (C) 2000-2006 Netfilter Core Team ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) ADDRCONF(NETDEV_UP): eth0: link is not ready e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready audit(1196713221.215:3): audit_pid=1901 old=0 by auid=4294967295 subj=system_u:system_r:auditd_t:s0 SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts eth0: no IPv6 routers present SELinux: initialized (dev autofs, type autofs), uses genfs_contexts SELinux: initialized (dev autofs, type autofs), uses genfs_contexts SELinux: initialized (dev autofs, type autofs), uses genfs_contexts Bridge firewalling registered ADDRCONF(NETDEV_UP): peth0: link is not ready e1000: peth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None e1000: peth0: e1000_watchdog_task: 10/100 speed: disabling TSO ADDRCONF(NETDEV_CHANGE): peth0: link becomes ready device peth0 entered promiscuous mode eth0: port 1(peth0) entering learning state eth0: topology change detected, propagating eth0: port 1(peth0) entering forwarding state virbr0: no IPv6 routers present peth0: no IPv6 routers present eth0: no IPv6 routers present xen-vbd: registered block device major 202 blkfront: xvda: barriers enabled xvda:<0>Eeek! page_mapcount(page) went negative! (-1) page pfn = 29384 page->flags = 835 page->count = 3 page->mapping = eb093d98 vma->vm_ops = gntdev_vmops+0x0/0x34 vma->vm_ops->nopage = 0x0 vma->vm_file->f_op->mmap = gntdev_mmap+0x0/0x467 ------------[ cut here ]------------ kernel BUG at mm/rmap.c:574! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/xen/vbd-51712/devtype Modules linked in: xenblk(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) bridge(U) autofs4(U) sunrpc(U) nf_conntrack_netbios_ns(U) nf_conntrack_ipv4(U) xt_state(U) nf_conntrack(U) nfnetlink(U) ipt_REJECT(U) iptable_filter(U) ip_tables(U) xt_tcpudp(U) ip6t_REJECT(U) ip6table_filter(U) ip6_tables(U) x_tables(U) ipv6(U) ext2(U) loop(U) dm_multipath(U) 8250_pci(U) snd_hda_intel(U) snd_hda_codec(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) parport_pc(U) snd(U) i2c_i801(U) ata_generic(U) e1000(U) parport(U) floppy(U) serio_raw(U) pcspkr(U) soundcore(U) i2c_core(U) 8250(U) snd_page_alloc(U) sg(U) sr_mod(U) serial_core(U) cdrom(U) ata_piix(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) ahci(U) liba ta(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) mbcache(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) CPU: 1 EIP: 0061:[<c1065c32>] Not tainted VLI EFLAGS: 00210282 (2.6.21-2952.fc8xen #1) EIP is at page_remove_rmap+0xce/0xed eax: 00000036 ebx: c2371080 ecx: 00000001 edx: 00000000 esi: e92aed84 edi: 00000020 ebp: 53584067 esp: e9245ea4 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0069 Process blkbackd (pid: 3154, ti=e9245000 task=c1d95150 task.ti=e9245000) Stack: c1291346 eb093d98 c1156c6b 00000000 c105d7fa 00000000 00000000 ffffffef b7f03000 e92aed84 e9245f68 53584067 00000000 003ff000 b7f03000 00000000 00000000 b7f04000 e95a1010 00000000 eb128580 e92df818 c2371080 c1c64900 Call Trace: [<c1156c6b>] gntdev_clear_pte+0x0/0x289 [<c105d7fa>] unmap_vmas+0x62e/0x8bd [<c101d6f4>] __wake_up+0x32/0x43 [<c106269f>] unmap_region+0x93/0xf7 [<c1063073>] do_munmap+0x15a/0x1ac [<c10630f5>] sys_munmap+0x30/0x3e [<c1005688>] syscall_call+0x7/0xb ======================= Code: c0 74 0d 8b 50 08 b8 76 13 29 c1 e8 35 af fd ff 8b 46 4c 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 95 13 29 c1 e8 1a af fd ff <0f> 0b eb fe 8b 53 10 89 d8 59 5b 5b 83 e2 01 5e f7 da 83 c2 04 EIP: [<c1065c32>] page_remove_rmap+0xce/0xed SS:ESP 0069:e9245ea4 [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-03 20:38 ` Gerd Hoffmann @ 2007-12-04 9:40 ` Derek Murray 2007-12-04 12:01 ` Gerd Hoffmann 2007-12-04 20:59 ` Ian Main 0 siblings, 2 replies; 57+ messages in thread From: Derek Murray @ 2007-12-04 9:40 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Gerd Hoffmann wrote: >> On this point I completely agree with you! If anyone has any less >> radical suggestions, then I'd be delighted to refactor the gntdev code >> to use them. However, I'm not currently aware of any alternative that >> maintains robustness to process crashes. > > Oh, for me it isn't robust at all, it crashes on the first munmap > syscall. It is the Fedora 8 kernel. See attachment. Didn't try > xensource 2.6.18 yet. My gut feeling is that something changed in mm between 2.6.18 and 2.6.21, but that seems like a cop out so... > Ideas what is wrong? Since the bug appears to be in page_remove_rmap, that would tend to imply that there is never a corresponding page_add_*_rmap (page_add_file_rmap?). My knowledge of the Linux mm code is a bit shaky here: should gntdev be doing this? Should we be using install_page (or a modified version thereof) to set the PTE? Also, does a simple program that opens gntdev, maps a grant, accesses/writes to the page, and unmaps it (all using the xc_gnttab_* functions) work? > Who uses the gntdev device right now? Good question! I'm aware of it being used in a few research projects, and it seems to work for them (though I think it is mostly used with the linux-2.6.18-xen kernel). Anyone else? >> I think this would represent good progress, though I wonder if there >> would be a performance penalty due to performing the mapping and >> unmapping in user-space (multiple syscalls per mapping versus a single >> hypercall). > > I'd expect the hard disk (and how I/O is scheduled) being the > bottleneck, not the syscall overhead. Nevertheless I plan to benchmark > it once I have it up and running. Great to hear that you're working on this! Let me know if there's any other help I can provide with gntdev. Cheers, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 9:40 ` Derek Murray @ 2007-12-04 12:01 ` Gerd Hoffmann 2007-12-04 12:39 ` Stephen C. Tweedie 2007-12-04 20:59 ` Ian Main 1 sibling, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-04 12:01 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Derek Murray wrote: > Gerd Hoffmann wrote: >> Oh, for me it isn't robust at all, it crashes on the first munmap >> syscall. It is the Fedora 8 kernel. See attachment. Didn't try >> xensource 2.6.18 yet. > > My gut feeling is that something changed in mm between 2.6.18 and > 2.6.21, but that seems like a cop out so... Could be. Cross checking failed thouth, 2.6.18 doesn't boot the machine in question (intel devel box with ich9). Doesn't finds the disk. Probably the ahci driver is too old. >> Ideas what is wrong? > > Since the bug appears to be in page_remove_rmap, that would tend to > imply that there is never a corresponding page_add_*_rmap > (page_add_file_rmap?). My knowledge of the Linux mm code is a bit shaky > here: should gntdev be doing this? Should we be using install_page (or a > modified version thereof) to set the PTE? Don't know, I'm just trying to use it. I did some mm handling for device drivers back in my video4linux days, but for that it wasn't needed to be involved into setting/clearing pte entries. I just had a ->nopage handler allocate the pages the way I needed it for the userspace mappings of video dma buffers. > Also, does a simple program that opens gntdev, maps a grant, > accesses/writes to the page, and unmaps it (all using the xc_gnttab_* > functions) work? Didn't try yet. The application in question (blkbackd) does this: * map blk shared ring * see the first request come in (kernel trying to read the partition table). * map the grants of the request. * perform I/O. * Try to unmap the grants of the request. On the first unmap call the kernel oopses. This all without even starting a guest, I'm just using "xm block-attach" to create a blkfront device in Dom0. >> Who uses the gntdev device right now? > > Good question! I'm aware of it being used in a few research projects, > and it seems to work for them (though I think it is mostly used with the > linux-2.6.18-xen kernel). Anyone else? So it effectively got no real-world testing yet ... cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 12:01 ` Gerd Hoffmann @ 2007-12-04 12:39 ` Stephen C. Tweedie 2007-12-04 19:58 ` Gerd Hoffmann ` (3 more replies) 0 siblings, 4 replies; 57+ messages in thread From: Stephen C. Tweedie @ 2007-12-04 12:39 UTC (permalink / raw) To: Gerd Hoffmann Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Stephen Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Hi, On Tue, 2007-12-04 at 13:01 +0100, Gerd Hoffmann wrote: > >> Who uses the gntdev device right now? > > > > Good question! I'm aware of it being used in a few research projects, > > and it seems to work for them (though I think it is mostly used with the > > linux-2.6.18-xen kernel). Anyone else? > > So it effectively got no real-world testing yet ... So... the interface (a) cannot be used on the Linux VM without at least one invasive VM modification, due to the requirement of ptes being explicitly unmapped via hypercall; and (b) isn't used significantly in real life yet. I can't help wondering if this is a hint that now is the time to find a better API, which doesn't have the requirement (a) that seems to be causing such trouble? Are other PV guests --- *BSD, Solaris --- going to have the same problems with their VM layers if they try to implement this API? Upstream Linux pv_ops certainly will, and it would be good if we could avoid tying unprivileged guests to ABIs which cannot hope to be merged into pv_ops. (Just what is the cost of not having this functionality in blktap, anyway?) --Stephen ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 12:39 ` Stephen C. Tweedie @ 2007-12-04 19:58 ` Gerd Hoffmann 2007-12-05 11:48 ` [Xen-devel] " Derek Murray ` (2 more replies) 2007-12-04 21:08 ` Ian Main ` (2 subsequent siblings) 3 siblings, 3 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-04 19:58 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization [-- Attachment #1: Type: text/plain, Size: 1883 bytes --] Stephen C. Tweedie wrote: > Hi, > > On Tue, 2007-12-04 at 13:01 +0100, Gerd Hoffmann wrote: > >>>> Who uses the gntdev device right now? >>> Good question! I'm aware of it being used in a few research projects, >>> and it seems to work for them (though I think it is mostly used with the >>> linux-2.6.18-xen kernel). Anyone else? >> So it effectively got no real-world testing yet ... > > So... the interface (a) cannot be used on the Linux VM without at least > one invasive VM modification, due to the requirement of ptes being > explicitly unmapped via hypercall; and (b) isn't used significantly in > real life yet. (c) seems not to work for anything non-trivial. I've compiled and tested a xensource 2.6.18 kernel (3.1 testing mercurial tree head, should be 3.1.2-release), it fails in a simliar way. See attachment. Want reproduce? Here we go: * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/ * grab a xenified dom0 kernel without blktap driver (either not compiled or module not loaded). * start xend * start blkbackd from xenner package (you probably want the -d switch for debug output, twice for more). * run "xm block-attach 0 tap:aio:/path/to/some/file xvda r" * watch it blow up ;) > I can't help wondering if this is a hint that now is the time to find a > better API, which doesn't have the requirement (a) that seems to be > causing such trouble? Are other PV guests --- *BSD, Solaris --- going > to have the same problems with their VM layers if they try to implement > this API? Upstream Linux pv_ops certainly will, and it would be good if > we could avoid tying unprivileged guests to ABIs which cannot hope to be > merged into pv_ops. And I fear the problems I've trapped into up to now is only the tip of the iceberg. What happens if an application with active grant table mappings calls fork() ? cheers, Gerd [-- Attachment #2: oops --] [-- Type: text/plain, Size: 15610 bytes --] Linux version 2.6.18-xen (kraxel@zweiblum.travel.kraxel.org) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Tue Dec 4 18:17:24 CET 2007 BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000000adc3000 (usable) 0MB HIGHMEM available. 173MB LOWMEM available. On node 0 totalpages: 44483 DMA zone: 44483 pages, LIFO batch:7 DMI 2.3 present. ACPI: RSDP (v000 OID_00 ) @ 0x000f0010 ACPI: RSDT (v001 OID_00 RSDT_000 0x30303030 & 0x00010000) @ 0x0bfffbd0 ACPI: FADT (v001 OID_00 FACP_000 0x30303030 & 0x00010000) @ 0x0bfffb20 ACPI: BOOT (v001 OID_00 BOOT_000 0x30303030 & 0x00010000) @ 0x0bfffba0 ACPI: DSDT (v001 INT440 SYSFexxx 0x00001001 MSFT 0x0100000b) @ 0x00000000 ACPI: Vendor "INT440" System "SYSFexxx" Revision 0x1001 has a known ACPI BIOS problem. ACPI: Reason: Does not use _REG to protect EC OpRegions. This is a non-recoverable error ACPI: Disabling ACPI support Allocating PCI resources starting at 10000000 (gap: 0c000000:f3fc0000) Detected 600.047 MHz processor. Built 1 zonelists. Total pages: 44483 Kernel command line: ro root=/dev/zen/rhel5 apm=off vga=0x317 panic=30 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 1024 (order: 10, 4096 bytes) Xen reported: 600.034 MHz processor. Console: colour VGA+ 80x50 Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) Software IO TLB enabled: Aperture: 2 megabytes Kernel range: c0aad000 - c0cad000 Address size: 24 bits vmalloc area: cb800000-f51fe000, maxmem 2d7fe000 Memory: 155572k/177932k available (1972k kernel code, 14020k reserved, 693k data, 192k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 1502.07 BogoMIPS (lpj=7510358) Security Framework v1.0.0 initialized Capability LSM initialized Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0387d1f1 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0387d1f1 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. CPU: After all inits, caps: 0383d1f1 00000000 00000000 00000040 00000000 00000000 00000000 Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code Freeing SMP alternatives: 12k freed Brought up 1 CPUs migration_cost=0 checking if image is initramfs... it is Freeing initrd memory: 6538k freed NET: Registered protocol family 16 PCI: Using configuration type 1 Setting up standard PCI resources ACPI: Interpreter disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI: disabled xen_mem: Initialising balloon driver. PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) PCI quirk: region 1000-103f claimed by PIIX4 ACPI PCI quirk: region 1400-140f claimed by PIIX4 SMB PIIX4 devres C PIO at 0398-0399 Boot video device is 0000:00:09.0 PCI: Using IRQ router PIIX/ICH [8086/7198] at 0000:00:07.0 PCI: Cannot allocate resource region 0 of device 0000:00:0b.0 PCI: Bus 1, cardbus bridge: 0000:00:08.0 IO window: 00001c00-00001cff IO window: 00002000-000020ff PREFETCH window: 10000000-11ffffff MEM window: 12000000-13ffffff PCI: setting IRQ 10 as level-triggered PCI: Found IRQ 10 for device 0000:00:08.0 NET: Registered protocol family 2 IP route cache hash table entries: 2048 (order: 1, 8192 bytes) TCP established hash table entries: 8192 (order: 4, 65536 bytes) TCP bind hash table entries: 4096 (order: 3, 32768 bytes) TCP: Hash tables configured (established 8192 bind 4096) TCP reno registered Simple Boot Flag at 0x37 set to 0x1 IA-32 Microcode Update Driver: v1.14a-xen <tigran@veritas.com> audit: initializing netlink socket (disabled) audit(1196794944.970:1): initialized VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) floppy0: no floppy controllers found RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize loop: loaded (max 8 devices) Xen virtual console successfully installed as ttyS0 Event-channel device installed. Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PNP: No PS/2 controller found. Probing ports directly. serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 mice: PS/2 mouse device common for all mice md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 NET: Registered protocol family 1 NET: Registered protocol family 17 Using IPI No-Shortcut mode Freeing unused kernel memory: 192k freed piix: no version for "struct_module" found: kernel tainted. PIIX4: IDE controller at PCI slot 0000:00:07.1 PIIX4: chipset revision 0 PIIX4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x1100-0x1107, BIOS settings: hda:DMA, hdb:pio Probing IDE interface ide0... input: AT Translated Set 2 keyboard as /class/input/input0 hda: HTS548040M9AT00, ATA DISK drive input: PS/2 Mouse as /class/input/input1 input: AlpsPS/2 ALPS GlidePoint as /class/input/input2 ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 usbcore: registered new driver usbfs usbcore: registered new driver hub USB Universal Host Controller Interface driver v3.0 PCI: IRQ 11 for device 0000:00:07.2 doesn't match PIRQ mask - try pci=usepirqmask <7>PCI: setting IRQ 11 as level-triggered PCI: Found IRQ 11 for device 0000:00:07.2 PCI: Sharing IRQ 11 with 0000:00:0a.0 uhci_hcd 0000:00:07.2: UHCI Host Controller uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:07.2: irq 11, io base 0x00001200 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) hda: max request size: 512KiB usb 1-2: new full speed USB device using uhci_hcd and address 2 hda: 78140160 sectors (40007 MB) w/7877KiB Cache, CHS=16383/255/63, UDMA(33) hda: cache flushes supported hda:<6>usb 1-2: configuration #1 chosen from 1 choice hub 1-2:1.0: USB hub found hub 1-2:1.0: 3 ports detected hda1 hda2 hda3 < hda5 > hda4 device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. Real Time Clock Driver v1.12ac input: PC Speaker as /class/input/input3 piix4_smbus 0000:00:07.3: Found 0000:00:07.3 device Yenta: CardBus bridge found at 0000:00:08.0 [1071:7722] Yenta: Enabling burst memory read transactions Yenta: Using CSCINT to route CSC interrupts to PCI Yenta: Routing CardBus interrupts to PCI Yenta TI: socket 0000:00:08.0, mfunc 0x017c1602, devctl 0x64 Intel 810 + AC97 Audio, version 1.01, 18:04:40 Dec 4 2007 Yenta: ISA IRQ mask 0x02d8, PCI irq 10 Socket status: 30000010 PCI: Setting latency timer of device 0000:00:00.1 to 64 i810: Intel 440MX found at IO 0x1500 and 0x1600, MEM 0x0000 and 0x0000, IRQ 5 ieee1394: Initialized config rom entry `ip1394' i810_audio: Audio Controller supports 2 channels. i810_audio: Defaulting to base 2 channel mode. i810_audio: Resetting connection 0 ac97_codec: AC97 Audio codec, id: CRY52 (Cirrus Logic CS4299 rev D) i810_audio: AC'97 codec 0 supports AMAP, total channels = 2 i810_audio: setting clocking to 38348 PCI: Found IRQ 10 for device 0000:00:0b.0 ohci1394: fw-host0: SelfID received outside of bus reset sequence ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[10] MMIO=[14021000-140217ff] Max Packet=[1024] IR/IT contexts=[4/4] pccard: PCMCIA card inserted into slot 0 8139cp: 10/100 PCI Ethernet driver v1.2 (Mar 22, 2004) 8139cp 0000:00:0a.0: This (id 10ec:8139 rev 10) is not an 8139C+ compatible chip 8139cp 0000:00:0a.0: Try the "8139too" driver instead. 8139too Fast Ethernet driver 0.9.27 PCI: Found IRQ 11 for device 0000:00:0a.0 PCI: Sharing IRQ 11 with 0000:00:07.2 eth0: RealTek RTL8139 at 0xcb980000, 00:40:d0:12:f3:b4, IRQ 11 eth0: Identified 8139 chip type 'RTL-8139B' PCI: Setting latency timer of device 0000:00:00.2 to 64 evbug.c: Connected device: "AT Translated Set 2 keyboard", isa0060/serio0/input0 evbug.c: Connected device: "PS/2 Mouse", isa0060/serio1/input1 evbug.c: Connected device: "AlpsPS/2 ALPS GlidePoint", isa0060/serio1/input0 evbug.c: Connected device: "PC Speaker", isa0061/input0 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled ts: Compaq touchscreen protocol output 8250_pci: Unknown symbol serial8250_unregister_port 8250_pci: Unknown symbol serial8250_resume_port 8250_pci: Unknown symbol serial8250_register_port 8250_pci: Unknown symbol serial8250_suspend_port ieee1394: Host added: ID:BUS[0-00:1023] GUID[0040d00100000b49] cs: memory probe 0x0c0000-0x0fffff: excluding 0xc0000-0xcffff 0xe0000-0xfffff cs: memory probe 0x60000000-0x60ffffff: clean. cs: memory probe 0xa0000000-0xa0ffffff: clean. pcmcia: registering new device pcmcia0.0 orinoco 0.15 (David Gibson <hermes@gibson.dropbear.id.au>, Pavel Roskin <proski@gnu.org>, et al) orinoco_cs 0.15 (David Gibson <hermes@gibson.dropbear.id.au>, Pavel Roskin <proski@gnu.org>, et al) pcmcia: request for exclusive IRQ could not be fulfilled. pcmcia: the driver needs updating to supported shared IRQ lines. eth1: Hardware identity 8008:0000:0001:0000 eth1: Station identity 001f:0004:0001:0003 eth1: Firmware determined as Intersil 1.3.4 eth1: Ad-hoc demo mode supported eth1: IEEE standard IBSS ad-hoc mode supported eth1: WEP supported, 104-bit key eth1: MAC address 00:30:AB:0F:69:F6 eth1: Station name "Prism I" eth1: ready eth1: orinoco_cs at 0.0, irq 10, io 0x0100-0x013f Non-volatile memory driver v1.2 lp: driver loaded but no devices found md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. device-mapper: multipath: version 1.0.4 loaded EXT3 FS on dm-1, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on dm-2, internal journal EXT3-fs: mounted filesystem with ordered data mode. Adding 1048568k swap on /dev/zen/swap. Priority:-1 extents:1 across:1048568k NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver ip6_tables: (C) 2000-2006 Netfilter Core Team ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. ip_conntrack version 2.4 (1390 buckets, 11120 max) - 228 bytes per conntrack process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.lo.retrans_time; Use net.ipv6.neigh.lo.retrans_time_ms instead. eth0: link down ADDRCONF(NETDEV_UP): eth0: link is not ready ADDRCONF(NETDEV_UP): eth1: link is not ready eth1: New link status: Connected (0001) ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready audit(1196794998.576:2): audit_pid=3073 old=0 by auid=4294967295 tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> Bluetooth: Core ver 2.10 NET: Registered protocol family 31 Bluetooth: HCI device and connection manager initialized Bluetooth: HCI socket layer initialized Bluetooth: L2CAP ver 2.8 Bluetooth: L2CAP socket layer initialized Bluetooth: RFCOMM socket layer initialized Bluetooth: RFCOMM TTY layer initialized Bluetooth: RFCOMM ver 1.8 eth1: no IPv6 routers present Bluetooth: HIDP (Human Interface Emulation) ver 1.1 Bridge firewalling registered openvpn0: no IPv6 routers present virbr0: no IPv6 routers present xen-vbd: registered block device major 202 blkfront: xvda: barriers enabled xvda:<0>------------[ cut here ]------------ kernel BUG at /home/kraxel/xen/xen31/linux-2.6.18-xen/mm/rmap.c:522! invalid opcode: 0000 [#1] SMP Modules linked in: xenblk ipt_MASQUERADE iptable_nat ip_nat bridge hidp rfcomm l2cap bluetooth tun sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc dm_multipath parport_pc lp parport nvram orinoco_cs orinoco hermes joydev pcmcia firmware_class tsdev evbug evdev serial_core snd_intel8x0m snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 8139too serio_raw 8139cp mii snd_intel8x0 snd_ac97_codec snd_ac97_bus ohci1394 ieee1394 snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i810_audio snd_timer ac97_codec snd snd_page_alloc soundcore yenta_socket rsrc_nonstatic pcmcia_core i2c_piix4 i2c_core pcspkr rtc dm_snapshot dm_zero dm_mirror dm_mod ide_disk ext3 jbd ehci_hcd ohci_hcd uhci_hcd usbcore piix CPU: 0 EIP: 0061:[<c0169688>] Tainted: GF VLI EFLAGS: 00010286 (2.6.18-xen #1) EIP is at page_remove_rmap+0x28/0x40 eax: ffffffff ebx: c1080780 ecx: c1080780 edx: 00000000 esi: c4e65a14 edi: 00000020 ebp: c536ab80 esp: c407bea8 ds: 007b es: 007b ss: 0069 Process blkbackd (pid: 3973, ti=c407a000 task=c05eda30 task.ti=c407a000) Stack: c0160b51 c536ab80 00000000 c05eda30 00000000 00000000 00000002 c01ea764 b7f70000 c4e65a14 c407bf68 07a3c067 00000000 003ff000 b7f70000 00000000 00000000 b7f71000 c4ccd010 c99e9740 c1080780 c1161c00 00000000 ffffffff Call Trace: [<c0160b51>] unmap_vmas+0x4a1/0x910 [<c01ea764>] copy_from_user+0x34/0x80 [<c016594b>] unmap_region+0x9b/0x120 [<c016645c>] do_munmap+0x14c/0x1e0 [<c0166522>] sys_munmap+0x32/0x50 [<c010568f>] syscall_call+0x7/0xb Code: 00 00 00 89 c1 90 83 40 08 ff 0f 98 c0 84 c0 75 02 f3 c3 8b 41 08 83 c0 01 78 10 8b 51 10 89 c8 83 f2 01 83 e2 01 e9 e8 42 ff ff <0f> 0b 0a 02 48 84 30 c0 eb e6 8d b4 26 00 00 00 00 8d bc 27 00 EIP: [<c0169688>] page_remove_rmap+0x28/0x40 SS:ESP 0069:c407bea8 <7>evbug.c: Event. Dev: isa0060/serio0/input0, Type: 4, Code: 4, Value: 42 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 1, Code: 42, Value: 1 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 0, Code: 0, Value: 0 XENBUS: Waiting for devices to initialise: 295s...<7>evbug.c: Event. Dev: isa0060/serio0/input0, Type: 4, Code: 4, Value: 201 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 1, Code: 104, Value: 1 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 0, Code: 0, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 4, Code: 4, Value: 201 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 1, Code: 104, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 0, Code: 0, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 4, Code: 4, Value: 201 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 1, Code: 104, Value: 1 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 0, Code: 0, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 4, Code: 4, Value: 201 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 1, Code: 104, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 0, Code: 0, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 4, Code: 4, Value: 42 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 1, Code: 42, Value: 0 evbug.c: Event. Dev: isa0060/serio0/input0, Type: 0, Code: 0, Value: 0 290s...285s... [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Xen-devel] Re: Next steps with pv_ops for Xen 2007-12-04 19:58 ` Gerd Hoffmann @ 2007-12-05 11:48 ` Derek Murray 2007-12-05 11:48 ` Derek Murray 2007-12-05 13:19 ` Derek Murray 2 siblings, 0 replies; 57+ messages in thread From: Derek Murray @ 2007-12-05 11:48 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Hi Gerd, Gerd Hoffmann wrote: Want reproduce? Here we go: > > * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/ > * grab a xenified dom0 kernel without blktap driver (either not > compiled or module not loaded). > * start xend > * start blkbackd from xenner package (you probably want the -d switch > for debug output, twice for more). > * run "xm block-attach 0 tap:aio:/path/to/some/file xvda r" > * watch it blow up ;) Thanks for the repro details. I'll have a go at this later. One thing we haven't tested AFAIK is mapping grants in the same domain: could you check to see if the bug is the same if you attach a block device to a domain other than Dom0? Also, could you send any Xen console output, if it contains errors or warnings? >> I can't help wondering if this is a hint that now is the time to find a >> better API, which doesn't have the requirement (a) that seems to be >> causing such trouble? Are other PV guests --- *BSD, Solaris --- going >> to have the same problems with their VM layers if they try to implement >> this API? Upstream Linux pv_ops certainly will, and it would be good if >> we could avoid tying unprivileged guests to ABIs which cannot hope to be >> merged into pv_ops. > > And I fear the problems I've trapped into up to now is only the tip of > the iceberg. What happens if an application with active grant table > mappings calls fork() ? Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls set_pte_at, which hypercalls HYPERVISOR_update_va_mapping. The hypercall will not succeed and will return an error code indicating the reason for this. Therefore the PTE will not be set. There appears to be no way to propagate this error through the Linux VM code, because there is no concept of a PTE update failing. I could add return codes to all those functions, but I don't fancy their chances upstream.... A possibility for solving that might be to carry out the mappings upon a page fault: I believe this would be compatible with copy_page_range. (In fact, it's possible that a forked process would attempt to demand-page in the granted page, bypassing the copy_page_range code. Since there is no nopage handler for a gntdev VMA, that would lead to an anonymous page being mapped into memory instead.) So, as far as I can tell, there would be no kernel BUG() or domain_crash() in the event of a fork(). It looks like implementing nopage in gntdev would enable grants to be remapped after a fork() and the correct behaviour to happen. Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 19:58 ` Gerd Hoffmann 2007-12-05 11:48 ` [Xen-devel] " Derek Murray @ 2007-12-05 11:48 ` Derek Murray 2007-12-05 14:12 ` Gerd Hoffmann 2007-12-05 18:12 ` Jeremy Fitzhardinge 2007-12-05 13:19 ` Derek Murray 2 siblings, 2 replies; 57+ messages in thread From: Derek Murray @ 2007-12-05 11:48 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Hi Gerd, Gerd Hoffmann wrote: Want reproduce? Here we go: > > * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/ > * grab a xenified dom0 kernel without blktap driver (either not > compiled or module not loaded). > * start xend > * start blkbackd from xenner package (you probably want the -d switch > for debug output, twice for more). > * run "xm block-attach 0 tap:aio:/path/to/some/file xvda r" > * watch it blow up ;) Thanks for the repro details. I'll have a go at this later. One thing we haven't tested AFAIK is mapping grants in the same domain: could you check to see if the bug is the same if you attach a block device to a domain other than Dom0? Also, could you send any Xen console output, if it contains errors or warnings? >> I can't help wondering if this is a hint that now is the time to find a >> better API, which doesn't have the requirement (a) that seems to be >> causing such trouble? Are other PV guests --- *BSD, Solaris --- going >> to have the same problems with their VM layers if they try to implement >> this API? Upstream Linux pv_ops certainly will, and it would be good if >> we could avoid tying unprivileged guests to ABIs which cannot hope to be >> merged into pv_ops. > > And I fear the problems I've trapped into up to now is only the tip of > the iceberg. What happens if an application with active grant table > mappings calls fork() ? Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls set_pte_at, which hypercalls HYPERVISOR_update_va_mapping. The hypercall will not succeed and will return an error code indicating the reason for this. Therefore the PTE will not be set. There appears to be no way to propagate this error through the Linux VM code, because there is no concept of a PTE update failing. I could add return codes to all those functions, but I don't fancy their chances upstream.... A possibility for solving that might be to carry out the mappings upon a page fault: I believe this would be compatible with copy_page_range. (In fact, it's possible that a forked process would attempt to demand-page in the granted page, bypassing the copy_page_range code. Since there is no nopage handler for a gntdev VMA, that would lead to an anonymous page being mapped into memory instead.) So, as far as I can tell, there would be no kernel BUG() or domain_crash() in the event of a fork(). It looks like implementing nopage in gntdev would enable grants to be remapped after a fork() and the correct behaviour to happen. Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 11:48 ` Derek Murray @ 2007-12-05 14:12 ` Gerd Hoffmann 2007-12-05 14:22 ` Keir Fraser 2007-12-05 18:12 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-05 14:12 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Hi, > Thanks for the repro details. I'll have a go at this later. One thing we > haven't tested AFAIK is mapping grants in the same domain: could you > check to see if the bug is the same if you attach a block device to a > domain other than Dom0? Also, could you send any Xen console output, if > it contains errors or warnings? Attaching to another domain works better. blkbackd needs some fixes as well though ... cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 14:12 ` Gerd Hoffmann @ 2007-12-05 14:22 ` Keir Fraser 2007-12-05 14:30 ` Derek Murray 0 siblings, 1 reply; 57+ messages in thread From: Keir Fraser @ 2007-12-05 14:22 UTC (permalink / raw) To: Gerd Hoffmann, Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization On 5/12/07 14:12, "Gerd Hoffmann" <kraxel@redhat.com> wrote: >> Thanks for the repro details. I'll have a go at this later. One thing we >> haven't tested AFAIK is mapping grants in the same domain: could you >> check to see if the bug is the same if you attach a block device to a >> domain other than Dom0? Also, could you send any Xen console output, if >> it contains errors or warnings? > > Attaching to another domain works better. blkbackd needs some fixes as > well though ... Is this patch to go into linux-2.6.18-xen.hg then? It needs a signed-off-by line if so. -- Keir ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 14:22 ` Keir Fraser @ 2007-12-05 14:30 ` Derek Murray 2007-12-05 16:58 ` Keir Fraser 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-05 14:30 UTC (permalink / raw) To: Keir Fraser Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann [-- Attachment #1: Type: text/plain, Size: 253 bytes --] Keir Fraser wrote: > Is this patch to go into linux-2.6.18-xen.hg then? Yes, even if it doesn't fix the exact bug we're seeing here, I think it should go in. I've attached a version with my signed-off-by and a better commit comment. Cheers, Derek. [-- Attachment #2: gntdev_vm_pfnmap.patch --] [-- Type: text/x-patch, Size: 1328 bytes --] # HG changeset patch # User dgm36@ise.cl.cam.ac.uk # Date 1196860382 0 # Node ID af26b3dd23822190acbec1872a47259e1fed88b8 # Parent b2768401db943e66af9d64bd610ffa225f560c0b Add VM_PFNMAP flag to gntdev-mmaped VM areas. This prevents an attempt in zap_pte_range to decrement the reverse-mapping count of the non-existant (but occasionally spuriously present) page_struct associated with the granted PFN. Signed-off-by: Derek Murray <Derek.Murray@cl.cam.ac.uk> diff -r b2768401db94 -r af26b3dd2382 drivers/xen/gntdev/gntdev.c --- a/drivers/xen/gntdev/gntdev.c Mon Dec 03 08:50:12 2007 +0000 +++ b/drivers/xen/gntdev/gntdev.c Wed Dec 05 13:13:02 2007 +0000 @@ -501,6 +501,17 @@ static int gntdev_mmap (struct file *fli /* The VM area contains pages from another VM. */ vma->vm_flags |= VM_FOREIGN; + + /* The VM area contains pages that are not backed by page_structs in + * this domain's memory map. + * + * TODO/FIXME?: We should probably use the VM_FOREIGN workaround as + * used by get_user_pages() to provide access to the + * page_structs for each page, but I'm not sure if that's + * necessary. + */ + vma->vm_flags |= VM_PFNMAP; + vma->vm_private_data = kzalloc(size * sizeof(struct page_struct *), GFP_KERNEL); if (vma->vm_private_data == NULL) { [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 14:30 ` Derek Murray @ 2007-12-05 16:58 ` Keir Fraser 2007-12-05 17:17 ` Derek Murray 0 siblings, 1 reply; 57+ messages in thread From: Keir Fraser @ 2007-12-05 16:58 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann On 5/12/07 14:30, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote: > Keir Fraser wrote: >> Is this patch to go into linux-2.6.18-xen.hg then? > > Yes, even if it doesn't fix the exact bug we're seeing here, I think it > should go in. I've attached a version with my signed-off-by and a better > commit comment. Actually I'm not so sure now. Presumably you add VM_PFNMAP to make vm_normal_page() return NULL? But actually I would expect pte_pfn() to return max_mapnr because the mapped page is not a local page. And that should cause vm_normal_page() to return NULL always, regardless of whether you assert VM_PFNMAP. Is gntdev being used to grant-and-map local pages in the test that causes the crash? -- Keir ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 16:58 ` Keir Fraser @ 2007-12-05 17:17 ` Derek Murray 2007-12-05 17:22 ` Keir Fraser 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-05 17:17 UTC (permalink / raw) To: Keir Fraser Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Keir Fraser wrote: > > Actually I'm not so sure now. Presumably you add VM_PFNMAP to make > vm_normal_page() return NULL? But actually I would expect pte_pfn() to > return max_mapnr because the mapped page is not a local page. And that > should cause vm_normal_page() to return NULL always, regardless of whether > you assert VM_PFNMAP. Is gntdev being used to grant-and-map local pages in > the test that causes the crash? That's right (gntdev is being used to map (but not grant) a local page). The test case creates a virtual block device in Dom0, and attempts to map its ring buffer in a user-space daemon in Dom0. Therefore pte_pfn succeeds. Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 17:17 ` Derek Murray @ 2007-12-05 17:22 ` Keir Fraser 2007-12-05 17:48 ` Derek Murray 0 siblings, 1 reply; 57+ messages in thread From: Keir Fraser @ 2007-12-05 17:22 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann On 5/12/07 17:17, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote: >> Actually I'm not so sure now. Presumably you add VM_PFNMAP to make >> vm_normal_page() return NULL? But actually I would expect pte_pfn() to >> return max_mapnr because the mapped page is not a local page. And that >> should cause vm_normal_page() to return NULL always, regardless of whether >> you assert VM_PFNMAP. Is gntdev being used to grant-and-map local pages in >> the test that causes the crash? > > That's right (gntdev is being used to map (but not grant) a local page). > The test case creates a virtual block device in Dom0, and attempts to > map its ring buffer in a user-space daemon in Dom0. Therefore pte_pfn > succeeds. Need to bite the bullet and fix this properly by setting a software flag in ptes that are not subject to reference counting. Unfortunately that also needs a hypervisor interface change, to allow setting of those pte flags. Easily done though, and we should definitely get that piece in for 3.2.0. Setting VM_PFNMAP is bogus. We used to do that for privcmd mappings too, but we stopped because IIRC it had other unwanted side effects. -- Keir ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 17:22 ` Keir Fraser @ 2007-12-05 17:48 ` Derek Murray 2007-12-05 17:59 ` Keir Fraser 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-05 17:48 UTC (permalink / raw) To: Keir Fraser Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Keir Fraser wrote: > Need to bite the bullet and fix this properly by setting a software flag in > ptes that are not subject to reference counting. Could we get away with testing the VM_FOREIGN flag in vm_normal_page()? Although I get the impression that this wouldn't be easily justified if trying to merge with upstream Linux.... > Unfortunately that also needs a hypervisor interface change, to allow > setting of those pte flags. Easily done though, and we should definitely get > that piece in for 3.2.0. Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for debugging? Indeed, if we did this, could be obviate the need for the PTE-zapping hook, by instead catching the case where this flag is set, and unmapping the grant implicitly? Otherwise, what would the semantics of this new flag be? Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 17:48 ` Derek Murray @ 2007-12-05 17:59 ` Keir Fraser 2007-12-05 18:15 ` Derek Murray 2007-12-05 20:06 ` Gerd Hoffmann 0 siblings, 2 replies; 57+ messages in thread From: Keir Fraser @ 2007-12-05 17:59 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann On 5/12/07 17:48, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote: > Keir Fraser wrote: >> Need to bite the bullet and fix this properly by setting a software flag in >> ptes that are not subject to reference counting. > > Could we get away with testing the VM_FOREIGN flag in vm_normal_page()? > Although I get the impression that this wouldn't be easily justified if > trying to merge with upstream Linux.... Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are there any other responsibilities that you acquire if you make use of VM_FOREIGN (in particular, how would this affect get_user_pages)? > Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for > debugging? Indeed, if we did this, could be obviate the need for the > PTE-zapping hook, by instead catching the case where this flag is set, > and unmapping the grant implicitly? Well, in the general case you don't have enough info to know which grant to release (a single page can be granted multiple times). > Otherwise, what would the semantics of this new flag be? It would cause pte_pfn() to return max_mapnr. It would be set for any foreign page mapping, and replace mfn_to_local_pfn() in pte_pfn(). -- Keir ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 17:59 ` Keir Fraser @ 2007-12-05 18:15 ` Derek Murray 2007-12-12 8:27 ` Isaku Yamahata 2007-12-05 20:06 ` Gerd Hoffmann 1 sibling, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-05 18:15 UTC (permalink / raw) To: Keir Fraser Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann [-- Attachment #1: Type: text/plain, Size: 526 bytes --] Keir Fraser wrote: > Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are > there any other responsibilities that you acquire if you make use of > VM_FOREIGN (in particular, how would this affect get_user_pages)? VM_FOREIGN is already set for the gntdev VMA (mostly because it's directly based on the blktap code). That means that it has the array of page_structs in its vm_private_data, which can be used to fulfill a get_user_pages call. I've attached a patch based on this fix. Regards, Derek. [-- Attachment #2: gntdev_vm_foreign.patch --] [-- Type: text/x-patch, Size: 714 bytes --] # HG changeset patch # User dgm36@ise.cl.cam.ac.uk # Date 1196878124 0 # Node ID df7d0555ec3847bd5915063d8ee79123d6ebc67a # Parent ba918cb2cf7520604dee724dd80dad5ce4bee8a1 Changed vm_normal_page to return NULL when presented with a VMA marked as being VM_FOREIGN. Signed-off-by: Derek Murray <Derek.Murray@cl.cam.ac.uk> diff -r ba918cb2cf75 -r df7d0555ec38 mm/memory.c --- a/mm/memory.c Tue Dec 04 11:54:22 2007 +0000 +++ b/mm/memory.c Wed Dec 05 18:08:44 2007 +0000 @@ -395,6 +395,9 @@ struct page *vm_normal_page(struct vm_ar if (!is_cow_mapping(vma->vm_flags)) return NULL; } + + if (unlikely(vma->vm_flags & VM_FOREIGN)) + return NULL; /* * Add some anal sanity checks for now. Eventually, [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 18:15 ` Derek Murray @ 2007-12-12 8:27 ` Isaku Yamahata 2007-12-12 8:39 ` Keir Fraser 0 siblings, 1 reply; 57+ messages in thread From: Isaku Yamahata @ 2007-12-12 8:27 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann, xen-ia64-devel On Wed, Dec 05, 2007 at 06:15:49PM +0000, Derek Murray wrote: > Keir Fraser wrote: > >Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are > >there any other responsibilities that you acquire if you make use of > >VM_FOREIGN (in particular, how would this affect get_user_pages)? > > VM_FOREIGN is already set for the gntdev VMA (mostly because it's > directly based on the blktap code). That means that it has the array of > page_structs in its vm_private_data, which can be used to fulfill a > get_user_pages call. I've attached a patch based on this fix. > > Regards, > > Derek. Hi Derek. Sorry for this late alert. This patch breaks blktap and gntdev on ia64. With auto translated physmap mode enabled, bktap/gntdev update the pte entry with vm_insert_page(). Not direct updating it with the hypercall. So when zapping the pte entry, it is necessary to release page reference counting, rmapping and etc. Thus vm_normal_page() have to return the struct page when auto translated physmap mode is enabled. How about passing the page struct** to the zap_pte call back and set it to NULL if necessary? (or Can the condition be changed to check auto trasnalted physmap mode? or Should the clean up be done in zap_pte callback?) -- yamahata ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-12 8:27 ` Isaku Yamahata @ 2007-12-12 8:39 ` Keir Fraser 2007-12-12 8:44 ` Isaku Yamahata 0 siblings, 1 reply; 57+ messages in thread From: Keir Fraser @ 2007-12-12 8:39 UTC (permalink / raw) To: Isaku Yamahata, Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann, xen-ia64-devel We already make the VM_FOREIGN check conditional on defined(CONFIG_XEN). We could add defined(CONFIG_X86) as well? This would seem reasonable as a temporary measure for the old 2.6.18 tree. -- Keir On 12/12/07 08:27, "Isaku Yamahata" <yamahata@valinux.co.jp> wrote: > This patch breaks blktap and gntdev on ia64. > With auto translated physmap mode enabled, bktap/gntdev update > the pte entry with vm_insert_page(). Not direct updating it with > the hypercall. > So when zapping the pte entry, it is necessary to release page > reference counting, rmapping and etc. Thus vm_normal_page() have > to return the struct page when auto translated physmap mode is enabled. > > How about passing the page struct** to the zap_pte call back > and set it to NULL if necessary? > (or > Can the condition be changed to check auto trasnalted physmap mode? > or > Should the clean up be done in zap_pte callback?) ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-12 8:39 ` Keir Fraser @ 2007-12-12 8:44 ` Isaku Yamahata 0 siblings, 0 replies; 57+ messages in thread From: Isaku Yamahata @ 2007-12-12 8:44 UTC (permalink / raw) To: Keir Fraser Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann, xen-ia64-devel On Wed, Dec 12, 2007 at 08:39:41AM +0000, Keir Fraser wrote: > We already make the VM_FOREIGN check conditional on defined(CONFIG_XEN). We > could add defined(CONFIG_X86) as well? This would seem reasonable as a > temporary measure for the old 2.6.18 tree. Yes, ok for IA64. -- yamahata ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 17:59 ` Keir Fraser 2007-12-05 18:15 ` Derek Murray @ 2007-12-05 20:06 ` Gerd Hoffmann 1 sibling, 0 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-05 20:06 UTC (permalink / raw) To: Keir Fraser Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization >> Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for >> debugging? Indeed, if we did this, could be obviate the need for the >> PTE-zapping hook, by instead catching the case where this flag is set, >> and unmapping the grant implicitly? > > Well, in the general case you don't have enough info to know which grant to > release (a single page can be granted multiple times). You'll also get the mm and the addr which should make it sufficiently unique, so this looks like a doable approach to me. ptep_get_and_clear_full() in include/asm-x86/pgtable_32.h needs to be changed take care, but that should be possible to do and the change is local to x86 paravirt_ops, which looks much better to me than touching generic mm code. cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 11:48 ` Derek Murray 2007-12-05 14:12 ` Gerd Hoffmann @ 2007-12-05 18:12 ` Jeremy Fitzhardinge 2007-12-05 18:29 ` Derek Murray 1 sibling, 1 reply; 57+ messages in thread From: Jeremy Fitzhardinge @ 2007-12-05 18:12 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Derek Murray wrote: > Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls > copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls > set_pte_at, which hypercalls HYPERVISOR_update_va_mapping. > > The hypercall will not succeed and will return an error code > indicating the reason for this. Therefore the PTE will not be set. > There appears to be no way to propagate this error through the Linux > VM code, because there is no concept of a PTE update failing. I could > add return codes to all those functions, but I don't fancy their > chances upstream.... Could we use one of the software-defined bits in the PTE to indicate that this is a foreign/granted PTE, and have set_pte_at behave differently if you pass it a pte with this bit set? J ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 18:12 ` Jeremy Fitzhardinge @ 2007-12-05 18:29 ` Derek Murray 2007-12-05 20:15 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-05 18:29 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Jeremy Fitzhardinge wrote: > Could we use one of the software-defined bits in the PTE to indicate > that this is a foreign/granted PTE, and have set_pte_at behave > differently if you pass it a pte with this bit set? Actually, as Gerd pointed out in his answer to his own question, the use of VM_DONTCOPY cuts out this entire code path, so we don't need to worry about it. Mind you, it looks like we're going to go ahead and use one of the PTE bits to signify foreign PTEs anyway, per Keir's suggestion. Either way, it's going to involve making Xen-specific changes to the mm code... have you any ideas how we can either (i) get rid of the zap_pte hook in the vm_operations_struct, or (ii) make a really compelling case to the kernel maintainers that it really should get in? Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 18:29 ` Derek Murray @ 2007-12-05 20:15 ` Jeremy Fitzhardinge 2007-12-05 20:35 ` Geoffrey Lefebvre 2007-12-05 20:44 ` Keir Fraser 0 siblings, 2 replies; 57+ messages in thread From: Jeremy Fitzhardinge @ 2007-12-05 20:15 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Derek Murray wrote: > Jeremy Fitzhardinge wrote: >> Could we use one of the software-defined bits in the PTE to indicate >> that this is a foreign/granted PTE, and have set_pte_at behave >> differently if you pass it a pte with this bit set? > > Actually, as Gerd pointed out in his answer to his own question, the > use of VM_DONTCOPY cuts out this entire code path, so we don't need to > worry about it. > > Mind you, it looks like we're going to go ahead and use one of the PTE > bits to signify foreign PTEs anyway, per Keir's suggestion. Either > way, it's going to involve making Xen-specific changes to the mm code... Sneaking in a user for the otherwise completely unused PTE bits should be fairly straightforward. > have you any ideas how we can either (i) get rid of the zap_pte hook > in the vm_operations_struct, or (ii) make a really compelling case to > the kernel maintainers that it really should get in? Hm, I haven't spent much time looking at how grant tables and their mappings work yet, so I can't say I really understand all this myself. Hence, questions: Can we take a different approach from the zap_pte hook? Given that we're 1) planning on claiming a pte bit for grant mappings, and 2) need to hook ptep_get_and_clear anyway to solve the mprotect performance problems, couldn't we just special-case grant mapping pte_clears? In 2.6.18-xen the only two implementations of zap_pte are blktap_clear_pte and gntdev_clear_pte. Given a ptep with the grant-mapping bit set, could we determine which of these need calling and do the appropriate thing? Do we even need separate implementations of the core pte-clearing functionality? Could we just say something like: if (pte & _PAGE_XEN_FOREIGN) HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ...); else xen_set_pte_at(...); blktap_clear_pte and gntdev_clear_pte do other housekeeping, but do they have to be done at the same instant as the grant mapping clear? Could they be done via some other hook? (I see Gerd just proposed this, pretty much.) J ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 20:15 ` Jeremy Fitzhardinge @ 2007-12-05 20:35 ` Geoffrey Lefebvre 2007-12-06 10:15 ` Gerd Hoffmann 2007-12-05 20:44 ` Keir Fraser 1 sibling, 1 reply; 57+ messages in thread From: Geoffrey Lefebvre @ 2007-12-05 20:35 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann > Can we take a different approach from the zap_pte hook? Given that > we're 1) planning on claiming a pte bit for grant mappings, and 2) need > to hook ptep_get_and_clear anyway to solve the mprotect performance > problems, couldn't we just special-case grant mapping pte_clears? > > In 2.6.18-xen the only two implementations of zap_pte are > blktap_clear_pte and gntdev_clear_pte. Given a ptep with the > grant-mapping bit set, could we determine which of these need calling > and do the appropriate thing? Do we even need separate implementations > of the core pte-clearing functionality? Could we just say something like: > > if (pte & _PAGE_XEN_FOREIGN) > HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ...); > else > xen_set_pte_at(...); > Hi, In order to unmap a grant, you need the grant handle obtained when the grant is mapped. That handle needs to be stored somewhere for the lifetime of the mapping. Where would the handle be stored (as Gerd proposed) in order to be able to unmap from ptep_get_and_clear_full? I haven't looked at the paravirt ops in details so I could be missing something obvious here. cheers, geoffrey ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 20:35 ` Geoffrey Lefebvre @ 2007-12-06 10:15 ` Gerd Hoffmann 0 siblings, 0 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-06 10:15 UTC (permalink / raw) To: Geoffrey Lefebvre Cc: Derek Murray, Jeremy Fitzhardinge, xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Geoffrey Lefebvre wrote: > In order to unmap a grant, you need the grant handle obtained when the > grant is mapped. That handle needs to be stored somewhere for the > lifetime of the mapping. Where would the handle be stored (as Gerd > proposed) in order to be able to unmap from ptep_get_and_clear_full? Sure. the kernel has to keep track of the grant mappings somewhere, so it can lookup the grant handle from the available information. Hashing by machine address should work reasonable fast. It's probably useful to have an in-kernel API for that which then can be used by both gntdev and the in-kernel backend drivers. This API can also abstract out arch-specific bits to make life easier for the ia64 guys ... cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 20:15 ` Jeremy Fitzhardinge 2007-12-05 20:35 ` Geoffrey Lefebvre @ 2007-12-05 20:44 ` Keir Fraser 2007-12-06 10:00 ` Derek Murray 1 sibling, 1 reply; 57+ messages in thread From: Keir Fraser @ 2007-12-05 20:44 UTC (permalink / raw) To: Jeremy Fitzhardinge, Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann On 5/12/07 20:15, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote: > In 2.6.18-xen the only two implementations of zap_pte are > blktap_clear_pte and gntdev_clear_pte. Given a ptep with the > grant-mapping bit set, could we determine which of these need calling > and do the appropriate thing? Do we even need separate implementations > of the core pte-clearing functionality? Could we just say something like: > > if (pte & _PAGE_XEN_FOREIGN) > HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ...); > else > xen_set_pte_at(...); You'd need to track pte->grant_handle mappings somewhere, but it could certainly be done this way, yes. -- Keir ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 20:44 ` Keir Fraser @ 2007-12-06 10:00 ` Derek Murray 2007-12-06 19:55 ` [Xen-devel] " Jeremy Fitzhardinge 0 siblings, 1 reply; 57+ messages in thread From: Derek Murray @ 2007-12-06 10:00 UTC (permalink / raw) To: Keir Fraser Cc: Jeremy Fitzhardinge, xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Keir Fraser wrote: > You'd need to track pte->grant_handle mappings somewhere, but it could > certainly be done this way, yes. At the moment, blktap and gntdev provide struct pages to get_user_pages by smuggling them in the vm_private_data field of the relevant vm_area_struct. Could we use this field to get the handles to ptep_get_and_clear_full as well? Only downside that I can see is that we would need to find the vma for each PTE that needs to be cleared this way (since we don't get this passed to ptep_get_and_clear_full), but this is mitigated by (i) it only happening in the erroneous, unclean-shutdown case, and (ii) getting a hit in the mm->mmap_cache for consecutive runs of mapped grants. Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Xen-devel] Re: Next steps with pv_ops for Xen 2007-12-06 10:00 ` Derek Murray @ 2007-12-06 19:55 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 57+ messages in thread From: Jeremy Fitzhardinge @ 2007-12-06 19:55 UTC (permalink / raw) To: Derek Murray Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Keir Fraser Derek Murray wrote: > Keir Fraser wrote: >> You'd need to track pte->grant_handle mappings somewhere, but it could >> certainly be done this way, yes. > > At the moment, blktap and gntdev provide struct pages to > get_user_pages by smuggling them in the vm_private_data field of the > relevant vm_area_struct. Could we use this field to get the handles to > ptep_get_and_clear_full as well? Yes. Given the mm and a vaddr passed to ptep_get_and_clear, find_vma() will return the vma_struct. If we assert that anyone who sets the "I'm foreign" bit in a pte has a standard format for the vm_private_data field, then we can stash a callback pointer there and make the appropriate callback. > Only downside that I can see is that we would need to find the vma for > each PTE that needs to be cleared this way (since we don't get this > passed to ptep_get_and_clear_full), but this is mitigated by (i) it > only happening in the erroneous, unclean-shutdown case, and (ii) > getting a hit in the mm->mmap_cache for consecutive runs of mapped > grants. Yes. find_vma is fairly hot, since its used on every fault, so it should be reasonably fast. And it doesn't sound like our case is particularly performance critical. J ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 19:58 ` Gerd Hoffmann 2007-12-05 11:48 ` [Xen-devel] " Derek Murray 2007-12-05 11:48 ` Derek Murray @ 2007-12-05 13:19 ` Derek Murray 2 siblings, 0 replies; 57+ messages in thread From: Derek Murray @ 2007-12-05 13:19 UTC (permalink / raw) To: Gerd Hoffmann Cc: xen-devel, Eduardo Habkost, Juan Quintela, Stephen C. Tweedie, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization [-- Attachment #1: Type: text/plain, Size: 488 bytes --] Gerd, Can you try the attached patch against linux-2.6.18-xen.hg? I think the problem was that the gntdev VMA is not marked as being VM_PFNMAP, therefore it tries to get a struct page_struct for each granted page when it is unmapped (and maybe sometimes succeeds (incorrectly), which could be why I haven't seen the bug). With this flag, vm_normal_page will return NULL in zap_pte_range, and so the code that decrements that reference count will not be executed. Regards, Derek. [-- Attachment #2: gntdev_vm_pfnmap.patch --] [-- Type: text/x-patch, Size: 1073 bytes --] # HG changeset patch # User dgm36@ise.cl.cam.ac.uk # Date 1196860382 0 # Node ID af26b3dd23822190acbec1872a47259e1fed88b8 # Parent b2768401db943e66af9d64bd610ffa225f560c0b Set gntdev VMA to be VM_PFNMAP. diff -r b2768401db94 -r af26b3dd2382 drivers/xen/gntdev/gntdev.c --- a/drivers/xen/gntdev/gntdev.c Mon Dec 03 08:50:12 2007 +0000 +++ b/drivers/xen/gntdev/gntdev.c Wed Dec 05 13:13:02 2007 +0000 @@ -501,6 +501,17 @@ static int gntdev_mmap (struct file *fli /* The VM area contains pages from another VM. */ vma->vm_flags |= VM_FOREIGN; + + /* The VM area contains pages that are not backed by page_structs in + * this domain's memory map. + * + * TODO/FIXME?: We should probably use the VM_FOREIGN workaround as + * used by get_user_pages() to provide access to the + * page_structs for each page, but I'm not sure if that's + * necessary. + */ + vma->vm_flags |= VM_PFNMAP; + vma->vm_private_data = kzalloc(size * sizeof(struct page_struct *), GFP_KERNEL); if (vma->vm_private_data == NULL) { [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 12:39 ` Stephen C. Tweedie 2007-12-04 19:58 ` Gerd Hoffmann @ 2007-12-04 21:08 ` Ian Main 2007-12-05 10:03 ` Gerd Hoffmann 2007-12-05 10:11 ` Derek Murray 3 siblings, 0 replies; 57+ messages in thread From: Ian Main @ 2007-12-04 21:08 UTC (permalink / raw) To: xen-devel On Tue, 04 Dec 2007 12:39:59 +0000 "Stephen C. Tweedie" <sct@redhat.com> wrote: > So... the interface (a) cannot be used on the Linux VM without at least > one invasive VM modification, due to the requirement of ptes being > explicitly unmapped via hypercall; and (b) isn't used significantly in > real life yet. > > I can't help wondering if this is a hint that now is the time to find a > better API, which doesn't have the requirement (a) that seems to be > causing such trouble? Are other PV guests --- *BSD, Solaris --- going > to have the same problems with their VM layers if they try to implement > this API? Upstream Linux pv_ops certainly will, and it would be good if > we could avoid tying unprivileged guests to ABIs which cannot hope to be > merged into pv_ops. I posted up and said we were using the current interface, but if there are fundamental issues with the API then I'd be in favor of changing it, even if there is some work involved on our side. Ian ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 12:39 ` Stephen C. Tweedie 2007-12-04 19:58 ` Gerd Hoffmann 2007-12-04 21:08 ` Ian Main @ 2007-12-05 10:03 ` Gerd Hoffmann 2007-12-05 12:51 ` Gerd Hoffmann 2007-12-05 10:11 ` Derek Murray 3 siblings, 1 reply; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-05 10:03 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Stephen C. Tweedie wrote: > I can't help wondering if this is a hint that now is the time to find a > better API, which doesn't have the requirement (a) that seems to be > causing such trouble? Are other PV guests --- *BSD, Solaris --- going > to have the same problems with their VM layers if they try to implement > this API? Well, it isn't that easy unfortunaly. We have to separate two things here: (a) the grant table hypercall API (linux kernel <-> xen). (b) the grant table device (userspace interface). The hypercall API *is* heavily used, block and network drivers are using it for example. It works quite well as long as the drivers are living in kernel space, thus the grants are also mapped in kernel space only. It isn't very hard to control map and unmap then. The problems start when the gntdev comes into play which wants allow userspace applications map grant references. At this point the whole VM subsystem becomes involved. And the requirement of the hypercall API to do any pte manipulation using grant table hypercalls becomes a real burden. The linux VM design simply doesn't allow that. Consequently the current gntdev implementation tries to get the job done by bypassing the VM (and hooking into it). It establishes mappings by doing the page table manipulations itself in the fops->mmap function. It tears down mappings using the hook discussed earlier. gntdev doesn't even try to handle forking. I wouldn't be surprised if that is a great way to kill Domain-0. The xen hypervisor will most likely not be amused to find a pte refering to a granted (but foreign) page which wasn't established using the grant table interface. Pinning the pgd of the child process will most likely fail and make the kernel BUG(). cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-05 10:03 ` Gerd Hoffmann @ 2007-12-05 12:51 ` Gerd Hoffmann 0 siblings, 0 replies; 57+ messages in thread From: Gerd Hoffmann @ 2007-12-05 12:51 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Derek Murray, xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization Hi, > gntdev doesn't even try to handle forking. I wouldn't be surprised if > that is a great way to kill Domain-0. The xen hypervisor will most > likely not be amused to find a pte refering to a granted (but foreign) > page which wasn't established using the grant table interface. Pinning > the pgd of the child process will most likely fail and make the kernel > BUG(). Ok, isn't that bad thanks to the VM_DONTCOPY. The child just doesn't get the grant mapping. cheers, Gerd ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 12:39 ` Stephen C. Tweedie ` (2 preceding siblings ...) 2007-12-05 10:03 ` Gerd Hoffmann @ 2007-12-05 10:11 ` Derek Murray 3 siblings, 0 replies; 57+ messages in thread From: Derek Murray @ 2007-12-05 10:11 UTC (permalink / raw) To: Stephen C. Tweedie Cc: xen-devel, Eduardo Habkost, Juan Quintela, Jan Beulich, Glauber de Oliveira Costa, Chris Wright, virtualization, Gerd Hoffmann Stephen C. Tweedie wrote: > So... the interface (a) cannot be used on the Linux VM without at least > one invasive VM modification, due to the requirement of ptes being > explicitly unmapped via hypercall; Also there is the use of VM_FOREIGN (http://xenbits.xensource.com/linux-2.6.18-xen.hg?file/b2768401db94/mm/memory.c lines 1040--1059), which has been used quite happily in blktap since 2005 (http://lists.xensource.com/archives/html/xen-changelog/2005-07/msg00053.html). While it may not be a priority to get gntdev into pv-ops Linux, I should imagine that blktap would be fairly critical. > I can't help wondering if this is a hint that now is the time to find a > better API, which doesn't have the requirement (a) that seems to be > causing such trouble? Are other PV guests --- *BSD, Solaris --- going > to have the same problems with their VM layers if they try to implement > this API? Upstream Linux pv_ops certainly will, and it would be good if > we could avoid tying unprivileged guests to ABIs which cannot hope to be > merged into pv_ops. I'm open to suggestions... but I think it always reduces to needing a hook that is called on process exit before the PTEs are zapped. > (Just what is the cost of not having this functionality in blktap, > anyway?) If tapdisk dies whilst holding a granted page, the page can never be ungranted, so we leak that page. Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 9:40 ` Derek Murray 2007-12-04 12:01 ` Gerd Hoffmann @ 2007-12-04 20:59 ` Ian Main 2007-12-05 11:54 ` Derek Murray 1 sibling, 1 reply; 57+ messages in thread From: Ian Main @ 2007-12-04 20:59 UTC (permalink / raw) To: xen-devel On Tue, 04 Dec 2007 09:40:49 +0000 Derek Murray <Derek.Murray@cl.cam.ac.uk> wrote: > Gerd Hoffmann wrote: > >> On this point I completely agree with you! If anyone has any less > >> radical suggestions, then I'd be delighted to refactor the gntdev code > >> to use them. However, I'm not currently aware of any alternative that > >> maintains robustness to process crashes. > > > > Oh, for me it isn't robust at all, it crashes on the first munmap > > syscall. It is the Fedora 8 kernel. See attachment. Didn't try > > xensource 2.6.18 yet. > > My gut feeling is that something changed in mm between 2.6.18 and > 2.6.21, but that seems like a cop out so... > > > Ideas what is wrong? > > Since the bug appears to be in page_remove_rmap, that would tend to > imply that there is never a corresponding page_add_*_rmap > (page_add_file_rmap?). My knowledge of the Linux mm code is a bit shaky > here: should gntdev be doing this? Should we be using install_page (or a > modified version thereof) to set the PTE? > > Also, does a simple program that opens gntdev, maps a grant, > accesses/writes to the page, and unmaps it (all using the xc_gnttab_* > functions) work? I am part of a team working on a project with Intel that is using it a fair bit in a number of places. We actually have no such simple test right now that I'm aware of, but we are certainly using it in larger applications and it does work. The only problem we're seeing is that killed processes using it cause a BUG to fire. I haven't explored it more than that yet, and I can't say for sure that gntdev is causing that either as it's a complex program (although I'm not aware of anything else in there that might cause it). > > Who uses the gntdev device right now? > > Good question! I'm aware of it being used in a few research projects, > and it seems to work for them (though I think it is mostly used with the > linux-2.6.18-xen kernel). Anyone else? We are using it with 2.6.18 xen kernel. Ian ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: Re: Next steps with pv_ops for Xen 2007-12-04 20:59 ` Ian Main @ 2007-12-05 11:54 ` Derek Murray 0 siblings, 0 replies; 57+ messages in thread From: Derek Murray @ 2007-12-05 11:54 UTC (permalink / raw) To: Ian Main; +Cc: xen-devel Hi Ian, Ian Main wrote: > We actually have no such simple test right now that I'm aware of, > but we are certainly using it in larger applications and it does work. > The only problem we're seeing is that killed processes using it cause a > BUG to fire. I haven't explored it more than that yet, and I can't say > for sure that gntdev is causing that either as it's a complex program > (although I'm not aware of anything else in there that might cause it). Does killing your process using gntdev *always* cause a BUG()? If you send me the kernel log and Xen console log, then I can have a look at what the problem might be. Regards, Derek. ^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2007-12-21 12:58 UTC | newest] Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-11-21 22:05 Next steps with pv_ops for Xen Stephen C. Tweedie 2007-11-21 23:12 ` Jeremy Fitzhardinge 2007-11-26 14:02 ` Juan Quintela 2007-11-26 18:52 ` Jeremy Fitzhardinge 2007-11-27 8:30 ` Jan Beulich 2007-11-27 17:00 ` Jeremy Fitzhardinge 2007-11-27 17:14 ` Jan Beulich 2007-11-27 17:15 ` Stephen C. Tweedie 2007-12-03 12:54 ` Gerd Hoffmann 2007-12-03 13:19 ` Derek Murray 2007-12-03 14:16 ` Gerd Hoffmann 2007-12-03 14:51 ` Derek Murray 2007-12-03 17:18 ` Mark Williamson 2007-12-03 18:36 ` D.G. Murray 2007-12-03 19:08 ` Mark Williamson 2007-12-04 9:35 ` tgh 2007-12-05 3:42 ` Mark Williamson 2007-12-06 15:21 ` Gerd Hoffmann 2007-12-06 15:32 ` Derek Murray 2007-12-06 15:55 ` Gerd Hoffmann 2007-12-21 12:58 ` Gerd Hoffmann 2007-12-21 12:58 ` [Xen-devel] " Gerd Hoffmann 2007-12-03 20:38 ` Gerd Hoffmann 2007-12-04 9:40 ` Derek Murray 2007-12-04 12:01 ` Gerd Hoffmann 2007-12-04 12:39 ` Stephen C. Tweedie 2007-12-04 19:58 ` Gerd Hoffmann 2007-12-05 11:48 ` [Xen-devel] " Derek Murray 2007-12-05 11:48 ` Derek Murray 2007-12-05 14:12 ` Gerd Hoffmann 2007-12-05 14:22 ` Keir Fraser 2007-12-05 14:30 ` Derek Murray 2007-12-05 16:58 ` Keir Fraser 2007-12-05 17:17 ` Derek Murray 2007-12-05 17:22 ` Keir Fraser 2007-12-05 17:48 ` Derek Murray 2007-12-05 17:59 ` Keir Fraser 2007-12-05 18:15 ` Derek Murray 2007-12-12 8:27 ` Isaku Yamahata 2007-12-12 8:39 ` Keir Fraser 2007-12-12 8:44 ` Isaku Yamahata 2007-12-05 20:06 ` Gerd Hoffmann 2007-12-05 18:12 ` Jeremy Fitzhardinge 2007-12-05 18:29 ` Derek Murray 2007-12-05 20:15 ` Jeremy Fitzhardinge 2007-12-05 20:35 ` Geoffrey Lefebvre 2007-12-06 10:15 ` Gerd Hoffmann 2007-12-05 20:44 ` Keir Fraser 2007-12-06 10:00 ` Derek Murray 2007-12-06 19:55 ` [Xen-devel] " Jeremy Fitzhardinge 2007-12-05 13:19 ` Derek Murray 2007-12-04 21:08 ` Ian Main 2007-12-05 10:03 ` Gerd Hoffmann 2007-12-05 12:51 ` Gerd Hoffmann 2007-12-05 10:11 ` Derek Murray 2007-12-04 20:59 ` Ian Main 2007-12-05 11:54 ` Derek Murray
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.