From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51926) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b69dX-0006sj-2q for qemu-devel@nongnu.org; Fri, 27 May 2016 00:44:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b69dU-0006ca-44 for qemu-devel@nongnu.org; Fri, 27 May 2016 00:44:38 -0400 Date: Fri, 27 May 2016 14:44:28 +1000 From: David Gibson Message-ID: <20160527044428.GY17226@voom.fritz.box> References: <1462344751-28281-1-git-send-email-aik@ozlabs.ru> <1462344751-28281-20-git-send-email-aik@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ztQAJXOExAP3xDFw" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH qemu v16 19/19] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Bharata B Rao Cc: Alexey Kardashevskiy , "qemu-devel@nongnu.org" , Alexander Graf , Alex Williamson , "qemu-ppc@nongnu.org" , Paolo Bonzini --ztQAJXOExAP3xDFw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 17, 2016 at 11:02:48AM +0530, Bharata B Rao wrote: > On Mon, May 16, 2016 at 11:55 AM, Alexey Kardashevskiy wr= ote: > > On 05/13/2016 06:41 PM, Bharata B Rao wrote: > >> > >> On Wed, May 4, 2016 at 12:22 PM, Alexey Kardashevskiy > >> wrote: > > > > > >> > >>> + > >>> + avail =3D SPAPR_PCI_DMA_MAX_WINDOWS - > >>> spapr_phb_get_active_win_num(sphb); > >>> + > >>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); > >>> + rtas_st(rets, 1, avail); > >>> + rtas_st(rets, 2, max_window_size); > >>> + rtas_st(rets, 3, pgmask); > >>> + rtas_st(rets, 4, 0); /* DMA migration mask, not supported */ > >>> + > >>> + trace_spapr_iommu_ddw_query(buid, addr, avail, max_window_size, > >>> pgmask); > >>> + return; > >>> + > >>> +param_error_exit: > >>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); > >>> +} > >>> + > >>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu, > >>> + sPAPRMachineState *spapr, > >>> + uint32_t token, uint32_t > >>> nargs, > >>> + target_ulong args, > >>> + uint32_t nret, target_ulong > >>> rets) > >>> +{ > >>> + sPAPRPHBState *sphb; > >>> + sPAPRTCETable *tcet =3D NULL; > >>> + uint32_t addr, page_shift, window_shift, liobn; > >>> + uint64_t buid; > >>> + > >>> + if ((nargs !=3D 5) || (nret !=3D 4)) { > >>> + goto param_error_exit; > >>> + } > >>> + > >>> + buid =3D ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2); > >>> + addr =3D rtas_ld(args, 0); > >>> + sphb =3D spapr_pci_find_phb(spapr, buid); > >>> + if (!sphb || !sphb->ddw_enabled) { > >>> + goto param_error_exit; > >>> + } > >>> + > >>> + page_shift =3D rtas_ld(args, 3); > >>> + window_shift =3D rtas_ld(args, 4); > >> > >> > >> Kernel has a bug due to which wrong window_shift gets returned here. I > >> have posted possible fix here: > >> https://patchwork.ozlabs.org/patch/621497/ > >> > >> I have tried to work around this issue in QEMU too > >> https://lists.nongnu.org/archive/html/qemu-ppc/2016-04/msg00226.html > >> > >> But the above work around involves changing the memory representation > >> in DT. > > > > > > What is wrong with this workaround? >=20 > The above workaround will result in different representations for > memory in DT before and after the workaround. >=20 > Currently for -m 2G, -numa node,nodeid=3D0,mem=3D1G -numa > node,nodeid=3D1,mem=3D0.5G, we will have the following nodes in DT: >=20 > memory@0 > memory@40000000 > ibm,dynamic-reconfiguration-memory >=20 > ibm,dynamic-memory will have only DR LMBs: >=20 > [root@localhost ibm,dynamic-reconfiguration-memory]# hexdump ibm,dynamic-= memory > 0000000 0000 000a 0000 0000 8000 0000 8000 0008 > 0000010 0000 0000 ffff ffff 0000 0000 0000 0000 > 0000020 9000 0000 8000 0009 0000 0000 ffff ffff > 0000030 0000 0000 0000 0000 a000 0000 8000 000a > 0000040 0000 0000 ffff ffff 0000 0000 0000 0000 > 0000050 b000 0000 8000 000b 0000 0000 ffff ffff > 0000060 0000 0000 0000 0000 c000 0000 8000 000c > 0000070 0000 0000 ffff ffff 0000 0000 0000 0000 > 0000080 d000 0000 8000 000d 0000 0000 ffff ffff > 0000090 0000 0000 0000 0000 e000 0000 8000 000e > 00000a0 0000 0000 ffff ffff 0000 0000 0000 0000 > 00000b0 f000 0000 8000 000f 0000 0000 ffff ffff > 00000c0 0000 0000 0000 0001 0000 0000 8000 0010 > 00000d0 0000 0000 ffff ffff 0000 0000 0000 0001 > 00000e0 1000 0000 8000 0011 0000 0000 ffff ffff > 00000f0 0000 0000 >=20 > The memory region looks like this: >=20 > memory-region: system > 0000000000000000-ffffffffffffffff (prio 0, RW): system > 0000000000000000-000000005fffffff (prio 0, RW): ppc_spapr.ram > 0000000080000000-000000011fffffff (prio 0, RW): hotplug-memory >=20 > After this workaround, all this will change like below: >=20 > memory@0 > ibm,dynamic-reconfiguration-memory >=20 > All LMBs in ibm,dynamic-memory: >=20 > [root@localhost ibm,dynamic-reconfiguration-memory]# hexdump ibm,dynamic-= memory >=20 > 0000000 0000 0010 0000 0000 0000 0000 8000 0000 > 0000010 0000 0000 0000 0000 0000 0080 0000 0000 > 0000020 1000 0000 8000 0001 0000 0000 0000 0000 > 0000030 0000 0080 0000 0000 2000 0000 8000 0002 > 0000040 0000 0000 0000 0000 0000 0080 0000 0000 > 0000050 3000 0000 8000 0003 0000 0000 0000 0000 > 0000060 0000 0080 0000 0000 4000 0000 8000 0004 > 0000070 0000 0000 0000 0001 0000 0008 0000 0000 > 0000080 5000 0000 8000 0005 0000 0000 0000 0001 > 0000090 0000 0008 0000 0000 6000 0000 8000 0006 > 00000a0 0000 0000 ffff ffff 0000 0000 0000 0000 > 00000b0 7000 0000 8000 0007 0000 0000 ffff ffff > 00000c0 0000 0000 0000 0000 8000 0000 8000 0008 > 00000d0 0000 0000 ffff ffff 0000 0000 0000 0000 > 00000e0 9000 0000 8000 0009 0000 0000 ffff ffff > 00000f0 0000 0000 0000 0000 a000 0000 8000 000a > 0000100 0000 0000 ffff ffff 0000 0000 0000 0000 > 0000110 b000 0000 8000 000b 0000 0000 ffff ffff > 0000120 0000 0000 0000 0000 c000 0000 8000 000c > 0000130 0000 0000 ffff ffff 0000 0000 0000 0000 > 0000140 d000 0000 8000 000d 0000 0000 ffff ffff > 0000150 0000 0000 0000 0000 e000 0000 8000 000e > 0000160 0000 0000 ffff ffff 0000 0000 0000 0000 > 0000170 f000 0000 8000 000f 0000 0000 ffff ffff > 0000180 0000 0000 >=20 > Hotplug memory region gets a new address range now: >=20 > memory-region: system > 0000000000000000-ffffffffffffffff (prio 0, RW): system > 0000000000000000-000000005fffffff (prio 0, RW): ppc_spapr.ram > 0000000060000000-00000000ffffffff (prio 0, RW): hotplug-memory >=20 >=20 > So when a guest that was booted with older QEMU is migrated to a newer > QEMU that has this workaround, then it will start exhibiting the above > changes after first reboot post migration. Ok.. why is that bad? > If user has done memory hotplug by explicitly specifying address in > the source, then even migration would fail because the addr specified > at the target will not be part of hotplug-memory range. Sorry, not really following the situation you're describing here. > Hence I believe we shoudn't workaround in this manner but have the > workaround in the DDW code where the window can be easily fixed. >=20 > > > >> Hence I feel until the guest kernel changes are available, a > >> simpler work around would be to discard the window_shift value above > >> and recalculate the right value as below: > >> > >> if (machine->ram_size =3D=3D machine->maxram_size) { > >> max_window_size =3D machine->ram_size; > >> } else { > >> MemoryHotplugState *hpms =3D &spapr->hotplug_memory; > >> max_window_size =3D hpms->base + memory_region_size(&hpms->mr); > >> } > >> window_shift =3D max_window_size >> SPAPR_TCE_PAGE_SHIFT; > >> > >> and create DDW based on this calculated window_shift value. Does that > >> sound reasonable ? > > > > > > The workaround should only do that for the second window, at least, or = for > > the default one but with page size at least 64K; otherwise it is going = to be > > a waste of memory (2MB per each 1GB of guest RAM). >=20 > Ok, will sync up with you separately to understand more about the > 'two' windows here. >=20 > Regards, > Bharata. >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --ztQAJXOExAP3xDFw Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXR9CrAAoJEGw4ysog2bOS6tAP/jNWyBC9LZFNF+MS1U616+Pa Uez2Gi2rvoA/M4R4cOmb5XS1883reSqY0NmkrKpItupeBH/3EAF8P6T9eyXzV7SY 7PaC6RVQCdKSqYEJItp2qT92ndqcb/z//snOcdjITNixkBGJ1zcD1E1Wj2vtpO6V nSpJPbrr4tt4XUzjyCLf+WNqueHUJcV7uxunyndm9jmT4GYf6KqvLmKaqjVaA2Bn NilB/cUgvpeJf3LdLevU9ngqMNnzuHiDw4JUjIZ4vrwGVAbp8RgdyB/YLN7obgnE kCk2hPUB1Vugu4E3d+Nmk5GeMWrxj3vI/8x+7CzUiK6+eafwSZjcZAsqH+YIC38q xNCfsdUc3n37O/56OLvHxAuBqrJD4OOgtAL1simERmklPpxQ/01KnOwF3H/IRw1k CnBKfJ9RhsuRVeTA7FVCD0GAPOFgJjoYcWk3fEQKw0ss+97xyxEnQef4eSk4p+at KnrhYyspQ/vqJuGxd2qxUCxTI0y1VghIkQf/NS0h1hA17Z+DDkPkT40ngiGjhxBR xohrF2bRb2EXBWx1om8p34QNLgC8eV1UAd1ZbCGN5qevxdUUOfjMzc1f4y823j67 IR7IIAoFi4I38mVNUC2489WaoQr+bEyKQM+IcliQuJbQIDkeXXawImYHzhKsxhOv Mi436+R/UX560A5QnWIM =PSV2 -----END PGP SIGNATURE----- --ztQAJXOExAP3xDFw--