Re: [PATCH] riscv: Support non-coherency memory model

From: Gary Guo <gary@garyguo.net>
To: Guo Ren <guoren@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Palmer Dabbelt <palmer@sifive.com>,
	Andrew Waterman <andrew@sifive.com>,
	Arnd Bergmann <arnd@arndb.de>, Anup Patel <anup.patel@wdc.com>,
	Xiang Xiaoyan <xiaoyan_xiang@c-sky.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Vincent Chen <vincentc@andestech.com>,
	Greentime Hu <green.hu@gmail.com>,
	"ren_guo@c-sky.com" <ren_guo@c-sky.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Scott Wood <swood@redhat.com>,
	"tech-privileged@lists.riscv.org"
	<tech-privileged@lists.riscv.org>
Subject: Re: [PATCH] riscv: Support non-coherency memory model
Date: Wed, 24 Apr 2019 12:45:56 +0000	[thread overview]
Message-ID: <4e6b0816-3fe9-8c0b-a749-f7f6ef7e5742@garyguo.net> (raw)
In-Reply-To: <20190424055703.GA3417@guoren-Inspiron-7460>

On 24/04/2019 06:57, Guo Ren wrote:
> Hi Gary,
> 
> On Wed, Apr 24, 2019 at 03:21:14AM +0000, Gary Guo wrote:
>>> Look:
>>> linux-next git:(riscv_asid_allocator_v2)$ grep GLOBAL arch/riscv -r
>>> arch/riscv/include/asm/pgtable-bits.h:#define _PAGE_GLOBAL    (1 << 5)    /*
>>> Global */
>>> arch/riscv/include/asm/pgtable-bits.h:                                    _PAGE_USER |
>>> _PAGE_GLOBAL))
>>>
>>> Your patch tell us _PAGE_USER and _PAGE_GLOBAL are duplicate and why we
>>> couldn't make _PAGE_USER implies _PAGE_GLOBAL? Can you give an example
>>> of a real scene in PTE about:
>>>    _PAGE_USER:0 + _PAGE_GLOBAL:1
>>> or
>>>    _PAGE_USER:1 + _PAGE_GLOBAL:0
>>>
>>> Of cause I know USER & GLOBAL are conceptually very different, but
>>> there are only 10 attribute-bits for riscv (In fact we've wasted two bits
>>> to support huge RV32-pfn :P). So I think it is time to merge these two bits
>>> before hardware supports GLOBAL. Reserve them for future!
>>
>> Two cases I can think of:
>> * vdso like things. They're user pages that can really be shared across address spaces (i.e. global). Kernels like L4 implement most systems calls similar to VDSO, so USER + GLOBAL is useful.
> Vdso is a user space mapping in linux, See: fs/binfmt_elf.c
> 
> static int load_elf_binary(struct linux_binprm *bprm) {
> ...
> #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
> 	retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
> 	if (retval < 0)
> 		goto out;
> #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
> 
> All linux archs use arch_setup_additional_pages for vdso mapping and
> every process has its own vdso mapping to the same pages.

But we shouldn't prevent a kernel from mapping a USER page globally. As 
I said, the fact that Linux doesn't do it isn't a valid reason for 
omitting the possibility.

> 
> I don't think vdso is a real scene for GLOBAL in PTE.
> 
>> * hypervisor without H-extension: This requires shadow page tables. Supervisor
>> pages are mapped to supervisor shadow pages. However these shadow pages cannot
>> be GLOBAL because they can't be shared between VMs. So  !USER + !GLOBAL is useful.
> Hypervisor use 2-stages TLB translation in hardware and shadow page
> tables is for stage 2 translation. Shadow page tables care vmid not
> asid.

When H-extension is present, stage 2 translation uses VMID and is 
performed by hardware. When H-extension is not present, there's no such 
thing called VMID. When H-extension is not present, both hypervisor and 
guest supervisor will run in supervisor mode, and hypervisor uses 
MSTATUS.TVM to trap guest supervisor virtual memory operations. The 
shadow page table is populated by doing 2-stage page walk in software. 
In this case, the hypervisor likely needs to use some bits of ASID to 
emulate the VMID feature. In this case GLOBAL page cannot be used as it 
means that the page exists in all physical ASIDs (which contains both 
emulated VMID and ASID). Having supervisor pages being GLOBAL makes the 
semantics incorrect!

> If hardware don't support H-extension (MMU 2-stages translation), it's
> hard to accept for virtualization performance.

The RISC-V privileged spec is explicitly designed to allow the 
techniques described above (this is the sole purpose of MSTATUS.TVM). It 
might be as high performance as a hardware with H-extension, but is 
definitely a legit use case. In fact, it is vital for use cases like 
recursive virtualization.

Also, I believe the PTE format of RISC-V is already frozen -- therefore 
it is impossible now to merge GLOBAL and USER bit, nor to replace RSW 
bit with another bit.

> 
> I don't think hypervisor is a real scene for GLOBAL in PTE.
> 
> Are there other scene for GLOBAL in PTE?
> 
> Best Regards
>   Guo Ren
>