All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-mips@linux-mips.org, Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>,
	mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com,
	sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
	elfring@users.sourceforge.net, Jonas Bonn <jonas@southpole.se>,
	linux-s390@vger.kernel.org, dancol@google.com,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Max Filippov <jcmvbkbc@gmail.com>,
	linux-hexagon@vger.kernel.org, Helge Deller <deller@gmx.de>,
	"maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" <x86@kernel.org>,
	hughd@google.com, "James E.J. Bottomley" <jejb@parisc-linux.org>,
	kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu,
	Ingo Molnar <mingo@redhat.com>,
	Geert Uytter
Subject: Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>

Ck9uIDEwLzEyLzE4IDU6NTAgUE0sIEpvZWwgRmVybmFuZGVzIHdyb3RlOgo+IE9uIEZyaSwgT2N0
IDEyLCAyMDE4IGF0IDA1OjQyOjI0UE0gKzAxMDAsIEFudG9uIEl2YW5vdiB3cm90ZToKPj4gT24g
MTAvMTIvMTggMzo0OCBQTSwgQW50b24gSXZhbm92IHdyb3RlOgo+Pj4gT24gMTIvMTAvMjAxOCAx
NTozNywgS2lyaWxsIEEuIFNodXRlbW92IHdyb3RlOgo+Pj4+IE9uIEZyaSwgT2N0IDEyLCAyMDE4
IGF0IDAzOjA5OjQ5UE0gKzAxMDAsIEFudG9uIEl2YW5vdiB3cm90ZToKPj4+Pj4gT24gMTAvMTIv
MTggMjozNyBBTSwgSm9lbCBGZXJuYW5kZXMgKEdvb2dsZSkgd3JvdGU6Cj4+Pj4+PiBBbmRyb2lk
IG5lZWRzIHRvIG1yZW1hcCBsYXJnZSByZWdpb25zIG9mIG1lbW9yeSBkdXJpbmcKPj4+Pj4+IG1l
bW9yeSBtYW5hZ2VtZW50Cj4+Pj4+PiByZWxhdGVkIG9wZXJhdGlvbnMuIFRoZSBtcmVtYXAgc3lz
dGVtIGNhbGwgY2FuIGJlIHJlYWxseQo+Pj4+Pj4gc2xvdyBpZiBUSFAgaXMKPj4+Pj4+IG5vdCBl
bmFibGVkLiBUaGUgYm90dGxlbmVjayBpcyBtb3ZlX3BhZ2VfdGFibGVzLCB3aGljaCBpcyBjb3B5
aW5nIGVhY2gKPj4+Pj4+IHB0ZSBhdCBhIHRpbWUsIGFuZCBjYW4gYmUgcmVhbGx5IHNsb3cgYWNy
b3NzIGEgbGFyZ2UgbWFwLgo+Pj4+Pj4gVHVybmluZyBvbiBUSFAKPj4+Pj4+IG1heSBub3QgYmUg
YSB2aWFibGUgb3B0aW9uLCBhbmQgaXMgbm90IGZvciB1cy4gVGhpcyBwYXRjaAo+Pj4+Pj4gc3Bl
ZWRzIHVwIHRoZQo+Pj4+Pj4gcGVyZm9ybWFuY2UgZm9yIG5vbi1USFAgc3lzdGVtIGJ5IGNvcHlp
bmcgYXQgdGhlIFBNRCBsZXZlbAo+Pj4+Pj4gd2hlbiBwb3NzaWJsZS4KPj4+Pj4+Cj4+Pj4+PiBU
aGUgc3BlZWQgdXAgaXMgdGhyZWUgb3JkZXJzIG9mIG1hZ25pdHVkZS4gT24gYSAxR0IgbXJlbWFw
LCB0aGUgbXJlbWFwCj4+Pj4+PiBjb21wbGV0aW9uIHRpbWVzIGRyb3BzIGZyb20gMTYwLTI1MCBt
aWxsZXNjb25kcyB0byAzODAtNDAwCj4+Pj4+PiBtaWNyb3NlY29uZHMuCj4+Pj4+Pgo+Pj4+Pj4g
QmVmb3JlOgo+Pj4+Pj4gVG90YWwgbXJlbWFwIHRpbWUgZm9yIDFHQiBkYXRhOiAyNDIzMjEwMTQg
bmFub3NlY29uZHMuCj4+Pj4+PiBUb3RhbCBtcmVtYXAgdGltZSBmb3IgMUdCIGRhdGE6IDE5Njg0
MjQ2NyBuYW5vc2Vjb25kcy4KPj4+Pj4+IFRvdGFsIG1yZW1hcCB0aW1lIGZvciAxR0IgZGF0YTog
MTY3MDUxMTYyIG5hbm9zZWNvbmRzLgo+Pj4+Pj4KPj4+Pj4+IEFmdGVyOgo+Pj4+Pj4gVG90YWwg
bXJlbWFwIHRpbWUgZm9yIDFHQiBkYXRhOiAzODU3ODEgbmFub3NlY29uZHMuCj4+Pj4+PiBUb3Rh
bCBtcmVtYXAgdGltZSBmb3IgMUdCIGRhdGE6IDM4ODk1OSBuYW5vc2Vjb25kcy4KPj4+Pj4+IFRv
dGFsIG1yZW1hcCB0aW1lIGZvciAxR0IgZGF0YTogNDAyODEzIG5hbm9zZWNvbmRzLgo+Pj4+Pj4K
Pj4+Pj4+IEluY2FzZSBUSFAgaXMgZW5hYmxlZCwgdGhlIG9wdGltaXphdGlvbiBpcyBza2lwcGVk
LiBJIGFsc28gZmx1c2ggdGhlCj4+Pj4+PiB0bGIgZXZlcnkgdGltZSB3ZSBkbyB0aGlzIG9wdGlt
aXphdGlvbiBzaW5jZSBJIGNvdWxkbid0IGZpbmQgYSB3YXkgdG8KPj4+Pj4+IGRldGVybWluZSBp
ZiB0aGUgbG93LWxldmVsIFBURXMgYXJlIGRpcnR5LiBJdCBpcyBzZWVuIHRoYXQgdGhlIGNvc3Qg
b2YKPj4+Pj4+IGRvaW5nIHNvIGlzIG5vdCBtdWNoIGNvbXBhcmVkIHRoZSBpbXByb3ZlbWVudCwg
b24gYm90aAo+Pj4+Pj4geDg2LTY0IGFuZCBhcm02NC4KPj4+Pj4+Cj4+Pj4+PiBDYzogbWluY2hh
bkBrZXJuZWwub3JnCj4+Pj4+PiBDYzogcGFudGluQGdvb2dsZS5jb20KPj4+Pj4+IENjOiBodWdo
ZEBnb29nbGUuY29tCj4+Pj4+PiBDYzogbG9rZXNoZ2lkcmFAZ29vZ2xlLmNvbQo+Pj4+Pj4gQ2M6
IGRhbmNvbEBnb29nbGUuY29tCj4+Pj4+PiBDYzogbWhvY2tvQGtlcm5lbC5vcmcKPj4+Pj4+IENj
OiBraXJpbGxAc2h1dGVtb3YubmFtZQo+Pj4+Pj4gQ2M6IGFrcG1AbGludXgtZm91bmRhdGlvbi5v
cmcKPj4+Pj4+IFNpZ25lZC1vZmYtYnk6IEpvZWwgRmVybmFuZGVzIChHb29nbGUpIDxqb2VsQGpv
ZWxmZXJuYW5kZXMub3JnPgo+Pj4+Pj4gLS0tCj4+Pj4+PiAgwqDCoCBtbS9tcmVtYXAuYyB8IDYy
Cj4+Pj4+PiArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKwo+Pj4+Pj4gIMKgwqAgMSBmaWxlIGNoYW5nZWQsIDYyIGluc2VydGlvbnMoKykKPj4+Pj4+
Cj4+Pj4+PiBkaWZmIC0tZ2l0IGEvbW0vbXJlbWFwLmMgYi9tbS9tcmVtYXAuYwo+Pj4+Pj4gaW5k
ZXggOWU2OGEwMmE1MmIxLi5kODJjNDg1ODIyZWYgMTAwNjQ0Cj4+Pj4+PiAtLS0gYS9tbS9tcmVt
YXAuYwo+Pj4+Pj4gKysrIGIvbW0vbXJlbWFwLmMKPj4+Pj4+IEBAIC0xOTEsNiArMTkxLDU0IEBA
IHN0YXRpYyB2b2lkIG1vdmVfcHRlcyhzdHJ1Y3QKPj4+Pj4+IHZtX2FyZWFfc3RydWN0ICp2bWEs
IHBtZF90ICpvbGRfcG1kLAo+Pj4+Pj4gIMKgwqDCoMKgwqDCoMKgwqDCoMKgIGRyb3Bfcm1hcF9s
b2Nrcyh2bWEpOwo+Pj4+Pj4gIMKgwqAgfQo+Pj4+Pj4gK3N0YXRpYyBib29sIG1vdmVfbm9ybWFs
X3BtZChzdHJ1Y3Qgdm1fYXJlYV9zdHJ1Y3QgKnZtYSwKPj4+Pj4+IHVuc2lnbmVkIGxvbmcgb2xk
X2FkZHIsCj4+Pj4+PiArwqDCoMKgwqDCoMKgwqDCoMKgIHVuc2lnbmVkIGxvbmcgbmV3X2FkZHIs
IHVuc2lnbmVkIGxvbmcgb2xkX2VuZCwKPj4+Pj4+ICvCoMKgwqDCoMKgwqDCoMKgwqAgcG1kX3Qg
Km9sZF9wbWQsIHBtZF90ICpuZXdfcG1kLCBib29sICpuZWVkX2ZsdXNoKQo+Pj4+Pj4gK3sKPj4+
Pj4+ICvCoMKgwqAgc3BpbmxvY2tfdCAqb2xkX3B0bCwgKm5ld19wdGw7Cj4+Pj4+PiArwqDCoMKg
IHN0cnVjdCBtbV9zdHJ1Y3QgKm1tID0gdm1hLT52bV9tbTsKPj4+Pj4+ICsKPj4+Pj4+ICvCoMKg
wqAgaWYgKChvbGRfYWRkciAmIH5QTURfTUFTSykgfHwgKG5ld19hZGRyICYgflBNRF9NQVNLKQo+
Pj4+Pj4gK8KgwqDCoMKgwqDCoMKgIHx8IG9sZF9lbmQgLSBvbGRfYWRkciA8IFBNRF9TSVpFKQo+
Pj4+Pj4gK8KgwqDCoMKgwqDCoMKgIHJldHVybiBmYWxzZTsKPj4+Pj4+ICsKPj4+Pj4+ICvCoMKg
wqAgLyoKPj4+Pj4+ICvCoMKgwqDCoCAqIFRoZSBkZXN0aW5hdGlvbiBwbWQgc2hvdWxkbid0IGJl
IGVzdGFibGlzaGVkLCBmcmVlX3BndGFibGVzKCkKPj4+Pj4+ICvCoMKgwqDCoCAqIHNob3VsZCBo
YXZlIHJlbGVhc2UgaXQuCj4+Pj4+PiArwqDCoMKgwqAgKi8KPj4+Pj4+ICvCoMKgwqAgaWYgKFdB
Uk5fT04oIXBtZF9ub25lKCpuZXdfcG1kKSkpCj4+Pj4+PiArwqDCoMKgwqDCoMKgwqAgcmV0dXJu
IGZhbHNlOwo+Pj4+Pj4gKwo+Pj4+Pj4gK8KgwqDCoCAvKgo+Pj4+Pj4gK8KgwqDCoMKgICogV2Ug
ZG9uJ3QgaGF2ZSB0byB3b3JyeSBhYm91dCB0aGUgb3JkZXJpbmcgb2Ygc3JjIGFuZCBkc3QKPj4+
Pj4+ICvCoMKgwqDCoCAqIHB0bG9ja3MgYmVjYXVzZSBleGNsdXNpdmUgbW1hcF9zZW0gcHJldmVu
dHMgZGVhZGxvY2suCj4+Pj4+PiArwqDCoMKgwqAgKi8KPj4+Pj4+ICvCoMKgwqAgb2xkX3B0bCA9
IHBtZF9sb2NrKHZtYS0+dm1fbW0sIG9sZF9wbWQpOwo+Pj4+Pj4gK8KgwqDCoCBpZiAob2xkX3B0
bCkgewo+Pj4+Pj4gK8KgwqDCoMKgwqDCoMKgIHBtZF90IHBtZDsKPj4+Pj4+ICsKPj4+Pj4+ICvC
oMKgwqDCoMKgwqDCoCBuZXdfcHRsID0gcG1kX2xvY2twdHIobW0sIG5ld19wbWQpOwo+Pj4+Pj4g
K8KgwqDCoMKgwqDCoMKgIGlmIChuZXdfcHRsICE9IG9sZF9wdGwpCj4+Pj4+PiArwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoCBzcGluX2xvY2tfbmVzdGVkKG5ld19wdGwsIFNJTkdMRV9ERVBUSF9ORVNU
SU5HKTsKPj4+Pj4+ICsKPj4+Pj4+ICvCoMKgwqDCoMKgwqDCoCAvKiBDbGVhciB0aGUgcG1kICov
Cj4+Pj4+PiArwqDCoMKgwqDCoMKgwqAgcG1kID0gKm9sZF9wbWQ7Cj4+Pj4+PiArwqDCoMKgwqDC
oMKgwqAgcG1kX2NsZWFyKG9sZF9wbWQpOwo+Pj4+Pj4gKwo+Pj4+Pj4gK8KgwqDCoMKgwqDCoMKg
IFZNX0JVR19PTighcG1kX25vbmUoKm5ld19wbWQpKTsKPj4+Pj4+ICsKPj4+Pj4+ICvCoMKgwqDC
oMKgwqDCoCAvKiBTZXQgdGhlIG5ldyBwbWQgKi8KPj4+Pj4+ICvCoMKgwqDCoMKgwqDCoCBzZXRf
cG1kX2F0KG1tLCBuZXdfYWRkciwgbmV3X3BtZCwgcG1kKTsKPj4+Pj4gVU1MIGRvZXMgbm90IGhh
dmUgc2V0X3BtZF9hdCBhdCBhbGwKPj4+PiBFdmVyeSBhcmNoaXRlY3R1cmUgZG9lcy4gOikKPj4+
IEkgdHJpZWQgdG8gYnVpbGQgaXQgcGF0Y2hpbmcgdnMgNC4xOS1yYyBiZWZvcmUgSSBtYWRlIHRo
aXMgc3RhdGVtZW50IGFuZAo+Pj4gcmFuIGludG8gdGhhdC4KPj4+Cj4+PiBQcmVzZW50bHkgaXQg
ZG9lcyBub3QuCj4+Pgo+Pj4gaHR0cHM6Ly9lbGl4aXIuYm9vdGxpbi5jb20vbGludXgvdjQuMTkt
cmM3L2lkZW50L3NldF9wbWRfYXQgLSBVTUwgaXMgbm90Cj4+PiBvbiB0aGUgbGlzdC4KPj4gT25j
ZSB0aGlzIHByb2JsZW0gYXMgd2VsbCBhcyB0aGUgb21pc3Npb25zIGluIHRoZSBpbmNsdWRlIGNo
YW5nZXMgZm9yIFVNTCBpbgo+PiBwYXRjaCBvbmUgaGF2ZSBiZWVuIGZpeGVkIGl0IGFwcGVhcnMg
dG8gYmUgd29ya2luZy4KPj4KPj4gV2hhdCBpdCBuZWVkcyBpcyBhdHRhY2hlZC4KPj4KPj4KPj4+
PiBCdXQgaXQgbWF5IGNvbWUgbm90IGZyb20gdGhlIGFyY2ggY29kZS4KPj4+IFRoZXJlIGlzIG5v
IGdlbmVyaWMgZGVmaW5pdGlvbiBhcyBmYXIgYXMgSSBjYW4gc2VlLiBBbGwgMTIgZGVmaW5lcyBp
bgo+Pj4gNC4xOSBhcmUgaW4gYXJjaCBzcGVjaWZpYyBjb2RlLiBVbmxlc3MgaSBhbSBtaXNzaW5n
IHNvbWV0aGluZy4uLgo+Pj4KPj4+Pj4gSWYgSSByZWFkIHRoZSBjb2RlIHJpZ2h0LCBNSVBTIGNv
bXBsZXRlbHkgaWdub3JlcyB0aGUgYWRkcmVzcwo+Pj4+PiBhcmd1bWVudCBzbwo+Pj4+PiBzZXRf
cG1kX2F0IHRoZXJlIG1heSBub3QgaGF2ZSB0aGUgZWZmZWN0IHdoaWNoIHRoaXMgcGF0Y2ggaXMg
dHJ5aW5nIHRvCj4+Pj4+IGFjaGlldmUuCj4+Pj4gSWdub3JpbmcgYWRkcmVzcyBpcyBmaW5lLiBN
b3N0IGFyY2hpdGVjdHVyZXMgZG8gdGhhdC4uCj4+Pj4gVGhlIGlkZWFzIGlzIHRvIG1vdmUgcGFn
ZSB0YWJsZSB0byB0aGUgbmV3IHBtZCBzbG90LiBJdCdzIG5vdGhpbmcgdG8gZG8KPj4+PiB3aXRo
IHRoZSBhZGRyZXNzIHBhc3NlZCB0byBzZXRfcG1kX2F0KCkuCj4+PiBJZiB0aGF0IGlzIGl0J3Mg
b25seSBmdW5jdGlvbiwgdGhlbiBJIGFtIGdvaW5nIHRvIGFwcHJvcHJpYXRlIHRoZSBjb2RlCj4+
PiBvdXQgb2YgdGhlIE1JUFMgdHJlZSBmb3IgZnVydGhlciB1bWwgdGVzdGluZy4gSXQgZG9lcyBl
eGFjdGx5IHRoYXQgLQo+Pj4ganVzdCBtb3ZlIHRoZSBwbWQgdGhlIG5ldyBzbG90Lgo+Pj4KPj4+
IEEuCj4+Cj4+IEEuCj4+Cj4+ICBGcm9tIGFjMjY1ZDk2ODk3YTM0NmIwNTY0NmZjZTkxNzg0ZWQ0
OTIyYzdmOGQgTW9uIFNlcCAxNyAwMDowMDowMCAyMDAxCj4+IEZyb206IEFudG9uIEl2YW5vdiA8
YW50b24uaXZhbm92QGNhbWJyaWRnZWdyZXlzLmNvbT4KPj4gRGF0ZTogRnJpLCAxMiBPY3QgMjAx
OCAxNzoyNDoxMCArMDEwMAo+PiBTdWJqZWN0OiBbUEFUQ0hdIEluY3JlbWVudGFsIGZpeGVzIHRv
IHRoZSBtbXJlbWFwIHBhdGNoCj4+Cj4+IFNpZ25lZC1vZmYtYnk6IEFudG9uIEl2YW5vdiA8YW50
b24uaXZhbm92QGNhbWJyaWRnZWdyZXlzLmNvbT4KPj4gLS0tCj4+ICAgYXJjaC91bS9pbmNsdWRl
L2FzbS9wZ2FsbG9jLmggfCA0ICsrLS0KPj4gICBhcmNoL3VtL2luY2x1ZGUvYXNtL3BndGFibGUu
aCB8IDMgKysrCj4+ICAgYXJjaC91bS9rZXJuZWwvdGxiLmMgICAgICAgICAgfCA2ICsrKysrKwo+
PiAgIDMgZmlsZXMgY2hhbmdlZCwgMTEgaW5zZXJ0aW9ucygrKSwgMiBkZWxldGlvbnMoLSkKPj4K
Pj4gZGlmZiAtLWdpdCBhL2FyY2gvdW0vaW5jbHVkZS9hc20vcGdhbGxvYy5oIGIvYXJjaC91bS9p
bmNsdWRlL2FzbS9wZ2FsbG9jLmgKPj4gaW5kZXggYmY5MGIyYWEyMDAyLi45OWViNTY4Mjc5MmEg
MTAwNjQ0Cj4+IC0tLSBhL2FyY2gvdW0vaW5jbHVkZS9hc20vcGdhbGxvYy5oCj4+ICsrKyBiL2Fy
Y2gvdW0vaW5jbHVkZS9hc20vcGdhbGxvYy5oCj4+IEBAIC0yNSw4ICsyNSw4IEBACj4+ICAgZXh0
ZXJuIHBnZF90ICpwZ2RfYWxsb2Moc3RydWN0IG1tX3N0cnVjdCAqKTsKPj4gICBleHRlcm4gdm9p
ZCBwZ2RfZnJlZShzdHJ1Y3QgbW1fc3RydWN0ICptbSwgcGdkX3QgKnBnZCk7Cj4+ICAgCj4+IC1l
eHRlcm4gcHRlX3QgKnB0ZV9hbGxvY19vbmVfa2VybmVsKHN0cnVjdCBtbV9zdHJ1Y3QgKiwgdW5z
aWduZWQgbG9uZyk7Cj4+IC1leHRlcm4gcGd0YWJsZV90IHB0ZV9hbGxvY19vbmUoc3RydWN0IG1t
X3N0cnVjdCAqLCB1bnNpZ25lZCBsb25nKTsKPj4gK2V4dGVybiBwdGVfdCAqcHRlX2FsbG9jX29u
ZV9rZXJuZWwoc3RydWN0IG1tX3N0cnVjdCAqKTsKPj4gK2V4dGVybiBwZ3RhYmxlX3QgcHRlX2Fs
bG9jX29uZShzdHJ1Y3QgbW1fc3RydWN0ICopOwo+IElmIGl0cyBPaywgbGV0IG1lIGhhbmRsZSB0
aGlzIGJpdCBzaW5jZSBvdGhlcndpc2UgaXQgY29tcGxpY2F0ZXMgdGhpbmdzIGZvcgo+IG1lLgo+
Cj4+ICAgc3RhdGljIGlubGluZSB2b2lkIHB0ZV9mcmVlX2tlcm5lbChzdHJ1Y3QgbW1fc3RydWN0
ICptbSwgcHRlX3QgKnB0ZSkKPj4gICB7Cj4+IGRpZmYgLS1naXQgYS9hcmNoL3VtL2luY2x1ZGUv
YXNtL3BndGFibGUuaCBiL2FyY2gvdW0vaW5jbHVkZS9hc20vcGd0YWJsZS5oCj4+IGluZGV4IDc0
ODUzOThkMDczNy4uMTY5MmRhNTVlNjNhIDEwMDY0NAo+PiAtLS0gYS9hcmNoL3VtL2luY2x1ZGUv
YXNtL3BndGFibGUuaAo+PiArKysgYi9hcmNoL3VtL2luY2x1ZGUvYXNtL3BndGFibGUuaAo+PiBA
QCAtMzU5LDQgKzM1OSw3IEBAIGRvIHsJCQkJCQlcCj4+ICAgCV9fZmx1c2hfdGxiX29uZSgodmFk
ZHIpKTsJCVwKPj4gICB9IHdoaWxlICgwKQo+PiAgIAo+PiArZXh0ZXJuIHZvaWQgc2V0X3BtZF9h
dChzdHJ1Y3QgbW1fc3RydWN0ICptbSwgdW5zaWduZWQgbG9uZyBhZGRyLAo+PiArCQlwbWRfdCAq
cG1kcCwgcG1kX3QgcG1kKTsKPj4gKwo+PiAgICNlbmRpZgo+PiBkaWZmIC0tZ2l0IGEvYXJjaC91
bS9rZXJuZWwvdGxiLmMgYi9hcmNoL3VtL2tlcm5lbC90bGIuYwo+PiBpbmRleCA3NjNkMzViZGRh
MDEuLmQxN2I3NDE4NGJhMCAxMDA2NDQKPj4gLS0tIGEvYXJjaC91bS9rZXJuZWwvdGxiLmMKPj4g
KysrIGIvYXJjaC91bS9rZXJuZWwvdGxiLmMKPj4gQEAgLTY0NywzICs2NDcsOSBAQCB2b2lkIGZv
cmNlX2ZsdXNoX2FsbCh2b2lkKQo+PiAgIAkJdm1hID0gdm1hLT52bV9uZXh0Owo+PiAgIAl9Cj4+
ICAgfQo+PiArdm9pZCBzZXRfcG1kX2F0KHN0cnVjdCBtbV9zdHJ1Y3QgKm1tLCB1bnNpZ25lZCBs
b25nIGFkZHIsCj4+ICsJCXBtZF90ICpwbWRwLCBwbWRfdCBwbWQpCj4+ICt7Cj4+ICsJKnBtZHAg
PSBwbWQ7Cj4+ICt9Cj4+ICsKPiBJIGJlbGlldmUgdGhpcyBzaG91bGQgYmUgaW5jbHVkZWQgaW4g
YSBzZXBhcmF0ZSBwYXRjaCBzaW5jZSBpdCBpcyBub3QgcmVsYXRlZAo+IHNwZWNpZmljYWxseSB0
byBwdGVfYWxsb2MgYXJndW1lbnQgcmVtb3ZhbC4gSWYgeW91IHdhbnQsIEkgY291bGQgc3BsaXQg
aXQKPiBpbnRvIGEgc2VwYXJhdGUgcGF0Y2ggZm9yIG15IHNlcmllcyB3aXRoIHlvdSBhcyBhdXRo
b3IuCgoKV2hpY2hldmVyIGlzIG1vcmUgY29udmVuaWVudCBmb3IgeW91LgoKT25lIHRoaW5nIHRv
IG5vdGUgLSB0bGIgZmx1c2ggaXMgZXh0cmVtZWx5IGV4cGVuc2l2ZSBvbiB1bWwuCgpJIGhhdmUg
bGlmdGVkIHRoZSBkZWZpbml0aW9uIG9mIHNldF9wbWRfYXQgZnJvbSB0aGUgbWlwcyB0cmVlIGFu
ZCAKcmVtb3ZlZCB0aGUgdGxiX2ZsdXNoX2FsbCBmcm9tIGl0IGZvciB0aGlzIGV4YWN0IHJlYXNv
bi4KCklmIEkgcmVhZCB0aGUgb3JpZ2luYWwgcGF0Y2ggY29ycmVjdGx5LCBpdCBkb2VzIGl0cyBv
d24gZmx1c2ggY29udHJvbCBzbyAKc2V0X3BtZF9hdCBkb2VzIG5vdCBuZWVkIHRvIGRvIGEgZm9y
Y2UgZmx1c2ggZXZlcnkgdGltZS4gSXQgaXMgZG9uZSAKZnVydGhlciB1cCB0aGUgY2hhaW4uCgpC
cmdkcywKCkEuCgoKPgo+IHRoYW5rcywKPgo+IC0gSm9lbAo+Cj4KCl9fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmxpbnV4LXJpc2N2IG1haWxpbmcgbGlzdAps
aW51eC1yaXNjdkBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3Jn
L21haWxtYW4vbGlzdGluZm8vbGludXgtcmlzY3YK

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-mips@linux-mips.org, Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>,
	mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com,
	sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
	elfring@users.sourceforge.net, Jonas Bonn <jonas@southpole.se>,
	linux-s390@vger.kernel.org, dancol@google.com,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Max Filippov <jcmvbkbc@gmail.com>,
	linux-hexagon@vger.kernel.org, Helge Deller <deller@gmx.de>,
	"maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" <x86@kernel.org>,
	hughd@google.com, "James E.J. Bottomley" <jejb@parisc-linux.org>,
	kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu,
	Ingo Molnar <mingo@redhat.com>,
	Geert
Subject: Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan@kernel.org
>>>>>> Cc: pantin@google.com
>>>>>> Cc: hughd@google.com
>>>>>> Cc: lokeshgidra@google.com
>>>>>> Cc: dancol@google.com
>>>>>> Cc: mhocko@kernel.org
>>>>>> Cc: kirill@shutemov.name
>>>>>> Cc: akpm@linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>     mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>             drop_rmap_locks(vma);
>>>>>>     }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +          unsigned long new_addr, unsigned long old_end,
>>>>>> +          pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +    spinlock_t *old_ptl, *new_ptl;
>>>>>> +    struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +    if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +        || old_end - old_addr < PMD_SIZE)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +     * should have release it.
>>>>>> +     */
>>>>>> +    if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * We don't have to worry about the ordering of src and dst
>>>>>> +     * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +     */
>>>>>> +    old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +    if (old_ptl) {
>>>>>> +        pmd_t pmd;
>>>>>> +
>>>>>> +        new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +        if (new_ptl != old_ptl)
>>>>>> +            spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +        /* Clear the pmd */
>>>>>> +        pmd = *old_pmd;
>>>>>> +        pmd_clear(old_pmd);
>>>>>> +
>>>>>> +        VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +        /* Set the new pmd */
>>>>>> +        set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org, linux-mips@linux-mips.org,
	Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>,
	mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com,
	linux-riscv@lists.infradead.org, elfring@users.sourceforge.net,
	Jonas Bonn <jonas@southpole.se>,
	linux-s390@vger.kernel.org, dancol@google.com,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-hexagon@vger.kernel.org, Helge Deller <deller@gmx.de>,
	"maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" <x86@kernel.org>,
	hughd@google.com, "James E.J. Bottomley" <jejb@parisc-linux.org>,
	kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu,
	Ingo Molnar <mingo@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	linux-snps-arc@lists.infradead.org, kernel-team@android.com,
	Sam Creasey <sammy@sammy.net>, Fenghua Yu <fenghua.yu@intel.com>,
	Jeff Dike <jdike@addtoit.com>,
	linux-um@lists.infradead.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Julia Lawall <Julia.Lawall@lip6.fr>,
	linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	nios2-dev@lists.rocketboards.org,
	Stafford Horne <shorne@gmail.com>, Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Chris Zankel <chris@zankel.net>, Tony Luck <tony.luck@intel.com>,
	Richard Weinberger <richard@nod.at>,
	linux-parisc@vger.kernel.org, pantin@google.com,
	Max Filippov <jcmvbkbc@gmail.com>,
	minchan@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	linux-alpha@vger.kernel.org, Ley Foon Tan <lftan@altera.com>,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan@kernel.org
>>>>>> Cc: pantin@google.com
>>>>>> Cc: hughd@google.com
>>>>>> Cc: lokeshgidra@google.com
>>>>>> Cc: dancol@google.com
>>>>>> Cc: mhocko@kernel.org
>>>>>> Cc: kirill@shutemov.name
>>>>>> Cc: akpm@linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>     mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>             drop_rmap_locks(vma);
>>>>>>     }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +          unsigned long new_addr, unsigned long old_end,
>>>>>> +          pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +    spinlock_t *old_ptl, *new_ptl;
>>>>>> +    struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +    if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +        || old_end - old_addr < PMD_SIZE)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +     * should have release it.
>>>>>> +     */
>>>>>> +    if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * We don't have to worry about the ordering of src and dst
>>>>>> +     * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +     */
>>>>>> +    old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +    if (old_ptl) {
>>>>>> +        pmd_t pmd;
>>>>>> +
>>>>>> +        new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +        if (new_ptl != old_ptl)
>>>>>> +            spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +        /* Clear the pmd */
>>>>>> +        pmd = *old_pmd;
>>>>>> +        pmd_clear(old_pmd);
>>>>>> +
>>>>>> +        VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +        /* Set the new pmd */
>>>>>> +        set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

WARNING: multiple messages have this Message-ID (diff)
From: anton.ivanov@kot-begemot.co.uk (Anton Ivanov)
To: linux-riscv@lists.infradead.org
Subject: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan at kernel.org
>>>>>> Cc: pantin at google.com
>>>>>> Cc: hughd at google.com
>>>>>> Cc: lokeshgidra at google.com
>>>>>> Cc: dancol at google.com
>>>>>> Cc: mhocko at kernel.org
>>>>>> Cc: kirill at shutemov.name
>>>>>> Cc: akpm at linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>  ?? mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  ?? 1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>  ?????????? drop_rmap_locks(vma);
>>>>>>  ?? }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +????????? unsigned long new_addr, unsigned long old_end,
>>>>>> +????????? pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +??? spinlock_t *old_ptl, *new_ptl;
>>>>>> +??? struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +??? if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +??????? || old_end - old_addr < PMD_SIZE)
>>>>>> +??????? return false;
>>>>>> +
>>>>>> +??? /*
>>>>>> +???? * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +???? * should have release it.
>>>>>> +???? */
>>>>>> +??? if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +??????? return false;
>>>>>> +
>>>>>> +??? /*
>>>>>> +???? * We don't have to worry about the ordering of src and dst
>>>>>> +???? * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +???? */
>>>>>> +??? old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +??? if (old_ptl) {
>>>>>> +??????? pmd_t pmd;
>>>>>> +
>>>>>> +??????? new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +??????? if (new_ptl != old_ptl)
>>>>>> +??????????? spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +??????? /* Clear the pmd */
>>>>>> +??????? pmd = *old_pmd;
>>>>>> +??????? pmd_clear(old_pmd);
>>>>>> +
>>>>>> +??????? VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +??????? /* Set the new pmd */
>>>>>> +??????? set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-mips@linux-mips.org, Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>,
	mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com,
	sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
	elfring@users.sourceforge.net, Jonas Bonn <jonas@southpole.se>,
	linux-s390@vger.kernel.org, dancol@google.com,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Max Filippov <jcmvbkbc@gmail.com>,
	linux-hexagon@vger.kernel.org, Helge Deller <deller@gmx.de>,
	"maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" <x86@kernel.org>,
	hughd@google.com, "James E.J. Bottomley" <jejb@parisc-linux.org>,
	kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu,
	Ingo Molnar <mingo@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	linux-snps-arc@lists.infradead.org, kernel-team@android.com,
	Sam Creasey <sammy@sammy.net>,
	linux-xtensa@linux-xtensa.org, Jeff Dike <jdike@addtoit.com>,
	linux-alpha@vger.kernel.org, linux-um@lists.infradead.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Julia Lawall <Julia.Lawall@lip6.fr>,
	linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Ley Foon Tan <lftan@altera.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Stafford Horne <shorne@gmail.com>, Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Chris Zankel <chris@zankel.net>, Tony Luck <tony.luck@intel.com>,
	linux-parisc@vger.kernel.org, pantin@google.com,
	linux-kernel@vger.kernel.org, Fenghua Yu <fenghua.yu@intel.com>,
	minchan@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Richard Weinberger <richard@nod.at>,
	nios2-dev@lists.rocketboards.org, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
Message-ID: <20181012165840.dqj_dqCuDEfoQhhMAs1XhoyoZjSpDktAHDyV-irfOEw@z> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan@kernel.org
>>>>>> Cc: pantin@google.com
>>>>>> Cc: hughd@google.com
>>>>>> Cc: lokeshgidra@google.com
>>>>>> Cc: dancol@google.com
>>>>>> Cc: mhocko@kernel.org
>>>>>> Cc: kirill@shutemov.name
>>>>>> Cc: akpm@linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>     mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>             drop_rmap_locks(vma);
>>>>>>     }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +          unsigned long new_addr, unsigned long old_end,
>>>>>> +          pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +    spinlock_t *old_ptl, *new_ptl;
>>>>>> +    struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +    if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +        || old_end - old_addr < PMD_SIZE)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +     * should have release it.
>>>>>> +     */
>>>>>> +    if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * We don't have to worry about the ordering of src and dst
>>>>>> +     * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +     */
>>>>>> +    old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +    if (old_ptl) {
>>>>>> +        pmd_t pmd;
>>>>>> +
>>>>>> +        new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +        if (new_ptl != old_ptl)
>>>>>> +            spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +        /* Clear the pmd */
>>>>>> +        pmd = *old_pmd;
>>>>>> +        pmd_clear(old_pmd);
>>>>>> +
>>>>>> +        VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +        /* Set the new pmd */
>>>>>> +        set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org, linux-mips@linux-mips.org,
	Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>,
	mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com,
	linux-riscv@lists.infradead.org, elfring@users.sourceforge.net,
	Jonas Bonn <jonas@southpole.se>,
	linux-s390@vger.kernel.org, dancol@google.com,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-hexagon@vger.kernel.org, Helge Deller <deller@gmx.de>,
	"maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" <x86@kernel.org>,
	hughd@google.com, "James E.J. Bottomley" <jejb@parisc-linux.org>,
	kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu,
	Ingo Molnar <mingo@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	linux-snps-arc@lists.infradead.org, kernel-team@android.com,
	Sam Creasey <sammy@sammy.net>, Fenghua Yu <fenghua.yu@intel.com>,
	Jeff Dike <jdike@addtoit.com>,
	linux-um@lists.infradead.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Julia Lawall <Julia.Lawall@lip6.fr>,
	linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	nios2-dev@lists.rocketboards.org,
	Stafford Horne <shorne@gmail.com>, Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Chris Zankel <chris@zankel.net>, Tony Luck <tony.luck@intel.com>,
	Richard Weinberger <richard@nod.at>,
	linux-parisc@vger.kernel.org, pantin@google.com,
	Max Filippov <jcmvbkbc@gmail.com>,
	minchan@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	linux-alpha@vger.kernel.org, Ley Foon Tan <lftan@altera.com>,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan@kernel.org
>>>>>> Cc: pantin@google.com
>>>>>> Cc: hughd@google.com
>>>>>> Cc: lokeshgidra@google.com
>>>>>> Cc: dancol@google.com
>>>>>> Cc: mhocko@kernel.org
>>>>>> Cc: kirill@shutemov.name
>>>>>> Cc: akpm@linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>  A A  mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  A A  1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>  A A A A A A A A A A  drop_rmap_locks(vma);
>>>>>>  A A  }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +A A A A A A A A A  unsigned long new_addr, unsigned long old_end,
>>>>>> +A A A A A A A A A  pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +A A A  spinlock_t *old_ptl, *new_ptl;
>>>>>> +A A A  struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +A A A  if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +A A A A A A A  || old_end - old_addr < PMD_SIZE)
>>>>>> +A A A A A A A  return false;
>>>>>> +
>>>>>> +A A A  /*
>>>>>> +A A A A  * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +A A A A  * should have release it.
>>>>>> +A A A A  */
>>>>>> +A A A  if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +A A A A A A A  return false;
>>>>>> +
>>>>>> +A A A  /*
>>>>>> +A A A A  * We don't have to worry about the ordering of src and dst
>>>>>> +A A A A  * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +A A A A  */
>>>>>> +A A A  old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +A A A  if (old_ptl) {
>>>>>> +A A A A A A A  pmd_t pmd;
>>>>>> +
>>>>>> +A A A A A A A  new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +A A A A A A A  if (new_ptl != old_ptl)
>>>>>> +A A A A A A A A A A A  spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +A A A A A A A  /* Clear the pmd */
>>>>>> +A A A A A A A  pmd = *old_pmd;
>>>>>> +A A A A A A A  pmd_clear(old_pmd);
>>>>>> +
>>>>>> +A A A A A A A  VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +A A A A A A A  /* Set the new pmd */
>>>>>> +A A A A A A A  set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-mips@linux-mips.org, Rich Felker <dalias@libc.org>,
	linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>,
	mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com,
	sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
	elfring@users.sourceforge.net, Jonas Bonn <jonas@southpole.se>,
	linux-s390@vger.kernel.org, dancol@google.com,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Max Filippov <jcmvbkbc@gmail.com>,
	linux-hexagon@vger.kernel.org, Helge Deller <deller@gmx.de>,
	"maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" <x86@kernel.org>,
	hughd@google.com, "James E.J. Bottomley" <jejb@parisc-linux.org>,
	kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu,
	Ingo Molnar <mingo@redhat.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	linux-snps-arc@lists.infradead.org, kernel-team@android.com,
	Sam Creasey <sammy@sammy.net>,
	linux-xtensa@linux-xtensa.org, Jeff Dike <jdike@addtoit.com>,
	linux-alpha@vger.kernel.org, linux-um@lists.infradead.org,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Julia Lawall <Julia.Lawall@lip6.fr>,
	linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Ley Foon Tan <lftan@altera.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Stafford Horne <shorne@gmail.com>, Guan Xuetao <gxt@pku.edu.cn>,
	linux-arm-kernel@lists.infradead.org,
	Chris Zankel <chris@zankel.net>, Tony Luck <tony.luck@intel.com>,
	linux-parisc@vger.kernel.org, pantin@google.com,
	linux-kernel@vger.kernel.org, Fenghua Yu <fenghua.yu@intel.com>,
	minchan@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Richard Weinberger <richard@nod.at>,
	nios2-dev@lists.rocketboards.org, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan@kernel.org
>>>>>> Cc: pantin@google.com
>>>>>> Cc: hughd@google.com
>>>>>> Cc: lokeshgidra@google.com
>>>>>> Cc: dancol@google.com
>>>>>> Cc: mhocko@kernel.org
>>>>>> Cc: kirill@shutemov.name
>>>>>> Cc: akpm@linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>     mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>             drop_rmap_locks(vma);
>>>>>>     }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +          unsigned long new_addr, unsigned long old_end,
>>>>>> +          pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +    spinlock_t *old_ptl, *new_ptl;
>>>>>> +    struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +    if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +        || old_end - old_addr < PMD_SIZE)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +     * should have release it.
>>>>>> +     */
>>>>>> +    if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * We don't have to worry about the ordering of src and dst
>>>>>> +     * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +     */
>>>>>> +    old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +    if (old_ptl) {
>>>>>> +        pmd_t pmd;
>>>>>> +
>>>>>> +        new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +        if (new_ptl != old_ptl)
>>>>>> +            spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +        /* Clear the pmd */
>>>>>> +        pmd = *old_pmd;
>>>>>> +        pmd_clear(old_pmd);
>>>>>> +
>>>>>> +        VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +        /* Set the new pmd */
>>>>>> +        set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

WARNING: multiple messages have this Message-ID (diff)
From: anton.ivanov@kot-begemot.co.uk (Anton Ivanov)
To: linux-snps-arc@lists.infradead.org
Subject: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018@05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018@03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan at kernel.org
>>>>>> Cc: pantin at google.com
>>>>>> Cc: hughd at google.com
>>>>>> Cc: lokeshgidra at google.com
>>>>>> Cc: dancol at google.com
>>>>>> Cc: mhocko at kernel.org
>>>>>> Cc: kirill at shutemov.name
>>>>>> Cc: akpm at linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel at joelfernandes.org>
>>>>>> ---
>>>>>>  ?? mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  ?? 1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>  ?????????? drop_rmap_locks(vma);
>>>>>>  ?? }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +????????? unsigned long new_addr, unsigned long old_end,
>>>>>> +????????? pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +??? spinlock_t *old_ptl, *new_ptl;
>>>>>> +??? struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +??? if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +??????? || old_end - old_addr < PMD_SIZE)
>>>>>> +??????? return false;
>>>>>> +
>>>>>> +??? /*
>>>>>> +???? * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +???? * should have release it.
>>>>>> +???? */
>>>>>> +??? if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +??????? return false;
>>>>>> +
>>>>>> +??? /*
>>>>>> +???? * We don't have to worry about the ordering of src and dst
>>>>>> +???? * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +???? */
>>>>>> +??? old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +??? if (old_ptl) {
>>>>>> +??????? pmd_t pmd;
>>>>>> +
>>>>>> +??????? new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +??????? if (new_ptl != old_ptl)
>>>>>> +??????????? spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +??????? /* Clear the pmd */
>>>>>> +??????? pmd = *old_pmd;
>>>>>> +??????? pmd_clear(old_pmd);
>>>>>> +
>>>>>> +??????? VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +??????? /* Set the new pmd */
>>>>>> +??????? set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov at cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov at cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@kot-begemot.co.uk>
To: openrisc@lists.librecores.org
Subject: [OpenRISC] [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
Date: Fri, 12 Oct 2018 17:58:40 +0100	[thread overview]
Message-ID: <4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk> (raw)
In-Reply-To: <20181012165012.GD223066@joelaf.mtv.corp.google.com>


On 10/12/18 5:50 PM, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote:
>> On 10/12/18 3:48 PM, Anton Ivanov wrote:
>>> On 12/10/2018 15:37, Kirill A. Shutemov wrote:
>>>> On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote:
>>>>> On 10/12/18 2:37 AM, Joel Fernandes (Google) wrote:
>>>>>> Android needs to mremap large regions of memory during
>>>>>> memory management
>>>>>> related operations. The mremap system call can be really
>>>>>> slow if THP is
>>>>>> not enabled. The bottleneck is move_page_tables, which is copying each
>>>>>> pte at a time, and can be really slow across a large map.
>>>>>> Turning on THP
>>>>>> may not be a viable option, and is not for us. This patch
>>>>>> speeds up the
>>>>>> performance for non-THP system by copying at the PMD level
>>>>>> when possible.
>>>>>>
>>>>>> The speed up is three orders of magnitude. On a 1GB mremap, the mremap
>>>>>> completion times drops from 160-250 millesconds to 380-400
>>>>>> microseconds.
>>>>>>
>>>>>> Before:
>>>>>> Total mremap time for 1GB data: 242321014 nanoseconds.
>>>>>> Total mremap time for 1GB data: 196842467 nanoseconds.
>>>>>> Total mremap time for 1GB data: 167051162 nanoseconds.
>>>>>>
>>>>>> After:
>>>>>> Total mremap time for 1GB data: 385781 nanoseconds.
>>>>>> Total mremap time for 1GB data: 388959 nanoseconds.
>>>>>> Total mremap time for 1GB data: 402813 nanoseconds.
>>>>>>
>>>>>> Incase THP is enabled, the optimization is skipped. I also flush the
>>>>>> tlb every time we do this optimization since I couldn't find a way to
>>>>>> determine if the low-level PTEs are dirty. It is seen that the cost of
>>>>>> doing so is not much compared the improvement, on both
>>>>>> x86-64 and arm64.
>>>>>>
>>>>>> Cc: minchan at kernel.org
>>>>>> Cc: pantin at google.com
>>>>>> Cc: hughd at google.com
>>>>>> Cc: lokeshgidra at google.com
>>>>>> Cc: dancol at google.com
>>>>>> Cc: mhocko at kernel.org
>>>>>> Cc: kirill at shutemov.name
>>>>>> Cc: akpm at linux-foundation.org
>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>>>>> ---
>>>>>>     mm/mremap.c | 62
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 9e68a02a52b1..d82c485822ef 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -191,6 +191,54 @@ static void move_ptes(struct
>>>>>> vm_area_struct *vma, pmd_t *old_pmd,
>>>>>>             drop_rmap_locks(vma);
>>>>>>     }
>>>>>> +static bool move_normal_pmd(struct vm_area_struct *vma,
>>>>>> unsigned long old_addr,
>>>>>> +          unsigned long new_addr, unsigned long old_end,
>>>>>> +          pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>>>>>> +{
>>>>>> +    spinlock_t *old_ptl, *new_ptl;
>>>>>> +    struct mm_struct *mm = vma->vm_mm;
>>>>>> +
>>>>>> +    if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
>>>>>> +        || old_end - old_addr < PMD_SIZE)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * The destination pmd shouldn't be established, free_pgtables()
>>>>>> +     * should have release it.
>>>>>> +     */
>>>>>> +    if (WARN_ON(!pmd_none(*new_pmd)))
>>>>>> +        return false;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * We don't have to worry about the ordering of src and dst
>>>>>> +     * ptlocks because exclusive mmap_sem prevents deadlock.
>>>>>> +     */
>>>>>> +    old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>>>>>> +    if (old_ptl) {
>>>>>> +        pmd_t pmd;
>>>>>> +
>>>>>> +        new_ptl = pmd_lockptr(mm, new_pmd);
>>>>>> +        if (new_ptl != old_ptl)
>>>>>> +            spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>>>>> +
>>>>>> +        /* Clear the pmd */
>>>>>> +        pmd = *old_pmd;
>>>>>> +        pmd_clear(old_pmd);
>>>>>> +
>>>>>> +        VM_BUG_ON(!pmd_none(*new_pmd));
>>>>>> +
>>>>>> +        /* Set the new pmd */
>>>>>> +        set_pmd_at(mm, new_addr, new_pmd, pmd);
>>>>> UML does not have set_pmd_at at all
>>>> Every architecture does. :)
>>> I tried to build it patching vs 4.19-rc before I made this statement and
>>> ran into that.
>>>
>>> Presently it does not.
>>>
>>> https://elixir.bootlin.com/linux/v4.19-rc7/ident/set_pmd_at - UML is not
>>> on the list.
>> Once this problem as well as the omissions in the include changes for UML in
>> patch one have been fixed it appears to be working.
>>
>> What it needs is attached.
>>
>>
>>>> But it may come not from the arch code.
>>> There is no generic definition as far as I can see. All 12 defines in
>>> 4.19 are in arch specific code. Unless i am missing something...
>>>
>>>>> If I read the code right, MIPS completely ignores the address
>>>>> argument so
>>>>> set_pmd_at there may not have the effect which this patch is trying to
>>>>> achieve.
>>>> Ignoring address is fine. Most architectures do that..
>>>> The ideas is to move page table to the new pmd slot. It's nothing to do
>>>> with the address passed to set_pmd_at().
>>> If that is it's only function, then I am going to appropriate the code
>>> out of the MIPS tree for further uml testing. It does exactly that -
>>> just move the pmd the new slot.
>>>
>>> A.
>>
>> A.
>>
>>  From ac265d96897a346b05646fce91784ed4922c7f8d Mon Sep 17 00:00:00 2001
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> Date: Fri, 12 Oct 2018 17:24:10 +0100
>> Subject: [PATCH] Incremental fixes to the mmremap patch
>>
>> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> ---
>>   arch/um/include/asm/pgalloc.h | 4 ++--
>>   arch/um/include/asm/pgtable.h | 3 +++
>>   arch/um/kernel/tlb.c          | 6 ++++++
>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
>> index bf90b2aa2002..99eb5682792a 100644
>> --- a/arch/um/include/asm/pgalloc.h
>> +++ b/arch/um/include/asm/pgalloc.h
>> @@ -25,8 +25,8 @@
>>   extern pgd_t *pgd_alloc(struct mm_struct *);
>>   extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
>>   
>> -extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
>> -extern pgtable_t pte_alloc_one(struct mm_struct *, unsigned long);
>> +extern pte_t *pte_alloc_one_kernel(struct mm_struct *);
>> +extern pgtable_t pte_alloc_one(struct mm_struct *);
> If its Ok, let me handle this bit since otherwise it complicates things for
> me.
>
>>   static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>>   {
>> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
>> index 7485398d0737..1692da55e63a 100644
>> --- a/arch/um/include/asm/pgtable.h
>> +++ b/arch/um/include/asm/pgtable.h
>> @@ -359,4 +359,7 @@ do {						\
>>   	__flush_tlb_one((vaddr));		\
>>   } while (0)
>>   
>> +extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd);
>> +
>>   #endif
>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>> index 763d35bdda01..d17b74184ba0 100644
>> --- a/arch/um/kernel/tlb.c
>> +++ b/arch/um/kernel/tlb.c
>> @@ -647,3 +647,9 @@ void force_flush_all(void)
>>   		vma = vma->vm_next;
>>   	}
>>   }
>> +void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> +		pmd_t *pmdp, pmd_t pmd)
>> +{
>> +	*pmdp = pmd;
>> +}
>> +
> I believe this should be included in a separate patch since it is not related
> specifically to pte_alloc argument removal. If you want, I could split it
> into a separate patch for my series with you as author.


Whichever is more convenient for you.

One thing to note - tlb flush is extremely expensive on uml.

I have lifted the definition of set_pmd_at from the mips tree and 
removed the tlb_flush_all from it for this exact reason.

If I read the original patch correctly, it does its own flush control so 
set_pmd_at does not need to do a force flush every time. It is done 
further up the chain.

Brgds,

A.


>
> thanks,
>
> - Joel
>
>

  reply	other threads:[~2018-10-12 16:58 UTC|newest]

Thread overview: 317+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12  1:37 [PATCH v2 1/2] treewide: remove unused address argument from pte_alloc functions Joel Fernandes (Google)
2018-10-12  1:37 ` [OpenRISC] " Joel Fernandes
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` Joel Fernandes (Google)
2018-10-12  1:37 ` [PATCH v2 2/2] mm: speed up mremap by 500x on large regions Joel Fernandes (Google)
2018-10-12  1:37   ` [OpenRISC] " Joel Fernandes
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  1:37   ` Joel Fernandes (Google)
2018-10-12  6:40   ` Anton Ivanov
2018-10-12 11:30   ` Kirill A. Shutemov
2018-10-12 11:30     ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 11:30     ` Kirill A. Shutemov
2018-10-12 11:30     ` Kirill A. Shutemov
2018-10-12 11:30     ` Kirill A. Shutemov
2018-10-12 11:30     ` Kirill A. Shutemov
2018-10-12 11:30     ` Kirill A. Shutemov
2018-10-12 11:30     ` Kirill A. Shutemov
2018-10-12 11:36     ` Kirill A. Shutemov
2018-10-12 11:36       ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 11:36       ` Kirill A. Shutemov
2018-10-12 11:36       ` Kirill A. Shutemov
2018-10-12 11:36       ` Kirill A. Shutemov
2018-10-12 11:36       ` Kirill A. Shutemov
2018-10-12 11:36       ` Kirill A. Shutemov
2018-10-12 11:36       ` Kirill A. Shutemov
2018-10-12 12:50     ` Joel Fernandes
2018-10-12 12:50       ` [OpenRISC] " Joel Fernandes
2018-10-12 12:50       ` Joel Fernandes
2018-10-12 12:50       ` Joel Fernandes
2018-10-12 12:50       ` Joel Fernandes
2018-10-12 12:50       ` Joel Fernandes
2018-10-12 12:50       ` Joel Fernandes
2018-10-12 12:50       ` Joel Fernandes
2018-10-12 13:19       ` Kirill A. Shutemov
2018-10-12 13:19         ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 13:19         ` Kirill A. Shutemov
2018-10-12 13:19         ` Kirill A. Shutemov
2018-10-12 13:19         ` Kirill A. Shutemov
2018-10-12 13:19         ` Kirill A. Shutemov
2018-10-12 13:19         ` Kirill A. Shutemov
2018-10-12 13:19         ` Kirill A. Shutemov
2018-10-12 16:57         ` Joel Fernandes
2018-10-12 16:57           ` [OpenRISC] " Joel Fernandes
2018-10-12 16:57           ` Joel Fernandes
2018-10-12 16:57           ` Joel Fernandes
2018-10-12 16:57           ` Joel Fernandes
2018-10-12 16:57           ` Joel Fernandes
2018-10-12 16:57           ` Joel Fernandes
2018-10-12 16:57           ` Joel Fernandes
2018-10-12 21:33           ` Kirill A. Shutemov
2018-10-12 21:33             ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 21:33             ` Kirill A. Shutemov
2018-10-12 21:33             ` Kirill A. Shutemov
2018-10-12 21:33             ` Kirill A. Shutemov
2018-10-12 21:33             ` Kirill A. Shutemov
2018-10-12 21:33             ` Kirill A. Shutemov
2018-10-12 21:33             ` Kirill A. Shutemov
2018-10-12 18:18       ` David Miller
2018-10-12 18:18         ` [OpenRISC] " David Miller
2018-10-12 18:18         ` David Miller
2018-10-12 18:18         ` David Miller
2018-10-12 18:18         ` David Miller
2018-10-12 18:18         ` David Miller
2018-10-12 18:18         ` David Miller
2018-10-12 18:18         ` David Miller
2018-10-13  1:35         ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:35           ` Joel Fernandes
2018-10-13  1:39           ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:39             ` Daniel Colascione
2018-10-13  1:44             ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:44               ` Joel Fernandes
2018-10-13  1:54               ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  1:54                 ` Daniel Colascione
2018-10-13  2:10                 ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:10                   ` Joel Fernandes
2018-10-13  2:25                   ` Daniel Colascione
2018-10-13  2:25                     ` Daniel Colascione
2018-10-13  2:25                     ` Daniel Colascione
2018-10-13  2:25                     ` Daniel Colascione
2018-10-13  2:25                     ` Daniel Colascione
2018-10-13  2:25                     ` Daniel Colascione
2018-10-13  2:25                     ` Daniel Colascione
2018-10-13 17:50                     ` Joel Fernandes
2018-10-13 17:50                       ` Joel Fernandes
2018-10-13 17:50                       ` Joel Fernandes
2018-10-13 17:50                       ` Joel Fernandes
2018-10-13 17:50                       ` Joel Fernandes
2018-10-13 17:50                       ` Joel Fernandes
2018-10-13 17:50                       ` Joel Fernandes
2018-10-12 18:02     ` David Miller
2018-10-12 18:02       ` [OpenRISC] " David Miller
2018-10-12 18:02       ` David Miller
2018-10-12 18:02       ` David Miller
2018-10-12 18:02       ` David Miller
2018-10-12 18:02       ` David Miller
2018-10-12 18:02       ` David Miller
2018-10-12 18:02       ` David Miller
2018-10-12 14:09   ` Anton Ivanov
2018-10-12 14:09     ` [OpenRISC] " Anton Ivanov
2018-10-12 14:09     ` Anton Ivanov
2018-10-12 14:09     ` Anton Ivanov
2018-10-12 14:09     ` Anton Ivanov
2018-10-12 14:09     ` Anton Ivanov
2018-10-12 14:09     ` Anton Ivanov
2018-10-12 14:09     ` Anton Ivanov
2018-10-12 14:37     ` Kirill A. Shutemov
2018-10-12 14:37       ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 14:37       ` Kirill A. Shutemov
2018-10-12 14:37       ` Kirill A. Shutemov
2018-10-12 14:37       ` Kirill A. Shutemov
2018-10-12 14:37       ` Kirill A. Shutemov
2018-10-12 14:37       ` Kirill A. Shutemov
2018-10-12 14:37       ` Kirill A. Shutemov
2018-10-12 14:48       ` Anton Ivanov
2018-10-12 14:48         ` [OpenRISC] " Anton Ivanov
2018-10-12 14:48         ` Anton Ivanov
2018-10-12 14:48         ` Anton Ivanov
2018-10-12 14:48         ` Anton Ivanov
2018-10-12 14:48         ` Anton Ivanov
2018-10-12 14:48         ` Anton Ivanov
2018-10-12 14:48         ` Anton Ivanov
2018-10-12 16:42         ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` [OpenRISC] " Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:42           ` Anton Ivanov
2018-10-12 16:50           ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` [OpenRISC] " Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:50             ` Joel Fernandes
2018-10-12 16:58             ` Anton Ivanov [this message]
2018-10-12 16:58               ` [OpenRISC] " Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 16:58               ` Anton Ivanov
2018-10-12 17:06               ` Joel Fernandes
2018-10-12 17:06                 ` [OpenRISC] " Joel Fernandes
2018-10-12 17:06                 ` Joel Fernandes
2018-10-12 17:06                 ` Joel Fernandes
2018-10-12 17:06                 ` Joel Fernandes
2018-10-12 17:06                 ` Joel Fernandes
2018-10-12 17:06                 ` Joel Fernandes
2018-10-12 17:06                 ` Joel Fernandes
2018-10-12 21:40           ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-12 21:40             ` Kirill A. Shutemov
2018-10-13  6:10             ` Anton Ivanov
2018-10-13  6:10               ` [OpenRISC] " Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-13  6:10               ` Anton Ivanov
2018-10-15  7:10   ` Christian Borntraeger
2018-10-15  7:10     ` [OpenRISC] " Christian Borntraeger
2018-10-15  7:10     ` Christian Borntraeger
2018-10-15  7:10     ` Christian Borntraeger
2018-10-15  7:10     ` Christian Borntraeger
2018-10-15  7:10     ` Christian Borntraeger
2018-10-15  7:10     ` Christian Borntraeger
2018-10-15  7:10     ` Christian Borntraeger
2018-10-15  8:18     ` Martin Schwidefsky
2018-10-15  8:18       ` [OpenRISC] " Martin Schwidefsky
2018-10-15  8:18       ` Martin Schwidefsky
2018-10-15  8:18       ` Martin Schwidefsky
2018-10-15  8:18       ` Martin Schwidefsky
2018-10-15  8:18       ` Martin Schwidefsky
2018-10-15  8:18       ` Martin Schwidefsky
2018-10-15  8:18       ` Martin Schwidefsky
2018-10-16  2:08       ` Joel Fernandes
2018-10-16  2:08         ` [OpenRISC] " Joel Fernandes
2018-10-16  2:08         ` Joel Fernandes
2018-10-16  2:08         ` Joel Fernandes
2018-10-16  2:08         ` Joel Fernandes
2018-10-16  2:08         ` Joel Fernandes
2018-10-16  2:08         ` Joel Fernandes
2018-10-16  2:08         ` Joel Fernandes
2018-10-12 11:09 ` [PATCH v2 1/2] treewide: remove unused address argument from pte_alloc functions Kirill A. Shutemov
2018-10-12 11:09   ` [OpenRISC] " Kirill A. Shutemov
2018-10-12 11:09   ` Kirill A. Shutemov
2018-10-12 11:09   ` Kirill A. Shutemov
2018-10-12 11:09   ` Kirill A. Shutemov
2018-10-12 11:09   ` Kirill A. Shutemov
2018-10-12 11:09   ` Kirill A. Shutemov
2018-10-12 11:09   ` Kirill A. Shutemov
2018-10-12 16:37   ` Joel Fernandes
2018-10-12 16:37     ` [OpenRISC] " Joel Fernandes
2018-10-12 16:37     ` Joel Fernandes
2018-10-12 16:37     ` Joel Fernandes
2018-10-12 16:37     ` Joel Fernandes
2018-10-12 16:37     ` Joel Fernandes
2018-10-12 16:37     ` Joel Fernandes
2018-10-12 16:37     ` Joel Fernandes
2018-10-12 13:56 ` Anton Ivanov
2018-10-12 13:56   ` [OpenRISC] " Anton Ivanov
2018-10-12 13:56   ` Anton Ivanov
2018-10-12 13:56   ` Anton Ivanov
2018-10-12 13:56   ` Anton Ivanov
2018-10-12 13:56   ` Anton Ivanov
2018-10-12 13:56   ` Anton Ivanov
2018-10-12 13:56   ` Anton Ivanov
2018-10-12 16:34   ` Joel Fernandes
2018-10-12 16:34     ` [OpenRISC] " Joel Fernandes
2018-10-12 16:34     ` Joel Fernandes
2018-10-12 16:34     ` Joel Fernandes
2018-10-12 16:34     ` Joel Fernandes
2018-10-12 16:34     ` Joel Fernandes
2018-10-12 16:34     ` Joel Fernandes
2018-10-12 16:34     ` Joel Fernandes
2018-10-12 16:38     ` Julia Lawall
2018-10-12 16:38       ` [OpenRISC] " Julia Lawall
2018-10-12 16:38       ` Julia Lawall
2018-10-12 16:38       ` Julia Lawall
2018-10-12 16:38       ` Julia Lawall
2018-10-12 16:38       ` Julia Lawall
2018-10-12 16:38       ` Julia Lawall
2018-10-12 16:38       ` Julia Lawall
2018-10-12 16:46       ` Joel Fernandes
2018-10-12 16:46         ` [OpenRISC] " Joel Fernandes
2018-10-12 16:46         ` Joel Fernandes
2018-10-12 16:46         ` Joel Fernandes
2018-10-12 16:46         ` Joel Fernandes
2018-10-12 16:46         ` Joel Fernandes
2018-10-12 16:46         ` Joel Fernandes
2018-10-12 16:46         ` Joel Fernandes
2018-10-12 18:51 ` SF Markus Elfring
2018-10-12 18:51   ` [OpenRISC] " SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 18:51   ` SF Markus Elfring
2018-10-12 19:42   ` Joel Fernandes
2018-10-12 19:42     ` [OpenRISC] " Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-12 19:42     ` Joel Fernandes
2018-10-13  9:22     ` SF Markus Elfring
2018-10-13  9:22       ` [OpenRISC] " SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring
2018-10-13  9:22       ` SF Markus Elfring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f969958-913e-cb9f-48fb-e3a88e1d288c@kot-begemot.co.uk \
    --to=anton.ivanov@kot-begemot.co.uk \
    --cc=catalin.marinas@arm.com \
    --cc=dalias@libc.org \
    --cc=dancol@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=deller@gmx.de \
    --cc=elfring@users.sourceforge.net \
    --cc=hughd@google.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=jejb@parisc-linux.org \
    --cc=joel@joelfernandes.org \
    --cc=jonas@southpole.se \
    --cc=kasan-dev@googlegroups.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=lokeshgidra@google.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.