From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Martin Subject: [PATCH v5 16/30] arm64/sve: Backend logic for setting the vector length Date: Tue, 31 Oct 2017 15:51:08 +0000 Message-ID: <1509465082-30427-17-git-send-email-Dave.Martin@arm.com> References: <1509465082-30427-1-git-send-email-Dave.Martin@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1509465082-30427-1-git-send-email-Dave.Martin@arm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org, Okamoto Takayuki , libc-alpha@sourceware.org, Ard Biesheuvel , Szabolcs Nagy , Catalin Marinas , Will Deacon , kvmarm@lists.cs.columbia.edu List-Id: linux-arch.vger.kernel.org VGhpcyBwYXRjaCBpbXBsZW1lbnRzIHRoZSBjb3JlIGxvZ2ljIGZvciBjaGFuZ2luZyBhIHRhc2sn cyB2ZWN0b3IKbGVuZ3RoIG9uIHJlcXVlc3QgZnJvbSB1c2Vyc3BhY2UuICBUaGlzIHdpbGwgYmUg dXNlZCBieSB0aGUgcHRyYWNlCmFuZCBwcmN0bCBmcm9udGVuZHMgdGhhdCBhcmUgaW1wbGVtZW50 ZWQgaW4gbGF0ZXIgcGF0Y2hlcy4KClRoZSBTVkUgYXJjaGl0ZWN0dXJlIHBlcm1pdHMsIGJ1dCBk b2VzIG5vdCByZXF1aXJlLCBpbXBsZW1lbnRhdGlvbnMKdG8gc3VwcG9ydCB2ZWN0b3IgbGVuZ3Ro cyB0aGF0IGFyZSBub3QgYSBwb3dlciBvZiB0d28uICBUbyBoYW5kbGUKdGhpcywgbG9naWMgaXMg YWRkZWQgdG8gY2hlY2sgYSByZXF1ZXN0ZWQgdmVjdG9yIGxlbmd0aCBhZ2FpbnN0IGEKcG9zc2li bHkgc3BhcnNlIGJpdG1hcCBvZiBhdmFpbGFibGUgdmVjdG9yIGxlbmd0aHMgYXQgcnVudGltZSwg c28KdGhhdCB0aGUgYmVzdCBzdXBwb3J0ZWQgdmFsdWUgY2FuIGJlIGNob3Nlbi4KClNpZ25lZC1v ZmYtYnk6IERhdmUgTWFydGluIDxEYXZlLk1hcnRpbkBhcm0uY29tPgpSZXZpZXdlZC1ieTogQ2F0 YWxpbiBNYXJpbmFzIDxjYXRhbGluLm1hcmluYXNAYXJtLmNvbT4KQ2M6IEFsZXggQmVubsOpZSA8 YWxleC5iZW5uZWVAbGluYXJvLm9yZz4KLS0tCiBhcmNoL2FybTY0L2luY2x1ZGUvYXNtL2Zwc2lt ZC5oIHwgICA4ICsrKwogYXJjaC9hcm02NC9rZXJuZWwvZnBzaW1kLmMgICAgICB8IDEzNyArKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystCiBpbmNsdWRlL3VhcGkvbGludXgv cHJjdGwuaCAgICAgIHwgICA1ICsrCiAzIGZpbGVzIGNoYW5nZWQsIDE0OSBpbnNlcnRpb25zKCsp LCAxIGRlbGV0aW9uKC0pCgpkaWZmIC0tZ2l0IGEvYXJjaC9hcm02NC9pbmNsdWRlL2FzbS9mcHNp bWQuaCBiL2FyY2gvYXJtNjQvaW5jbHVkZS9hc20vZnBzaW1kLmgKaW5kZXggOWJiZDc0Yy4uODZm NTUwYyAxMDA2NDQKLS0tIGEvYXJjaC9hcm02NC9pbmNsdWRlL2FzbS9mcHNpbWQuaAorKysgYi9h cmNoL2FybTY0L2luY2x1ZGUvYXNtL2Zwc2ltZC5oCkBAIC0yMCw2ICsyMCw3IEBACiAKICNpZm5k ZWYgX19BU1NFTUJMWV9fCiAKKyNpbmNsdWRlIDxsaW51eC9jYWNoZS5oPgogI2luY2x1ZGUgPGxp bnV4L3N0ZGRlZi5oPgogCiAvKgpAQCAtNzAsMTcgKzcxLDI0IEBAIGV4dGVybiB2b2lkIGZwc2lt ZF91cGRhdGVfY3VycmVudF9zdGF0ZShzdHJ1Y3QgZnBzaW1kX3N0YXRlICpzdGF0ZSk7CiAKIGV4 dGVybiB2b2lkIGZwc2ltZF9mbHVzaF90YXNrX3N0YXRlKHN0cnVjdCB0YXNrX3N0cnVjdCAqdGFy Z2V0KTsKIAorLyogTWF4aW11bSBWTCB0aGF0IFNWRSBWTC1hZ25vc3RpYyBzb2Z0d2FyZSBjYW4g dHJhbnNwYXJlbnRseSBzdXBwb3J0ICovCisjZGVmaW5lIFNWRV9WTF9BUkNIX01BWCAweDEwMAor CiBleHRlcm4gdm9pZCBzdmVfc2F2ZV9zdGF0ZSh2b2lkICpzdGF0ZSwgdTMyICpwZnBzcik7CiBl eHRlcm4gdm9pZCBzdmVfbG9hZF9zdGF0ZSh2b2lkIGNvbnN0ICpzdGF0ZSwgdTMyIGNvbnN0ICpw ZnBzciwKIAkJCSAgIHVuc2lnbmVkIGxvbmcgdnFfbWludXNfMSk7CiBleHRlcm4gdW5zaWduZWQg aW50IHN2ZV9nZXRfdmwodm9pZCk7CiAKK2V4dGVybiBpbnQgX19yb19hZnRlcl9pbml0IHN2ZV9t YXhfdmw7CisKICNpZmRlZiBDT05GSUdfQVJNNjRfU1ZFCiAKIGV4dGVybiBzaXplX3Qgc3ZlX3N0 YXRlX3NpemUoc3RydWN0IHRhc2tfc3RydWN0IGNvbnN0ICp0YXNrKTsKIAogZXh0ZXJuIHZvaWQg c3ZlX2FsbG9jKHN0cnVjdCB0YXNrX3N0cnVjdCAqdGFzayk7CiBleHRlcm4gdm9pZCBmcHNpbWRf cmVsZWFzZV90YXNrKHN0cnVjdCB0YXNrX3N0cnVjdCAqdGFzayk7CitleHRlcm4gaW50IHN2ZV9z ZXRfdmVjdG9yX2xlbmd0aChzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRhc2ssCisJCQkJIHVuc2lnbmVk IGxvbmcgdmwsIHVuc2lnbmVkIGxvbmcgZmxhZ3MpOwogCiAjZWxzZSAvKiAhIENPTkZJR19BUk02 NF9TVkUgKi8KIApkaWZmIC0tZ2l0IGEvYXJjaC9hcm02NC9rZXJuZWwvZnBzaW1kLmMgYi9hcmNo L2FybTY0L2tlcm5lbC9mcHNpbWQuYwppbmRleCBlMGI1ZWY1Li4xY2ViMDY5IDEwMDY0NAotLS0g YS9hcmNoL2FybTY0L2tlcm5lbC9mcHNpbWQuYworKysgYi9hcmNoL2FybTY0L2tlcm5lbC9mcHNp bWQuYwpAQCAtMTcsOCArMTcsMTAgQEAKICAqIGFsb25nIHdpdGggdGhpcyBwcm9ncmFtLiAgSWYg bm90LCBzZWUgPGh0dHA6Ly93d3cuZ251Lm9yZy9saWNlbnNlcy8+LgogICovCiAKKyNpbmNsdWRl IDxsaW51eC9iaXRtYXAuaD4KICNpbmNsdWRlIDxsaW51eC9ib3R0b21faGFsZi5oPgogI2luY2x1 ZGUgPGxpbnV4L2J1Zy5oPgorI2luY2x1ZGUgPGxpbnV4L2NhY2hlLmg+CiAjaW5jbHVkZSA8bGlu dXgvY29tcGF0Lmg+CiAjaW5jbHVkZSA8bGludXgvY3B1Lmg+CiAjaW5jbHVkZSA8bGludXgvY3B1 X3BtLmg+CkBAIC0yOCw2ICszMCw3IEBACiAjaW5jbHVkZSA8bGludXgvaW5pdC5oPgogI2luY2x1 ZGUgPGxpbnV4L3BlcmNwdS5oPgogI2luY2x1ZGUgPGxpbnV4L3ByZWVtcHQuaD4KKyNpbmNsdWRl IDxsaW51eC9wcmN0bC5oPgogI2luY2x1ZGUgPGxpbnV4L3B0cmFjZS5oPgogI2luY2x1ZGUgPGxp bnV4L3NjaGVkL3NpZ25hbC5oPgogI2luY2x1ZGUgPGxpbnV4L3NpZ25hbC5oPgpAQCAtMTEzLDYg KzExNiwyMCBAQCBzdGF0aWMgREVGSU5FX1BFUl9DUFUoc3RydWN0IGZwc2ltZF9zdGF0ZSAqLCBm cHNpbWRfbGFzdF9zdGF0ZSk7CiAvKiBEZWZhdWx0IFZMIGZvciB0YXNrcyB0aGF0IGRvbid0IHNl dCBpdCBleHBsaWNpdGx5OiAqLwogc3RhdGljIGludCBzdmVfZGVmYXVsdF92bCA9IFNWRV9WTF9N SU47CiAKKyNpZmRlZiBDT05GSUdfQVJNNjRfU1ZFCisKKy8qIE1heGltdW0gc3VwcG9ydGVkIHZl Y3RvciBsZW5ndGggYWNyb3NzIGFsbCBDUFVzIChpbml0aWFsbHkgcG9pc29uZWQpICovCitpbnQg X19yb19hZnRlcl9pbml0IHN2ZV9tYXhfdmwgPSAtMTsKKy8qIFNldCBvZiBhdmFpbGFibGUgdmVj dG9yIGxlbmd0aHMsIGFzIHZxX3RvX2JpdCh2cSk6ICovCitzdGF0aWMgREVDTEFSRV9CSVRNQVAo c3ZlX3ZxX21hcCwgU1ZFX1ZRX01BWCk7CisKKyNlbHNlIC8qICEgQ09ORklHX0FSTTY0X1NWRSAq LworCisvKiBEdW1teSBkZWNsYXJhdGlvbiBmb3IgY29kZSB0aGF0IHdpbGwgYmUgb3B0aW1pc2Vk IG91dDogKi8KK2V4dGVybiBERUNMQVJFX0JJVE1BUChzdmVfdnFfbWFwLCBTVkVfVlFfTUFYKTsK KworI2VuZGlmIC8qICEgQ09ORklHX0FSTTY0X1NWRSAqLworCiAvKgogICogQ2FsbCBfX3N2ZV9m cmVlKCkgZGlyZWN0bHkgb25seSBpZiB5b3Uga25vdyB0YXNrIGNhbid0IGJlIHNjaGVkdWxlZAog ICogb3IgcHJlZW1wdGVkLgpAQCAtMjcwLDYgKzI4Nyw1MCBAQCBzdGF0aWMgdm9pZCB0YXNrX2Zw c2ltZF9zYXZlKHZvaWQpCiAJfQogfQogCisvKgorICogSGVscGVycyB0byB0cmFuc2xhdGUgYml0 IGluZGljZXMgaW4gc3ZlX3ZxX21hcCB0byBWUSB2YWx1ZXMgKGFuZAorICogdmljZSB2ZXJzYSku ICBUaGlzIGFsbG93cyBmaW5kX25leHRfYml0KCkgdG8gYmUgdXNlZCB0byBmaW5kIHRoZQorICog X21heGltdW1fIFZRIG5vdCBleGNlZWRpbmcgYSBjZXJ0YWluIHZhbHVlLgorICovCisKK3N0YXRp YyB1bnNpZ25lZCBpbnQgdnFfdG9fYml0KHVuc2lnbmVkIGludCB2cSkKK3sKKwlyZXR1cm4gU1ZF X1ZRX01BWCAtIHZxOworfQorCitzdGF0aWMgdW5zaWduZWQgaW50IGJpdF90b192cSh1bnNpZ25l ZCBpbnQgYml0KQoreworCWlmIChXQVJOX09OKGJpdCA+PSBTVkVfVlFfTUFYKSkKKwkJYml0ID0g U1ZFX1ZRX01BWCAtIDE7CisKKwlyZXR1cm4gU1ZFX1ZRX01BWCAtIGJpdDsKK30KKworLyoKKyAq IEFsbCB2ZWN0b3IgbGVuZ3RoIHNlbGVjdGlvbiBmcm9tIHVzZXJzcGFjZSBjb21lcyB0aHJvdWdo IGhlcmUuCisgKiBXZSdyZSBvbiBhIHNsb3cgcGF0aCwgc28gc29tZSBzYW5pdHktY2hlY2tzIGFy ZSBpbmNsdWRlZC4KKyAqIElmIHRoaW5ncyBnbyB3cm9uZyB0aGVyZSdzIGEgYnVnIHNvbWV3aGVy ZSwgYnV0IHRyeSB0byBmYWxsIGJhY2sgdG8gYQorICogc2FmZSBjaG9pY2UuCisgKi8KK3N0YXRp YyB1bnNpZ25lZCBpbnQgZmluZF9zdXBwb3J0ZWRfdmVjdG9yX2xlbmd0aCh1bnNpZ25lZCBpbnQg dmwpCit7CisJaW50IGJpdDsKKwlpbnQgbWF4X3ZsID0gc3ZlX21heF92bDsKKworCWlmIChXQVJO X09OKCFzdmVfdmxfdmFsaWQodmwpKSkKKwkJdmwgPSBTVkVfVkxfTUlOOworCisJaWYgKFdBUk5f T04oIXN2ZV92bF92YWxpZChtYXhfdmwpKSkKKwkJbWF4X3ZsID0gU1ZFX1ZMX01JTjsKKworCWlm ICh2bCA+IG1heF92bCkKKwkJdmwgPSBtYXhfdmw7CisKKwliaXQgPSBmaW5kX25leHRfYml0KHN2 ZV92cV9tYXAsIFNWRV9WUV9NQVgsCisJCQkgICAgdnFfdG9fYml0KHN2ZV92cV9mcm9tX3ZsKHZs KSkpOworCXJldHVybiBzdmVfdmxfZnJvbV92cShiaXRfdG9fdnEoYml0KSk7Cit9CisKICNkZWZp bmUgWlJFRyhzdmVfc3RhdGUsIHZxLCBuKSAoKGNoYXIgKikoc3ZlX3N0YXRlKSArCQlcCiAJKFNW RV9TSUdfWlJFR19PRkZTRVQodnEsIG4pIC0gU1ZFX1NJR19SRUdTX09GRlNFVCkpCiAKQEAgLTM2 NCw2ICs0MjUsNzYgQEAgdm9pZCBzdmVfYWxsb2Moc3RydWN0IHRhc2tfc3RydWN0ICp0YXNrKQog CUJVR19PTighdGFzay0+dGhyZWFkLnN2ZV9zdGF0ZSk7CiB9CiAKK2ludCBzdmVfc2V0X3ZlY3Rv cl9sZW5ndGgoc3RydWN0IHRhc2tfc3RydWN0ICp0YXNrLAorCQkJICB1bnNpZ25lZCBsb25nIHZs LCB1bnNpZ25lZCBsb25nIGZsYWdzKQoreworCWlmIChmbGFncyAmIH4odW5zaWduZWQgbG9uZyko UFJfU1ZFX1ZMX0lOSEVSSVQgfAorCQkJCSAgICAgUFJfU1ZFX1NFVF9WTF9PTkVYRUMpKQorCQly ZXR1cm4gLUVJTlZBTDsKKworCWlmICghc3ZlX3ZsX3ZhbGlkKHZsKSkKKwkJcmV0dXJuIC1FSU5W QUw7CisKKwkvKgorCSAqIENsYW1wIHRvIHRoZSBtYXhpbXVtIHZlY3RvciBsZW5ndGggdGhhdCBW TC1hZ25vc3RpYyBTVkUgY29kZSBjYW4KKwkgKiB3b3JrIHdpdGguICBBIGZsYWcgbWF5IGJlIGFz c2lnbmVkIGluIHRoZSBmdXR1cmUgdG8gYWxsb3cgc2V0dGluZworCSAqIG9mIGxhcmdlciB2ZWN0 b3IgbGVuZ3RocyB3aXRob3V0IGNvbmZ1c2luZyBvbGRlciBzb2Z0d2FyZS4KKwkgKi8KKwlpZiAo dmwgPiBTVkVfVkxfQVJDSF9NQVgpCisJCXZsID0gU1ZFX1ZMX0FSQ0hfTUFYOworCisJdmwgPSBm aW5kX3N1cHBvcnRlZF92ZWN0b3JfbGVuZ3RoKHZsKTsKKworCWlmIChmbGFncyAmIChQUl9TVkVf VkxfSU5IRVJJVCB8CisJCSAgICAgUFJfU1ZFX1NFVF9WTF9PTkVYRUMpKQorCQl0YXNrLT50aHJl YWQuc3ZlX3ZsX29uZXhlYyA9IHZsOworCWVsc2UKKwkJLyogUmVzZXQgVkwgdG8gc3lzdGVtIGRl ZmF1bHQgb24gbmV4dCBleGVjOiAqLworCQl0YXNrLT50aHJlYWQuc3ZlX3ZsX29uZXhlYyA9IDA7 CisKKwkvKiBPbmx5IGFjdHVhbGx5IHNldCB0aGUgVkwgaWYgbm90IGRlZmVycmVkOiAqLworCWlm IChmbGFncyAmIFBSX1NWRV9TRVRfVkxfT05FWEVDKQorCQlnb3RvIG91dDsKKworCWlmICh2bCA9 PSB0YXNrLT50aHJlYWQuc3ZlX3ZsKQorCQlnb3RvIG91dDsKKworCS8qCisJICogVG8gZW5zdXJl IHRoZSBGUFNJTUQgYml0cyBvZiB0aGUgU1ZFIHZlY3RvciByZWdpc3RlcnMgYXJlIHByZXNlcnZl ZCwKKwkgKiB3cml0ZSBhbnkgbGl2ZSByZWdpc3RlciBzdGF0ZSBiYWNrIHRvIHRhc2tfc3RydWN0 LCBhbmQgY29udmVydCB0byBhCisJICogbm9uLVNWRSB0aHJlYWQuCisJICovCisJaWYgKHRhc2sg PT0gY3VycmVudCkgeworCQlsb2NhbF9iaF9kaXNhYmxlKCk7CisKKwkJdGFza19mcHNpbWRfc2F2 ZSgpOworCQlzZXRfdGhyZWFkX2ZsYWcoVElGX0ZPUkVJR05fRlBTVEFURSk7CisJfQorCisJZnBz aW1kX2ZsdXNoX3Rhc2tfc3RhdGUodGFzayk7CisJaWYgKHRlc3RfYW5kX2NsZWFyX3Rza190aHJl YWRfZmxhZyh0YXNrLCBUSUZfU1ZFKSkKKwkJc3ZlX3RvX2Zwc2ltZCh0YXNrKTsKKworCWlmICh0 YXNrID09IGN1cnJlbnQpCisJCWxvY2FsX2JoX2VuYWJsZSgpOworCisJLyoKKwkgKiBGb3JjZSBy ZWFsbG9jYXRpb24gb2YgdGFzayBTVkUgc3RhdGUgdG8gdGhlIGNvcnJlY3Qgc2l6ZQorCSAqIG9u IG5leHQgdXNlOgorCSAqLworCXN2ZV9mcmVlKHRhc2spOworCisJdGFzay0+dGhyZWFkLnN2ZV92 bCA9IHZsOworCitvdXQ6CisJaWYgKGZsYWdzICYgUFJfU1ZFX1ZMX0lOSEVSSVQpCisJCXNldF90 c2tfdGhyZWFkX2ZsYWcodGFzaywgVElGX1NWRV9WTF9JTkhFUklUKTsKKwllbHNlCisJCWNsZWFy X3Rza190aHJlYWRfZmxhZyh0YXNrLCBUSUZfU1ZFX1ZMX0lOSEVSSVQpOworCisJcmV0dXJuIDA7 Cit9CisKIC8qCiAgKiBDYWxsZWQgZnJvbSB0aGUgcHV0X3Rhc2tfc3RydWN0KCkgcGF0aCwgd2hp Y2ggY2Fubm90IGdldCBoZXJlCiAgKiB1bmxlc3MgZGVhZF90YXNrIGlzIHJlYWxseSBkZWFkIGFu ZCBub3Qgc2NoZWR1bGFibGUuCkBAIC00ODAsNyArNjExLDcgQEAgdm9pZCBmcHNpbWRfdGhyZWFk X3N3aXRjaChzdHJ1Y3QgdGFza19zdHJ1Y3QgKm5leHQpCiAKIHZvaWQgZnBzaW1kX2ZsdXNoX3Ro cmVhZCh2b2lkKQogewotCWludCB2bDsKKwlpbnQgdmwsIHN1cHBvcnRlZF92bDsKIAogCWlmICgh c3lzdGVtX3N1cHBvcnRzX2Zwc2ltZCgpKQogCQlyZXR1cm47CkBAIC01MDgsNiArNjM5LDEwIEBA IHZvaWQgZnBzaW1kX2ZsdXNoX3RocmVhZCh2b2lkKQogCQlpZiAoV0FSTl9PTighc3ZlX3ZsX3Zh bGlkKHZsKSkpCiAJCQl2bCA9IFNWRV9WTF9NSU47CiAKKwkJc3VwcG9ydGVkX3ZsID0gZmluZF9z dXBwb3J0ZWRfdmVjdG9yX2xlbmd0aCh2bCk7CisJCWlmIChXQVJOX09OKHN1cHBvcnRlZF92bCAh PSB2bCkpCisJCQl2bCA9IHN1cHBvcnRlZF92bDsKKwogCQljdXJyZW50LT50aHJlYWQuc3ZlX3Zs ID0gdmw7CiAKIAkJLyoKZGlmZiAtLWdpdCBhL2luY2x1ZGUvdWFwaS9saW51eC9wcmN0bC5oIGIv aW5jbHVkZS91YXBpL2xpbnV4L3ByY3RsLmgKaW5kZXggYThkMDc1OS4uMWI2NDkwMSAxMDA2NDQK LS0tIGEvaW5jbHVkZS91YXBpL2xpbnV4L3ByY3RsLmgKKysrIGIvaW5jbHVkZS91YXBpL2xpbnV4 L3ByY3RsLmgKQEAgLTE5Nyw0ICsxOTcsOSBAQCBzdHJ1Y3QgcHJjdGxfbW1fbWFwIHsKICMgZGVm aW5lIFBSX0NBUF9BTUJJRU5UX0xPV0VSCQkzCiAjIGRlZmluZSBQUl9DQVBfQU1CSUVOVF9DTEVB Ul9BTEwJNAogCisvKiBhcm02NCBTY2FsYWJsZSBWZWN0b3IgRXh0ZW5zaW9uIGNvbnRyb2xzICov CisjIGRlZmluZSBQUl9TVkVfU0VUX1ZMX09ORVhFQwkJKDEgPDwgMTgpIC8qIGRlZmVyIGVmZmVj dCB1bnRpbCBleGVjICovCisjIGRlZmluZSBQUl9TVkVfVkxfTEVOX01BU0sJCTB4ZmZmZgorIyBk ZWZpbmUgUFJfU1ZFX1ZMX0lOSEVSSVQJCSgxIDw8IDE3KSAvKiBpbmhlcml0IGFjcm9zcyBleGVj ICovCisKICNlbmRpZiAvKiBfTElOVVhfUFJDVExfSCAqLwotLSAKMi4xLjQKCl9fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmt2bWFybSBtYWlsaW5nIGxpc3QK a3ZtYXJtQGxpc3RzLmNzLmNvbHVtYmlhLmVkdQpodHRwczovL2xpc3RzLmNzLmNvbHVtYmlhLmVk dS9tYWlsbWFuL2xpc3RpbmZvL2t2bWFybQo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:37740 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753795AbdJaPwC (ORCPT ); Tue, 31 Oct 2017 11:52:02 -0400 From: Dave Martin Subject: [PATCH v5 16/30] arm64/sve: Backend logic for setting the vector length Date: Tue, 31 Oct 2017 15:51:08 +0000 Message-ID: <1509465082-30427-17-git-send-email-Dave.Martin@arm.com> In-Reply-To: <1509465082-30427-1-git-send-email-Dave.Martin@arm.com> References: <1509465082-30427-1-git-send-email-Dave.Martin@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Szabolcs Nagy , Okamoto Takayuki , kvmarm@lists.cs.columbia.edu, libc-alpha@sourceware.org, linux-arch@vger.kernel.org Message-ID: <20171031155108.njzOpwoUlXW-WaaiZdJEcMKZsNJrJEstHN9HxyJixnE@z> This patch implements the core logic for changing a task's vector length on request from userspace. This will be used by the ptrace and prctl frontends that are implemented in later patches. The SVE architecture permits, but does not require, implementations to support vector lengths that are not a power of two. To handle this, logic is added to check a requested vector length against a possibly sparse bitmap of available vector lengths at runtime, so that the best supported value can be chosen. Signed-off-by: Dave Martin Reviewed-by: Catalin Marinas Cc: Alex Bennée --- arch/arm64/include/asm/fpsimd.h | 8 +++ arch/arm64/kernel/fpsimd.c | 137 +++++++++++++++++++++++++++++++++++++++- include/uapi/linux/prctl.h | 5 ++ 3 files changed, 149 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 9bbd74c..86f550c 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -20,6 +20,7 @@ #ifndef __ASSEMBLY__ +#include #include /* @@ -70,17 +71,24 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state); extern void fpsimd_flush_task_state(struct task_struct *target); +/* Maximum VL that SVE VL-agnostic software can transparently support */ +#define SVE_VL_ARCH_MAX 0x100 + extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); +extern int __ro_after_init sve_max_vl; + #ifdef CONFIG_ARM64_SVE extern size_t sve_state_size(struct task_struct const *task); extern void sve_alloc(struct task_struct *task); extern void fpsimd_release_task(struct task_struct *task); +extern int sve_set_vector_length(struct task_struct *task, + unsigned long vl, unsigned long flags); #else /* ! CONFIG_ARM64_SVE */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index e0b5ef5..1ceb069 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -17,8 +17,10 @@ * along with this program. If not, see . */ +#include #include #include +#include #include #include #include @@ -28,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -113,6 +116,20 @@ static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state); /* Default VL for tasks that don't set it explicitly: */ static int sve_default_vl = SVE_VL_MIN; +#ifdef CONFIG_ARM64_SVE + +/* Maximum supported vector length across all CPUs (initially poisoned) */ +int __ro_after_init sve_max_vl = -1; +/* Set of available vector lengths, as vq_to_bit(vq): */ +static DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); + +#else /* ! CONFIG_ARM64_SVE */ + +/* Dummy declaration for code that will be optimised out: */ +extern DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); + +#endif /* ! CONFIG_ARM64_SVE */ + /* * Call __sve_free() directly only if you know task can't be scheduled * or preempted. @@ -270,6 +287,50 @@ static void task_fpsimd_save(void) } } +/* + * Helpers to translate bit indices in sve_vq_map to VQ values (and + * vice versa). This allows find_next_bit() to be used to find the + * _maximum_ VQ not exceeding a certain value. + */ + +static unsigned int vq_to_bit(unsigned int vq) +{ + return SVE_VQ_MAX - vq; +} + +static unsigned int bit_to_vq(unsigned int bit) +{ + if (WARN_ON(bit >= SVE_VQ_MAX)) + bit = SVE_VQ_MAX - 1; + + return SVE_VQ_MAX - bit; +} + +/* + * All vector length selection from userspace comes through here. + * We're on a slow path, so some sanity-checks are included. + * If things go wrong there's a bug somewhere, but try to fall back to a + * safe choice. + */ +static unsigned int find_supported_vector_length(unsigned int vl) +{ + int bit; + int max_vl = sve_max_vl; + + if (WARN_ON(!sve_vl_valid(vl))) + vl = SVE_VL_MIN; + + if (WARN_ON(!sve_vl_valid(max_vl))) + max_vl = SVE_VL_MIN; + + if (vl > max_vl) + vl = max_vl; + + bit = find_next_bit(sve_vq_map, SVE_VQ_MAX, + vq_to_bit(sve_vq_from_vl(vl))); + return sve_vl_from_vq(bit_to_vq(bit)); +} + #define ZREG(sve_state, vq, n) ((char *)(sve_state) + \ (SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET)) @@ -364,6 +425,76 @@ void sve_alloc(struct task_struct *task) BUG_ON(!task->thread.sve_state); } +int sve_set_vector_length(struct task_struct *task, + unsigned long vl, unsigned long flags) +{ + if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT | + PR_SVE_SET_VL_ONEXEC)) + return -EINVAL; + + if (!sve_vl_valid(vl)) + return -EINVAL; + + /* + * Clamp to the maximum vector length that VL-agnostic SVE code can + * work with. A flag may be assigned in the future to allow setting + * of larger vector lengths without confusing older software. + */ + if (vl > SVE_VL_ARCH_MAX) + vl = SVE_VL_ARCH_MAX; + + vl = find_supported_vector_length(vl); + + if (flags & (PR_SVE_VL_INHERIT | + PR_SVE_SET_VL_ONEXEC)) + task->thread.sve_vl_onexec = vl; + else + /* Reset VL to system default on next exec: */ + task->thread.sve_vl_onexec = 0; + + /* Only actually set the VL if not deferred: */ + if (flags & PR_SVE_SET_VL_ONEXEC) + goto out; + + if (vl == task->thread.sve_vl) + goto out; + + /* + * To ensure the FPSIMD bits of the SVE vector registers are preserved, + * write any live register state back to task_struct, and convert to a + * non-SVE thread. + */ + if (task == current) { + local_bh_disable(); + + task_fpsimd_save(); + set_thread_flag(TIF_FOREIGN_FPSTATE); + } + + fpsimd_flush_task_state(task); + if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) + sve_to_fpsimd(task); + + if (task == current) + local_bh_enable(); + + /* + * Force reallocation of task SVE state to the correct size + * on next use: + */ + sve_free(task); + + task->thread.sve_vl = vl; + +out: + if (flags & PR_SVE_VL_INHERIT) + set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT); + else + clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT); + + return 0; +} + /* * Called from the put_task_struct() path, which cannot get here * unless dead_task is really dead and not schedulable. @@ -480,7 +611,7 @@ void fpsimd_thread_switch(struct task_struct *next) void fpsimd_flush_thread(void) { - int vl; + int vl, supported_vl; if (!system_supports_fpsimd()) return; @@ -508,6 +639,10 @@ void fpsimd_flush_thread(void) if (WARN_ON(!sve_vl_valid(vl))) vl = SVE_VL_MIN; + supported_vl = find_supported_vector_length(vl); + if (WARN_ON(supported_vl != vl)) + vl = supported_vl; + current->thread.sve_vl = vl; /* diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a8d0759..1b64901 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -197,4 +197,9 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* arm64 Scalable Vector Extension controls */ +# define PR_SVE_SET_VL_ONEXEC (1 << 18) /* defer effect until exec */ +# define PR_SVE_VL_LEN_MASK 0xffff +# define PR_SVE_VL_INHERIT (1 << 17) /* inherit across exec */ + #endif /* _LINUX_PRCTL_H */ -- 2.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave.Martin@arm.com (Dave Martin) Date: Tue, 31 Oct 2017 15:51:08 +0000 Subject: [PATCH v5 16/30] arm64/sve: Backend logic for setting the vector length In-Reply-To: <1509465082-30427-1-git-send-email-Dave.Martin@arm.com> References: <1509465082-30427-1-git-send-email-Dave.Martin@arm.com> Message-ID: <1509465082-30427-17-git-send-email-Dave.Martin@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org This patch implements the core logic for changing a task's vector length on request from userspace. This will be used by the ptrace and prctl frontends that are implemented in later patches. The SVE architecture permits, but does not require, implementations to support vector lengths that are not a power of two. To handle this, logic is added to check a requested vector length against a possibly sparse bitmap of available vector lengths at runtime, so that the best supported value can be chosen. Signed-off-by: Dave Martin Reviewed-by: Catalin Marinas Cc: Alex Benn?e --- arch/arm64/include/asm/fpsimd.h | 8 +++ arch/arm64/kernel/fpsimd.c | 137 +++++++++++++++++++++++++++++++++++++++- include/uapi/linux/prctl.h | 5 ++ 3 files changed, 149 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 9bbd74c..86f550c 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -20,6 +20,7 @@ #ifndef __ASSEMBLY__ +#include #include /* @@ -70,17 +71,24 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state); extern void fpsimd_flush_task_state(struct task_struct *target); +/* Maximum VL that SVE VL-agnostic software can transparently support */ +#define SVE_VL_ARCH_MAX 0x100 + extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); +extern int __ro_after_init sve_max_vl; + #ifdef CONFIG_ARM64_SVE extern size_t sve_state_size(struct task_struct const *task); extern void sve_alloc(struct task_struct *task); extern void fpsimd_release_task(struct task_struct *task); +extern int sve_set_vector_length(struct task_struct *task, + unsigned long vl, unsigned long flags); #else /* ! CONFIG_ARM64_SVE */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index e0b5ef5..1ceb069 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -17,8 +17,10 @@ * along with this program. If not, see . */ +#include #include #include +#include #include #include #include @@ -28,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -113,6 +116,20 @@ static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_last_state); /* Default VL for tasks that don't set it explicitly: */ static int sve_default_vl = SVE_VL_MIN; +#ifdef CONFIG_ARM64_SVE + +/* Maximum supported vector length across all CPUs (initially poisoned) */ +int __ro_after_init sve_max_vl = -1; +/* Set of available vector lengths, as vq_to_bit(vq): */ +static DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); + +#else /* ! CONFIG_ARM64_SVE */ + +/* Dummy declaration for code that will be optimised out: */ +extern DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX); + +#endif /* ! CONFIG_ARM64_SVE */ + /* * Call __sve_free() directly only if you know task can't be scheduled * or preempted. @@ -270,6 +287,50 @@ static void task_fpsimd_save(void) } } +/* + * Helpers to translate bit indices in sve_vq_map to VQ values (and + * vice versa). This allows find_next_bit() to be used to find the + * _maximum_ VQ not exceeding a certain value. + */ + +static unsigned int vq_to_bit(unsigned int vq) +{ + return SVE_VQ_MAX - vq; +} + +static unsigned int bit_to_vq(unsigned int bit) +{ + if (WARN_ON(bit >= SVE_VQ_MAX)) + bit = SVE_VQ_MAX - 1; + + return SVE_VQ_MAX - bit; +} + +/* + * All vector length selection from userspace comes through here. + * We're on a slow path, so some sanity-checks are included. + * If things go wrong there's a bug somewhere, but try to fall back to a + * safe choice. + */ +static unsigned int find_supported_vector_length(unsigned int vl) +{ + int bit; + int max_vl = sve_max_vl; + + if (WARN_ON(!sve_vl_valid(vl))) + vl = SVE_VL_MIN; + + if (WARN_ON(!sve_vl_valid(max_vl))) + max_vl = SVE_VL_MIN; + + if (vl > max_vl) + vl = max_vl; + + bit = find_next_bit(sve_vq_map, SVE_VQ_MAX, + vq_to_bit(sve_vq_from_vl(vl))); + return sve_vl_from_vq(bit_to_vq(bit)); +} + #define ZREG(sve_state, vq, n) ((char *)(sve_state) + \ (SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET)) @@ -364,6 +425,76 @@ void sve_alloc(struct task_struct *task) BUG_ON(!task->thread.sve_state); } +int sve_set_vector_length(struct task_struct *task, + unsigned long vl, unsigned long flags) +{ + if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT | + PR_SVE_SET_VL_ONEXEC)) + return -EINVAL; + + if (!sve_vl_valid(vl)) + return -EINVAL; + + /* + * Clamp to the maximum vector length that VL-agnostic SVE code can + * work with. A flag may be assigned in the future to allow setting + * of larger vector lengths without confusing older software. + */ + if (vl > SVE_VL_ARCH_MAX) + vl = SVE_VL_ARCH_MAX; + + vl = find_supported_vector_length(vl); + + if (flags & (PR_SVE_VL_INHERIT | + PR_SVE_SET_VL_ONEXEC)) + task->thread.sve_vl_onexec = vl; + else + /* Reset VL to system default on next exec: */ + task->thread.sve_vl_onexec = 0; + + /* Only actually set the VL if not deferred: */ + if (flags & PR_SVE_SET_VL_ONEXEC) + goto out; + + if (vl == task->thread.sve_vl) + goto out; + + /* + * To ensure the FPSIMD bits of the SVE vector registers are preserved, + * write any live register state back to task_struct, and convert to a + * non-SVE thread. + */ + if (task == current) { + local_bh_disable(); + + task_fpsimd_save(); + set_thread_flag(TIF_FOREIGN_FPSTATE); + } + + fpsimd_flush_task_state(task); + if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) + sve_to_fpsimd(task); + + if (task == current) + local_bh_enable(); + + /* + * Force reallocation of task SVE state to the correct size + * on next use: + */ + sve_free(task); + + task->thread.sve_vl = vl; + +out: + if (flags & PR_SVE_VL_INHERIT) + set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT); + else + clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT); + + return 0; +} + /* * Called from the put_task_struct() path, which cannot get here * unless dead_task is really dead and not schedulable. @@ -480,7 +611,7 @@ void fpsimd_thread_switch(struct task_struct *next) void fpsimd_flush_thread(void) { - int vl; + int vl, supported_vl; if (!system_supports_fpsimd()) return; @@ -508,6 +639,10 @@ void fpsimd_flush_thread(void) if (WARN_ON(!sve_vl_valid(vl))) vl = SVE_VL_MIN; + supported_vl = find_supported_vector_length(vl); + if (WARN_ON(supported_vl != vl)) + vl = supported_vl; + current->thread.sve_vl = vl; /* diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a8d0759..1b64901 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -197,4 +197,9 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* arm64 Scalable Vector Extension controls */ +# define PR_SVE_SET_VL_ONEXEC (1 << 18) /* defer effect until exec */ +# define PR_SVE_VL_LEN_MASK 0xffff +# define PR_SVE_VL_INHERIT (1 << 17) /* inherit across exec */ + #endif /* _LINUX_PRCTL_H */ -- 2.1.4