* [PATCH v9 0/2] blk-mq: Rework blk-mq timeout handling again @ 2018-05-14 18:46 Bart Van Assche 2018-05-14 18:46 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche 2018-05-14 18:46 ` [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche 0 siblings, 2 replies; 14+ messages in thread From: Bart Van Assche @ 2018-05-14 18:46 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-kernel, Christoph Hellwig, Bart Van Assche Hello Jens, This is the ninth incarnation of the blk-mq timeout handling rework. All previously posted comments have been addressed. Please consider this patch series for inclusion in the upstream kernel. Bart. Changes compared to v8: - Split into two patches. - Moved the spin_lock_init() call from blk_mq_rq_ctx_init() into blk_mq_init_request(). - Fixed the deadline set by blk_add_timer(). - Surrounded the das_lock member with #ifndef CONFIG_ARCH_HAVE_CMPXCHG64 / #endif. Changes compared to v7: - Fixed the generation number mechanism. Note: with this patch applied the behavior of the block layer does not depend on the generation number. - Added more 32-bit architectures to the list of architectures on which cmpxchg64() should not be used. Changes compared to v6: - Used a union instead of bit manipulations to store multiple values into a single 64-bit field. - Reduced the size of the timeout field from 64 to 32 bits. - Made sure that the block layer still builds with this patch applied for the sh and mips architectures. - Fixed two sparse warnings that were introduced by this patch in the WRITE_ONCE() calls. Changes compared to v5: - Restored the synchronize_rcu() call between marking a request for timeout handling and the actual timeout handling to avoid that timeout handling starts while .queue_rq() is still in progress if the timeout is very short. - Only use cmpxchg() if another context could attempt to change the request state concurrently. Use WRITE_ONCE() otherwise. Changes compared to v4: - Addressed multiple review comments from Christoph. The most important are that atomic_long_cmpxchg() has been changed into cmpxchg() and also that there is now a nice and clean split between the legacy and blk-mq versions of blk_add_timer(). - Changed the patch name and modified the patch description because there is disagreement about whether or not the v4.16 blk-mq core can complete a single request twice. Kept the "Cc: stable" tag because of https://bugzilla.kernel.org/show_bug.cgi?id=199077. Changes compared to v3 (see also https://www.mail-archive.com/linux-block@vger.kernel.org/msg20073.html): - Removed the spinlock again that was introduced to protect the request state. v4 uses atomic_long_cmpxchg() instead. - Split __deadline into two variables - one for the legacy block layer and one for blk-mq. Changes compared to v2 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18338.html): - Rebased and retested on top of kernel v4.16. Changes compared to v1 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18089.html): - Removed the gstate and aborted_gstate members of struct request and used the __deadline member to encode both the generation and state information. Bart Van Assche (2): arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 blk-mq: Rework blk-mq timeout handling again .../features/locking/cmpxchg64/arch-support.txt | 31 ++++ arch/Kconfig | 3 + arch/alpha/Kconfig | 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 1 + block/blk-core.c | 6 - block/blk-mq-debugfs.c | 1 - block/blk-mq.c | 183 ++++++--------------- block/blk-mq.h | 117 ++++++++++--- block/blk-timeout.c | 95 ++++++----- block/blk.h | 14 +- include/linux/blkdev.h | 47 +++--- 21 files changed, 276 insertions(+), 233 deletions(-) create mode 100644 Documentation/features/locking/cmpxchg64/arch-support.txt -- 2.16.3 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-14 18:46 [PATCH v9 0/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche @ 2018-05-14 18:46 ` Bart Van Assche 2018-05-14 18:50 ` Max Filippov ` (2 more replies) 2018-05-14 18:46 ` [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche 1 sibling, 3 replies; 14+ messages in thread From: Bart Van Assche @ 2018-05-14 18:46 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, linux-kernel, Christoph Hellwig, Bart Van Assche, Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu, Geert Uytterhoeven, James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Martin Schwidefsky, Heiko Carstens, David S . Miller, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Chris Zankel, Max Filippov, Arnd Bergmann The next patch in this series introduces a call to cmpxchg64() in the block layer core for those architectures on which this functionality is available. Make it possible to test whether cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> f --- .../features/locking/cmpxchg64/arch-support.txt | 31 ++++++++++++++++++++++ arch/Kconfig | 3 +++ arch/alpha/Kconfig | 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 1 + 14 files changed, 46 insertions(+) create mode 100644 Documentation/features/locking/cmpxchg64/arch-support.txt diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt b/Documentation/features/locking/cmpxchg64/arch-support.txt new file mode 100644 index 000000000000..65b3290ce5d5 --- /dev/null +++ b/Documentation/features/locking/cmpxchg64/arch-support.txt @@ -0,0 +1,31 @@ +# +# Feature name: cmpxchg64 +# Kconfig: ARCH_HAVE_CMPXCHG64 +# description: arch supports the cmpxchg64() API +# + ----------------------- + | arch |status| + ----------------------- + | alpha: | ok | + | arc: | TODO | + | arm: |!thumb| + | arm64: | ok | + | c6x: | TODO | + | h8300: | TODO | + | hexagon: | TODO | + | ia64: | ok | + | m68k: | ok | + | microblaze: | TODO | + | mips: |64-bit| + | nios2: | TODO | + | openrisc: | TODO | + | parisc: | ok | + | powerpc: |64-bit| + | s390: | ok | + | sh: | TODO | + | sparc: | ok | + | um: | TODO | + | unicore32: | TODO | + | x86: | ok | + | xtensa: | ok | + ----------------------- diff --git a/arch/Kconfig b/arch/Kconfig index 8e0d665c8d53..bd54eb125b15 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -358,6 +358,9 @@ config HAVE_ALIGNED_STRUCT_PAGE on a struct page for better performance. However selecting this might increase the size of a struct page by a word. +config ARCH_HAVE_CMPXCHG64 + bool + config HAVE_CMPXCHG_LOCAL bool diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index b2022885ced8..94fc28e30ed2 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -16,6 +16,7 @@ config ALPHA select GENERIC_IRQ_SHOW select ARCH_WANT_IPC_PARSE_VERSION select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select AUDIT_ARCH select GENERIC_CLOCKEVENTS select GENERIC_CPU_VULNERABILITIES diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a7f8e7f4b88f..7be06e46d329 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -21,6 +21,7 @@ config ARM select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_HAVE_CMPXCHG64 if !THUMB2_KERNEL select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT if MMU select CLONE_BACKWARDS diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index eb2cf4938f6d..998e454a51b5 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -23,6 +23,7 @@ config ARM64 select ARCH_HAS_STRICT_MODULE_RWX select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_INLINE_READ_LOCK if !PREEMPT select ARCH_INLINE_READ_LOCK_BH if !PREEMPT select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index bbe12a038d21..b99a7fd9efec 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -42,6 +42,7 @@ config IA64 select GENERIC_IRQ_SHOW select GENERIC_IRQ_LEGACY select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select GENERIC_IOMAP select GENERIC_SMP_IDLE_THREAD select ARCH_TASK_STRUCT_ON_STACK diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 785612b576f7..9249de31414b 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -12,6 +12,7 @@ config M68K select HAVE_UID16 select VIRT_TO_BUS select ARCH_HAVE_NMI_SAFE_CMPXCHG if RMW_INSNS + select ARCH_HAVE_CMPXCHG64 select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERIC_STRNCPY_FROM_USER if MMU diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 225c95da23ce..7f453e7cb9e4 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -10,6 +10,7 @@ config MIPS select ARCH_SUPPORTS_UPROBES select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if 64BIT + select ARCH_HAVE_CMPXCHG64 if 64BIT select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_IPC_PARSE_VERSION diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index fc5a574c3482..deae8720861e 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -31,6 +31,7 @@ config PARISC select GENERIC_IRQ_PROBE select GENERIC_PCI_IOMAP select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select GENERIC_SMP_IDLE_THREAD select GENERIC_CPU_DEVICES select GENERIC_STRNCPY_FROM_USER diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c32a181a7cbb..315fc1e46baa 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -150,6 +150,7 @@ config PPC select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64 select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 199ac3e4da1d..b06b34ae8b3d 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -74,6 +74,7 @@ config S390 select ARCH_HAS_STRICT_MODULE_RWX select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_INLINE_READ_LOCK select ARCH_INLINE_READ_LOCK_BH select ARCH_INLINE_READ_LOCK_IRQ diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 8767e45f1b2b..cd41bf0d5afd 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -76,6 +76,7 @@ config SPARC64 select PERF_USE_VMALLOC select IRQ_PREFLOW_FASTEOI select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select HAVE_C_RECORDMCOUNT select NO_BOOTMEM select HAVE_ARCH_AUDITSYSCALL diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c07f492b871a..a5bcbb179ff2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -68,6 +68,7 @@ config X86 select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_ZONE_DEVICE if X86_64 select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index c921e8bccdc8..8234278a821d 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -23,6 +23,7 @@ config XTENSA select HAVE_DMA_CONTIGUOUS select HAVE_EXIT_THREAD select HAVE_FUNCTION_TRACER + select ARCH_HAVE_CMPXCHG64 select HAVE_FUTEX_CMPXCHG if !MMU select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING -- 2.16.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-14 18:46 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche @ 2018-05-14 18:50 ` Max Filippov 2018-05-14 20:42 ` Bart Van Assche 2018-05-15 2:54 ` Michael Ellerman 2018-05-15 8:15 ` Andrea Parri 2 siblings, 1 reply; 14+ messages in thread From: Max Filippov @ 2018-05-14 18:50 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, LKML, Christoph Hellwig, Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu, Geert Uytterhoeven, James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Martin Schwidefsky, Heiko Carstens, David S . Miller, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Chris Zankel, Arnd Bergmann On Mon, May 14, 2018 at 11:46 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote: > The next patch in this series introduces a call to cmpxchg64() > in the block layer core for those architectures on which this > functionality is available. Make it possible to test whether > cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64. > > --- [...] > diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig > index c921e8bccdc8..8234278a821d 100644 > --- a/arch/xtensa/Kconfig > +++ b/arch/xtensa/Kconfig > @@ -23,6 +23,7 @@ config XTENSA > select HAVE_DMA_CONTIGUOUS > select HAVE_EXIT_THREAD > select HAVE_FUNCTION_TRACER > + select ARCH_HAVE_CMPXCHG64 This breaks alphabetical sorting of Kconfig entries. > select HAVE_FUTEX_CMPXCHG if !MMU > select HAVE_HW_BREAKPOINT if PERF_EVENTS > select HAVE_IRQ_TIME_ACCOUNTING -- Thanks. -- Max ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-14 18:50 ` Max Filippov @ 2018-05-14 20:42 ` Bart Van Assche 0 siblings, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2018-05-14 20:42 UTC (permalink / raw) To: jcmvbkbc Cc: schwidefsky, linux-kernel, linux-block, tglx, deller, heiko.carstens, hch, fenghua.yu, axboe, hpa, catalin.marinas, will.deacon, mingo, jejb, tony.luck, chris, davem, arnd, geert, benh, mpe, paulus T24gTW9uLCAyMDE4LTA1LTE0IGF0IDExOjUwIC0wNzAwLCBNYXggRmlsaXBwb3Ygd3JvdGU6DQo+ IE9uIE1vbiwgTWF5IDE0LCAyMDE4IGF0IDExOjQ2IEFNLCBCYXJ0IFZhbiBBc3NjaGUNCj4gPGJh cnQudmFuYXNzY2hlQHdkYy5jb20+IHdyb3RlOg0KPiA+IFRoZSBuZXh0IHBhdGNoIGluIHRoaXMg c2VyaWVzIGludHJvZHVjZXMgYSBjYWxsIHRvIGNtcHhjaGc2NCgpDQo+ID4gaW4gdGhlIGJsb2Nr IGxheWVyIGNvcmUgZm9yIHRob3NlIGFyY2hpdGVjdHVyZXMgb24gd2hpY2ggdGhpcw0KPiA+IGZ1 bmN0aW9uYWxpdHkgaXMgYXZhaWxhYmxlLiBNYWtlIGl0IHBvc3NpYmxlIHRvIHRlc3Qgd2hldGhl cg0KPiA+IGNtcHhjaGc2NCgpIGlzIGF2YWlsYWJsZSBieSBpbnRyb2R1Y2luZyBDT05GSUdfQVJD SF9IQVZFX0NNUFhDSEc2NC4NCj4gPiANCj4gPiAtLS0NCj4gDQo+IFsuLi5dDQo+IA0KPiA+IGRp ZmYgLS1naXQgYS9hcmNoL3h0ZW5zYS9LY29uZmlnIGIvYXJjaC94dGVuc2EvS2NvbmZpZw0KPiA+ IGluZGV4IGM5MjFlOGJjY2RjOC4uODIzNDI3OGE4MjFkIDEwMDY0NA0KPiA+IC0tLSBhL2FyY2gv eHRlbnNhL0tjb25maWcNCj4gPiArKysgYi9hcmNoL3h0ZW5zYS9LY29uZmlnDQo+ID4gQEAgLTIz LDYgKzIzLDcgQEAgY29uZmlnIFhURU5TQQ0KPiA+ICAgICAgICAgc2VsZWN0IEhBVkVfRE1BX0NP TlRJR1VPVVMNCj4gPiAgICAgICAgIHNlbGVjdCBIQVZFX0VYSVRfVEhSRUFEDQo+ID4gICAgICAg ICBzZWxlY3QgSEFWRV9GVU5DVElPTl9UUkFDRVINCj4gPiArICAgICAgIHNlbGVjdCBBUkNIX0hB VkVfQ01QWENIRzY0DQo+IA0KPiBUaGlzIGJyZWFrcyBhbHBoYWJldGljYWwgc29ydGluZyBvZiBL Y29uZmlnIGVudHJpZXMuDQoNCkhlbGxvIE1heCwNCg0KVGhhbmtzIGZvciB0aGUgZmVlZGJhY2su IERvIHlvdSBwZXJoYXBzIGtub3cgd2hldGhlciBrZWVwaW5nIG5hbWVzIGluDQphbHBoYWJldGlj YWwgb3JkZXIgaXMgYSByZXF1aXJlbWVudCBmb3IgYXJjaC94dGVuc2EvS2NvbmZpZyBvbmx5IG9y DQp3aGV0aGVyIHRoaXMgaXMgcmVxdWlyZWQgZm9yIGFsbCBhcmNoLyovS2NvbmZpZyBmaWxlcz8N Cg0KQmFydC4= ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 @ 2018-05-14 20:42 ` Bart Van Assche 0 siblings, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2018-05-14 20:42 UTC (permalink / raw) To: jcmvbkbc Cc: schwidefsky, linux-kernel, linux-block, tglx, deller, heiko.carstens, hch, fenghua.yu, axboe, hpa, catalin.marinas, will.deacon, mingo, jejb, tony.luck, chris, davem, arnd, geert, benh, mpe, paulus On Mon, 2018-05-14 at 11:50 -0700, Max Filippov wrote: > On Mon, May 14, 2018 at 11:46 AM, Bart Van Assche > <bart.vanassche@wdc.com> wrote: > > The next patch in this series introduces a call to cmpxchg64() > > in the block layer core for those architectures on which this > > functionality is available. Make it possible to test whether > > cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64. > > > > --- > > [...] > > > diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig > > index c921e8bccdc8..8234278a821d 100644 > > --- a/arch/xtensa/Kconfig > > +++ b/arch/xtensa/Kconfig > > @@ -23,6 +23,7 @@ config XTENSA > > select HAVE_DMA_CONTIGUOUS > > select HAVE_EXIT_THREAD > > select HAVE_FUNCTION_TRACER > > + select ARCH_HAVE_CMPXCHG64 > > This breaks alphabetical sorting of Kconfig entries. Hello Max, Thanks for the feedback. Do you perhaps know whether keeping names in alphabetical order is a requirement for arch/xtensa/Kconfig only or whether this is required for all arch/*/Kconfig files? Bart. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-14 20:42 ` Bart Van Assche (?) @ 2018-05-14 21:23 ` Geert Uytterhoeven -1 siblings, 0 replies; 14+ messages in thread From: Geert Uytterhoeven @ 2018-05-14 21:23 UTC (permalink / raw) To: Bart Van Assche Cc: jcmvbkbc, schwidefsky, linux-kernel, linux-block, tglx, deller, heiko.carstens, hch, fenghua.yu, axboe, hpa, catalin.marinas, will.deacon, mingo, jejb, tony.luck, chris, davem, arnd, benh, mpe, paulus Hi Bart, On Mon, May 14, 2018 at 10:42 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Mon, 2018-05-14 at 11:50 -0700, Max Filippov wrote: >> On Mon, May 14, 2018 at 11:46 AM, Bart Van Assche >> <bart.vanassche@wdc.com> wrote: >> > The next patch in this series introduces a call to cmpxchg64() >> > in the block layer core for those architectures on which this >> > functionality is available. Make it possible to test whether >> > cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64. >> > >> > --- >> >> [...] >> >> > diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig >> > index c921e8bccdc8..8234278a821d 100644 >> > --- a/arch/xtensa/Kconfig >> > +++ b/arch/xtensa/Kconfig >> > @@ -23,6 +23,7 @@ config XTENSA >> > select HAVE_DMA_CONTIGUOUS >> > select HAVE_EXIT_THREAD >> > select HAVE_FUNCTION_TRACER >> > + select ARCH_HAVE_CMPXCHG64 >> >> This breaks alphabetical sorting of Kconfig entries. > > Hello Max, > > Thanks for the feedback. Do you perhaps know whether keeping names in > alphabetical order is a requirement for arch/xtensa/Kconfig only or > whether this is required for all arch/*/Kconfig files? It's a good practice anyway, as it reduces the probability of merge conflicts. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-14 18:46 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche 2018-05-14 18:50 ` Max Filippov @ 2018-05-15 2:54 ` Michael Ellerman 2018-05-15 15:14 ` Bart Van Assche 2018-05-15 8:15 ` Andrea Parri 2 siblings, 1 reply; 14+ messages in thread From: Michael Ellerman @ 2018-05-15 2:54 UTC (permalink / raw) To: Bart Van Assche, Jens Axboe Cc: linux-block, linux-kernel, Christoph Hellwig, Bart Van Assche, Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu, Geert Uytterhoeven, James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt, Paul Mackerras, Martin Schwidefsky, Heiko Carstens, David S . Miller, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Chris Zankel, Max Filippov, Arnd Bergmann Hi Bart, Bart Van Assche <bart.vanassche@wdc.com> writes: > ... > diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt b/Documentation/features/locking/cmpxchg64/arch-support.txt > new file mode 100644 > index 000000000000..65b3290ce5d5 > --- /dev/null > +++ b/Documentation/features/locking/cmpxchg64/arch-support.txt > @@ -0,0 +1,31 @@ > +# > +# Feature name: cmpxchg64 > +# Kconfig: ARCH_HAVE_CMPXCHG64 > +# description: arch supports the cmpxchg64() API > +# > + ----------------------- > + | arch |status| > + ----------------------- > + | alpha: | ok | > + | arc: | TODO | > + | arm: |!thumb| > + | arm64: | ok | > + | c6x: | TODO | > + | h8300: | TODO | > + | hexagon: | TODO | > + | ia64: | ok | > + | m68k: | ok | > + | microblaze: | TODO | > + | mips: |64-bit| > + | nios2: | TODO | > + | openrisc: | TODO | > + | parisc: | ok | > + | powerpc: |64-bit| I think that is correct for powerpc, we don't have a 32-bit implementation and there's no fallback it seems. > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -150,6 +150,7 @@ config PPC > select ARCH_HAS_UBSAN_SANITIZE_ALL > select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64 > select ARCH_HAVE_NMI_SAFE_CMPXCHG > + select ARCH_HAVE_CMPXCHG64 So shouldn't this should be: + select ARCH_HAVE_CMPXCHG64 if PPC64 And it should be sorted alphabetically, ie. above the previous NMI entry. cheers ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-15 2:54 ` Michael Ellerman @ 2018-05-15 15:14 ` Bart Van Assche 0 siblings, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2018-05-15 15:14 UTC (permalink / raw) To: mpe, axboe Cc: linux-kernel, linux-block, jcmvbkbc, tglx, deller, heiko.carstens, hch, fenghua.yu, hpa, catalin.marinas, will.deacon, mingo, jejb, tony.luck, chris, davem, arnd, geert, benh, schwidefsky, paulus T24gVHVlLCAyMDE4LTA1LTE1IGF0IDEyOjU0ICsxMDAwLCBNaWNoYWVsIEVsbGVybWFuIHdyb3Rl Og0KPiBCYXJ0IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+IHdyaXRlczoNCj4g PiANCj4gPiArICAgIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4gKyAgICB8ICAgICAgICAg YXJjaCB8c3RhdHVzfA0KPiA+ICsgICAgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiArICAg IHwgICAgICAgYWxwaGE6IHwgIG9rICB8DQo+ID4gKyAgICB8ICAgICAgICAgYXJjOiB8IFRPRE8g fA0KPiA+ICsgICAgfCAgICAgICAgIGFybTogfCF0aHVtYnwNCj4gPiArICAgIHwgICAgICAgYXJt NjQ6IHwgIG9rICB8DQo+ID4gKyAgICB8ICAgICAgICAgYzZ4OiB8IFRPRE8gfA0KPiA+ICsgICAg fCAgICAgICBoODMwMDogfCBUT0RPIHwNCj4gPiArICAgIHwgICAgIGhleGFnb246IHwgVE9ETyB8 DQo+ID4gKyAgICB8ICAgICAgICBpYTY0OiB8ICBvayAgfA0KPiA+ICsgICAgfCAgICAgICAgbTY4 azogfCAgb2sgIHwNCj4gPiArICAgIHwgIG1pY3JvYmxhemU6IHwgVE9ETyB8DQo+ID4gKyAgICB8 ICAgICAgICBtaXBzOiB8NjQtYml0fA0KPiA+ICsgICAgfCAgICAgICBuaW9zMjogfCBUT0RPIHwN Cj4gPiArICAgIHwgICAgb3BlbnJpc2M6IHwgVE9ETyB8DQo+ID4gKyAgICB8ICAgICAgcGFyaXNj OiB8ICBvayAgfA0KPiA+ICsgICAgfCAgICAgcG93ZXJwYzogfDY0LWJpdHwNCj4gDQo+IEkgdGhp bmsgdGhhdCBpcyBjb3JyZWN0IGZvciBwb3dlcnBjLCB3ZSBkb24ndCBoYXZlIGEgMzItYml0DQo+ IGltcGxlbWVudGF0aW9uIGFuZCB0aGVyZSdzIG5vIGZhbGxiYWNrIGl0IHNlZW1zLg0KPiANCj4g PiBkaWZmIC0tZ2l0IGEvYXJjaC9wb3dlcnBjL0tjb25maWcgYi9hcmNoL3Bvd2VycGMvS2NvbmZp Zw0KPiA+IC0tLSBhL2FyY2gvcG93ZXJwYy9LY29uZmlnDQo+ID4gKysrIGIvYXJjaC9wb3dlcnBj L0tjb25maWcNCj4gPiBAQCAtMTUwLDYgKzE1MCw3IEBAIGNvbmZpZyBQUEMNCj4gPiAgCXNlbGVj dCBBUkNIX0hBU19VQlNBTl9TQU5JVElaRV9BTEwNCj4gPiAgCXNlbGVjdCBBUkNIX0hBU19aT05F X0RFVklDRQkJaWYgUFBDX0JPT0szU182NA0KPiA+ICAJc2VsZWN0IEFSQ0hfSEFWRV9OTUlfU0FG RV9DTVBYQ0hHDQo+ID4gKwlzZWxlY3QgQVJDSF9IQVZFX0NNUFhDSEc2NA0KPiANCj4gU28gc2hv dWxkbid0IHRoaXMgc2hvdWxkIGJlOg0KPiANCj4gKwlzZWxlY3QgQVJDSF9IQVZFX0NNUFhDSEc2 NAkJaWYgUFBDNjQNCj4gDQo+IEFuZCBpdCBzaG91bGQgYmUgc29ydGVkIGFscGhhYmV0aWNhbGx5 LCBpZS4gYWJvdmUgdGhlIHByZXZpb3VzIE5NSSBlbnRyeS4NCg0KSGVsbG8gTWljaGFlbCwNCg0K VGhhbmtzLCBJIHdpbGwgbWFrZSB0aGVzZSBjaGFuZ2VzLg0KDQpCYXJ0Lg0KDQoNCg0KDQo= ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 @ 2018-05-15 15:14 ` Bart Van Assche 0 siblings, 0 replies; 14+ messages in thread From: Bart Van Assche @ 2018-05-15 15:14 UTC (permalink / raw) To: mpe, axboe Cc: linux-kernel, linux-block, jcmvbkbc, tglx, deller, heiko.carstens, hch, fenghua.yu, hpa, catalin.marinas, will.deacon, mingo, jejb, tony.luck, chris, davem, arnd, geert, benh, schwidefsky, paulus On Tue, 2018-05-15 at 12:54 +1000, Michael Ellerman wrote: > Bart Van Assche <bart.vanassche@wdc.com> writes: > > > > + ----------------------- > > + | arch |status| > > + ----------------------- > > + | alpha: | ok | > > + | arc: | TODO | > > + | arm: |!thumb| > > + | arm64: | ok | > > + | c6x: | TODO | > > + | h8300: | TODO | > > + | hexagon: | TODO | > > + | ia64: | ok | > > + | m68k: | ok | > > + | microblaze: | TODO | > > + | mips: |64-bit| > > + | nios2: | TODO | > > + | openrisc: | TODO | > > + | parisc: | ok | > > + | powerpc: |64-bit| > > I think that is correct for powerpc, we don't have a 32-bit > implementation and there's no fallback it seems. > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -150,6 +150,7 @@ config PPC > > select ARCH_HAS_UBSAN_SANITIZE_ALL > > select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64 > > select ARCH_HAVE_NMI_SAFE_CMPXCHG > > + select ARCH_HAVE_CMPXCHG64 > > So shouldn't this should be: > > + select ARCH_HAVE_CMPXCHG64 if PPC64 > > And it should be sorted alphabetically, ie. above the previous NMI entry. Hello Michael, Thanks, I will make these changes. Bart. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-14 18:46 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche 2018-05-14 18:50 ` Max Filippov 2018-05-15 2:54 ` Michael Ellerman @ 2018-05-15 8:15 ` Andrea Parri 2 siblings, 0 replies; 14+ messages in thread From: Andrea Parri @ 2018-05-15 8:15 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, linux-kernel, Christoph Hellwig, Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu, Geert Uytterhoeven, James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Martin Schwidefsky, Heiko Carstens, David S . Miller, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Chris Zankel, Max Filippov, Arnd Bergmann Hi Bart, On Mon, May 14, 2018 at 11:46:33AM -0700, Bart Van Assche wrote: [...] > diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt b/Documentation/features/locking/cmpxchg64/arch-support.txt > new file mode 100644 > index 000000000000..65b3290ce5d5 > --- /dev/null > +++ b/Documentation/features/locking/cmpxchg64/arch-support.txt > @@ -0,0 +1,31 @@ > +# > +# Feature name: cmpxchg64 > +# Kconfig: ARCH_HAVE_CMPXCHG64 > +# description: arch supports the cmpxchg64() API > +# > + ----------------------- > + | arch |status| > + ----------------------- > + | alpha: | ok | > + | arc: | TODO | > + | arm: |!thumb| > + | arm64: | ok | > + | c6x: | TODO | > + | h8300: | TODO | > + | hexagon: | TODO | > + | ia64: | ok | > + | m68k: | ok | > + | microblaze: | TODO | > + | mips: |64-bit| > + | nios2: | TODO | > + | openrisc: | TODO | > + | parisc: | ok | > + | powerpc: |64-bit| > + | s390: | ok | > + | sh: | TODO | > + | sparc: | ok | > + | um: | TODO | > + | unicore32: | TODO | > + | x86: | ok | > + | xtensa: | ok | > + ----------------------- nds32 and riscv seem to be missing from the table. I'd also suggest sticking to the three entries documented in Documentation/features/arch-support.txt and using the header comment to provide any additional information. A script that refreshes the arch support status file in place (from the Kconfig files) is currently available in linux-next: c.f., Documentation/features/scripts/features-refresh.sh Andrea ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again 2018-05-14 18:46 [PATCH v9 0/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche 2018-05-14 18:46 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche @ 2018-05-14 18:46 ` Bart Van Assche 2018-05-15 13:49 ` Sebastian Ott 1 sibling, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2018-05-14 18:46 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, linux-kernel, Christoph Hellwig, Bart Van Assche, Tejun Heo, Jianchao Wang, Ming Lei, Sebastian Ott, Sagi Grimberg, Israel Rukshin, Max Gurtovoy Recently the blk-mq timeout handling code was reworked. See also Tejun Heo, "[PATCHSET v4] blk-mq: reimplement timeout handling", 08 Jan 2018 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg16985.html). This patch reworks the blk-mq timeout handling code again. The timeout handling code is simplified by introducing a state machine per request. This change avoids that the blk-mq timeout handling code ignores completions that occur after blk_mq_check_expired() has been called and before blk_mq_rq_timed_out() has reset rq->aborted_gstate. If a block driver timeout handler always returns BLK_EH_RESET_TIMER then the result will be that the request never terminates. Fix this race as follows: - Replace the __deadline member of struct request by a new member called das that contains the generation number, state and deadline. Only 32 bits are used for the deadline field such that all three fields occupy only 64 bits. This change reduces the maximum supported request timeout value from (2**63/HZ) to (2**31/HZ). - Remove all request member variables that became superfluous due to this change: gstate, gstate_seq and aborted_gstate_sync. - Remove the request state information that became superfluous due to this patch, namely RQF_MQ_TIMEOUT_EXPIRED. - Remove the code that became superfluous due to this change, namely the RCU lock and unlock statements in blk_mq_complete_request() and also the synchronize_rcu() call in the timeout handler. Notes: - A spinlock is used to protect atomic changes of rq->das on those architectures that do not provide a cmpxchg64() implementation. - Atomic instructions are only used to update the request state if a concurrent request state change could be in progress. - blk_add_timer() has been split into two functions - one for the legacy block layer and one for blk-mq. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Sebastian Ott <sebott@linux.ibm.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Israel Rukshin <israelr@mellanox.com>, Cc: Max Gurtovoy <maxg@mellanox.com> --- block/blk-core.c | 6 -- block/blk-mq-debugfs.c | 1 - block/blk-mq.c | 183 ++++++++++++++----------------------------------- block/blk-mq.h | 117 +++++++++++++++++++++++-------- block/blk-timeout.c | 95 ++++++++++++++----------- block/blk.h | 14 ++-- include/linux/blkdev.h | 47 ++++++------- 7 files changed, 230 insertions(+), 233 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 341501c5e239..a7301fcda734 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -198,12 +198,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq) rq->internal_tag = -1; rq->start_time_ns = ktime_get_ns(); rq->part = NULL; - seqcount_init(&rq->gstate_seq); - u64_stats_init(&rq->aborted_gstate_sync); - /* - * See comment of blk_mq_init_request - */ - WRITE_ONCE(rq->gstate, MQ_RQ_GEN_INC); } EXPORT_SYMBOL(blk_rq_init); diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 3080e18cb859..ffa622366922 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -344,7 +344,6 @@ static const char *const rqf_name[] = { RQF_NAME(STATS), RQF_NAME(SPECIAL_PAYLOAD), RQF_NAME(ZONE_WRITE_LOCKED), - RQF_NAME(MQ_TIMEOUT_EXPIRED), RQF_NAME(MQ_POLL_SLEPT), }; #undef RQF_NAME diff --git a/block/blk-mq.c b/block/blk-mq.c index 64630caaf27e..40c9aa085613 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -318,7 +318,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, rq->special = NULL; /* tag was already set */ rq->extra_len = 0; - rq->__deadline = 0; + WARN_ON_ONCE(rq->das.state != MQ_RQ_IDLE); INIT_LIST_HEAD(&rq->timeout_list); rq->timeout = 0; @@ -494,7 +494,8 @@ void blk_mq_free_request(struct request *rq) if (blk_rq_rl(rq)) blk_put_rl(blk_rq_rl(rq)); - blk_mq_rq_update_state(rq, MQ_RQ_IDLE); + if (!blk_mq_change_rq_state(rq, blk_mq_rq_state(rq), MQ_RQ_IDLE)) + WARN_ON_ONCE(true); if (rq->tag != -1) blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag); if (sched_tag != -1) @@ -547,8 +548,7 @@ static void __blk_mq_complete_request(struct request *rq) bool shared = false; int cpu; - WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT); - blk_mq_rq_update_state(rq, MQ_RQ_COMPLETE); + WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_COMPLETE); if (rq->internal_tag != -1) blk_mq_sched_completed_request(rq); @@ -593,36 +593,6 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) *srcu_idx = srcu_read_lock(hctx->srcu); } -static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate) -{ - unsigned long flags; - - /* - * blk_mq_rq_aborted_gstate() is used from the completion path and - * can thus be called from irq context. u64_stats_fetch in the - * middle of update on the same CPU leads to lockup. Disable irq - * while updating. - */ - local_irq_save(flags); - u64_stats_update_begin(&rq->aborted_gstate_sync); - rq->aborted_gstate = gstate; - u64_stats_update_end(&rq->aborted_gstate_sync); - local_irq_restore(flags); -} - -static u64 blk_mq_rq_aborted_gstate(struct request *rq) -{ - unsigned int start; - u64 aborted_gstate; - - do { - start = u64_stats_fetch_begin(&rq->aborted_gstate_sync); - aborted_gstate = rq->aborted_gstate; - } while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start)); - - return aborted_gstate; -} - /** * blk_mq_complete_request - end I/O on a request * @rq: the request being processed @@ -634,27 +604,12 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq) void blk_mq_complete_request(struct request *rq) { struct request_queue *q = rq->q; - struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu); - int srcu_idx; if (unlikely(blk_should_fake_timeout(q))) return; - /* - * If @rq->aborted_gstate equals the current instance, timeout is - * claiming @rq and we lost. This is synchronized through - * hctx_lock(). See blk_mq_timeout_work() for details. - * - * Completion path never blocks and we can directly use RCU here - * instead of hctx_lock() which can be either RCU or SRCU. - * However, that would complicate paths which want to synchronize - * against us. Let stay in sync with the issue path so that - * hctx_lock() covers both issue and completion paths. - */ - hctx_lock(hctx, &srcu_idx); - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) + if (blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) __blk_mq_complete_request(rq); - hctx_unlock(hctx, srcu_idx); } EXPORT_SYMBOL(blk_mq_complete_request); @@ -681,27 +636,7 @@ void blk_mq_start_request(struct request *rq) wbt_issue(q->rq_wb, rq); } - WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE); - - /* - * Mark @rq in-flight which also advances the generation number, - * and register for timeout. Protect with a seqcount to allow the - * timeout path to read both @rq->gstate and @rq->deadline - * coherently. - * - * This is the only place where a request is marked in-flight. If - * the timeout path reads an in-flight @rq->gstate, the - * @rq->deadline it reads together under @rq->gstate_seq is - * guaranteed to be the matching one. - */ - preempt_disable(); - write_seqcount_begin(&rq->gstate_seq); - - blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); - blk_add_timer(rq); - - write_seqcount_end(&rq->gstate_seq); - preempt_enable(); + blk_mq_add_timer(rq, MQ_RQ_IDLE, MQ_RQ_IN_FLIGHT); if (q->dma_drain_size && blk_rq_bytes(rq)) { /* @@ -714,22 +649,19 @@ void blk_mq_start_request(struct request *rq) } EXPORT_SYMBOL(blk_mq_start_request); -/* - * When we reach here because queue is busy, it's safe to change the state - * to IDLE without checking @rq->aborted_gstate because we should still be - * holding the RCU read lock and thus protected against timeout. - */ static void __blk_mq_requeue_request(struct request *rq) { struct request_queue *q = rq->q; + enum mq_rq_state old_state = blk_mq_rq_state(rq); blk_mq_put_driver_tag(rq); trace_block_rq_requeue(q, rq); wbt_requeue(q->rq_wb, rq); - if (blk_mq_rq_state(rq) != MQ_RQ_IDLE) { - blk_mq_rq_update_state(rq, MQ_RQ_IDLE); + if (old_state != MQ_RQ_IDLE) { + if (!blk_mq_change_rq_state(rq, old_state, MQ_RQ_IDLE)) + WARN_ON_ONCE(true); if (q->dma_drain_size && blk_rq_bytes(rq)) rq->nr_phys_segments--; } @@ -838,8 +770,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) const struct blk_mq_ops *ops = req->q->mq_ops; enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER; - req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED; - if (ops->timeout) ret = ops->timeout(req, reserved); @@ -848,13 +778,7 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) __blk_mq_complete_request(req); break; case BLK_EH_RESET_TIMER: - /* - * As nothing prevents from completion happening while - * ->aborted_gstate is set, this may lead to ignored - * completions and further spurious timeouts. - */ - blk_mq_rq_update_aborted_gstate(req, 0); - blk_add_timer(req); + blk_mq_add_timer(req, MQ_RQ_COMPLETE, MQ_RQ_IN_FLIGHT); break; case BLK_EH_NOT_HANDLED: break; @@ -868,48 +792,50 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, struct request *rq, void *priv, bool reserved) { struct blk_mq_timeout_data *data = priv; - unsigned long gstate, deadline; - int start; - - might_sleep(); - - if (rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) - return; + union blk_deadline_and_state das = READ_ONCE(rq->das); + unsigned long now = jiffies; + int32_t diff_jiffies = das.deadline - now; + int32_t diff_next = das.deadline - data->next; + enum mq_rq_state rq_state = das.state; - /* read coherent snapshots of @rq->state_gen and @rq->deadline */ - while (true) { - start = read_seqcount_begin(&rq->gstate_seq); - gstate = READ_ONCE(rq->gstate); - deadline = blk_rq_deadline(rq); - if (!read_seqcount_retry(&rq->gstate_seq, start)) - break; - cond_resched(); - } - - /* if in-flight && overdue, mark for abortion */ - if ((gstate & MQ_RQ_STATE_MASK) == MQ_RQ_IN_FLIGHT && - time_after_eq(jiffies, deadline)) { - blk_mq_rq_update_aborted_gstate(rq, gstate); + /* + * Make sure that rq->aborted_gstate != rq->das if rq->deadline has not + * yet been reached even if a request gets recycled before + * blk_mq_terminate_expired() is called and the value of rq->deadline + * is not modified due to the request recycling. + */ + rq->aborted_gstate = das; + rq->aborted_gstate.generation ^= (1UL << 29); + if (diff_jiffies <= 0 && rq_state == MQ_RQ_IN_FLIGHT) { + /* request timed out */ + rq->aborted_gstate = das; data->nr_expired++; hctx->nr_expired++; - } else if (!data->next_set || time_after(data->next, deadline)) { - data->next = deadline; + } else if (!data->next_set) { + /* data->next is not yet set; set it to deadline. */ + data->next = now + diff_jiffies; data->next_set = 1; + } else if (diff_next < 0) { + /* data->next is later than deadline; reduce data->next. */ + data->next += diff_next; } + } static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, struct request *rq, void *priv, bool reserved) { + union blk_deadline_and_state old_val = rq->aborted_gstate; + union blk_deadline_and_state new_val = old_val; + + new_val.state = MQ_RQ_COMPLETE; + /* - * We marked @rq->aborted_gstate and waited for RCU. If there were - * completions that we lost to, they would have finished and - * updated @rq->gstate by now; otherwise, the completion path is - * now guaranteed to see @rq->aborted_gstate and yield. If - * @rq->aborted_gstate still matches @rq->gstate, @rq is ours. + * We marked @rq->aborted_gstate and waited for ongoing .queue_rq() + * calls. If rq->das has not changed that means that it + * is now safe to change the request state and to handle the timeout. */ - if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) && - READ_ONCE(rq->gstate) == rq->aborted_gstate) + if (blk_mq_set_rq_state(rq, old_val, new_val)) blk_mq_rq_timed_out(rq, reserved); } @@ -948,10 +874,10 @@ static void blk_mq_timeout_work(struct work_struct *work) bool has_rcu = false; /* - * Wait till everyone sees ->aborted_gstate. The - * sequential waits for SRCUs aren't ideal. If this ever - * becomes a problem, we can add per-hw_ctx rcu_head and - * wait in parallel. + * For very short timeouts it can happen that + * blk_mq_check_expired() modifies the state of a request + * while .queue_rq() is still in progress. Hence wait until + * these .queue_rq() calls have finished. */ queue_for_each_hw_ctx(q, hctx, i) { if (!hctx->nr_expired) @@ -967,7 +893,7 @@ static void blk_mq_timeout_work(struct work_struct *work) if (has_rcu) synchronize_rcu(); - /* terminate the ones we won */ + /* Terminate the requests marked by blk_mq_check_expired(). */ blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL); } @@ -2057,21 +1983,16 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq, { int ret; +#ifndef CONFIG_ARCH_HAVE_CMPXCHG64 + spin_lock_init(&rq->das_lock); +#endif + if (set->ops->init_request) { ret = set->ops->init_request(set, rq, hctx_idx, node); if (ret) return ret; } - seqcount_init(&rq->gstate_seq); - u64_stats_init(&rq->aborted_gstate_sync); - /* - * start gstate with gen 1 instead of 0, otherwise it will be equal - * to aborted_gstate, and be identified timed out by - * blk_mq_terminate_expired. - */ - WRITE_ONCE(rq->gstate, MQ_RQ_GEN_INC); - return 0; } diff --git a/block/blk-mq.h b/block/blk-mq.h index e1bb420dc5d6..42320011d264 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -2,6 +2,7 @@ #ifndef INT_BLK_MQ_H #define INT_BLK_MQ_H +#include <asm/cmpxchg.h> #include "blk-stat.h" #include "blk-mq-tag.h" @@ -30,18 +31,11 @@ struct blk_mq_ctx { struct kobject kobj; } ____cacheline_aligned_in_smp; -/* - * Bits for request->gstate. The lower two bits carry MQ_RQ_* state value - * and the upper bits the generation number. - */ +/* See also blk_deadline_and_state.state. */ enum mq_rq_state { MQ_RQ_IDLE = 0, MQ_RQ_IN_FLIGHT = 1, MQ_RQ_COMPLETE = 2, - - MQ_RQ_STATE_BITS = 2, - MQ_RQ_STATE_MASK = (1 << MQ_RQ_STATE_BITS) - 1, - MQ_RQ_GEN_INC = 1 << MQ_RQ_STATE_BITS, }; void blk_mq_freeze_queue(struct request_queue *q); @@ -103,37 +97,106 @@ extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx); void blk_mq_release(struct request_queue *q); +static inline bool blk_mq_set_rq_state(struct request *rq, + union blk_deadline_and_state old_val, + union blk_deadline_and_state new_val) +{ +#ifdef CONFIG_ARCH_HAVE_CMPXCHG64 + return cmpxchg64(&rq->das.das, old_val.das, new_val.das) == old_val.das; +#else + unsigned long flags; + bool res = false; + + spin_lock_irqsave(&rq->das_lock, flags); + if (rq->das.das == old_val.das) { + rq->das = new_val; + res = true; + } + spin_unlock_irqrestore(&rq->das_lock, flags); + + return res; +#endif +} + +/* + * If the state of request @rq equals @old_state, update deadline and request + * state atomically to @time and @new_state. blk-mq only. cmpxchg64() is only + * used if there could be a concurrent update attempt from another context. + */ +static inline bool blk_mq_rq_set_deadline(struct request *rq, + unsigned long new_time, + enum mq_rq_state old_state, + enum mq_rq_state new_state) +{ + union blk_deadline_and_state old_val, new_val; + + if (old_state != MQ_RQ_IN_FLIGHT) { + old_val = READ_ONCE(rq->das); + if (old_val.state != old_state) + return false; + new_val = old_val; + new_val.deadline = new_time; + new_val.state = new_state; + if (new_state == MQ_RQ_IN_FLIGHT) + new_val.generation++; + WRITE_ONCE(rq->das.das, new_val.das); + return true; + } + + do { + old_val = READ_ONCE(rq->das); + if (old_val.state != old_state) + return false; + new_val = old_val; + new_val.deadline = new_time; + new_val.state = new_state; + if (new_state == MQ_RQ_IN_FLIGHT) + new_val.generation++; + } while (!blk_mq_set_rq_state(rq, old_val, new_val)); + + return true; +} + /** * blk_mq_rq_state() - read the current MQ_RQ_* state of a request * @rq: target request. */ -static inline int blk_mq_rq_state(struct request *rq) +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) { - return READ_ONCE(rq->gstate) & MQ_RQ_STATE_MASK; + return rq->das.state; } /** - * blk_mq_rq_update_state() - set the current MQ_RQ_* state of a request - * @rq: target request. - * @state: new state to set. + * blk_mq_change_rq_state - atomically test and set request state + * @rq: Request pointer. + * @old_state: Old request state. + * @new_state: New request state. * - * Set @rq's state to @state. The caller is responsible for ensuring that - * there are no other updaters. A request can transition into IN_FLIGHT - * only from IDLE and doing so increments the generation number. + * Returns %true if and only if the old state was @old and if the state has + * been changed into @new. */ -static inline void blk_mq_rq_update_state(struct request *rq, - enum mq_rq_state state) +static inline bool blk_mq_change_rq_state(struct request *rq, + enum mq_rq_state old_state, + enum mq_rq_state new_state) { - u64 old_val = READ_ONCE(rq->gstate); - u64 new_val = (old_val & ~MQ_RQ_STATE_MASK) | state; - - if (state == MQ_RQ_IN_FLIGHT) { - WARN_ON_ONCE((old_val & MQ_RQ_STATE_MASK) != MQ_RQ_IDLE); - new_val += MQ_RQ_GEN_INC; + union blk_deadline_and_state das = READ_ONCE(rq->das); + union blk_deadline_and_state old_val = das; + union blk_deadline_and_state new_val = das; + + old_val.state = old_state; + new_val.state = new_state; + if (new_state == MQ_RQ_IN_FLIGHT) + new_val.generation++; + /* + * For transitions from state in-flight to another state cmpxchg64() + * must be used. For other state transitions it is safe to use + * WRITE_ONCE(). + */ + if (old_state != MQ_RQ_IN_FLIGHT) { + WRITE_ONCE(rq->das.das, new_val.das); + return true; } - - /* avoid exposing interim values */ - WRITE_ONCE(rq->gstate, new_val); + return blk_mq_set_rq_state(rq, old_val, new_val); } static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q, diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 652d4d4d3e97..549dfda7190c 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -162,8 +162,9 @@ void blk_abort_request(struct request *req) * immediately and that scan sees the new timeout value. * No need for fancy synchronizations. */ - blk_rq_set_deadline(req, jiffies); - kblockd_schedule_work(&req->q->timeout_work); + if (blk_mq_rq_set_deadline(req, jiffies, MQ_RQ_IN_FLIGHT, + MQ_RQ_IN_FLIGHT)) + kblockd_schedule_work(&req->q->timeout_work); } else { if (blk_mark_rq_complete(req)) return; @@ -184,52 +185,17 @@ unsigned long blk_rq_timeout(unsigned long timeout) return timeout; } -/** - * blk_add_timer - Start timeout timer for a single request - * @req: request that is about to start running. - * - * Notes: - * Each request has its own timer, and as it is added to the queue, we - * set up the timer. When the request completes, we cancel the timer. - */ -void blk_add_timer(struct request *req) +static void __blk_add_timer(struct request *req, unsigned long deadline) { struct request_queue *q = req->q; unsigned long expiry; - if (!q->mq_ops) - lockdep_assert_held(q->queue_lock); - - /* blk-mq has its own handler, so we don't need ->rq_timed_out_fn */ - if (!q->mq_ops && !q->rq_timed_out_fn) - return; - - BUG_ON(!list_empty(&req->timeout_list)); - - /* - * Some LLDs, like scsi, peek at the timeout to prevent a - * command from being retried forever. - */ - if (!req->timeout) - req->timeout = q->rq_timeout; - - blk_rq_set_deadline(req, jiffies + req->timeout); - req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED; - - /* - * Only the non-mq case needs to add the request to a protected list. - * For the mq case we simply scan the tag map. - */ - if (!q->mq_ops) - list_add_tail(&req->timeout_list, &req->q->timeout_list); - /* * If the timer isn't already pending or this timeout is earlier * than an existing one, modify the timer. Round up to next nearest * second. */ - expiry = blk_rq_timeout(round_jiffies_up(blk_rq_deadline(req))); - + expiry = blk_rq_timeout(round_jiffies_up(deadline)); if (!timer_pending(&q->timeout) || time_before(expiry, q->timeout.expires)) { unsigned long diff = q->timeout.expires - expiry; @@ -244,5 +210,56 @@ void blk_add_timer(struct request *req) if (!timer_pending(&q->timeout) || (diff >= HZ / 2)) mod_timer(&q->timeout, expiry); } +} + +/** + * blk_add_timer - Start timeout timer for a single request + * @req: request that is about to start running. + * + * Notes: + * Each request has its own timer, and as it is added to the queue, we + * set up the timer. When the request completes, we cancel the timer. + */ +void blk_add_timer(struct request *req) +{ + struct request_queue *q = req->q; + unsigned long deadline; + + lockdep_assert_held(q->queue_lock); + + if (!q->rq_timed_out_fn) + return; + if (!req->timeout) + req->timeout = q->rq_timeout; + deadline = jiffies + req->timeout; + blk_rq_set_deadline(req, deadline); + list_add_tail(&req->timeout_list, &req->q->timeout_list); + + return __blk_add_timer(req, deadline); +} + +/** + * blk_mq_add_timer - set the deadline for a single request + * @req: request for which to set the deadline. + * @old: current request state. + * @new: new request state. + * + * Sets the deadline of a request if and only if it has state @old and + * at the same time changes the request state from @old into @new. The caller + * must guarantee that the request state won't be modified while this function + * is in progress. + */ +void blk_mq_add_timer(struct request *req, enum mq_rq_state old, + enum mq_rq_state new) +{ + struct request_queue *q = req->q; + unsigned long deadline; + + if (!req->timeout) + req->timeout = q->rq_timeout; + deadline = jiffies + req->timeout; + if (!blk_mq_rq_set_deadline(req, deadline, old, new)) + WARN_ON_ONCE(true); + return __blk_add_timer(req, deadline); } diff --git a/block/blk.h b/block/blk.h index eaf1a8e87d11..984367308b11 100644 --- a/block/blk.h +++ b/block/blk.h @@ -170,6 +170,8 @@ static inline bool bio_integrity_endio(struct bio *bio) void blk_timeout_work(struct work_struct *work); unsigned long blk_rq_timeout(unsigned long timeout); void blk_add_timer(struct request *req); +void blk_mq_add_timer(struct request *req, enum mq_rq_state old, + enum mq_rq_state new); void blk_delete_timer(struct request *); @@ -191,21 +193,21 @@ void blk_account_io_done(struct request *req, u64 now); /* * EH timer and IO completion will both attempt to 'grab' the request, make * sure that only one of them succeeds. Steal the bottom bit of the - * __deadline field for this. + * das field for this. */ static inline int blk_mark_rq_complete(struct request *rq) { - return test_and_set_bit(0, &rq->__deadline); + return test_and_set_bit(0, &rq->das.legacy_deadline); } static inline void blk_clear_rq_complete(struct request *rq) { - clear_bit(0, &rq->__deadline); + clear_bit(0, &rq->das.legacy_deadline); } static inline bool blk_rq_is_complete(struct request *rq) { - return test_bit(0, &rq->__deadline); + return test_bit(0, &rq->das.legacy_deadline); } /* @@ -314,12 +316,12 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req) */ static inline void blk_rq_set_deadline(struct request *rq, unsigned long time) { - rq->__deadline = time & ~0x1UL; + rq->das.legacy_deadline = time & ~0x1UL; } static inline unsigned long blk_rq_deadline(struct request *rq) { - return rq->__deadline & ~0x1UL; + return rq->das.legacy_deadline & ~0x1UL; } /* diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f3999719f828..771dbdc8758c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,8 +27,6 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> -#include <linux/seqlock.h> -#include <linux/u64_stats_sync.h> struct module; struct scsi_ioctl_command; @@ -125,15 +123,23 @@ typedef __u32 __bitwise req_flags_t; #define RQF_SPECIAL_PAYLOAD ((__force req_flags_t)(1 << 18)) /* The per-zone write lock is held for this request */ #define RQF_ZONE_WRITE_LOCKED ((__force req_flags_t)(1 << 19)) -/* timeout is expired */ -#define RQF_MQ_TIMEOUT_EXPIRED ((__force req_flags_t)(1 << 20)) /* already slept for hybrid poll */ -#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 21)) +#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 20)) /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \ (RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD) +union blk_deadline_and_state { + struct { + uint32_t generation:30; + uint32_t state:2; + uint32_t deadline; + }; + unsigned long legacy_deadline; + uint64_t das; +}; + /* * Try to put the fields that are referenced together in the same cacheline. * @@ -236,29 +242,24 @@ struct request { unsigned int extra_len; /* length of alignment and padding */ - /* - * On blk-mq, the lower bits of ->gstate (generation number and - * state) carry the MQ_RQ_* state value and the upper bits the - * generation number which is monotonically incremented and used to - * distinguish the reuse instances. - * - * ->gstate_seq allows updates to ->gstate and other fields - * (currently ->deadline) during request start to be read - * atomically from the timeout path, so that it can operate on a - * coherent set of information. - */ - seqcount_t gstate_seq; - u64 gstate; - /* * ->aborted_gstate is used by the timeout to claim a specific * recycle instance of this request. See blk_mq_timeout_work(). */ - struct u64_stats_sync aborted_gstate_sync; - u64 aborted_gstate; + union blk_deadline_and_state aborted_gstate; + +#ifndef CONFIG_ARCH_HAVE_CMPXCHG64 + spinlock_t das_lock; +#endif - /* access through blk_rq_set_deadline, blk_rq_deadline */ - unsigned long __deadline; + /* + * Access through blk_rq_deadline() and blk_rq_set_deadline(), + * blk_mark_rq_complete(), blk_clear_rq_complete() and + * blk_rq_is_complete() for legacy queues or blk_mq_rq_state(), + * blk_mq_change_rq_state() and blk_mq_rq_set_deadline() for + * blk-mq queues. + */ + union blk_deadline_and_state das; struct list_head timeout_list; -- 2.16.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again 2018-05-14 18:46 ` [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche @ 2018-05-15 13:49 ` Sebastian Ott 0 siblings, 0 replies; 14+ messages in thread From: Sebastian Ott @ 2018-05-15 13:49 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, linux-kernel, Christoph Hellwig, Tejun Heo, Jianchao Wang, Ming Lei, Sagi Grimberg, Israel Rukshin, Max Gurtovoy On Mon, 14 May 2018, Bart Van Assche wrote: > Recently the blk-mq timeout handling code was reworked. See also Tejun > Heo, "[PATCHSET v4] blk-mq: reimplement timeout handling", 08 Jan 2018 > (https://www.mail-archive.com/linux-block@vger.kernel.org/msg16985.html). > This patch reworks the blk-mq timeout handling code again. The timeout > handling code is simplified by introducing a state machine per request. > This change avoids that the blk-mq timeout handling code ignores > completions that occur after blk_mq_check_expired() has been called and > before blk_mq_rq_timed_out() has reset rq->aborted_gstate. If a block > driver timeout handler always returns BLK_EH_RESET_TIMER then the result > will be that the request never terminates. > > Fix this race as follows: > - Replace the __deadline member of struct request by a new member > called das that contains the generation number, state and deadline. > Only 32 bits are used for the deadline field such that all three > fields occupy only 64 bits. This change reduces the maximum supported > request timeout value from (2**63/HZ) to (2**31/HZ). > - Remove all request member variables that became superfluous due to > this change: gstate, gstate_seq and aborted_gstate_sync. > - Remove the request state information that became superfluous due to > this patch, namely RQF_MQ_TIMEOUT_EXPIRED. > - Remove the code that became superfluous due to this change, namely > the RCU lock and unlock statements in blk_mq_complete_request() and > also the synchronize_rcu() call in the timeout handler. > > Notes: > - A spinlock is used to protect atomic changes of rq->das on those > architectures that do not provide a cmpxchg64() implementation. > - Atomic instructions are only used to update the request state if > a concurrent request state change could be in progress. > - blk_add_timer() has been split into two functions - one for the > legacy block layer and one for blk-mq. > I tested your patch on top of block/for-next (with forced timeouts) - works as expected. The lockdep warnings with regard to gstate_seq are gone (surprise with gstate_seq gone) - thanks for that! Regards, Sebastian ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v10 0/2] blk-mq: Rework blk-mq timeout handling again @ 2018-05-15 22:51 Bart Van Assche 2018-05-15 22:51 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche 0 siblings, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2018-05-15 22:51 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-kernel, Christoph Hellwig, Bart Van Assche Hello Jens, This is the tenth incarnation of the blk-mq timeout handling rework. All previously posted comments should have been addressed. Please consider this patch series for inclusion in the upstream kernel. Bart. Changes compared to v9: - Addressed multiple comments related to patch 1/2: added CONFIG_ARCH_HAVE_CMPXCHG64 for riscv, modified features/locking/cmpxchg64/arch-support.txt as requested and made the order of the symbols in the arch/*/Kconfig alphabetical where possible. Changes compared to v8: - Split into two patches. - Moved the spin_lock_init() call from blk_mq_rq_ctx_init() into blk_mq_init_request(). - Fixed the deadline set by blk_add_timer(). - Surrounded the das_lock member with #ifndef CONFIG_ARCH_HAVE_CMPXCHG64 / #endif. Changes compared to v7: - Fixed the generation number mechanism. Note: with this patch applied the behavior of the block layer does not depend on the generation number. - Added more 32-bit architectures to the list of architectures on which cmpxchg64() should not be used. Changes compared to v6: - Used a union instead of bit manipulations to store multiple values into a single 64-bit field. - Reduced the size of the timeout field from 64 to 32 bits. - Made sure that the block layer still builds with this patch applied for the sh and mips architectures. - Fixed two sparse warnings that were introduced by this patch in the WRITE_ONCE() calls. Changes compared to v5: - Restored the synchronize_rcu() call between marking a request for timeout handling and the actual timeout handling to avoid that timeout handling starts while .queue_rq() is still in progress if the timeout is very short. - Only use cmpxchg() if another context could attempt to change the request state concurrently. Use WRITE_ONCE() otherwise. Changes compared to v4: - Addressed multiple review comments from Christoph. The most important are that atomic_long_cmpxchg() has been changed into cmpxchg() and also that there is now a nice and clean split between the legacy and blk-mq versions of blk_add_timer(). - Changed the patch name and modified the patch description because there is disagreement about whether or not the v4.16 blk-mq core can complete a single request twice. Kept the "Cc: stable" tag because of https://bugzilla.kernel.org/show_bug.cgi?id=199077. Changes compared to v3 (see also https://www.mail-archive.com/linux-block@vger.kernel.org/msg20073.html): - Removed the spinlock again that was introduced to protect the request state. v4 uses atomic_long_cmpxchg() instead. - Split __deadline into two variables - one for the legacy block layer and one for blk-mq. Changes compared to v2 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18338.html): - Rebased and retested on top of kernel v4.16. Changes compared to v1 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18089.html): - Removed the gstate and aborted_gstate members of struct request and used the __deadline member to encode both the generation and state information. Bart Van Assche (2): arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 blk-mq: Rework blk-mq timeout handling again .../features/locking/cmpxchg64/arch-support.txt | 33 ++++ arch/Kconfig | 3 + arch/alpha/Kconfig | 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/riscv/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 1 + block/blk-core.c | 6 - block/blk-mq-debugfs.c | 1 - block/blk-mq.c | 183 ++++++--------------- block/blk-mq.h | 117 ++++++++++--- block/blk-timeout.c | 95 ++++++----- block/blk.h | 14 +- include/linux/blkdev.h | 47 +++--- 22 files changed, 279 insertions(+), 233 deletions(-) create mode 100644 Documentation/features/locking/cmpxchg64/arch-support.txt -- 2.16.3 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-15 22:51 [PATCH v10 0/2] " Bart Van Assche @ 2018-05-15 22:51 ` Bart Van Assche 2018-05-16 6:07 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2018-05-15 22:51 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, linux-kernel, Christoph Hellwig, Bart Van Assche, Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu, Geert Uytterhoeven, James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Martin Schwidefsky, Heiko Carstens, David S . Miller, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Chris Zankel, Max Filippov, Arnd Bergmann The next patch in this series introduces a call to cmpxchg64() in the block layer core for those architectures on which this functionality is available. Make it possible to test whether cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> f --- .../features/locking/cmpxchg64/arch-support.txt | 31 ++++++++++++++++++++++ arch/Kconfig | 3 +++ arch/alpha/Kconfig | 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/ia64/Kconfig | 1 + arch/m68k/Kconfig | 1 + arch/mips/Kconfig | 1 + arch/parisc/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 1 + 14 files changed, 46 insertions(+) create mode 100644 Documentation/features/locking/cmpxchg64/arch-support.txt diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt b/Documentation/features/locking/cmpxchg64/arch-support.txt new file mode 100644 index 000000000000..65b3290ce5d5 --- /dev/null +++ b/Documentation/features/locking/cmpxchg64/arch-support.txt @@ -0,0 +1,31 @@ +# +# Feature name: cmpxchg64 +# Kconfig: ARCH_HAVE_CMPXCHG64 +# description: arch supports the cmpxchg64() API +# + ----------------------- + | arch |status| + ----------------------- + | alpha: | ok | + | arc: | TODO | + | arm: |!thumb| + | arm64: | ok | + | c6x: | TODO | + | h8300: | TODO | + | hexagon: | TODO | + | ia64: | ok | + | m68k: | ok | + | microblaze: | TODO | + | mips: |64-bit| + | nios2: | TODO | + | openrisc: | TODO | + | parisc: | ok | + | powerpc: |64-bit| + | s390: | ok | + | sh: | TODO | + | sparc: | ok | + | um: | TODO | + | unicore32: | TODO | + | x86: | ok | + | xtensa: | ok | + ----------------------- diff --git a/arch/Kconfig b/arch/Kconfig index 8e0d665c8d53..bd54eb125b15 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -358,6 +358,9 @@ config HAVE_ALIGNED_STRUCT_PAGE on a struct page for better performance. However selecting this might increase the size of a struct page by a word. +config ARCH_HAVE_CMPXCHG64 + bool + config HAVE_CMPXCHG_LOCAL bool diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index b2022885ced8..94fc28e30ed2 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -16,6 +16,7 @@ config ALPHA select GENERIC_IRQ_SHOW select ARCH_WANT_IPC_PARSE_VERSION select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select AUDIT_ARCH select GENERIC_CLOCKEVENTS select GENERIC_CPU_VULNERABILITIES diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a7f8e7f4b88f..7be06e46d329 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -21,6 +21,7 @@ config ARM select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_HAVE_CMPXCHG64 if !THUMB2_KERNEL select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT if MMU select CLONE_BACKWARDS diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index eb2cf4938f6d..998e454a51b5 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -23,6 +23,7 @@ config ARM64 select ARCH_HAS_STRICT_MODULE_RWX select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_INLINE_READ_LOCK if !PREEMPT select ARCH_INLINE_READ_LOCK_BH if !PREEMPT select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index bbe12a038d21..b99a7fd9efec 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -42,6 +42,7 @@ config IA64 select GENERIC_IRQ_SHOW select GENERIC_IRQ_LEGACY select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select GENERIC_IOMAP select GENERIC_SMP_IDLE_THREAD select ARCH_TASK_STRUCT_ON_STACK diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 785612b576f7..9249de31414b 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -12,6 +12,7 @@ config M68K select HAVE_UID16 select VIRT_TO_BUS select ARCH_HAVE_NMI_SAFE_CMPXCHG if RMW_INSNS + select ARCH_HAVE_CMPXCHG64 select GENERIC_CPU_DEVICES select GENERIC_IOMAP select GENERIC_STRNCPY_FROM_USER if MMU diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 225c95da23ce..7f453e7cb9e4 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -10,6 +10,7 @@ config MIPS select ARCH_SUPPORTS_UPROBES select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if 64BIT + select ARCH_HAVE_CMPXCHG64 if 64BIT select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_IPC_PARSE_VERSION diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index fc5a574c3482..deae8720861e 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -31,6 +31,7 @@ config PARISC select GENERIC_IRQ_PROBE select GENERIC_PCI_IOMAP select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select GENERIC_SMP_IDLE_THREAD select GENERIC_CPU_DEVICES select GENERIC_STRNCPY_FROM_USER diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c32a181a7cbb..315fc1e46baa 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -150,6 +150,7 @@ config PPC select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64 select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 199ac3e4da1d..b06b34ae8b3d 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -74,6 +74,7 @@ config S390 select ARCH_HAS_STRICT_MODULE_RWX select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_INLINE_READ_LOCK select ARCH_INLINE_READ_LOCK_BH select ARCH_INLINE_READ_LOCK_IRQ diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 8767e45f1b2b..cd41bf0d5afd 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -76,6 +76,7 @@ config SPARC64 select PERF_USE_VMALLOC select IRQ_PREFLOW_FASTEOI select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select HAVE_C_RECORDMCOUNT select NO_BOOTMEM select HAVE_ARCH_AUDITSYSCALL diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c07f492b871a..a5bcbb179ff2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -68,6 +68,7 @@ config X86 select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_ZONE_DEVICE if X86_64 select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_CMPXCHG64 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index c921e8bccdc8..8234278a821d 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -23,6 +23,7 @@ config XTENSA select HAVE_DMA_CONTIGUOUS select HAVE_EXIT_THREAD select HAVE_FUNCTION_TRACER + select ARCH_HAVE_CMPXCHG64 select HAVE_FUTEX_CMPXCHG if !MMU select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING -- 2.16.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 2018-05-15 22:51 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche @ 2018-05-16 6:07 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2018-05-16 6:07 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, linux-kernel, Christoph Hellwig, Catalin Marinas, Will Deacon, Tony Luck, Fenghua Yu, Geert Uytterhoeven, James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Martin Schwidefsky, Heiko Carstens, David S . Miller, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Chris Zankel, Max Filippov, Arnd Bergmann > +config ARCH_HAVE_CMPXCHG64 > + bool 64-bit architectures must support this as long is 64-bits wide. So this should have a default y if 64BIT which also means you only need to explicitly select in on 32-bit architectures that support 64-bit cmpxchg. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-05-16 6:07 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-14 18:46 [PATCH v9 0/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche 2018-05-14 18:46 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche 2018-05-14 18:50 ` Max Filippov 2018-05-14 20:42 ` Bart Van Assche 2018-05-14 20:42 ` Bart Van Assche 2018-05-14 21:23 ` Geert Uytterhoeven 2018-05-15 2:54 ` Michael Ellerman 2018-05-15 15:14 ` Bart Van Assche 2018-05-15 15:14 ` Bart Van Assche 2018-05-15 8:15 ` Andrea Parri 2018-05-14 18:46 ` [PATCH v9 2/2] blk-mq: Rework blk-mq timeout handling again Bart Van Assche 2018-05-15 13:49 ` Sebastian Ott 2018-05-15 22:51 [PATCH v10 0/2] " Bart Van Assche 2018-05-15 22:51 ` [PATCH v9 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64 Bart Van Assche 2018-05-16 6:07 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.