All of lore.kernel.org
 help / color / mirror / Atom feed
* Re:Re: [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue
@ 2021-03-23 15:40 xiang fei
  2021-03-23 19:11 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: xiang fei @ 2021-03-23 15:40 UTC (permalink / raw)
  To: ulf.hansson; +Cc: linux-mmc, linux-kernel, adrian.hunter, hch

At 2021-02-06 00:22:21, "Christoph Hellwig" <hch@lst.de> wrote:
>On Fri, Feb 05, 2021 at 03:24:06PM +0100, Ulf Hansson wrote:
>> On Thu, 21 Jan 2021 at 09:13, Liu Xiang <liu.xiang@zlingsmart.com> wrote:
>> >
>> > After commit "40d09b53bfc557af7481b9d80f060a7ac9c7d314", request is
>> > completed in softirq. This may cause the system to suffer bad preemptoff
>> > time.
>> > The mmc driver has its own complete workqueue, but it can not work
>> > well now.
>> > The REQ_HIPRI flag can be used to complete request directly in its own
>> > complete workqueue and the preemptoff problem could be avoided.
>>
>> I am trying to understand all of the problem, but I don't quite get
>> it, sorry. Would it be possible for you to extend the description in
>> the commit message a bit?
>
>Yes, the message sounds weird.  The mentioned commit should obviously
>not make any difference for drivers not using it.
>
>> More exactly, what will happen if we tag a request with REQ_HIPRI
>> before completing it? Apologize for my ignorance, but I am currently a
>> bit overwhelmed with work, so I didn't have the time to really look it
>> up myself.
>
>Drivers must never set REQ_HIPRI!  This is a flag that is set by
>the submitter, and actually cleared for most drivers that don't support
>it by the block layer.


Sorry for not describing clearly in commit message.
I configure CONFIG_PREEMPT_TRACER and run iozone test with mmc driver.
This is the test result of the mainline:

# tracer: preemptoff
#
# preemptoff latency trace v1.1.5 on 5.11.0-rc4-ga95f9eb3f6cf
# --------------------------------------------------------------------
# latency: 1130504 us, #51247/49333412, CPU#3 | (M:preempt VP:0, KP:0,
SP:0 HP:0 #P:4)
#    -----------------
#    | task: ksoftirqd/3-27 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: __do_softirq
#  => ended at:   __do_softirq
#
#
#                    _------=> CPU#
#                   / _-----=> irqs-off
#                  | / _----=> need-resched
#                  || / _---=> hardirq/softirq
#                  ||| / _--=> preempt-depth
#                  |||| /     delay
#  cmd     pid     ||||| time  |   caller
#     \   /        |||||  \    |   /
ksoftirq-27        3.Ns1 1055999us : end_page_writeback <-ext4_finish_bio
ksoftirq-27        3.Ns1 1056000us : test_clear_page_writeback
<-end_page_writeback
ksoftirq-27        3.Ns1 1056002us : page_mapping <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056003us : lock_page_memcg <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056004us : __rcu_read_lock <-lock_page_memcg
ksoftirq-27        3.Ns1 1056006us : _raw_spin_lock_irqsave
<-test_clear_page_writeback
ksoftirq-27        3dNs1 1056007us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056009us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056010us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056012us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056013us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056014us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056016us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056017us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056018us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056020us : mem_cgroup_wb_domain
<-test_clear_page_writeback
ksoftirq-27        3dNs2 1056021us : _raw_spin_unlock_irqrestore
<-test_clear_page_writeback
ksoftirq-27        3.Ns2 1056023us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3dNs1 1056025us : __mod_lruvec_state
<-test_clear_page_writeback
ksoftirq-27        3dNs1 1056026us : __mod_node_page_state <-__mod_lruvec_state
ksoftirq-27        3dNs1 1056027us : __mod_memcg_lruvec_state
<-__mod_lruvec_state
ksoftirq-27        3dNs1 1056029us : __mod_memcg_state
<-__mod_memcg_lruvec_state
ksoftirq-27        3.Ns1 1056031us : dec_zone_page_state
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056032us : inc_node_page_state
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056033us : __unlock_page_memcg
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056035us : __rcu_read_unlock <-__unlock_page_memcg
ksoftirq-27        3.Ns1 1056036us : wake_up_page_bit <-end_page_writeback
ksoftirq-27        3.Ns1 1056037us : _raw_spin_lock_irqsave <-wake_up_page_bit
ksoftirq-27        3dNs1 1056039us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056041us : __wake_up_locked_key_bookmark
<-wake_up_page_bit
ksoftirq-27        3dNs2 1056042us : __wake_up_common
<-__wake_up_locked_key_bookmark
ksoftirq-27        3dNs2 1056043us : wake_page_function <-__wake_up_common
ksoftirq-27        3dNs2 1056044us : wake_up_state <-wake_page_function
ksoftirq-27        3dNs2 1056045us : try_to_wake_up <-wake_up_state
ksoftirq-27        3dNs2 1056047us : preempt_count_add <-try_to_wake_up
ksoftirq-27        3dNs3 1056048us : _raw_spin_lock_irqsave <-try_to_wake_up
ksoftirq-27        3dNs3 1056049us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs4 1056051us : select_task_rq_fair <-try_to_wake_up
ksoftirq-27        3dNs4 1056052us : __rcu_read_lock <-select_task_rq_fair
ksoftirq-27        3dNs4 1056053us : available_idle_cpu <-select_task_rq_fair
ksoftirq-27        3dNs4 1056055us : available_idle_cpu <-select_task_rq_fair
ksoftirq-27        3dNs4 1056056us : available_idle_cpu <-select_task_rq_fair
ksoftirq-27        3dNs4 1056057us : __rcu_read_unlock <-select_task_rq_fair
ksoftirq-27        3dNs4 1056059us : ttwu_queue_wakelist <-try_to_wake_up
ksoftirq-27        3dNs4 1056060us : _raw_spin_lock <-try_to_wake_up
ksoftirq-27        3dNs4 1056061us : preempt_count_add <-_raw_spin_lock
ksoftirq-27        3dNs5 1056062us : update_rq_clock.part.104 <-try_to_wake_up
ksoftirq-27        3dNs5 1056064us : ttwu_do_activate.isra.115 <-try_to_wake_up
ksoftirq-27        3dNs5 1056065us : __delayacct_blkio_end
<-ttwu_do_activate.isra.115
ksoftirq-27        3dNs5 1056066us : delayacct_end <-__delayacct_blkio_end
ksoftirq-27        3dNs5 1056068us : ktime_get <-delayacct_end
ksoftirq-27        3dNs5 1056069us : arch_counter_read <-ktime_get
ksoftirq-27        3dNs5 1056070us : _raw_spin_lock_irqsave <-delayacct_end
ksoftirq-27        3dNs5 1056072us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs6 1056073us : _raw_spin_unlock_irqrestore <-delayacct_end
ksoftirq-27        3dNs6 1056074us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3dNs5 1056075us : enqueue_task_fair
<-ttwu_do_activate.isra.115
ksoftirq-27        3dNs5 1056077us : update_curr <-enqueue_task_fair
ksoftirq-27        3dNs5 1056078us : __update_load_avg_se <-update_load_avg
ksoftirq-27        3dNs5 1056080us : __update_load_avg_cfs_rq <-update_load_avg
ksoftirq-27        3dNs5 1056081us : update_cfs_group <-enqueue_task_fair
ksoftirq-27        3dNs5 1056082us : __enqueue_entity <-enqueue_task_fair
ksoftirq-27        3dNs5 1056084us : update_curr <-enqueue_task_fair
ksoftirq-27        3dNs5 1056085us : __update_load_avg_se <-update_load_avg
ksoftirq-27        3dNs5 1056086us : __update_load_avg_cfs_rq <-update_load_avg
ksoftirq-27        3dNs5 1056088us : update_cfs_group <-enqueue_task_fair
ksoftirq-27        3dNs5 1056089us : reweight_entity <-update_cfs_group
ksoftirq-27        3dNs5 1056090us : __enqueue_entity <-enqueue_task_fair
ksoftirq-27        3dNs5 1056092us : ttwu_do_wakeup.isra.114
<-ttwu_do_activate.isra.115
ksoftirq-27        3dNs5 1056093us : check_preempt_curr
<-ttwu_do_wakeup.isra.114
ksoftirq-27        3dNs5 1056095us : resched_curr <-check_preempt_curr
ksoftirq-27        3dNs5 1056096us : smp_send_reschedule <-resched_curr
ksoftirq-27        3dNs5 1056097us : smp_cross_call <-smp_send_reschedule
ksoftirq-27        3dNs5 1056098us : __ipi_send_mask <-smp_cross_call
ksoftirq-27        3dNs5 1056100us : gic_ipi_send_mask <-__ipi_send_mask
ksoftirq-27        3dNs5 1056101us : _raw_spin_unlock <-try_to_wake_up
ksoftirq-27        3dNs5 1056103us : preempt_count_sub <-_raw_spin_unlock
ksoftirq-27        3dNs4 1056104us : _raw_spin_unlock_irqrestore
<-try_to_wake_up
ksoftirq-27        3dNs4 1056105us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3dNs3 1056107us : preempt_count_sub <-try_to_wake_up
ksoftirq-27        3dNs2 1056108us : _raw_spin_unlock_irqrestore
<-wake_up_page_bit
ksoftirq-27        3.Ns2 1056110us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3.Ns1 1056112us : _raw_spin_lock_irqsave <-ext4_finish_bio
ksoftirq-27        3dNs1 1056113us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056115us : _raw_spin_unlock_irqrestore
<-ext4_finish_bio
ksoftirq-27        3.Ns2 1056117us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3.Ns1 1056119us : end_page_writeback <-ext4_finish_bio
ksoftirq-27        3.Ns1 1056120us : test_clear_page_writeback
<-end_page_writeback
ksoftirq-27        3.Ns1 1056121us : page_mapping <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056122us : lock_page_memcg <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056124us : __rcu_read_lock <-lock_page_memcg
ksoftirq-27        3.Ns1 1056125us : _raw_spin_lock_irqsave
<-test_clear_page_writeback
ksoftirq-27        3dNs1 1056127us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056129us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056130us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056131us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056132us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056134us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056135us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056136us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056138us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056139us : mem_cgroup_wb_domain
<-test_clear_page_writeback
ksoftirq-27        3dNs2 1056140us : _raw_spin_unlock_irqrestore
<-test_clear_page_writeback
ksoftirq-27        3.Ns2 1056142us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3dNs1 1056144us : __mod_lruvec_state
<-test_clear_page_writeback
ksoftirq-27        3dNs1 1056146us : __mod_node_page_state <-__mod_lruvec_state
ksoftirq-27        3dNs1 1056147us : __mod_memcg_lruvec_state
<-__mod_lruvec_state
ksoftirq-27        3dNs1 1056148us : __mod_memcg_state
<-__mod_memcg_lruvec_state
ksoftirq-27        3.Ns1 1056150us : dec_zone_page_state
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056152us : inc_node_page_state
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056153us : __unlock_page_memcg
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056154us : __rcu_read_unlock <-__unlock_page_memcg
ksoftirq-27        3.Ns1 1056156us : _raw_spin_lock_irqsave <-ext4_finish_bio
ksoftirq-27        3dNs1 1056158us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056159us : _raw_spin_unlock_irqrestore
<-ext4_finish_bio
ksoftirq-27        3.Ns2 1056161us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3.Ns1 1056163us : end_page_writeback <-ext4_finish_bio
ksoftirq-27        3.Ns1 1056164us : test_clear_page_writeback
<-end_page_writeback
ksoftirq-27        3.Ns1 1056165us : page_mapping <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056167us : lock_page_memcg <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056168us : __rcu_read_lock <-lock_page_memcg
ksoftirq-27        3.Ns1 1056169us : _raw_spin_lock_irqsave
<-test_clear_page_writeback
ksoftirq-27        3dNs1 1056171us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056173us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056174us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056175us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056177us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056178us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056179us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056180us : preempt_count_add
<-percpu_counter_add_batch
ksoftirq-27        3dNs3 1056182us : preempt_count_sub
<-percpu_counter_add_batch
ksoftirq-27        3dNs2 1056183us : mem_cgroup_wb_domain
<-test_clear_page_writeback
ksoftirq-27        3dNs2 1056184us : _raw_spin_unlock_irqrestore
<-test_clear_page_writeback
ksoftirq-27        3.Ns2 1056186us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3dNs1 1056188us : __mod_lruvec_state
<-test_clear_page_writeback
ksoftirq-27        3dNs1 1056190us : __mod_node_page_state <-__mod_lruvec_state
ksoftirq-27        3dNs1 1056191us : __mod_memcg_lruvec_state
<-__mod_lruvec_state
ksoftirq-27        3dNs1 1056192us : __mod_memcg_state
<-__mod_memcg_lruvec_state
ksoftirq-27        3.Ns1 1056194us : dec_zone_page_state
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056196us : inc_node_page_state
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056197us : __unlock_page_memcg
<-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056198us : __rcu_read_unlock <-__unlock_page_memcg
ksoftirq-27        3.Ns1 1056200us : _raw_spin_lock_irqsave <-ext4_finish_bio
ksoftirq-27        3dNs1 1056202us : preempt_count_add <-_raw_spin_lock_irqsave
ksoftirq-27        3dNs2 1056204us : _raw_spin_unlock_irqrestore
<-ext4_finish_bio
ksoftirq-27        3.Ns2 1056207us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
ksoftirq-27        3.Ns1 1056208us : end_page_writeback <-ext4_finish_bio
ksoftirq-27        3.Ns1 1056209us : test_clear_page_writeback
<-end_page_writeback
ksoftirq-27        3.Ns1 1056211us : page_mapping <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056212us : lock_page_memcg <-test_clear_page_writeback
ksoftirq-27        3.Ns1 1056213us : __rcu_read_lock <-lock_page_memcg
ksoftirq-27        3.Ns1 1056215us : _raw_spin_lock_irqsave
<-test_clear_page_writeback


This is the test result with the patch:

# tracer: preemptoff
#
# preemptoff latency trace v1.1.5 on 5.11.0-rc4-ga95f9eb3f6cf-dirty
# --------------------------------------------------------------------
# latency: 37647 us, #1084/1084, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: kworker/1:1H-117 (uid:0 nice:-20 policy:0 rt_prio:0)
#    -----------------
#  => started at: __mutex_lock.isra.10
#  => ended at:   __mutex_lock.isra.10
#
#
#                    _------=> CPU#
#                   / _-----=> irqs-off
#                  | / _----=> need-resched
#                  || / _---=> hardirq/softirq
#                  ||| / _--=> preempt-depth
#                  |||| /     delay
#  cmd     pid     ||||| time  |   caller
#     \   /        |||||  \    |   /
kworker/-117       1...1    0us : __mutex_lock.isra.10 <-__mutex_lock.isra.10
kworker/-117       1...2    2us : __rcu_read_lock <-__mutex_lock.isra.10
kworker/-117       1...2    4us : __rcu_read_unlock <-__mutex_lock.isra.10
kworker/-117       1...2    5us : osq_lock <-__mutex_lock.isra.10
kworker/-117       1...2    7us : mutex_spin_on_owner <-__mutex_lock.isra.10
kworker/-117       1...2    9us!: __rcu_read_lock <-mutex_spin_on_owner
kworker/-117       1d..2  426us : gic_handle_irq <-el1_irq
kworker/-117       1d..2  427us : __handle_domain_irq <-gic_handle_irq
kworker/-117       1d..2  429us : irq_enter <-__handle_domain_irq
kworker/-117       1d..2  430us : irq_enter_rcu <-irq_enter
kworker/-117       1d..2  431us : preempt_count_add <-irq_enter_rcu
kworker/-117       1d.h2  433us : irqtime_account_irq <-irq_enter_rcu
kworker/-117       1d.h2  435us : irq_find_mapping <-__handle_domain_irq
kworker/-117       1d.h2  436us : generic_handle_irq <-__handle_domain_irq
kworker/-117       1d.h2  438us : handle_percpu_devid_irq <-generic_handle_irq
kworker/-117       1d.h2  439us : arch_timer_handler_phys
<-handle_percpu_devid_irq
kworker/-117       1d.h2  440us : hrtimer_interrupt <-arch_timer_handler_phys
kworker/-117       1d.h2  442us : _raw_spin_lock_irqsave <-hrtimer_interrupt
kworker/-117       1d.h2  443us : preempt_count_add <-_raw_spin_lock_irqsave
kworker/-117       1d.h3  444us : ktime_get_update_offsets_now
<-hrtimer_interrupt
kworker/-117       1d.h3  445us : arch_counter_read
<-ktime_get_update_offsets_now
kworker/-117       1d.h3  447us : __hrtimer_run_queues <-hrtimer_interrupt
kworker/-117       1d.h3  449us : __remove_hrtimer <-__hrtimer_run_queues
kworker/-117       1d.h3  450us : _raw_spin_unlock_irqrestore
<-__hrtimer_run_queues
kworker/-117       1d.h3  451us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
kworker/-117       1d.h2  453us : tick_sched_timer <-__hrtimer_run_queues
kworker/-117       1d.h2  454us : ktime_get <-tick_sched_timer
kworker/-117       1d.h2  455us : arch_counter_read <-ktime_get
kworker/-117       1d.h2  457us : tick_sched_do_timer <-tick_sched_timer
kworker/-117       1d.h2  458us : tick_do_update_jiffies64 <-tick_sched_do_timer
kworker/-117       1d.h2  459us : _raw_spin_lock <-tick_do_update_jiffies64
kworker/-117       1d.h2  460us : preempt_count_add <-_raw_spin_lock
kworker/-117       1d.h3  462us : calc_global_load <-tick_do_update_jiffies64
kworker/-117       1d.h3  465us : _raw_spin_unlock <-tick_do_update_jiffies64
kworker/-117       1d.h3  466us : preempt_count_sub <-_raw_spin_unlock
kworker/-117       1d.h2  467us : update_wall_time <-tick_do_update_jiffies64
kworker/-117       1d.h2  469us : timekeeping_advance <-update_wall_time
kworker/-117       1d.h2  470us : _raw_spin_lock_irqsave <-timekeeping_advance
kworker/-117       1d.h2  472us : preempt_count_add <-_raw_spin_lock_irqsave
kworker/-117       1d.h3  473us : arch_counter_read <-timekeeping_advance
kworker/-117       1d.h3  474us : ntp_tick_length <-timekeeping_advance
kworker/-117       1d.h3  476us : ntp_tick_length <-timekeeping_advance
kworker/-117       1d.h3  478us : timekeeping_update <-timekeeping_advance
kworker/-117       1d.h3  479us : ntp_get_next_leap <-timekeeping_update
kworker/-117       1d.h3  481us : update_vsyscall <-timekeeping_update
kworker/-117       1d.h3  483us : raw_notifier_call_chain <-timekeeping_update
kworker/-117       1d.h3  485us : update_fast_timekeeper <-timekeeping_update
kworker/-117       1d.h3  486us : update_fast_timekeeper <-timekeeping_update
kworker/-117       1d.h3  487us : _raw_spin_unlock_irqrestore
<-timekeeping_advance
kworker/-117       1d.h3  489us : preempt_count_sub
<-_raw_spin_unlock_irqrestore
kworker/-117       1d.h2  490us : tick_sched_handle.isra.16 <-tick_sched_timer
kworker/-117       1d.h2  492us : update_process_times
<-tick_sched_handle.isra.16
kworker/-117       1d.h2  493us : account_process_tick <-update_process_times
kworker/-117       1d.h2  494us : irqtime_account_process_tick
<-account_process_tick
kworker/-117       1d.h2  496us : account_system_index_time
<-irqtime_account_process_tick
kworker/-117       1d.h2  497us : cpuacct_account_field
<-account_system_index_time
kworker/-117       1d.h2  499us : __rcu_read_lock <-cpuacct_account_field
kworker/-117       1d.h2  500us : __rcu_read_unlock <-cpuacct_account_field
kworker/-117       1d.h2  501us : __rcu_read_lock <-account_system_index_time
kworker/-117       1d.h2  503us : __rcu_read_unlock <-account_system_index_time
kworker/-117       1d.h2  504us : acct_account_cputime
<-account_system_index_time
kworker/-117       1d.h2  505us : __acct_update_integrals <-acct_account_cputime



Before the commit "40d09b53bfc557af7481b9d80f060a7ac9c7d314", block I/O request
is completed in mmc_blk_mq_complete_work() and there is no problem.
But after the commit, block I/O request is completed in softirq and it
may cause the preemptoff
problem as above.

The use of REQ_HIPRI flag is intended to execute rq->q->mq_ops->complete() in
mmc_blk_mq_complete_work(), not in softirq.
I just think it can avoid the preemptoff problem and not change too much.
Maybe there is  a better way to solve the problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue
  2021-03-23 15:40 Re:Re: [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue xiang fei
@ 2021-03-23 19:11 ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2021-03-23 19:11 UTC (permalink / raw)
  To: xiang fei; +Cc: ulf.hansson, linux-mmc, linux-kernel, adrian.hunter, sagi

On Tue, Mar 23, 2021 at 11:40:47PM +0800, xiang fei wrote:
> Before the commit "40d09b53bfc557af7481b9d80f060a7ac9c7d314", block I/O request
> is completed in mmc_blk_mq_complete_work() and there is no problem.
> But after the commit, block I/O request is completed in softirq and it
> may cause the preemptoff
> problem as above.

I see how they are executed in softirq context, but I don't see how the
above commit could have introduce that.  It literally just refactored the
existing checks.

> The use of REQ_HIPRI flag is intended to execute rq->q->mq_ops->complete() in
> mmc_blk_mq_complete_work(), not in softirq.
> I just think it can avoid the preemptoff problem and not change too much.
> Maybe there is  a better way to solve the problem.

Well, there isn't really much of a point in bouncing to a softirq
context.  The patch below cleans the mmc completion code up to
avoid that internally, but for that it uses an API that right now
isn't intended for that kind of use.  I'm not sure yet it it just
needs updated documentation or maybe a different API.  Jens, any
comments?  Sagi, this might also make sense for nvme-tcp, doesn't it?

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index d666e24fbe0e0a..7ad7a4efd10481 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1364,17 +1364,28 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
 
 #define MMC_CQE_RETRIES 2
 
-static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_cqe_complete_rq(struct request *req)
 {
 	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
 	struct mmc_request *mrq = &mqrq->brq.mrq;
 	struct request_queue *q = req->q;
+	struct mmc_queue *mq = q->queuedata;
 	struct mmc_host *host = mq->card->host;
 	enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
 	unsigned long flags;
 	bool put_card;
 	int err;
 
+	/*
+	 * Block layer timeouts race with completions which means the normal
+	 * completion path cannot be used during recovery.
+	 */
+	if (!mq->in_recovery) {
+		if (unlikely(blk_should_fake_timeout(req->q)))
+			return;
+		blk_mq_set_request_complete(req);
+	}
+
 	mmc_cqe_post_req(host, mrq);
 
 	if (mrq->cmd && mrq->cmd->error)
@@ -1437,17 +1448,8 @@ static void mmc_blk_cqe_req_done(struct mmc_request *mrq)
 	struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req,
 						  brq.mrq);
 	struct request *req = mmc_queue_req_to_req(mqrq);
-	struct request_queue *q = req->q;
-	struct mmc_queue *mq = q->queuedata;
 
-	/*
-	 * Block layer timeouts race with completions which means the normal
-	 * completion path cannot be used during recovery.
-	 */
-	if (mq->in_recovery)
-		mmc_blk_cqe_complete_rq(mq, req);
-	else if (likely(!blk_should_fake_timeout(req->q)))
-		blk_mq_complete_request(req);
+	mmc_blk_cqe_complete_rq(req);
 }
 
 static int mmc_blk_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq)
@@ -1864,6 +1866,16 @@ static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
 	unsigned int nr_bytes = mqrq->brq.data.bytes_xfered;
 
+	/*
+	 * Block layer timeouts race with completions which means the normal
+	 * completion path cannot be used during recovery.
+	 */
+	if (!mq->in_recovery) {
+		if (unlikely(blk_should_fake_timeout(req->q)))
+			return;
+		blk_mq_set_request_complete(req);
+	}
+
 	if (nr_bytes) {
 		if (blk_update_request(req, BLK_STS_OK, nr_bytes))
 			blk_mq_requeue_request(req, true);
@@ -1920,24 +1932,7 @@ static void mmc_blk_hsq_req_done(struct mmc_request *mrq)
 
 	mmc_blk_rw_reset_success(mq, req);
 
-	/*
-	 * Block layer timeouts race with completions which means the normal
-	 * completion path cannot be used during recovery.
-	 */
-	if (mq->in_recovery)
-		mmc_blk_cqe_complete_rq(mq, req);
-	else if (likely(!blk_should_fake_timeout(req->q)))
-		blk_mq_complete_request(req);
-}
-
-void mmc_blk_mq_complete(struct request *req)
-{
-	struct mmc_queue *mq = req->q->queuedata;
-
-	if (mq->use_cqe)
-		mmc_blk_cqe_complete_rq(mq, req);
-	else if (likely(!blk_should_fake_timeout(req->q)))
-		mmc_blk_mq_complete_rq(mq, req);
+	mmc_blk_cqe_complete_rq(req);
 }
 
 static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
@@ -1981,16 +1976,7 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
 	struct mmc_host *host = mq->card->host;
 
 	mmc_post_req(host, mrq, 0);
-
-	/*
-	 * Block layer timeouts race with completions which means the normal
-	 * completion path cannot be used during recovery.
-	 */
-	if (mq->in_recovery)
-		mmc_blk_mq_complete_rq(mq, req);
-	else if (likely(!blk_should_fake_timeout(req->q)))
-		blk_mq_complete_request(req);
-
+	mmc_blk_mq_complete_rq(mq, req);
 	mmc_blk_mq_dec_in_flight(mq, req);
 }
 
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 31153f656f4129..f15ab390f4f5b9 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -10,7 +10,6 @@ void mmc_blk_cqe_recovery(struct mmc_queue *mq);
 enum mmc_issued;
 
 enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
-void mmc_blk_mq_complete(struct request *req);
 void mmc_blk_mq_recovery(struct mmc_queue *mq);
 
 struct work_struct;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2c473c9b899089..dbf4262d470fcf 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -491,11 +491,7 @@ static inline int blk_mq_request_completed(struct request *rq)
 }
 
 /*
- * 
- * Set the state to complete when completing a request from inside ->queue_rq.
- * This is used by drivers that want to ensure special complete actions that
- * need access to the request are called on failure, e.g. by nvme for
- * multipathing.
+ * XXX: needs better documentation
  */
 static inline void blk_mq_set_request_complete(struct request *rq)
 {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue
  2021-02-05 14:24 ` Ulf Hansson
@ 2021-02-05 16:22   ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2021-02-05 16:22 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Liu Xiang, linux-mmc, Linux Kernel Mailing List, liuxiang_1999,
	Christoph Hellwig, Adrian Hunter

On Fri, Feb 05, 2021 at 03:24:06PM +0100, Ulf Hansson wrote:
> On Thu, 21 Jan 2021 at 09:13, Liu Xiang <liu.xiang@zlingsmart.com> wrote:
> >
> > After commit "40d09b53bfc557af7481b9d80f060a7ac9c7d314", request is
> > completed in softirq. This may cause the system to suffer bad preemptoff
> > time.
> > The mmc driver has its own complete workqueue, but it can not work
> > well now.
> > The REQ_HIPRI flag can be used to complete request directly in its own
> > complete workqueue and the preemptoff problem could be avoided.
> 
> I am trying to understand all of the problem, but I don't quite get
> it, sorry. Would it be possible for you to extend the description in
> the commit message a bit?

Yes, the message sounds weird.  The mentioned commit should obviously
not make any difference for drivers not using it.

> More exactly, what will happen if we tag a request with REQ_HIPRI
> before completing it? Apologize for my ignorance, but I am currently a
> bit overwhelmed with work, so I didn't have the time to really look it
> up myself.

Drivers must never set REQ_HIPRI!  This is a flag that is set by
the submitter, and actually cleared for most drivers that don't support
it by the block layer.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue
  2021-01-21  8:13 Liu Xiang
@ 2021-02-05 14:24 ` Ulf Hansson
  2021-02-05 16:22   ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Ulf Hansson @ 2021-02-05 14:24 UTC (permalink / raw)
  To: Liu Xiang
  Cc: linux-mmc, Linux Kernel Mailing List, liuxiang_1999,
	Christoph Hellwig, Adrian Hunter

+ Adrian, Christoph

On Thu, 21 Jan 2021 at 09:13, Liu Xiang <liu.xiang@zlingsmart.com> wrote:
>
> After commit "40d09b53bfc557af7481b9d80f060a7ac9c7d314", request is
> completed in softirq. This may cause the system to suffer bad preemptoff
> time.
> The mmc driver has its own complete workqueue, but it can not work
> well now.
> The REQ_HIPRI flag can be used to complete request directly in its own
> complete workqueue and the preemptoff problem could be avoided.

I am trying to understand all of the problem, but I don't quite get
it, sorry. Would it be possible for you to extend the description in
the commit message a bit?

More exactly, what will happen if we tag a request with REQ_HIPRI
before completing it? Apologize for my ignorance, but I am currently a
bit overwhelmed with work, so I didn't have the time to really look it
up myself.

>
> Signed-off-by: Liu Xiang <liu.xiang@zlingsmart.com>
> ---
>  drivers/mmc/core/block.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 42e27a298..c27239a89 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1985,8 +1985,10 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
>          */
>         if (mq->in_recovery)
>                 mmc_blk_mq_complete_rq(mq, req);
> -       else if (likely(!blk_should_fake_timeout(req->q)))
> +       else if (likely(!blk_should_fake_timeout(req->q))) {
> +               req->cmd_flags |= REQ_HIPRI;
>                 blk_mq_complete_request(req);

Is there a specific reason why REQ_HIPRI is applicable only for
mmc_blk_mq_post_req() case?

We have other paths where we complete requests for MMC as well, are
those not relevant?

> +       }
>
>         mmc_blk_mq_dec_in_flight(mq, req);
>  }
> --
> 2.17.1
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue
@ 2021-01-21  8:13 Liu Xiang
  2021-02-05 14:24 ` Ulf Hansson
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Xiang @ 2021-01-21  8:13 UTC (permalink / raw)
  To: linux-mmc; +Cc: ulf.hansson, linux-kernel, liuxiang_1999, Liu Xiang

After commit "40d09b53bfc557af7481b9d80f060a7ac9c7d314", request is
completed in softirq. This may cause the system to suffer bad preemptoff
time.
The mmc driver has its own complete workqueue, but it can not work
well now.
The REQ_HIPRI flag can be used to complete request directly in its own
complete workqueue and the preemptoff problem could be avoided.

Signed-off-by: Liu Xiang <liu.xiang@zlingsmart.com>
---
 drivers/mmc/core/block.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 42e27a298..c27239a89 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1985,8 +1985,10 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req)
 	 */
 	if (mq->in_recovery)
 		mmc_blk_mq_complete_rq(mq, req);
-	else if (likely(!blk_should_fake_timeout(req->q)))
+	else if (likely(!blk_should_fake_timeout(req->q))) {
+		req->cmd_flags |= REQ_HIPRI;
 		blk_mq_complete_request(req);
+	}
 
 	mmc_blk_mq_dec_in_flight(mq, req);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-23 19:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23 15:40 Re:Re: [PATCH] mmc: block: use REQ_HIPRI flag to complete request directly in own complete workqueue xiang fei
2021-03-23 19:11 ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2021-01-21  8:13 Liu Xiang
2021-02-05 14:24 ` Ulf Hansson
2021-02-05 16:22   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.