All of lore.kernel.org
 help / color / mirror / Atom feed
* EIO with io_uring O_DIRECT writes on ext4
@ 2019-07-23  8:07 Stefan Hajnoczi
  2019-07-23 15:20 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2019-07-23  8:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Aarushi Mehta, Julia Suvorova, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 28483 bytes --]

Hi,
io_uring O_DIRECT writes can fail with EIO on ext4.  Please see the
function graph trace from Linux 5.3.0-rc1 below for details.  It was
produced with the following qemu-io command (using Aarushi's QEMU
patches from https://github.com/rooshm/qemu/commits/io_uring):

  $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2

This issue is specific to ext4.  XFS and the underlying LVM logical
volume both work.

The storage configuration is an LVM logical volume (device-mapper linear
target), on top of LUKS, on top of a SATA disk.  The logical volume's
request_queue does not have mq_ops and this causes
generic_make_request_checks() to fail:

  if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
          goto not_supported;

I guess this could be worked around by deferring the request to the
io_uring work queue to avoid REQ_NOWAIT.  But XFS handles this fine so
how can io_uring.c detect this case cleanly or is there a bug in ext4?

Stefan
---

 2)               |  __x64_sys_io_uring_enter() {
 2)               |    __fdget() {
 2)               |      __fget_light() {
 2)   0.207 us    |        __fget();
 2)   0.683 us    |      }
 2)   1.097 us    |    }
 2)               |    mutex_lock() {
 2)               |      _cond_resched() {
 2)               |        rcu_note_context_switch() {
 2)   0.233 us    |          rcu_qs();
 2)   0.700 us    |        }
 2)   0.235 us    |        _raw_spin_lock();
 2)   0.258 us    |        update_rq_clock();
 2)               |        pick_next_task_fair() {
 2)               |          update_curr() {
 2)   0.240 us    |            __calc_delta();
 2)   0.262 us    |            update_min_vruntime();
 2)   1.240 us    |          }
 2)   0.227 us    |          check_cfs_rq_runtime();
 2)               |          pick_next_entity() {
 2)   0.235 us    |            wakeup_preempt_entity.isra.0();
 2)   0.274 us    |            clear_buddies();
 2)   1.218 us    |          }
 2)               |          put_prev_entity() {
 2)               |            update_curr() {
 2)   0.222 us    |              update_min_vruntime();
 2)   0.226 us    |              cpuacct_charge();
 2)               |              __cgroup_account_cputime() {
 2)   0.226 us    |                cgroup_rstat_updated();
 2)   0.672 us    |              }
 2)   2.021 us    |            }
 2)   0.225 us    |            check_cfs_rq_runtime();
 2)   0.242 us    |            __enqueue_entity();
 2)   0.209 us    |            __update_load_avg_se();
 2)   0.183 us    |            __update_load_avg_cfs_rq();
 2)   4.119 us    |          }
 2)               |          put_prev_entity() {
 2)   0.235 us    |            update_curr();
 2)   0.237 us    |            check_cfs_rq_runtime();
 2)   0.274 us    |            __enqueue_entity();
 2)   0.246 us    |            __update_load_avg_se();
 2)   0.220 us    |            __update_load_avg_cfs_rq();
 2)   2.550 us    |          }
 2)               |          set_next_entity() {
 2)   0.266 us    |            __update_load_avg_se();
 2)   0.223 us    |            __update_load_avg_cfs_rq();
 2)   1.263 us    |          }
 2) + 12.628 us   |        }
 2)   0.237 us    |        enter_lazy_tlb();
 2)   0.277 us    |        finish_task_switch();
 2) + 25.560 us   |      }
 2) + 26.057 us   |    }
 2)               |    io_ring_submit() {
 2)   0.280 us    |      io_get_sqring.isra.0();
 2)               |      io_submit_sqe() {
 2)               |        kmem_cache_alloc() {
 2)               |          _cond_resched() {
 2)   0.230 us    |            rcu_all_qs();
 2)   0.697 us    |          }
 2)   0.226 us    |          should_failslab();
 2)   0.209 us    |          memcg_kmem_put_cache();
 2)   2.141 us    |        }
 2)               |        fget() {
 2)   0.263 us    |          __fget();
 2)   0.705 us    |        }
 2)               |        io_queue_sqe() {
 2)               |          __io_submit_sqe() {
 2)               |            io_write() {
 2)   0.303 us    |              io_prep_rw();
 2)               |              io_import_iovec.isra.0() {
 2)               |                rw_copy_check_uvector() {
 2)               |                  __check_object_size() {
 2)   0.255 us    |                    check_stack_object();
 2)   0.708 us    |                  }
 2)   1.296 us    |                }
 2)   1.854 us    |              }
 2)               |              rw_verify_area() {
 2)               |                security_file_permission() {
 2)               |                  selinux_file_permission() {
 2)               |                    __inode_security_revalidate() {
 2)               |                      _cond_resched() {
 2)   0.219 us    |                        rcu_all_qs();
 2)   0.666 us    |                      }
 2)   1.143 us    |                    }
 2)   0.207 us    |                    avc_policy_seqno();
 2)   2.090 us    |                  }
 2)   2.759 us    |                }
 2)   3.161 us    |              }
 2)               |              __sb_start_write() {
 2)               |                _cond_resched() {
 2)   0.223 us    |                  rcu_all_qs();
 2)   0.662 us    |                }
 2)   1.195 us    |              }
 2)               |              ext4_file_write_iter() {
 2)   0.221 us    |                down_write_trylock();
 2)               |                ext4_write_checks() {
 2)   0.238 us    |                  generic_write_check_limits.isra.0();
 2)   0.739 us    |                }
 2)               |                ext4_map_blocks() {
 2)               |                  ext4_es_lookup_extent() {
 2)   0.231 us    |                    _raw_read_lock();
 2)   0.728 us    |                  }
 2)               |                  __check_block_validity.constprop.0() {
 2)   0.294 us    |                    ext4_data_block_valid();
 2)   0.796 us    |                  }
 2)   2.211 us    |                }
 2)               |                __generic_file_write_iter() {
 2)   0.225 us    |                  file_remove_privs();
 2)               |                  file_update_time() {
 2)               |                    current_time() {
 2)   0.225 us    |                      ktime_get_coarse_real_ts64();
 2)   0.244 us    |                      timespec64_trunc();
 2)   1.155 us    |                    }
 2)   0.307 us    |                    __mnt_want_write_file();
 2)               |                    generic_update_time() {
 2)               |                      __mark_inode_dirty() {
 2)               |                        ext4_dirty_inode() {
 2)               |                          __ext4_journal_start_sb() {
 2)               |                            ext4_journal_check_start() {
 2)               |                              _cond_resched() {
 2)   9.606 us    |                                rcu_all_qs();
 2) + 10.042 us   |                              }
 2) + 10.507 us   |                            }
 2)               |                            jbd2__journal_start() {
 2)               |                              kmem_cache_alloc() {
 2)               |                                _cond_resched() {
 2)   0.208 us    |                                  rcu_all_qs();
 2)   0.608 us    |                                }
 2)   0.207 us    |                                should_failslab();
 2)   0.257 us    |                                memcg_kmem_put_cache();
 2)   2.063 us    |                              }
 2)               |                              start_this_handle() {
 2)   0.218 us    |                                _raw_read_lock();
 2)   0.248 us    |                                add_transaction_credits();
 2)   1.308 us    |                              }
 2)   4.047 us    |                            }
 2) + 15.251 us   |                          }
 2)               |                          ext4_mark_inode_dirty() {
 2)               |                            _cond_resched() {
 2)   0.194 us    |                              rcu_all_qs();
 2)   0.616 us    |                            }
 2)               |                            ext4_reserve_inode_write() {
 2)               |                              __ext4_get_inode_loc() {
 2)   0.325 us    |                                ext4_get_group_desc();
 2)   0.238 us    |                                ext4_inode_table();
 2)               |                                __getblk_gfp() {
 2)               |                                  __find_get_block() {
 2)   0.265 us    |                                    mark_page_accessed();
 2)   1.071 us    |                                  } /* __find_get_block */
 2)               |                                  _cond_resched() {
 2)   0.211 us    |                                    rcu_all_qs();
 2)   0.643 us    |                                  }
 2)   2.560 us    |                                }
 2)   4.076 us    |                              }
 2)               |                              __ext4_journal_get_write_access() {
 2)               |                                _cond_resched() {
 2)   0.174 us    |                                  rcu_all_qs();
 2)   0.609 us    |                                }
 2)               |                                jbd2_journal_get_write_access() {
 2)   0.259 us    |                                  jbd2_write_access_granted();
 2)   0.695 us    |                                }
 2)   1.937 us    |                              }
 2)   6.681 us    |                            }
 2)               |                            ext4_mark_iloc_dirty() {
 2)   0.205 us    |                              _raw_spin_lock();
 2)               |                              from_kuid() {
 2)   0.196 us    |                                map_id_up();
 2)   0.604 us    |                              }
 2)               |                              from_kgid() {
 2)   0.223 us    |                                map_id_up();
 2)   0.641 us    |                              }
 2)               |                              from_kprojid() {
 2)   0.221 us    |                                map_id_up();
 2)   0.644 us    |                              }
 2)               |                              ext4_inode_csum_set() {
 2)               |                                ext4_inode_csum.isra.0() {
 2)               |                                  crypto_shash_update() {
 2)   0.309 us    |                                    crc32c_pcl_intel_update [crc32c_intel]();
 2)   0.956 us    |                                  }
 2)               |                                  crypto_shash_update() {
 2)   0.250 us    |                                    crc32c_pcl_intel_update [crc32c_intel]();
 2)   0.692 us    |                                  }
 2)               |                                  crypto_shash_update() {
 2)   0.244 us    |                                    crc32c_pcl_intel_update [crc32c_intel]();
 2)   0.662 us    |                                  }
 2)               |                                  crypto_shash_update() {
 2)   0.172 us    |                                    crc32c_pcl_intel_update [crc32c_intel]();
 2)   0.562 us    |                                  }
 2)               |                                  crypto_shash_update() {
 2)   0.223 us    |                                    crc32c_pcl_intel_update [crc32c_intel]();
 2)   0.658 us    |                                  }
 2)               |                                  crypto_shash_update() {
 2)   0.236 us    |                                    crc32c_pcl_intel_update [crc32c_intel]();
 2)   0.681 us    |                                  }
 2)   5.786 us    |                                }
 2)   6.383 us    |                              }
 2)               |                              __ext4_handle_dirty_metadata() {
 2)               |                                _cond_resched() {
 2)   0.220 us    |                                  rcu_all_qs();
 2)   0.649 us    |                                }
 2)   0.250 us    |                                jbd2_journal_dirty_metadata();
 2)   1.576 us    |                              }
 2)   0.208 us    |                              __brelse();
 2) + 12.384 us   |                            }
 2) + 20.577 us   |                          }
 2)               |                          __ext4_journal_stop() {
 2)               |                            jbd2_journal_stop() {
 2)               |                              __wake_up() {
 2)               |                                __wake_up_common_lock() {
 2)   0.222 us    |                                  _raw_spin_lock_irqsave();
 2)   0.218 us    |                                  __wake_up_common();
 2)   0.216 us    |                                  _raw_spin_unlock_irqrestore();
 2)   1.485 us    |                                }
 2)   1.902 us    |                              }
 2)   0.347 us    |                              kmem_cache_free();
 2)   3.030 us    |                            }
 2)   3.521 us    |                          }
 2) + 40.202 us   |                        }
 2) + 40.826 us   |                      }
 2) + 41.419 us   |                    }
 2)   0.204 us    |                    __mnt_drop_write_file();
 2) + 44.224 us   |                  }
 2)               |                  generic_file_direct_write() {
 2)   0.337 us    |                    filemap_range_has_page();
 2)   0.336 us    |                    invalidate_inode_pages2_range();
 2)               |                    ext4_direct_IO() {
 2)               |                      __blockdev_direct_IO() {
 2)               |                        kmem_cache_alloc() {
 2)               |                          _cond_resched() {
 2)   0.210 us    |                            rcu_all_qs();
 2)   0.660 us    |                          }
 2)   0.225 us    |                          should_failslab();
 2)   0.215 us    |                          memcg_kmem_put_cache();
 2)   2.042 us    |                        }
 2)   0.229 us    |                        blk_start_plug();
 2)               |                        get_user_pages_fast() {
 2)               |                          gup_pgd_range() {
 2)   0.226 us    |                            pud_huge();
 2)   0.212 us    |                            pmd_huge();
 2)   1.862 us    |                          }
 2)   2.330 us    |                        }
 2)               |                        ext4_dio_get_block_unwritten_async() {
 2)               |                          ext4_get_block_trans() {
 2)               |                            ext4_meta_trans_blocks() {
 2)   0.231 us    |                              ext4_ext_index_trans_blocks();
 2)   0.775 us    |                            }
 2)               |                            __ext4_journal_start_sb() {
 2)               |                              ext4_journal_check_start() {
 2)               |                                _cond_resched() {
 2)   0.210 us    |                                  rcu_all_qs();
 2)   0.636 us    |                                }
 2)   1.078 us    |                              }
 2)               |                              jbd2__journal_start() {
 2)               |                                kmem_cache_alloc() {
 2)               |                                  _cond_resched() {
 2)   0.211 us    |                                    rcu_all_qs();
 2)   0.643 us    |                                  }
 2)   0.216 us    |                                  should_failslab();
 2)   0.223 us    |                                  memcg_kmem_put_cache();
 2)   1.993 us    |                                }
 2)               |                                start_this_handle() {
 2)   0.213 us    |                                  _raw_read_lock();
 2)   0.237 us    |                                  add_transaction_credits();
 2)   1.139 us    |                                }
 2)   3.808 us    |                              }
 2)   5.548 us    |                            }
 2)               |                            _ext4_get_block() {
 2)               |                              ext4_map_blocks() {
 2)               |                                ext4_es_lookup_extent() {
 2)   0.211 us    |                                  _raw_read_lock();
 2)   0.686 us    |                                }
 2)               |                                __check_block_validity.constprop.0() {
 2)   0.248 us    |                                  ext4_data_block_valid();
 2)   0.673 us    |                                }
 2)   2.023 us    |                              }
 2)   0.177 us    |                              ext4_update_bh_state();
 2)   2.809 us    |                            }
 2)               |                            __ext4_journal_stop() {
 2)               |                              jbd2_journal_stop() {
 2)               |                                __wake_up() {
 2)               |                                  __wake_up_common_lock() {
 2)   0.219 us    |                                    _raw_spin_lock_irqsave();
 2)   0.236 us    |                                    __wake_up_common();
 2)   0.226 us    |                                    _raw_spin_unlock_irqrestore();
 2)   1.539 us    |                                  }
 2)   1.950 us    |                                }
 2)   0.191 us    |                                kmem_cache_free();
 2)   2.770 us    |                              }
 2)   3.191 us    |                            }
 2) + 13.443 us   |                          }
 2) + 13.901 us   |                        }
 2)               |                        bio_alloc_bioset() {
 2)               |                          mempool_alloc() {
 2)               |                            _cond_resched() {
 2)   0.215 us    |                              rcu_all_qs();
 2)   0.655 us    |                            }
 2)               |                            mempool_alloc_slab() {
 2)               |                              kmem_cache_alloc() {
 2)   0.184 us    |                                should_failslab();
 2)   0.171 us    |                                memcg_kmem_put_cache();
 2)   0.958 us    |                              }
 2)   1.372 us    |                            }
 2)   2.724 us    |                          }
 2)   0.229 us    |                          bio_init();
 2)               |                          bvec_alloc() {
 2)               |                            kmem_cache_alloc() {
 2)   0.217 us    |                              should_failslab();
 2)   0.228 us    |                              memcg_kmem_put_cache();
 2)   1.259 us    |                            }
 2)   1.723 us    |                          }
 2)   5.627 us    |                        }
 2)               |                        bio_associate_blkg() {
 2)   0.202 us    |                          kthread_blkcg();
 2)               |                          bio_associate_blkg_from_css() {
 2)   0.313 us    |                            __bio_associate_blkg.isra.0();
 2)   0.761 us    |                          }
 2)   1.676 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.220 us    |                          __bio_try_merge_page();
 2)   0.238 us    |                          __bio_add_page();
 2)   1.092 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.250 us    |                          __bio_try_merge_page();
 2)   0.229 us    |                          __bio_add_page();
 2)   1.132 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.174 us    |                          __bio_try_merge_page();
 2)   0.220 us    |                          __bio_add_page();
 2)   0.965 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.220 us    |                          __bio_try_merge_page();
 2)   0.218 us    |                          __bio_add_page();
 2)   1.086 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.218 us    |                          __bio_try_merge_page();
 2)   0.226 us    |                          __bio_add_page();
 2)   1.082 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.190 us    |                          __bio_try_merge_page();
 2)   0.224 us    |                          __bio_add_page();
 2)   1.014 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.218 us    |                          __bio_try_merge_page();
 2)   0.211 us    |                          __bio_add_page();
 2)   1.066 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.227 us    |                          __bio_try_merge_page();
 2)   0.213 us    |                          __bio_add_page();
 2)   1.039 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.230 us    |                          __bio_try_merge_page();
 2)   0.218 us    |                          __bio_add_page();
 2)   1.073 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.220 us    |                          __bio_try_merge_page();
 2)   0.211 us    |                          __bio_add_page();
 2)   1.081 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.229 us    |                          __bio_try_merge_page();
 2)   0.171 us    |                          __bio_add_page();
 2)   0.996 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.224 us    |                          __bio_try_merge_page();
 2)   0.218 us    |                          __bio_add_page();
 2)   1.073 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.222 us    |                          __bio_try_merge_page();
 2)   0.224 us    |                          __bio_add_page();
 2)   1.091 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.176 us    |                          __bio_try_merge_page();
 2)   0.224 us    |                          __bio_add_page();
 2)   1.009 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.224 us    |                          __bio_try_merge_page();
 2)   0.227 us    |                          __bio_add_page();
 2)   1.105 us    |                        }
 2)               |                        bio_add_page() {
 2)   0.222 us    |                          __bio_try_merge_page();
 2)   0.221 us    |                          __bio_add_page();
 2)   1.139 us    |                        }
 2)   0.216 us    |                        _raw_spin_lock_irqsave();
 2)   0.356 us    |                        _raw_spin_unlock_irqrestore();
 2)               |                        submit_bio() {
 2)               |                          generic_make_request() {
 2)               |                            generic_make_request_checks() {
 2)               |                              _cond_resched() {
 2)   0.224 us    |                                rcu_all_qs();
 2)   0.653 us    |                              }
 2)               |                              bio_endio() {
 2)               |                                __rq_qos_done_bio() {
 2)               |                                  blkcg_iolatency_done_bio() {
 2)   0.272 us    |                                    ktime_get();
 2)   0.738 us    |                                  }
 2)   1.264 us    |                                }
 2)               |                                dio_bio_end_aio() {
 2)               |                                  dio_bio_complete() {
 2)               |                                    bio_release_pages() {
 2)   0.433 us    |                                      bio_release_pages.part.0();
 2)   0.870 us    |                                    }
 2)               |                                    bio_put() {
 2)               |                                      bio_free() {
 2)               |                                        bvec_free() {
 2)   0.237 us    |                                          kmem_cache_free();
 2)   0.696 us    |                                        } /* bvec_free */
 2)               |                                        mempool_free() {
 2)               |                                          mempool_free_slab() {
 2)   0.237 us    |                                            kmem_cache_free();
 2)   0.619 us    |                                          }
 2)   1.002 us    |                                        }
 2)   2.337 us    |                                      }
 2)   2.770 us    |                                    }
 2)   4.270 us    |                                  }
 2)   0.216 us    |                                  _raw_spin_lock_irqsave();
 2)   0.227 us    |                                  _raw_spin_unlock_irqrestore();
 2)   5.584 us    |                                }
 2)   7.567 us    |                              }
 2)   8.915 us    |                            }
 2)   9.398 us    |                          }
 2)   9.849 us    |                        }
 2)               |                        blk_finish_plug() {
 2)   0.246 us    |                          blk_flush_plug_list();
 2)   0.701 us    |                        }
 2)   0.227 us    |                        _raw_spin_lock_irqsave();
 2)   0.225 us    |                        _raw_spin_unlock_irqrestore();
 2)               |                        dio_complete() {
 2)   0.224 us    |                          ext4_end_io_dio();
 2)   0.259 us    |                          kmem_cache_free();
 2)   1.215 us    |                        }
 2) + 63.921 us   |                      }
 2)               |                      wake_up_bit() {
 2)   0.221 us    |                        __wake_up_bit();
 2)   0.703 us    |                      }
 2) + 65.463 us   |                    }
 2) + 67.271 us   |                  }
 2) ! 112.659 us  |                }
 2)   0.219 us    |                up_write();
 2) ! 117.784 us  |              }
 2)               |              io_complete_rw() {
 2)               |                kiocb_end_write.isra.0.part.0() {
 2)   0.242 us    |                  __sb_end_write();
 2)   0.723 us    |                }
 2)               |                io_cqring_add_event() {
 2)   0.221 us    |                  _raw_spin_lock_irqsave();
 2)   0.254 us    |                  io_commit_cqring();
 2)   0.215 us    |                  _raw_spin_unlock_irqrestore();
 2)   0.225 us    |                  io_cqring_ev_posted();
 2)   1.958 us    |                }
 2)   0.218 us    |                io_put_req();
 2)   3.795 us    |              }
 2)   0.219 us    |              kfree();
 2) ! 130.276 us  |            }
 2) ! 130.807 us  |          }
 2)               |          io_put_req() {
 2)               |            io_free_req() {
 2)               |              __io_free_req() {
 2)               |                fput() {
 2)   0.223 us    |                  fput_many();
 2)   0.636 us    |                }
 2)   0.230 us    |                io_ring_drop_ctx_refs();
 2)   0.242 us    |                kmem_cache_free();
 2)   1.993 us    |              }
 2)   2.376 us    |            }
 2)   2.781 us    |          }
 2) ! 134.263 us  |        }
 2) ! 138.150 us  |      }
 2) ! 139.215 us  |    }
 2)   0.230 us    |    mutex_unlock();
 2)   0.234 us    |    io_cqring_wait();
 2)   0.230 us    |    io_ring_drop_ctx_refs();
 2)               |    fput() {
 2)   0.209 us    |      fput_many();
 2)   0.617 us    |    }
 2) ! 186.975 us  |  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EIO with io_uring O_DIRECT writes on ext4
  2019-07-23  8:07 EIO with io_uring O_DIRECT writes on ext4 Stefan Hajnoczi
@ 2019-07-23 15:20 ` Jens Axboe
  2019-07-23 20:07   ` Theodore Y. Ts'o
  2019-07-23 22:05   ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2019-07-23 15:20 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, Aarushi Mehta, Julia Suvorova, linux-fsdevel,
	Christoph Hellwig

On 7/23/19 2:07 AM, Stefan Hajnoczi wrote:
> Hi,
> io_uring O_DIRECT writes can fail with EIO on ext4.  Please see the
> function graph trace from Linux 5.3.0-rc1 below for details.  It was
> produced with the following qemu-io command (using Aarushi's QEMU
> patches from https://github.com/rooshm/qemu/commits/io_uring):
> 
>    $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2
> 
> This issue is specific to ext4.  XFS and the underlying LVM logical
> volume both work.
> 
> The storage configuration is an LVM logical volume (device-mapper linear
> target), on top of LUKS, on top of a SATA disk.  The logical volume's
> request_queue does not have mq_ops and this causes
> generic_make_request_checks() to fail:
> 
>    if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
>            goto not_supported;
> 
> I guess this could be worked around by deferring the request to the
> io_uring work queue to avoid REQ_NOWAIT.  But XFS handles this fine so
> how can io_uring.c detect this case cleanly or is there a bug in ext4?

I actually think it's XFS that's broken here, it's not passing down
the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
important request bit, and we just block instead of triggering the
not_supported case.

Outside of that, that case needs similar treatment to what I did for
the EAGAIN case here:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7

It was a big mistake to pass back these values in an async fashion,
and it also means that some accounting in other drivers are broken
as we can get completions without the bio actually being submitted.
This was fixed for just the EAGAIN case here:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=c9b3007feca018d3f7061f5d5a14cb00766ffe9b

but that's still broken for EOPNOTSUPP.

tldr is that we should pass these errors back sync, and it was a
mistake to EVER try and do that through ->bi_end_io(). That behavior
was introduced by:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=03a07c92a9ed9938d828ca7f1d11b8bc63a7bb89

I'll add a patch on top of my for-linus branch that makes us handle
the EOPNOTSUPP case similarly. You are right that in those cases we
should just punt to the async worker internally.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EIO with io_uring O_DIRECT writes on ext4
  2019-07-23 15:20 ` Jens Axboe
@ 2019-07-23 20:07   ` Theodore Y. Ts'o
  2019-07-23 20:11     ` Jens Axboe
  2019-07-23 22:05   ` Dave Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Theodore Y. Ts'o @ 2019-07-23 20:07 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stefan Hajnoczi, linux-block, Aarushi Mehta, Julia Suvorova,
	linux-fsdevel, Christoph Hellwig

On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote:
> 
> I actually think it's XFS that's broken here, it's not passing down
> the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
> important request bit, and we just block instead of triggering the
> not_supported case.
> 
> Outside of that, that case needs similar treatment to what I did for
> the EAGAIN case here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7
> 
> It was a big mistake to pass back these values in an async fashion,
> and it also means that some accounting in other drivers are broken
> as we can get completions without the bio actually being submitted.

Hmmm, I had been trying to track down a similar case with virtio-scsi
on top of LVM, using a Google Compute Engine VM.  In that case,
xfstests generic/471 was failing with EIO, when it would pass just
*fine* when I was using KVM with a virtio-scsi device w/o LVM.

So it sounds like that what's going on is if the device driver (or
LVM, or anything else in the storage stack) needs to do a blocking
memory allocation, and NOWAIT is requested, we'll end up returning EIO
because an asynchronous error is getting reported, where as if we
could return it synchronously, the file system could properly return
EOPNOTSUP.  Am I understanding you correctly?

I guess there's a separate question hiding here, which is whether
there's a way to allow dm-linear or dm-crypt to handle NOWAIT requests
without needing to block.

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EIO with io_uring O_DIRECT writes on ext4
  2019-07-23 20:07   ` Theodore Y. Ts'o
@ 2019-07-23 20:11     ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-07-23 20:11 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Stefan Hajnoczi, linux-block, Aarushi Mehta, Julia Suvorova,
	linux-fsdevel, Christoph Hellwig

On 7/23/19 2:07 PM, Theodore Y. Ts'o wrote:
> On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote:
>>
>> I actually think it's XFS that's broken here, it's not passing down
>> the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
>> important request bit, and we just block instead of triggering the
>> not_supported case.
>>
>> Outside of that, that case needs similar treatment to what I did for
>> the EAGAIN case here:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7
>>
>> It was a big mistake to pass back these values in an async fashion,
>> and it also means that some accounting in other drivers are broken
>> as we can get completions without the bio actually being submitted.
> 
> Hmmm, I had been trying to track down a similar case with virtio-scsi
> on top of LVM, using a Google Compute Engine VM.  In that case,
> xfstests generic/471 was failing with EIO, when it would pass just
> *fine* when I was using KVM with a virtio-scsi device w/o LVM.
> 
> So it sounds like that what's going on is if the device driver (or
> LVM, or anything else in the storage stack) needs to do a blocking
> memory allocation, and NOWAIT is requested, we'll end up returning EIO
> because an asynchronous error is getting reported, where as if we
> could return it synchronously, the file system could properly return
> EOPNOTSUP.  Am I understanding you correctly?

Yes, that's exactly right. The EAGAIN/EOPNOTSUPP for blocking reasons
should be returned sync, so ext4/xfs can return that error as well. This
enables us to punt the IO appropriately to a workqueue. It should NOT
result in being translated into an EIO to the application.

Working on this change right now...

> I guess there's a separate question hiding here, which is whether
> there's a way to allow dm-linear or dm-crypt to handle NOWAIT requests
> without needing to block.

That's certainly the next step. Right now we just guard this with
queue_is_mq(), but in reality it should be a queue flag so that stacking
drivers can opt in when they have been vetted/changed to support NOWAIT
properly.

But wading through this stuff is leaving me a little disappointed in the
level of NOWAIT support we have right now. It seems mostly half-assed
and there are plenty of cases where we don't really do the right thing.
I'll try and work through all that, to the extent possible.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EIO with io_uring O_DIRECT writes on ext4
  2019-07-23 15:20 ` Jens Axboe
  2019-07-23 20:07   ` Theodore Y. Ts'o
@ 2019-07-23 22:05   ` Dave Chinner
  2019-07-23 22:19     ` Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2019-07-23 22:05 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stefan Hajnoczi, linux-block, Aarushi Mehta, Julia Suvorova,
	linux-fsdevel, Christoph Hellwig

On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote:
> On 7/23/19 2:07 AM, Stefan Hajnoczi wrote:
> > Hi,
> > io_uring O_DIRECT writes can fail with EIO on ext4.  Please see the
> > function graph trace from Linux 5.3.0-rc1 below for details.  It was
> > produced with the following qemu-io command (using Aarushi's QEMU
> > patches from https://github.com/rooshm/qemu/commits/io_uring):
> > 
> >    $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2
> > 
> > This issue is specific to ext4.  XFS and the underlying LVM logical
> > volume both work.
> > 
> > The storage configuration is an LVM logical volume (device-mapper linear
> > target), on top of LUKS, on top of a SATA disk.  The logical volume's
> > request_queue does not have mq_ops and this causes
> > generic_make_request_checks() to fail:
> > 
> >    if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
> >            goto not_supported;
> > 
> > I guess this could be worked around by deferring the request to the
> > io_uring work queue to avoid REQ_NOWAIT.  But XFS handles this fine so
> > how can io_uring.c detect this case cleanly or is there a bug in ext4?
> 
> I actually think it's XFS that's broken here, it's not passing down
> the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
> important request bit, and we just block instead of triggering the
> not_supported case.

I wouldn't say XFS is broken, we didn't implement it because it
meant that IOCB_NOWAIT did not work on all block devices. i.e. the
biggest issue IOCB_NOWAIT is avoiding is blocking on filesytem
locks, and blocking in the request queue was largely just noise for
the applications RWF_NOWAIT was initially implemented for.

IOWs, if you have the wrong hardware, you can't use RWF_NOWAIT at
all, despite it providing massive benefits for AIO at the filesystem
level. Hence to say how IOMAP_NOWAIT is implemented (i.e. does not
set REQ_NOWAIT) is broken ignores the fact that RWF_NOWAIT was
originally intended as a "don't block on contended filesystem locks" 
directive, not as something that is conditional on block layer
functionality...


> Outside of that, that case needs similar treatment to what I did for
> the EAGAIN case here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7

I don't see REQ_NOWAIT_INLINE in 5.3-rc1.

However, nobody checks the cookie returned by submit_bio() for error
status. It's only a recent addition for block polling and so the
only time it is checked is if we are polling and it gets passed to
blk_poll when RWF_HIPRI is set. So this change, by itself, doesn't
solve any problem.

In fact, the way the direct IO code is right now a multi-bio DIO
submission will overwrite the submit cookie repeatedly and hence may
end up only doing partial submission but still report success
because the last bio in the chain didn't block and REQ_NOWAIT_INLINE
doesn't actually mark the bio itself with an error, so the bio
completion function won't report it, either.

> It was a big mistake to pass back these values in an async fashion,

IMO, the big mistake was to have only some block device
configurations support REQ_NOWAIT - that was an expedient hack to
get block layer polling into the kernel fast. The way the error is
passed back is largely irrelevant from that perspective, and
REQ_NOWAIT_INLINE doesn't resolve this problem at all.

Indeed, I think REQ_NOWAIT is largely redundant, because if we care
about IO submission blocking because the request queue is full, then
we simply use the existing bdi_congested() interface to check.
That works for all types of block devices - not just random mq
devices - and matches code we have all over the kernel to avoid
blocking async IO submission on congested reuqest queues...

So, yeah, I think REQ_NOWAIT needs to die and the direct IO callers
should do just congestion checks on IOCB_NOWAIT/IOMAP_NOWAIT rather
than try to add new error reporting mechanisms into bios that lots
of code will need to be changed to support....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EIO with io_uring O_DIRECT writes on ext4
  2019-07-23 22:05   ` Dave Chinner
@ 2019-07-23 22:19     ` Jens Axboe
  2019-07-24  2:23       ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2019-07-23 22:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Stefan Hajnoczi, linux-block, Aarushi Mehta, Julia Suvorova,
	linux-fsdevel, Christoph Hellwig

On 7/23/19 4:05 PM, Dave Chinner wrote:
> On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote:
>> On 7/23/19 2:07 AM, Stefan Hajnoczi wrote:
>>> Hi,
>>> io_uring O_DIRECT writes can fail with EIO on ext4.  Please see the
>>> function graph trace from Linux 5.3.0-rc1 below for details.  It was
>>> produced with the following qemu-io command (using Aarushi's QEMU
>>> patches from https://github.com/rooshm/qemu/commits/io_uring):
>>>
>>>     $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2
>>>
>>> This issue is specific to ext4.  XFS and the underlying LVM logical
>>> volume both work.
>>>
>>> The storage configuration is an LVM logical volume (device-mapper linear
>>> target), on top of LUKS, on top of a SATA disk.  The logical volume's
>>> request_queue does not have mq_ops and this causes
>>> generic_make_request_checks() to fail:
>>>
>>>     if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
>>>             goto not_supported;
>>>
>>> I guess this could be worked around by deferring the request to the
>>> io_uring work queue to avoid REQ_NOWAIT.  But XFS handles this fine so
>>> how can io_uring.c detect this case cleanly or is there a bug in ext4?
>>
>> I actually think it's XFS that's broken here, it's not passing down
>> the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
>> important request bit, and we just block instead of triggering the
>> not_supported case.
> 
> I wouldn't say XFS is broken, we didn't implement it because it
> meant that IOCB_NOWAIT did not work on all block devices. i.e. the
> biggest issue IOCB_NOWAIT is avoiding is blocking on filesytem
> locks, and blocking in the request queue was largely just noise for
> the applications RWF_NOWAIT was initially implemented for.

Blocking due to resource starvation (like requests) is definitely not
just noise, in some case it's cases it's an equal or larger amount of
time.

> IOWs, if you have the wrong hardware, you can't use RWF_NOWAIT at

Define wrong...

> all, despite it providing massive benefits for AIO at the filesystem
> level. Hence to say how IOMAP_NOWAIT is implemented (i.e. does not
> set REQ_NOWAIT) is broken ignores the fact that RWF_NOWAIT was
> originally intended as a "don't block on contended filesystem locks"
> directive, not as something that is conditional on block layer
> functionality...

RWF_NOWAIT should have nothing to do with the block layer at all, each
storage layer would have to support it.

>> Outside of that, that case needs similar treatment to what I did for
>> the EAGAIN case here:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7
> 
> I don't see REQ_NOWAIT_INLINE in 5.3-rc1.

That's because it isn't in 5.3-rc1 :-)

> However, nobody checks the cookie returned by submit_bio() for error
> status. It's only a recent addition for block polling and so the
> only time it is checked is if we are polling and it gets passed to
> blk_poll when RWF_HIPRI is set. So this change, by itself, doesn't
> solve any problem.

REQ_NOWAIT wasn't introduced as part of the polling work, it was done
earlier for libaio.

You don't have to check the cookie for REQ_NOWAIT, you'd only have to
check it for REQ_NOWAIT_INLINE.

> In fact, the way the direct IO code is right now a multi-bio DIO
> submission will overwrite the submit cookie repeatedly and hence may
> end up only doing partial submission but still report success
> because the last bio in the chain didn't block and REQ_NOWAIT_INLINE
> doesn't actually mark the bio itself with an error, so the bio
> completion function won't report it, either.

Agree, details around multi-bio was largely ignored for the polling,
since multi-bio implies larger IO and that was somewhat ignored (as less
interesting).

>> It was a big mistake to pass back these values in an async fashion,
> 
> IMO, the big mistake was to have only some block device
> configurations support REQ_NOWAIT - that was an expedient hack to
> get block layer polling into the kernel fast. The way the error is
> passed back is largely irrelevant from that perspective, and
> REQ_NOWAIT_INLINE doesn't resolve this problem at all.

Again, it has nothing to do with polling. But yes, going forward it
needs to get divorced from being tied to the fact that the queue is
blk-mq or not, and stacking drivers should opt-in to supporting it.

> Indeed, I think REQ_NOWAIT is largely redundant, because if we care
> about IO submission blocking because the request queue is full, then
> we simply use the existing bdi_congested() interface to check.
> That works for all types of block devices - not just random mq
> devices - and matches code we have all over the kernel to avoid
> blocking async IO submission on congested reuqest queues...

No... The congestion logic is silly and I think a design mistake from
way back when. There's no race free way to answer that question and
utilize the answer safely, it can only truly be done as part of the
resource allocation when the IO is going through the stack. That's
looking at the simple case of just a basic storage setup, with stacked
layers it becomes even worse and a nightmare to support. And you still
get the racy answer.

> So, yeah, I think REQ_NOWAIT needs to die and the direct IO callers
> should do just congestion checks on IOCB_NOWAIT/IOMAP_NOWAIT rather
> than try to add new error reporting mechanisms into bios that lots
> of code will need to be changed to support....

See above, totally disagree on that conclusion.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EIO with io_uring O_DIRECT writes on ext4
  2019-07-23 22:19     ` Jens Axboe
@ 2019-07-24  2:23       ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2019-07-24  2:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stefan Hajnoczi, linux-block, Aarushi Mehta, Julia Suvorova,
	linux-fsdevel, Christoph Hellwig

On Tue, Jul 23, 2019 at 04:19:31PM -0600, Jens Axboe wrote:
> On 7/23/19 4:05 PM, Dave Chinner wrote:
> > On Tue, Jul 23, 2019 at 09:20:05AM -0600, Jens Axboe wrote:
> >> On 7/23/19 2:07 AM, Stefan Hajnoczi wrote:
> >>> Hi,
> >>> io_uring O_DIRECT writes can fail with EIO on ext4.  Please see the
> >>> function graph trace from Linux 5.3.0-rc1 below for details.  It was
> >>> produced with the following qemu-io command (using Aarushi's QEMU
> >>> patches from https://github.com/rooshm/qemu/commits/io_uring):
> >>>
> >>>     $ qemu-io --cache=none --aio=io_uring --format=qcow2 -c 'writev -P 185 131072 65536' tests/qemu-iotests/scratch/test.qcow2
> >>>
> >>> This issue is specific to ext4.  XFS and the underlying LVM logical
> >>> volume both work.
> >>>
> >>> The storage configuration is an LVM logical volume (device-mapper linear
> >>> target), on top of LUKS, on top of a SATA disk.  The logical volume's
> >>> request_queue does not have mq_ops and this causes
> >>> generic_make_request_checks() to fail:
> >>>
> >>>     if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q))
> >>>             goto not_supported;
> >>>
> >>> I guess this could be worked around by deferring the request to the
> >>> io_uring work queue to avoid REQ_NOWAIT.  But XFS handles this fine so
> >>> how can io_uring.c detect this case cleanly or is there a bug in ext4?
> >>
> >> I actually think it's XFS that's broken here, it's not passing down
> >> the IOCB_NOWAIT -> IOMAP_NOWAIT -> REQ_NOWAIT. This means we lose that
> >> important request bit, and we just block instead of triggering the
> >> not_supported case.
> > 
> > I wouldn't say XFS is broken, we didn't implement it because it
> > meant that IOCB_NOWAIT did not work on all block devices. i.e. the
> > biggest issue IOCB_NOWAIT is avoiding is blocking on filesytem
> > locks, and blocking in the request queue was largely just noise for
> > the applications RWF_NOWAIT was initially implemented for.
> 
> Blocking due to resource starvation (like requests) is definitely not
> just noise, in some case it's cases it's an equal or larger amount of
> time.

Yes, it can be, but that wasn't what RWF_NOWAIT was targetted at -
it was for non-blocking AIO-based applications like the seastar DB
which was primarily deployed on XFS.

Indeed, it was largely for avoiding latencies in filesystem extent
allocation - having a filesystem block on space allocation means it
might issue and wait synchronously for tens of metadata IOs (which
may all throttled in the block layer individually), block on memory
reclaim trying to allocate the memory for all that metadata, block
on log space whihc might require thousands of IOs to be issued and
complete, etc.

i.e.  the latency of space allocation during IO submission can
be orders of magnitude worse than just submission of a single bio
that hits a congested queue. Blocking in the request queue for a
single IO dispatch is just noise compared to the long tail latencies
RWF_NOWAIT avoids in the filesystem layers.

> > IOWs, if you have the wrong hardware, you can't use RWF_NOWAIT at
> 
> Define wrong...

Wrong as in any hardware that doesn't have a blk_mq driver
interface.

Which means it's not just hardware that is a problem here, it's also
stacked software device configs (e.g. various dm/md configs,
loopback, etc) that can cause RWF_NOWAIT IO to mysteriously fail...

> > level. Hence to say how IOMAP_NOWAIT is implemented (i.e. does not
> > set REQ_NOWAIT) is broken ignores the fact that RWF_NOWAIT was
> > originally intended as a "don't block on contended filesystem locks"
> > directive, not as something that is conditional on block layer
> > functionality...
> 
> RWF_NOWAIT should have nothing to do with the block layer at all, each
> storage layer would have to support it.

Jens, that's -exactly- the problem I'm telling you exists - the
fs/direct-io.c code path sets REQ_NOWAIT unconditionally when
RWF_NOWAIT is set in userspace via:

RWF_NOWAIT -> IOCB_NOWAIT -> fs/direct-io.c -> REQ_NOWAIT

And hence userspace IO fails in very unpredictable ways. Indeed, it
only does this for write direct IOs - why would you want to not
block DIO writes but allow blocking DIO reads is beyond me, but
that's what the code does, and that's why it breaks....

> >> Outside of that, that case needs similar treatment to what I did for
> >> the EAGAIN case here:
> >>
> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=893a1c97205a3ece0cbb3f571a3b972080f3b4c7
> > 
> > I don't see REQ_NOWAIT_INLINE in 5.3-rc1.
> 
> That's because it isn't in 5.3-rc1 :-)
> 
> > However, nobody checks the cookie returned by submit_bio() for error
> > status. It's only a recent addition for block polling and so the
> > only time it is checked is if we are polling and it gets passed to
> > blk_poll when RWF_HIPRI is set. So this change, by itself, doesn't
> > solve any problem.
> 
> REQ_NOWAIT wasn't introduced as part of the polling work, it was done
> earlier for libaio.

I know that.

I was talking about the introduction of returning of errors from
submit_bio() - that came with RWF_HIPRI, not the introduction of
REQ_NOWAIT. Two different things altogether, and that means there is
no support anywhere for REQ_NOWAIT_INLINE error handling except
the block polling code.

> > In fact, the way the direct IO code is right now a multi-bio DIO
> > submission will overwrite the submit cookie repeatedly and hence may
> > end up only doing partial submission but still report success
> > because the last bio in the chain didn't block and REQ_NOWAIT_INLINE
> > doesn't actually mark the bio itself with an error, so the bio
> > completion function won't report it, either.
> 
> Agree, details around multi-bio was largely ignored for the polling,
> since multi-bio implies larger IO and that was somewhat ignored (as less
> interesting).

So it was never tested and so nobody developing that functionality
noticed that block device congestion can result in random data
corruption when RWF_HIPRI is used.

I have absolutely nothing good to say in response to this
relevation.

> > Indeed, I think REQ_NOWAIT is largely redundant, because if we care
> > about IO submission blocking because the request queue is full, then
> > we simply use the existing bdi_congested() interface to check.
> > That works for all types of block devices - not just random mq
> > devices - and matches code we have all over the kernel to avoid
> > blocking async IO submission on congested reuqest queues...
> 
> No... The congestion logic is silly and I think a design mistake from
> way back when. There's no race free way to answer that question and
> utilize the answer safely,

We don't care that it's racy - RWF_NOWAIT is best effort only, so if
we occasionally get it wrong and block it's not the end of the
world. Perfect is the enemy of good; having something that is good
that works for everything is far, far better than having something
perfect that only works for certain configurations.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-24  2:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23  8:07 EIO with io_uring O_DIRECT writes on ext4 Stefan Hajnoczi
2019-07-23 15:20 ` Jens Axboe
2019-07-23 20:07   ` Theodore Y. Ts'o
2019-07-23 20:11     ` Jens Axboe
2019-07-23 22:05   ` Dave Chinner
2019-07-23 22:19     ` Jens Axboe
2019-07-24  2:23       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.