All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/80] staging: lustre: majority of missing fixes for 2.6 release
@ 2016-08-16 20:18 ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

This is the combination of two previous patch sets that cover missing
fixes for lustre version 2.5.56 to the 2.6 release minus a few patches that
have some bug regressions. Also in this update are the latest LNet fixes.

Alexander Boyko (1):
  staging: lustre: lnet: make connection more stable with packet loss

Andriy Skulysh (1):
  staging: lustre: ptlrpc: request gets stuck in UNREGISTERING phase

Brian Behlendorf (1):
  staging: lustre: obd: limit lu_object cache

Chris Horn (1):
  staging: lustre: ptlrpc: Early replies need to honor at_max

Christopher J. Morrone (1):
  staging: lustre: Remove static declaration in anonymous union

Dmitry Eremin (2):
  staging: lustre: lmv: fix issue found by Klocwork Insight tool
  staging: lustre: lmv: build error with gcc 4.7.0 20110509

Doug Oucharek (3):
  staging: lustre: lnet: Do not drop message when shutting down LNet
  staging: lustre: lnet: Correct position of lnet_ni_decref()
  staging: lustre: lnet: Stop Infinite CON RACE Condition

Emoly Liu (1):
  staging: lustre: ldlm: improve ldlm_lock_create() return value

Fan Yong (6):
  staging: lustre: obdclass: bug fixes for lu_device_type handling
  staging: lustre: llite: enable clients to inject error for lfsck
  staging: lustre: obdclass: unified flow control interfaces
  staging: lustre: reorder LOV_MAGIC_* definition
  staging: lustre: lov: new pattern flag for partially repaired file
  staging: lustre: lmv: build master LMV EA dynamically build via readdir

Gregoire Pichon (1):
  staging: lustre: llite: fix inconsistencies of root squash feature

Hongchao Zhang (2):
  staging: lustre: llite: set dir LOV xattr length variable
  staging: lustre: osc: Automatically increase the max_dirty_mb

James Simmons (3):
  staging: lustre: obdclass: compile issues with variable not being initialized
  staging: lustre: include: fix one off errors in lustre_id.h
  staging: lustre: llite: remove assert for acl refcount

Jian Yu (1):
  staging: lustre: obdclass: fix lmd_parse() to handle comma-separated NIDs

Jinshan Xiong (7):
  staging: lustre: osc: allow to call brw_commit() multiple times
  staging: lustre: llite: avoid a deadlock in page write
  staging: lustre: lov: handle the case of stripe size is not power 2
  staging: lustre: llite: Fix the deadlock in balance_dirty_pages()
  staging: lustre: llite: Change readdir BRW metrics
  staging: lustre: clio: Reduce memory overhead of per-page allocation
  staging: lustre: osc: revise unstable pages accounting

John L. Hammond (13):
  staging: lustre: mdc: fixup MDS_SWAP_LAYOUTS ELC handling
  staging: lustre: don't need to const __u64 parameters for lustre_idl.h
  staging: lustre: const correct FID/OSTID/... helpers
  staging: lustre: use bool for several function in lustre_idl.h/lustre_fid.h
  staging: lustre: simplify inline functions in lustre_fid.h
  staging: lustre: lmv: access lum_stripe_offset as little endian
  staging: lustre: lmv: cleanup req in lmv_getattr_name()
  staging: lustre: lmv: rename request to preq in lmv_getattr_name()
  staging: lustre: move ioctls to lustre_ioctl.h
  staging: lustre: llite: validate names
  staging: lustre: uapi: reduce scope of lustre_idl.h
  staging: lustre: mdt: add mbo_ prefix to members of struct mdt_body
  staging: lustre: obd: validate open handle cookies

Lai Siyao (2):
  staging: lustre: fid: do open-by-fid by default
  staging: lustre: ptlrpc: add OBD_CONNECT_UNLINK_CLOSE flag

Liang Zhen (1):
  staging: lustre: lnet: lock improvement for ko2iblnd

Mikhail Pershin (1):
  staging: lustre: llog: keep llog ctxt indices constant

Nathaniel Clark (1):
  staging: lustre: lmv: Ensure lmv_intent_lookup cleans up reqp

Niu Yawei (1):
  staging: lustre: obd: rename lsr_padding to lsr_valid

Patrick Farrell (1):
  staging: lustre: fld: add fld description documentation

Ryan Haasken (1):
  staging: lustre: libcfs: Only dump log once per sec. to avoid EEXIST

Vitaly Fertman (1):
  staging: lustre: ldlm: flock completion fixes.

wang di (27):
  staging: lustre: llite: add md_op_data parameter to ll_get_dir_page
  staging: lustre: llite: remove comment from ll_dir_read
  staging: lustre: llite: style cleanup for llite_internal.h
  staging: lustre: llite: pass inode to ll_release_page
  staging: lustre: llite: change remove parameter to bool
  staging: lustre: mdc: don't take rpc lock for readdir case
  staging: lustre: lmv: remove unused lmv_get_mea function
  staging: lustre: lmv: remove duplicate MAX_HASH_*
  staging: lustre: lmv: change handling of lmv striping information
  staging: lustre: lmv: remove lmv_get_easize
  staging: lustre: lmv: replace obd_free_memmd with lmv_free_memmd
  staging: lustre: create striped directory
  staging: lustre: llite: fix "getdirstripe" to show stripe info
  staging: lustre: delete striped directory
  staging: lustre: add ability to migrate inodes.
  staging: lustre: llite: a few fixes for migration.
  staging: lustre: lmv: lookup remote migrating object in LMV
  staging: lustre: llite: add error handler in inode prepare phase
  staging: lustre: lmv: separate master object with master stripe
  staging: lustre: llite: a few fixes about readdir of striped dir.
  staging: lustre: lmv: validate lock with correct stripe FID
  staging: lustre: lmv: Match MDT where the FID locates first
  staging: lustre: llite: use the correct mode for striped directory
  staging: lustre: mdc: always use D_INFO for debug info when mdc_put_rpc_lock fails
  staging: lustre: lmv: try all stripes for unknown hash functions
  staging: lustre: obd: implement md_read_page
  staging: lustre: llite: set op_max_pages

 .../lustre/include/linux/libcfs/libcfs_fail.h      |    3 +
 .../lustre/include/linux/libcfs/libcfs_private.h   |    9 -
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    2 +
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   78 +-
 drivers/staging/lustre/lnet/libcfs/debug.c         |    9 +-
 drivers/staging/lustre/lnet/libcfs/fail.c          |    6 +-
 drivers/staging/lustre/lnet/libcfs/libcfs_string.c |    2 -
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    3 +
 drivers/staging/lustre/lustre/fld/fld_internal.h   |   19 +
 drivers/staging/lustre/lustre/fld/fld_request.c    |   55 +--
 drivers/staging/lustre/lustre/include/cl_object.h  |   75 +-
 .../staging/lustre/lustre/include/lprocfs_status.h |    6 +
 drivers/staging/lustre/lustre/include/lu_object.h  |   19 +-
 .../lustre/lustre/include/lustre/lustre_idl.h      |  341 +++++----
 .../lustre/lustre/include/lustre/lustre_ioctl.h    |  412 ++++++++++
 .../lustre/lustre/include/lustre/lustre_user.h     |   73 +-
 drivers/staging/lustre/lustre/include/lustre_dlm.h |   11 +-
 .../lustre/lustre/include/lustre_dlm_flags.h       |   36 +-
 drivers/staging/lustre/lustre/include/lustre_fid.h |   30 +-
 .../staging/lustre/lustre/include/lustre_handles.h |    3 +-
 drivers/staging/lustre/lustre/include/lustre_lib.h |  286 +-------
 .../staging/lustre/lustre/include/lustre_lite.h    |    1 -
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   67 ++-
 drivers/staging/lustre/lustre/include/lustre_mdc.h |   23 +-
 drivers/staging/lustre/lustre/include/lustre_mds.h |    3 -
 drivers/staging/lustre/lustre/include/lustre_ver.h |   13 +-
 drivers/staging/lustre/lustre/include/obd.h        |  123 ++--
 drivers/staging/lustre/lustre/include/obd_class.h  |   66 ++-
 .../staging/lustre/lustre/include/obd_support.h    |    9 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_flock.c    |  102 ++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |   16 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   53 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   29 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |   33 +-
 drivers/staging/lustre/lustre/llite/dir.c          |  382 ++++++---
 drivers/staging/lustre/lustre/llite/file.c         |  311 ++++++--
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |    2 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   57 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  434 ++++++++--
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |   57 +-
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |  109 +++-
 drivers/staging/lustre/lustre/llite/namei.c        |  162 +++--
 drivers/staging/lustre/lustre/llite/rw.c           |    4 +
 drivers/staging/lustre/lustre/llite/rw26.c         |    7 +
 drivers/staging/lustre/lustre/llite/statahead.c    |   46 +-
 drivers/staging/lustre/lustre/llite/symlink.c      |    6 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |    6 -
 drivers/staging/lustre/lustre/llite/vvp_internal.h |    6 +-
 drivers/staging/lustre/lustre/llite/vvp_req.c      |    2 +
 drivers/staging/lustre/lustre/llite/xattr.c        |   29 +-
 drivers/staging/lustre/lustre/llite/xattr_cache.c  |   12 +-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |  339 +++++++--
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |  127 ++-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  865 ++++++++++++++++----
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |    4 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |   22 +-
 drivers/staging/lustre/lustre/lov/lov_obd.c        |    1 +
 drivers/staging/lustre/lustre/lov/lov_object.c     |    1 +
 drivers/staging/lustre/lustre/lov/lov_page.c       |   12 +-
 drivers/staging/lustre/lustre/mdc/lproc_mdc.c      |   17 +-
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    7 +-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |  200 ++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   69 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |   12 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |   73 +-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   10 +-
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |   12 +-
 drivers/staging/lustre/lustre/obdclass/class_obd.c |   10 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |  134 +++-
 .../lustre/lustre/obdclass/linux/linux-module.c    |    1 +
 drivers/staging/lustre/lustre/obdclass/llog_swab.c |    1 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |  144 ++++
 drivers/staging/lustre/lustre/obdclass/lu_object.c |  125 ++-
 .../lustre/lustre/obdclass/lustre_handles.c        |    4 +-
 .../staging/lustre/lustre/obdclass/obd_config.c    |    1 +
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |   23 +-
 .../staging/lustre/lustre/obdecho/echo_client.c    |    1 +
 drivers/staging/lustre/lustre/osc/lproc_osc.c      |   10 +-
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  135 +---
 drivers/staging/lustre/lustre/osc/osc_internal.h   |    3 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |    7 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  208 ++++-
 drivers/staging/lustre/lustre/osc/osc_request.c    |   64 +-
 drivers/staging/lustre/lustre/ptlrpc/client.c      |   12 +-
 drivers/staging/lustre/lustre/ptlrpc/import.c      |   12 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   93 ++-
 drivers/staging/lustre/lustre/ptlrpc/service.c     |   18 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |  346 +++++----
 88 files changed, 4588 insertions(+), 2183 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 00/80] staging: lustre: majority of missing fixes for 2.6 release
@ 2016-08-16 20:18 ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

This is the combination of two previous patch sets that cover missing
fixes for lustre version 2.5.56 to the 2.6 release minus a few patches that
have some bug regressions. Also in this update are the latest LNet fixes.

Alexander Boyko (1):
  staging: lustre: lnet: make connection more stable with packet loss

Andriy Skulysh (1):
  staging: lustre: ptlrpc: request gets stuck in UNREGISTERING phase

Brian Behlendorf (1):
  staging: lustre: obd: limit lu_object cache

Chris Horn (1):
  staging: lustre: ptlrpc: Early replies need to honor at_max

Christopher J. Morrone (1):
  staging: lustre: Remove static declaration in anonymous union

Dmitry Eremin (2):
  staging: lustre: lmv: fix issue found by Klocwork Insight tool
  staging: lustre: lmv: build error with gcc 4.7.0 20110509

Doug Oucharek (3):
  staging: lustre: lnet: Do not drop message when shutting down LNet
  staging: lustre: lnet: Correct position of lnet_ni_decref()
  staging: lustre: lnet: Stop Infinite CON RACE Condition

Emoly Liu (1):
  staging: lustre: ldlm: improve ldlm_lock_create() return value

Fan Yong (6):
  staging: lustre: obdclass: bug fixes for lu_device_type handling
  staging: lustre: llite: enable clients to inject error for lfsck
  staging: lustre: obdclass: unified flow control interfaces
  staging: lustre: reorder LOV_MAGIC_* definition
  staging: lustre: lov: new pattern flag for partially repaired file
  staging: lustre: lmv: build master LMV EA dynamically build via readdir

Gregoire Pichon (1):
  staging: lustre: llite: fix inconsistencies of root squash feature

Hongchao Zhang (2):
  staging: lustre: llite: set dir LOV xattr length variable
  staging: lustre: osc: Automatically increase the max_dirty_mb

James Simmons (3):
  staging: lustre: obdclass: compile issues with variable not being initialized
  staging: lustre: include: fix one off errors in lustre_id.h
  staging: lustre: llite: remove assert for acl refcount

Jian Yu (1):
  staging: lustre: obdclass: fix lmd_parse() to handle comma-separated NIDs

Jinshan Xiong (7):
  staging: lustre: osc: allow to call brw_commit() multiple times
  staging: lustre: llite: avoid a deadlock in page write
  staging: lustre: lov: handle the case of stripe size is not power 2
  staging: lustre: llite: Fix the deadlock in balance_dirty_pages()
  staging: lustre: llite: Change readdir BRW metrics
  staging: lustre: clio: Reduce memory overhead of per-page allocation
  staging: lustre: osc: revise unstable pages accounting

John L. Hammond (13):
  staging: lustre: mdc: fixup MDS_SWAP_LAYOUTS ELC handling
  staging: lustre: don't need to const __u64 parameters for lustre_idl.h
  staging: lustre: const correct FID/OSTID/... helpers
  staging: lustre: use bool for several function in lustre_idl.h/lustre_fid.h
  staging: lustre: simplify inline functions in lustre_fid.h
  staging: lustre: lmv: access lum_stripe_offset as little endian
  staging: lustre: lmv: cleanup req in lmv_getattr_name()
  staging: lustre: lmv: rename request to preq in lmv_getattr_name()
  staging: lustre: move ioctls to lustre_ioctl.h
  staging: lustre: llite: validate names
  staging: lustre: uapi: reduce scope of lustre_idl.h
  staging: lustre: mdt: add mbo_ prefix to members of struct mdt_body
  staging: lustre: obd: validate open handle cookies

Lai Siyao (2):
  staging: lustre: fid: do open-by-fid by default
  staging: lustre: ptlrpc: add OBD_CONNECT_UNLINK_CLOSE flag

Liang Zhen (1):
  staging: lustre: lnet: lock improvement for ko2iblnd

Mikhail Pershin (1):
  staging: lustre: llog: keep llog ctxt indices constant

Nathaniel Clark (1):
  staging: lustre: lmv: Ensure lmv_intent_lookup cleans up reqp

Niu Yawei (1):
  staging: lustre: obd: rename lsr_padding to lsr_valid

Patrick Farrell (1):
  staging: lustre: fld: add fld description documentation

Ryan Haasken (1):
  staging: lustre: libcfs: Only dump log once per sec. to avoid EEXIST

Vitaly Fertman (1):
  staging: lustre: ldlm: flock completion fixes.

wang di (27):
  staging: lustre: llite: add md_op_data parameter to ll_get_dir_page
  staging: lustre: llite: remove comment from ll_dir_read
  staging: lustre: llite: style cleanup for llite_internal.h
  staging: lustre: llite: pass inode to ll_release_page
  staging: lustre: llite: change remove parameter to bool
  staging: lustre: mdc: don't take rpc lock for readdir case
  staging: lustre: lmv: remove unused lmv_get_mea function
  staging: lustre: lmv: remove duplicate MAX_HASH_*
  staging: lustre: lmv: change handling of lmv striping information
  staging: lustre: lmv: remove lmv_get_easize
  staging: lustre: lmv: replace obd_free_memmd with lmv_free_memmd
  staging: lustre: create striped directory
  staging: lustre: llite: fix "getdirstripe" to show stripe info
  staging: lustre: delete striped directory
  staging: lustre: add ability to migrate inodes.
  staging: lustre: llite: a few fixes for migration.
  staging: lustre: lmv: lookup remote migrating object in LMV
  staging: lustre: llite: add error handler in inode prepare phase
  staging: lustre: lmv: separate master object with master stripe
  staging: lustre: llite: a few fixes about readdir of striped dir.
  staging: lustre: lmv: validate lock with correct stripe FID
  staging: lustre: lmv: Match MDT where the FID locates first
  staging: lustre: llite: use the correct mode for striped directory
  staging: lustre: mdc: always use D_INFO for debug info when mdc_put_rpc_lock fails
  staging: lustre: lmv: try all stripes for unknown hash functions
  staging: lustre: obd: implement md_read_page
  staging: lustre: llite: set op_max_pages

 .../lustre/include/linux/libcfs/libcfs_fail.h      |    3 +
 .../lustre/include/linux/libcfs/libcfs_private.h   |    9 -
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    2 +
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   78 +-
 drivers/staging/lustre/lnet/libcfs/debug.c         |    9 +-
 drivers/staging/lustre/lnet/libcfs/fail.c          |    6 +-
 drivers/staging/lustre/lnet/libcfs/libcfs_string.c |    2 -
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    3 +
 drivers/staging/lustre/lustre/fld/fld_internal.h   |   19 +
 drivers/staging/lustre/lustre/fld/fld_request.c    |   55 +--
 drivers/staging/lustre/lustre/include/cl_object.h  |   75 +-
 .../staging/lustre/lustre/include/lprocfs_status.h |    6 +
 drivers/staging/lustre/lustre/include/lu_object.h  |   19 +-
 .../lustre/lustre/include/lustre/lustre_idl.h      |  341 +++++----
 .../lustre/lustre/include/lustre/lustre_ioctl.h    |  412 ++++++++++
 .../lustre/lustre/include/lustre/lustre_user.h     |   73 +-
 drivers/staging/lustre/lustre/include/lustre_dlm.h |   11 +-
 .../lustre/lustre/include/lustre_dlm_flags.h       |   36 +-
 drivers/staging/lustre/lustre/include/lustre_fid.h |   30 +-
 .../staging/lustre/lustre/include/lustre_handles.h |    3 +-
 drivers/staging/lustre/lustre/include/lustre_lib.h |  286 +-------
 .../staging/lustre/lustre/include/lustre_lite.h    |    1 -
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   67 ++-
 drivers/staging/lustre/lustre/include/lustre_mdc.h |   23 +-
 drivers/staging/lustre/lustre/include/lustre_mds.h |    3 -
 drivers/staging/lustre/lustre/include/lustre_ver.h |   13 +-
 drivers/staging/lustre/lustre/include/obd.h        |  123 ++--
 drivers/staging/lustre/lustre/include/obd_class.h  |   66 ++-
 .../staging/lustre/lustre/include/obd_support.h    |    9 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_flock.c    |  102 ++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |   16 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   53 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   29 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |   33 +-
 drivers/staging/lustre/lustre/llite/dir.c          |  382 ++++++---
 drivers/staging/lustre/lustre/llite/file.c         |  311 ++++++--
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |    2 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   57 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  434 ++++++++--
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |   57 +-
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |  109 +++-
 drivers/staging/lustre/lustre/llite/namei.c        |  162 +++--
 drivers/staging/lustre/lustre/llite/rw.c           |    4 +
 drivers/staging/lustre/lustre/llite/rw26.c         |    7 +
 drivers/staging/lustre/lustre/llite/statahead.c    |   46 +-
 drivers/staging/lustre/lustre/llite/symlink.c      |    6 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |    6 -
 drivers/staging/lustre/lustre/llite/vvp_internal.h |    6 +-
 drivers/staging/lustre/lustre/llite/vvp_req.c      |    2 +
 drivers/staging/lustre/lustre/llite/xattr.c        |   29 +-
 drivers/staging/lustre/lustre/llite/xattr_cache.c  |   12 +-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |  339 +++++++--
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |  127 ++-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  865 ++++++++++++++++----
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |    4 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |   22 +-
 drivers/staging/lustre/lustre/lov/lov_obd.c        |    1 +
 drivers/staging/lustre/lustre/lov/lov_object.c     |    1 +
 drivers/staging/lustre/lustre/lov/lov_page.c       |   12 +-
 drivers/staging/lustre/lustre/mdc/lproc_mdc.c      |   17 +-
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    7 +-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |  200 ++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   69 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |   12 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |   73 +-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   10 +-
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |   12 +-
 drivers/staging/lustre/lustre/obdclass/class_obd.c |   10 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |  134 +++-
 .../lustre/lustre/obdclass/linux/linux-module.c    |    1 +
 drivers/staging/lustre/lustre/obdclass/llog_swab.c |    1 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |  144 ++++
 drivers/staging/lustre/lustre/obdclass/lu_object.c |  125 ++-
 .../lustre/lustre/obdclass/lustre_handles.c        |    4 +-
 .../staging/lustre/lustre/obdclass/obd_config.c    |    1 +
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |   23 +-
 .../staging/lustre/lustre/obdecho/echo_client.c    |    1 +
 drivers/staging/lustre/lustre/osc/lproc_osc.c      |   10 +-
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  135 +---
 drivers/staging/lustre/lustre/osc/osc_internal.h   |    3 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |    7 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  208 ++++-
 drivers/staging/lustre/lustre/osc/osc_request.c    |   64 +-
 drivers/staging/lustre/lustre/ptlrpc/client.c      |   12 +-
 drivers/staging/lustre/lustre/ptlrpc/import.c      |   12 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   93 ++-
 drivers/staging/lustre/lustre/ptlrpc/service.c     |   18 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |  346 +++++----
 88 files changed, 4588 insertions(+), 2183 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [PATCH 01/80] staging: lustre: llite: add md_op_data parameter to ll_get_dir_page
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Pass in struct md_op_data for ll_get_dir_page function.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |    8 ++++----
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 ++--
 drivers/staging/lustre/lustre/llite/statahead.c    |   15 +++++++++++----
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 031c9e4..82c7f88 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -322,8 +322,8 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 	return page;
 }
 
-struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
-			     struct ll_dir_chain *chain)
+struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
+			     __u64 hash, struct ll_dir_chain *chain)
 {
 	ldlm_policy_data_t policy = {.l_inodebits = {MDS_INODELOCK_UPDATE} };
 	struct address_space *mapping = dir->i_mapping;
@@ -503,7 +503,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 
 	ll_dir_chain_init(&chain);
 
-	page = ll_get_dir_page(inode, pos, &chain);
+	page = ll_get_dir_page(inode, op_data, pos, &chain);
 
 	while (rc == 0 && !done) {
 		struct lu_dirpage *dp;
@@ -585,7 +585,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 					le32_to_cpu(dp->ldp_flags) &
 					LDF_COLLIDE);
 			next = pos;
-			page = ll_get_dir_page(inode, pos,
+			page = ll_get_dir_page(inode, op_data, pos,
 					       &chain);
 		}
 	}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index dc15957..fc0c72c 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -651,8 +651,8 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid,
 void ll_release_page(struct page *page, int remove);
 extern const struct file_operations ll_dir_operations;
 extern const struct inode_operations ll_dir_inode_operations;
-struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
-			     struct ll_dir_chain *chain);
+struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
+			     __u64 hash, struct ll_dir_chain *chain);
 int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 		struct dir_context *ctx);
 
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 54ed52e..1b222c7 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1067,7 +1067,7 @@ static int ll_statahead_thread(void *arg)
 	wake_up(&thread->t_ctl_waitq);
 
 	ll_dir_chain_init(&chain);
-	page = ll_get_dir_page(dir, pos, &chain);
+	page = ll_get_dir_page(dir, op_data, pos, &chain);
 
 	while (1) {
 		struct lu_dirpage *dp;
@@ -1232,7 +1232,7 @@ do_it:
 			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
 					      LDF_COLLIDE);
 			sai->sai_in_readpage = 1;
-			page = ll_get_dir_page(dir, pos, &chain);
+			page = ll_get_dir_page(dir, op_data, pos, &chain);
 			sai->sai_in_readpage = 0;
 		}
 	}
@@ -1344,13 +1344,19 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 {
 	struct ll_dir_chain   chain;
 	const struct qstr  *target = &dentry->d_name;
+	struct md_op_data *op_data;
 	struct page	  *page;
 	__u64		 pos    = 0;
 	int		   dot_de;
 	int		   rc     = LS_NONE_FIRST_DE;
 
+	op_data = ll_prep_md_op_data(NULL, dir, dir, NULL, 0, 0,
+				     LUSTRE_OPC_ANY, dir);
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
 	ll_dir_chain_init(&chain);
-	page = ll_get_dir_page(dir, pos, &chain);
+	page = ll_get_dir_page(dir, op_data, pos, &chain);
 
 	while (1) {
 		struct lu_dirpage *dp;
@@ -1438,12 +1444,13 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			 */
 			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
 					      LDF_COLLIDE);
-			page = ll_get_dir_page(dir, pos, &chain);
+			page = ll_get_dir_page(dir, op_data, pos, &chain);
 		}
 	}
 
 out:
 	ll_dir_chain_fini(&chain);
+	ll_finish_md_op_data(op_data);
 	return rc;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 01/80] staging: lustre: llite: add md_op_data parameter to ll_get_dir_page
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Pass in struct md_op_data for ll_get_dir_page function.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |    8 ++++----
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 ++--
 drivers/staging/lustre/lustre/llite/statahead.c    |   15 +++++++++++----
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 031c9e4..82c7f88 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -322,8 +322,8 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 	return page;
 }
 
-struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
-			     struct ll_dir_chain *chain)
+struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
+			     __u64 hash, struct ll_dir_chain *chain)
 {
 	ldlm_policy_data_t policy = {.l_inodebits = {MDS_INODELOCK_UPDATE} };
 	struct address_space *mapping = dir->i_mapping;
@@ -503,7 +503,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 
 	ll_dir_chain_init(&chain);
 
-	page = ll_get_dir_page(inode, pos, &chain);
+	page = ll_get_dir_page(inode, op_data, pos, &chain);
 
 	while (rc == 0 && !done) {
 		struct lu_dirpage *dp;
@@ -585,7 +585,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 					le32_to_cpu(dp->ldp_flags) &
 					LDF_COLLIDE);
 			next = pos;
-			page = ll_get_dir_page(inode, pos,
+			page = ll_get_dir_page(inode, op_data, pos,
 					       &chain);
 		}
 	}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index dc15957..fc0c72c 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -651,8 +651,8 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid,
 void ll_release_page(struct page *page, int remove);
 extern const struct file_operations ll_dir_operations;
 extern const struct inode_operations ll_dir_inode_operations;
-struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
-			     struct ll_dir_chain *chain);
+struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
+			     __u64 hash, struct ll_dir_chain *chain);
 int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 		struct dir_context *ctx);
 
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 54ed52e..1b222c7 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1067,7 +1067,7 @@ static int ll_statahead_thread(void *arg)
 	wake_up(&thread->t_ctl_waitq);
 
 	ll_dir_chain_init(&chain);
-	page = ll_get_dir_page(dir, pos, &chain);
+	page = ll_get_dir_page(dir, op_data, pos, &chain);
 
 	while (1) {
 		struct lu_dirpage *dp;
@@ -1232,7 +1232,7 @@ do_it:
 			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
 					      LDF_COLLIDE);
 			sai->sai_in_readpage = 1;
-			page = ll_get_dir_page(dir, pos, &chain);
+			page = ll_get_dir_page(dir, op_data, pos, &chain);
 			sai->sai_in_readpage = 0;
 		}
 	}
@@ -1344,13 +1344,19 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 {
 	struct ll_dir_chain   chain;
 	const struct qstr  *target = &dentry->d_name;
+	struct md_op_data *op_data;
 	struct page	  *page;
 	__u64		 pos    = 0;
 	int		   dot_de;
 	int		   rc     = LS_NONE_FIRST_DE;
 
+	op_data = ll_prep_md_op_data(NULL, dir, dir, NULL, 0, 0,
+				     LUSTRE_OPC_ANY, dir);
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
 	ll_dir_chain_init(&chain);
-	page = ll_get_dir_page(dir, pos, &chain);
+	page = ll_get_dir_page(dir, op_data, pos, &chain);
 
 	while (1) {
 		struct lu_dirpage *dp;
@@ -1438,12 +1444,13 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			 */
 			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
 					      LDF_COLLIDE);
-			page = ll_get_dir_page(dir, pos, &chain);
+			page = ll_get_dir_page(dir, op_data, pos, &chain);
 		}
 	}
 
 out:
 	ll_dir_chain_fini(&chain);
+	ll_finish_md_op_data(op_data);
 	return rc;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 02/80] staging: lustre: llite: remove comment from ll_dir_read
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Remove comment about fixing swabbing that is not needed.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 82c7f88..d854edd 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -526,10 +526,6 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 			__u64	  lhash;
 			__u64	  ino;
 
-			/*
-			 * XXX: implement correct swabbing here.
-			 */
-
 			hash = le64_to_cpu(ent->lde_hash);
 			if (hash < pos)
 				/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 02/80] staging: lustre: llite: remove comment from ll_dir_read
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Remove comment about fixing swabbing that is not needed.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 82c7f88..d854edd 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -526,10 +526,6 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 			__u64	  lhash;
 			__u64	  ino;
 
-			/*
-			 * XXX: implement correct swabbing here.
-			 */
-
 			hash = le64_to_cpu(ent->lde_hash);
 			if (hash < pos)
 				/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 03/80] staging: lustre: llite: style cleanup for llite_internal.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Group function prototypes together related to dir.c. Move
ll_release_page to be with function declarations.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/llite/llite_internal.h   |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index fc0c72c..1ced397 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -648,15 +648,15 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid,
 		       size_t count, int rw);
 
 /* llite/dir.c */
-void ll_release_page(struct page *page, int remove);
 extern const struct file_operations ll_dir_operations;
 extern const struct inode_operations ll_dir_inode_operations;
-struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
-			     __u64 hash, struct ll_dir_chain *chain);
 int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 		struct dir_context *ctx);
-
 int ll_get_mdt_idx(struct inode *inode);
+struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
+			     __u64 hash, struct ll_dir_chain *chain);
+void ll_release_page(struct page *page, int remove);
+
 /* llite/namei.c */
 extern const struct inode_operations ll_special_inode_operations;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 03/80] staging: lustre: llite: style cleanup for llite_internal.h
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Group function prototypes together related to dir.c. Move
ll_release_page to be with function declarations.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/llite/llite_internal.h   |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index fc0c72c..1ced397 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -648,15 +648,15 @@ void ll_rw_stats_tally(struct ll_sb_info *sbi, pid_t pid,
 		       size_t count, int rw);
 
 /* llite/dir.c */
-void ll_release_page(struct page *page, int remove);
 extern const struct file_operations ll_dir_operations;
 extern const struct inode_operations ll_dir_inode_operations;
-struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
-			     __u64 hash, struct ll_dir_chain *chain);
 int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 		struct dir_context *ctx);
-
 int ll_get_mdt_idx(struct inode *inode);
+struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
+			     __u64 hash, struct ll_dir_chain *chain);
+void ll_release_page(struct page *page, int remove);
+
 /* llite/namei.c */
 extern const struct inode_operations ll_special_inode_operations;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 04/80] staging: lustre: llite: pass inode to ll_release_page
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add a inode parameter to ll_release_page. This will be
used in the future.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   17 +++++++++--------
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +-
 drivers/staging/lustre/lustre/llite/statahead.c    |   20 +++++++++++---------
 3 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index d854edd..3a800b2 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -236,7 +236,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	return rc;
 }
 
-void ll_release_page(struct page *page, int remove)
+void ll_release_page(struct inode *inode, struct page *page, int remove)
 {
 	kunmap(page);
 	if (remove) {
@@ -297,7 +297,7 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 			CDEBUG(D_VFSTRACE, "page %lu [%llu %llu], hash %llu\n",
 			       offset, *start, *end, *hash);
 			if (*hash > *end) {
-				ll_release_page(page, 0);
+				ll_release_page(dir, page, 0);
 				page = NULL;
 			} else if (*end != *start && *hash == *end) {
 				/*
@@ -306,8 +306,9 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 				 * ll_get_dir_page() will issue RPC to fetch
 				 * the page we want.
 				 */
-				ll_release_page(page,
-				    le32_to_cpu(dp->ldp_flags) & LDF_COLLIDE);
+				ll_release_page(dir, page,
+						le32_to_cpu(dp->ldp_flags) &
+						LDF_COLLIDE);
 				page = NULL;
 			}
 		} else {
@@ -462,7 +463,7 @@ out_unlock:
 	return page;
 
 fail:
-	ll_release_page(page, 1);
+	ll_release_page(dir, page, 1);
 	page = ERR_PTR(-EIO);
 	goto out_unlock;
 }
@@ -560,7 +561,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 
 		if (done) {
 			pos = hash;
-			ll_release_page(page, 0);
+			ll_release_page(inode, page, 0);
 			break;
 		}
 
@@ -571,13 +572,13 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 			 * End of directory reached.
 			 */
 			done = 1;
-			ll_release_page(page, 0);
+			ll_release_page(inode, page, 0);
 		} else {
 			/*
 			 * Normal case: continue to the next
 			 * page.
 			 */
-			ll_release_page(page,
+			ll_release_page(inode, page,
 					le32_to_cpu(dp->ldp_flags) &
 					LDF_COLLIDE);
 			next = pos;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 1ced397..4b03a64 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -655,7 +655,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 int ll_get_mdt_idx(struct inode *inode);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     __u64 hash, struct ll_dir_chain *chain);
-void ll_release_page(struct page *page, int remove);
+void ll_release_page(struct inode *inode, struct page *page, int remove);
 
 /* llite/namei.c */
 extern const struct inode_operations ll_special_inode_operations;
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 1b222c7..2949ff6 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1142,7 +1142,7 @@ interpret_it:
 				ll_post_statahead(sai);
 
 			if (unlikely(!thread_is_running(thread))) {
-				ll_release_page(page, 0);
+				ll_release_page(dir, page, 0);
 				rc = 0;
 				goto out;
 			}
@@ -1166,7 +1166,7 @@ interpret_it:
 
 					if (unlikely(
 						!thread_is_running(thread))) {
-						ll_release_page(page, 0);
+						ll_release_page(dir, page, 0);
 						rc = 0;
 						goto out;
 					}
@@ -1189,7 +1189,7 @@ do_it:
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(page, 0);
+			ll_release_page(dir, page, 0);
 			while (1) {
 				l_wait_event(thread->t_ctl_waitq,
 					     !list_empty(&sai->sai_entries_received) ||
@@ -1229,8 +1229,9 @@ do_it:
 			 * chain is exhausted.
 			 * Normal case: continue to the next page.
 			 */
-			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
-					      LDF_COLLIDE);
+			ll_release_page(dir, page,
+					le32_to_cpu(dp->ldp_flags) &
+					LDF_COLLIDE);
 			sai->sai_in_readpage = 1;
 			page = ll_get_dir_page(dir, op_data, pos, &chain);
 			sai->sai_in_readpage = 0;
@@ -1427,7 +1428,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			else
 				rc = LS_FIRST_DOT_DE;
 
-			ll_release_page(page, 0);
+			ll_release_page(dir, page, 0);
 			goto out;
 		}
 		pos = le64_to_cpu(dp->ldp_hash_end);
@@ -1435,15 +1436,16 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(page, 0);
+			ll_release_page(dir, page, 0);
 			goto out;
 		} else {
 			/*
 			 * chain is exhausted
 			 * Normal case: continue to the next page.
 			 */
-			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
-					      LDF_COLLIDE);
+			ll_release_page(dir, page,
+					le32_to_cpu(dp->ldp_flags) &
+					LDF_COLLIDE);
 			page = ll_get_dir_page(dir, op_data, pos, &chain);
 		}
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 04/80] staging: lustre: llite: pass inode to ll_release_page
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add a inode parameter to ll_release_page. This will be
used in the future.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   17 +++++++++--------
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +-
 drivers/staging/lustre/lustre/llite/statahead.c    |   20 +++++++++++---------
 3 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index d854edd..3a800b2 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -236,7 +236,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	return rc;
 }
 
-void ll_release_page(struct page *page, int remove)
+void ll_release_page(struct inode *inode, struct page *page, int remove)
 {
 	kunmap(page);
 	if (remove) {
@@ -297,7 +297,7 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 			CDEBUG(D_VFSTRACE, "page %lu [%llu %llu], hash %llu\n",
 			       offset, *start, *end, *hash);
 			if (*hash > *end) {
-				ll_release_page(page, 0);
+				ll_release_page(dir, page, 0);
 				page = NULL;
 			} else if (*end != *start && *hash == *end) {
 				/*
@@ -306,8 +306,9 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 				 * ll_get_dir_page() will issue RPC to fetch
 				 * the page we want.
 				 */
-				ll_release_page(page,
-				    le32_to_cpu(dp->ldp_flags) & LDF_COLLIDE);
+				ll_release_page(dir, page,
+						le32_to_cpu(dp->ldp_flags) &
+						LDF_COLLIDE);
 				page = NULL;
 			}
 		} else {
@@ -462,7 +463,7 @@ out_unlock:
 	return page;
 
 fail:
-	ll_release_page(page, 1);
+	ll_release_page(dir, page, 1);
 	page = ERR_PTR(-EIO);
 	goto out_unlock;
 }
@@ -560,7 +561,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 
 		if (done) {
 			pos = hash;
-			ll_release_page(page, 0);
+			ll_release_page(inode, page, 0);
 			break;
 		}
 
@@ -571,13 +572,13 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 			 * End of directory reached.
 			 */
 			done = 1;
-			ll_release_page(page, 0);
+			ll_release_page(inode, page, 0);
 		} else {
 			/*
 			 * Normal case: continue to the next
 			 * page.
 			 */
-			ll_release_page(page,
+			ll_release_page(inode, page,
 					le32_to_cpu(dp->ldp_flags) &
 					LDF_COLLIDE);
 			next = pos;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 1ced397..4b03a64 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -655,7 +655,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 int ll_get_mdt_idx(struct inode *inode);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     __u64 hash, struct ll_dir_chain *chain);
-void ll_release_page(struct page *page, int remove);
+void ll_release_page(struct inode *inode, struct page *page, int remove);
 
 /* llite/namei.c */
 extern const struct inode_operations ll_special_inode_operations;
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 1b222c7..2949ff6 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1142,7 +1142,7 @@ interpret_it:
 				ll_post_statahead(sai);
 
 			if (unlikely(!thread_is_running(thread))) {
-				ll_release_page(page, 0);
+				ll_release_page(dir, page, 0);
 				rc = 0;
 				goto out;
 			}
@@ -1166,7 +1166,7 @@ interpret_it:
 
 					if (unlikely(
 						!thread_is_running(thread))) {
-						ll_release_page(page, 0);
+						ll_release_page(dir, page, 0);
 						rc = 0;
 						goto out;
 					}
@@ -1189,7 +1189,7 @@ do_it:
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(page, 0);
+			ll_release_page(dir, page, 0);
 			while (1) {
 				l_wait_event(thread->t_ctl_waitq,
 					     !list_empty(&sai->sai_entries_received) ||
@@ -1229,8 +1229,9 @@ do_it:
 			 * chain is exhausted.
 			 * Normal case: continue to the next page.
 			 */
-			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
-					      LDF_COLLIDE);
+			ll_release_page(dir, page,
+					le32_to_cpu(dp->ldp_flags) &
+					LDF_COLLIDE);
 			sai->sai_in_readpage = 1;
 			page = ll_get_dir_page(dir, op_data, pos, &chain);
 			sai->sai_in_readpage = 0;
@@ -1427,7 +1428,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			else
 				rc = LS_FIRST_DOT_DE;
 
-			ll_release_page(page, 0);
+			ll_release_page(dir, page, 0);
 			goto out;
 		}
 		pos = le64_to_cpu(dp->ldp_hash_end);
@@ -1435,15 +1436,16 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(page, 0);
+			ll_release_page(dir, page, 0);
 			goto out;
 		} else {
 			/*
 			 * chain is exhausted
 			 * Normal case: continue to the next page.
 			 */
-			ll_release_page(page, le32_to_cpu(dp->ldp_flags) &
-					      LDF_COLLIDE);
+			ll_release_page(dir, page,
+					le32_to_cpu(dp->ldp_flags) &
+					LDF_COLLIDE);
 			page = ll_get_dir_page(dir, op_data, pos, &chain);
 		}
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 05/80] staging: lustre: llite: change remove parameter to bool
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Change the 3rd parameter remove to a bool for ll_release_page
function.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   10 +++++-----
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +-
 drivers/staging/lustre/lustre/llite/statahead.c    |   10 +++++-----
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 3a800b2..a72b486 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -236,7 +236,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	return rc;
 }
 
-void ll_release_page(struct inode *inode, struct page *page, int remove)
+void ll_release_page(struct inode *inode, struct page *page, bool remove)
 {
 	kunmap(page);
 	if (remove) {
@@ -297,7 +297,7 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 			CDEBUG(D_VFSTRACE, "page %lu [%llu %llu], hash %llu\n",
 			       offset, *start, *end, *hash);
 			if (*hash > *end) {
-				ll_release_page(dir, page, 0);
+				ll_release_page(dir, page, false);
 				page = NULL;
 			} else if (*end != *start && *hash == *end) {
 				/*
@@ -463,7 +463,7 @@ out_unlock:
 	return page;
 
 fail:
-	ll_release_page(dir, page, 1);
+	ll_release_page(dir, page, true);
 	page = ERR_PTR(-EIO);
 	goto out_unlock;
 }
@@ -561,7 +561,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 
 		if (done) {
 			pos = hash;
-			ll_release_page(inode, page, 0);
+			ll_release_page(inode, page, false);
 			break;
 		}
 
@@ -572,7 +572,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 			 * End of directory reached.
 			 */
 			done = 1;
-			ll_release_page(inode, page, 0);
+			ll_release_page(inode, page, false);
 		} else {
 			/*
 			 * Normal case: continue to the next
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 4b03a64..07b6918 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -655,7 +655,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 int ll_get_mdt_idx(struct inode *inode);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     __u64 hash, struct ll_dir_chain *chain);
-void ll_release_page(struct inode *inode, struct page *page, int remove);
+void ll_release_page(struct inode *inode, struct page *page, bool remove);
 
 /* llite/namei.c */
 extern const struct inode_operations ll_special_inode_operations;
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 2949ff6..6ce7442 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1142,7 +1142,7 @@ interpret_it:
 				ll_post_statahead(sai);
 
 			if (unlikely(!thread_is_running(thread))) {
-				ll_release_page(dir, page, 0);
+				ll_release_page(dir, page, false);
 				rc = 0;
 				goto out;
 			}
@@ -1166,7 +1166,7 @@ interpret_it:
 
 					if (unlikely(
 						!thread_is_running(thread))) {
-						ll_release_page(dir, page, 0);
+						ll_release_page(dir, page, false);
 						rc = 0;
 						goto out;
 					}
@@ -1189,7 +1189,7 @@ do_it:
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(dir, page, 0);
+			ll_release_page(dir, page, false);
 			while (1) {
 				l_wait_event(thread->t_ctl_waitq,
 					     !list_empty(&sai->sai_entries_received) ||
@@ -1428,7 +1428,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			else
 				rc = LS_FIRST_DOT_DE;
 
-			ll_release_page(dir, page, 0);
+			ll_release_page(dir, page, false);
 			goto out;
 		}
 		pos = le64_to_cpu(dp->ldp_hash_end);
@@ -1436,7 +1436,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(dir, page, 0);
+			ll_release_page(dir, page, false);
 			goto out;
 		} else {
 			/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 05/80] staging: lustre: llite: change remove parameter to bool
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Change the 3rd parameter remove to a bool for ll_release_page
function.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   10 +++++-----
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +-
 drivers/staging/lustre/lustre/llite/statahead.c    |   10 +++++-----
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 3a800b2..a72b486 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -236,7 +236,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	return rc;
 }
 
-void ll_release_page(struct inode *inode, struct page *page, int remove)
+void ll_release_page(struct inode *inode, struct page *page, bool remove)
 {
 	kunmap(page);
 	if (remove) {
@@ -297,7 +297,7 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
 			CDEBUG(D_VFSTRACE, "page %lu [%llu %llu], hash %llu\n",
 			       offset, *start, *end, *hash);
 			if (*hash > *end) {
-				ll_release_page(dir, page, 0);
+				ll_release_page(dir, page, false);
 				page = NULL;
 			} else if (*end != *start && *hash == *end) {
 				/*
@@ -463,7 +463,7 @@ out_unlock:
 	return page;
 
 fail:
-	ll_release_page(dir, page, 1);
+	ll_release_page(dir, page, true);
 	page = ERR_PTR(-EIO);
 	goto out_unlock;
 }
@@ -561,7 +561,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 
 		if (done) {
 			pos = hash;
-			ll_release_page(inode, page, 0);
+			ll_release_page(inode, page, false);
 			break;
 		}
 
@@ -572,7 +572,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 			 * End of directory reached.
 			 */
 			done = 1;
-			ll_release_page(inode, page, 0);
+			ll_release_page(inode, page, false);
 		} else {
 			/*
 			 * Normal case: continue to the next
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 4b03a64..07b6918 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -655,7 +655,7 @@ int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 int ll_get_mdt_idx(struct inode *inode);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     __u64 hash, struct ll_dir_chain *chain);
-void ll_release_page(struct inode *inode, struct page *page, int remove);
+void ll_release_page(struct inode *inode, struct page *page, bool remove);
 
 /* llite/namei.c */
 extern const struct inode_operations ll_special_inode_operations;
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 2949ff6..6ce7442 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1142,7 +1142,7 @@ interpret_it:
 				ll_post_statahead(sai);
 
 			if (unlikely(!thread_is_running(thread))) {
-				ll_release_page(dir, page, 0);
+				ll_release_page(dir, page, false);
 				rc = 0;
 				goto out;
 			}
@@ -1166,7 +1166,7 @@ interpret_it:
 
 					if (unlikely(
 						!thread_is_running(thread))) {
-						ll_release_page(dir, page, 0);
+						ll_release_page(dir, page, false);
 						rc = 0;
 						goto out;
 					}
@@ -1189,7 +1189,7 @@ do_it:
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(dir, page, 0);
+			ll_release_page(dir, page, false);
 			while (1) {
 				l_wait_event(thread->t_ctl_waitq,
 					     !list_empty(&sai->sai_entries_received) ||
@@ -1428,7 +1428,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			else
 				rc = LS_FIRST_DOT_DE;
 
-			ll_release_page(dir, page, 0);
+			ll_release_page(dir, page, false);
 			goto out;
 		}
 		pos = le64_to_cpu(dp->ldp_hash_end);
@@ -1436,7 +1436,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 			/*
 			 * End of directory reached.
 			 */
-			ll_release_page(dir, page, 0);
+			ll_release_page(dir, page, false);
 			goto out;
 		} else {
 			/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 06/80] staging: lustre: mdc: don't take rpc lock for readdir case
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

If the operation is IT_READDIR don't need to handle the
mdc RPC lock.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/10761
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4906
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_mdc.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h b/drivers/staging/lustre/lustre/include/lustre_mdc.h
index fa62b95..0a8c639 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
@@ -96,7 +96,7 @@ static inline void mdc_get_rpc_lock(struct mdc_rpc_lock *lck,
 				    struct lookup_intent *it)
 {
 	if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-		   it->it_op == IT_LAYOUT))
+		   it->it_op == IT_LAYOUT || it->it_op == IT_READDIR))
 		return;
 
 	/* This would normally block until the existing request finishes.
@@ -136,7 +136,7 @@ static inline void mdc_put_rpc_lock(struct mdc_rpc_lock *lck,
 				    struct lookup_intent *it)
 {
 	if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-		   it->it_op == IT_LAYOUT))
+		   it->it_op == IT_LAYOUT || it->it_op == IT_READDIR))
 		return;
 
 	if (lck->rpcl_it == MDC_FAKE_RPCL_IT) { /* OBD_FAIL_MDC_RPCS_SEM */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 06/80] staging: lustre: mdc: don't take rpc lock for readdir case
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

If the operation is IT_READDIR don't need to handle the
mdc RPC lock.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/10761
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4906
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_mdc.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h b/drivers/staging/lustre/lustre/include/lustre_mdc.h
index fa62b95..0a8c639 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
@@ -96,7 +96,7 @@ static inline void mdc_get_rpc_lock(struct mdc_rpc_lock *lck,
 				    struct lookup_intent *it)
 {
 	if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-		   it->it_op == IT_LAYOUT))
+		   it->it_op == IT_LAYOUT || it->it_op == IT_READDIR))
 		return;
 
 	/* This would normally block until the existing request finishes.
@@ -136,7 +136,7 @@ static inline void mdc_put_rpc_lock(struct mdc_rpc_lock *lck,
 				    struct lookup_intent *it)
 {
 	if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-		   it->it_op == IT_LAYOUT))
+		   it->it_op == IT_LAYOUT || it->it_op == IT_READDIR))
 		return;
 
 	if (lck->rpcl_it == MDC_FAKE_RPCL_IT) { /* OBD_FAIL_MDC_RPCS_SEM */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 07/80] staging: lustre: lmv: remove unused lmv_get_mea function
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

The function lmv_get_mea() is not used so remove it.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |   24 ----------------------
 1 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index 471470b..ab01560 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -55,30 +55,6 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
 int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
 		  struct md_op_data *op_data);
 
-static inline struct lmv_stripe_md *lmv_get_mea(struct ptlrpc_request *req)
-{
-	struct mdt_body	 *body;
-	struct lmv_stripe_md    *mea;
-
-	LASSERT(req);
-
-	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-
-	if (!body || !S_ISDIR(body->mode) || !body->eadatasize)
-		return NULL;
-
-	mea = req_capsule_server_sized_get(&req->rq_pill, &RMF_MDT_MD,
-					   body->eadatasize);
-	if (mea->mea_count == 0)
-		return NULL;
-	if (mea->mea_magic != MEA_MAGIC_LAST_CHAR &&
-	    mea->mea_magic != MEA_MAGIC_ALL_CHARS &&
-	    mea->mea_magic != MEA_MAGIC_HASH_SEGMENT)
-		return NULL;
-
-	return mea;
-}
-
 static inline int lmv_get_easize(struct lmv_obd *lmv)
 {
 	return sizeof(struct lmv_stripe_md) +
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 07/80] staging: lustre: lmv: remove unused lmv_get_mea function
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

The function lmv_get_mea() is not used so remove it.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |   24 ----------------------
 1 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index 471470b..ab01560 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -55,30 +55,6 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
 int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
 		  struct md_op_data *op_data);
 
-static inline struct lmv_stripe_md *lmv_get_mea(struct ptlrpc_request *req)
-{
-	struct mdt_body	 *body;
-	struct lmv_stripe_md    *mea;
-
-	LASSERT(req);
-
-	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-
-	if (!body || !S_ISDIR(body->mode) || !body->eadatasize)
-		return NULL;
-
-	mea = req_capsule_server_sized_get(&req->rq_pill, &RMF_MDT_MD,
-					   body->eadatasize);
-	if (mea->mea_count == 0)
-		return NULL;
-	if (mea->mea_magic != MEA_MAGIC_LAST_CHAR &&
-	    mea->mea_magic != MEA_MAGIC_ALL_CHARS &&
-	    mea->mea_magic != MEA_MAGIC_HASH_SEGMENT)
-		return NULL;
-
-	return mea;
-}
-
 static inline int lmv_get_easize(struct lmv_obd *lmv)
 {
 	return sizeof(struct lmv_stripe_md) +
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 08/80] staging: lustre: lmv: remove duplicate MAX_HASH_*
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

The MAX_HASH_* macros already exist in obd.h. Remove
the duplicated defines in lustre_idl.h.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 32471a6..5f31724 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2486,10 +2486,6 @@ struct lmv_desc {
 #define MEA_MAGIC_ALL_CHARS      0xb222a11c
 #define MEA_MAGIC_HASH_SEGMENT   0xb222a11b
 
-#define MAX_HASH_SIZE_32	 0x7fffffffUL
-#define MAX_HASH_SIZE	    0x7fffffffffffffffULL
-#define MAX_HASH_HIGHEST_BIT     0x1000000000000000ULL
-
 /* lmv structures */
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 08/80] staging: lustre: lmv: remove duplicate MAX_HASH_*
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

The MAX_HASH_* macros already exist in obd.h. Remove
the duplicated defines in lustre_idl.h.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 32471a6..5f31724 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2486,10 +2486,6 @@ struct lmv_desc {
 #define MEA_MAGIC_ALL_CHARS      0xb222a11c
 #define MEA_MAGIC_HASH_SEGMENT   0xb222a11b
 
-#define MAX_HASH_SIZE_32	 0x7fffffffUL
-#define MAX_HASH_SIZE	    0x7fffffffffffffffULL
-#define MAX_HASH_HIGHEST_BIT     0x1000000000000000ULL
-
 /* lmv structures */
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 09/80] staging: lustre: lmv: change handling of lmv striping information
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

The lmv_[un]pack_md function are used to calculate the
size of the data used to represent the LMV striping data.
The original code was straight forward in its calculate
with lmv_get_easize since only one type of data format
could exist. We want to be able to support different
version of this data in the future so this patch moves
to generating the size of the data using the stripe count
and which LMV_MAGIC_* version.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 -
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   15 +-
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |    7 +
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  245 +++++++++++++++-----
 4 files changed, 198 insertions(+), 73 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 5f31724..0ad6605 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2482,10 +2482,6 @@ struct lmv_desc {
 	struct obd_uuid ld_uuid;
 };
 
-#define MEA_MAGIC_LAST_CHAR      0xb2221ca1
-#define MEA_MAGIC_ALL_CHARS      0xb222a11c
-#define MEA_MAGIC_HASH_SEGMENT   0xb222a11b
-
 /* lmv structures */
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 0620c8c..784d67b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -41,12 +41,15 @@ struct lmv_oinfo {
 };
 
 struct lmv_stripe_md {
-	__u32	mea_magic;
-	__u32	mea_count;
-	__u32	mea_master;
-	__u32	mea_padding;
-	char	mea_pool_name[LOV_MAXPOOLNAME];
-	struct lu_fid mea_ids[0];
+	__u32	lsm_md_magic;
+	__u32	lsm_md_stripe_count;
+	__u32	lsm_md_master_mdt_index;
+	__u32	lsm_md_hash_type;
+	__u32	lsm_md_layout_version;
+	__u32	lsm_md_default_count;
+	__u32	lsm_md_default_index;
+	char	lsm_md_pool_name[LOV_MAXPOOLNAME];
+	struct lmv_oinfo lsm_md_oinfo[0];
 };
 
 union lmv_mds_md;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index ab01560..90a9786 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -94,6 +94,13 @@ lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid)
 	return lmv_get_target(lmv, mds);
 }
 
+static inline int lmv_stripe_md_size(int stripe_count)
+{
+	struct lmv_stripe_md *lsm;
+
+	return sizeof(*lsm) + stripe_count * sizeof(lsm->lsm_md_oinfo[0]);
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 8e83263..1ba5900 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2376,105 +2376,224 @@ static int lmv_set_info_async(const struct lu_env *env, struct obd_export *exp,
 	return -EINVAL;
 }
 
-static int lmv_packmd(struct obd_export *exp, struct lov_mds_md **lmmp,
-		      struct lov_stripe_md *lsm)
+static int lmv_pack_md_v1(const struct lmv_stripe_md *lsm,
+			  struct lmv_mds_md_v1 *lmm1)
 {
-	struct obd_device	 *obd = class_exp2obd(exp);
-	struct lmv_obd	    *lmv = &obd->u.lmv;
-	struct lmv_stripe_md      *meap;
-	struct lmv_stripe_md      *lsmp;
-	int			mea_size;
-	int			i;
+	int cplen;
+	int i;
+
+	lmm1->lmv_magic = cpu_to_le32(lsm->lsm_md_magic);
+	lmm1->lmv_stripe_count = cpu_to_le32(lsm->lsm_md_stripe_count);
+	lmm1->lmv_master_mdt_index = cpu_to_le32(lsm->lsm_md_master_mdt_index);
+	lmm1->lmv_hash_type = cpu_to_le32(lsm->lsm_md_hash_type);
+	cplen = strlcpy(lmm1->lmv_pool_name, lsm->lsm_md_pool_name,
+			sizeof(lmm1->lmv_pool_name));
+	if (cplen >= sizeof(lmm1->lmv_pool_name))
+		return -E2BIG;
+
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++)
+		fid_cpu_to_le(&lmm1->lmv_stripe_fids[i],
+			      &lsm->lsm_md_oinfo[i].lmo_fid);
+	return 0;
+}
+
+int lmv_pack_md(union lmv_mds_md **lmmp, const struct lmv_stripe_md *lsm,
+		int stripe_count)
+{
+	int lmm_size = 0, rc = 0;
+	bool allocated = false;
 
-	mea_size = lmv_get_easize(lmv);
-	if (!lmmp)
-		return mea_size;
+	LASSERT(lmmp);
 
+	/* Free lmm */
 	if (*lmmp && !lsm) {
+		int stripe_cnt;
+
+		stripe_cnt = lmv_mds_md_stripe_count_get(*lmmp);
+		lmm_size = lmv_mds_md_size(stripe_cnt,
+					   le32_to_cpu((*lmmp)->lmv_magic));
+		if (!lmm_size)
+			return -EINVAL;
 		kvfree(*lmmp);
 		*lmmp = NULL;
 		return 0;
 	}
 
+	/* Alloc lmm */
+	if (!*lmmp && !lsm) {
+		lmm_size = lmv_mds_md_size(stripe_count, LMV_MAGIC);
+		LASSERT(lmm_size > 0);
+		*lmmp = libcfs_kvzalloc(lmm_size, GFP_NOFS);
+		if (!*lmmp)
+			return -ENOMEM;
+		lmv_mds_md_stripe_count_set(*lmmp, stripe_count);
+		(*lmmp)->lmv_magic = cpu_to_le32(LMV_MAGIC);
+		return lmm_size;
+	}
+
+	/* pack lmm */
+	LASSERT(lsm);
+	lmm_size = lmv_mds_md_size(lsm->lsm_md_stripe_count,
+				   lsm->lsm_md_magic);
 	if (!*lmmp) {
-		*lmmp = libcfs_kvzalloc(mea_size, GFP_NOFS);
+		*lmmp = libcfs_kvzalloc(lmm_size, GFP_NOFS);
 		if (!*lmmp)
 			return -ENOMEM;
+		allocated = true;
 	}
 
-	if (!lsm)
-		return mea_size;
+	switch (lsm->lsm_md_magic) {
+	case LMV_MAGIC_V1:
+		rc = lmv_pack_md_v1(lsm, &(*lmmp)->lmv_md_v1);
+		break;
+	default:
+		rc = -EINVAL;
+		break;
+	}
 
-	lsmp = (struct lmv_stripe_md *)lsm;
-	meap = (struct lmv_stripe_md *)*lmmp;
+	if (rc && allocated) {
+		kvfree(*lmmp);
+		*lmmp = NULL;
+	}
 
-	if (lsmp->mea_magic != MEA_MAGIC_LAST_CHAR &&
-	    lsmp->mea_magic != MEA_MAGIC_ALL_CHARS)
-		return -EINVAL;
+	return lmm_size;
+}
+EXPORT_SYMBOL(lmv_pack_md);
 
-	meap->mea_magic = cpu_to_le32(lsmp->mea_magic);
-	meap->mea_count = cpu_to_le32(lsmp->mea_count);
-	meap->mea_master = cpu_to_le32(lsmp->mea_master);
+static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
+			    const struct lmv_mds_md_v1 *lmm1)
+{
+	struct lmv_obd *lmv = &exp->exp_obd->u.lmv;
+	int stripe_count;
+	int rc = 0;
+	int cplen;
+	int i;
 
-	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
-		meap->mea_ids[i] = lsmp->mea_ids[i];
-		fid_cpu_to_le(&meap->mea_ids[i], &lsmp->mea_ids[i]);
+	lsm->lsm_md_magic = le32_to_cpu(lmm1->lmv_magic);
+	lsm->lsm_md_stripe_count = le32_to_cpu(lmm1->lmv_stripe_count);
+	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmm1->lmv_master_mdt_index);
+	lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
+	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
+	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
+			sizeof(lsm->lsm_md_pool_name));
+
+	if (cplen >= sizeof(lsm->lsm_md_pool_name))
+		return -E2BIG;
+
+	CDEBUG(D_INFO, "unpack lsm count %d, master %d hash_type %d layout_version %d\n",
+	       lsm->lsm_md_stripe_count, lsm->lsm_md_master_mdt_index,
+	       lsm->lsm_md_hash_type, lsm->lsm_md_layout_version);
+
+	stripe_count = le32_to_cpu(lmm1->lmv_stripe_count);
+	for (i = 0; i < le32_to_cpu(stripe_count); i++) {
+		fid_le_to_cpu(&lsm->lsm_md_oinfo[i].lmo_fid,
+			      &lmm1->lmv_stripe_fids[i]);
+		rc = lmv_fld_lookup(lmv, &lsm->lsm_md_oinfo[i].lmo_fid,
+				    &lsm->lsm_md_oinfo[i].lmo_mds);
+		if (rc)
+			return rc;
+		CDEBUG(D_INFO, "unpack fid #%d "DFID"\n", i,
+		       PFID(&lsm->lsm_md_oinfo[i].lmo_fid));
 	}
 
-	return mea_size;
+	return rc;
 }
 
-static int lmv_unpackmd(struct obd_export *exp, struct lov_stripe_md **lsmp,
-			struct lov_mds_md *lmm, int lmm_size)
+int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
+		  const union lmv_mds_md *lmm, int stripe_count)
 {
-	struct obd_device	  *obd = class_exp2obd(exp);
-	struct lmv_stripe_md      **tmea = (struct lmv_stripe_md **)lsmp;
-	struct lmv_stripe_md       *mea = (struct lmv_stripe_md *)lmm;
-	struct lmv_obd	     *lmv = &obd->u.lmv;
-	int			 mea_size;
-	int			 i;
-	__u32		       magic;
+	struct lmv_stripe_md *lsm;
+	bool allocated = false;
+	int lsm_size, rc;
+
+	LASSERT(lsmp);
+
+	lsm = *lsmp;
+	/* Free memmd */
+	if (lsm && !lmm) {
+		int i;
 
-	mea_size = lmv_get_easize(lmv);
-	if (!lsmp)
-		return mea_size;
+		for (i = 1; i < lsm->lsm_md_stripe_count; i++) {
+			if (lsm->lsm_md_oinfo[i].lmo_root)
+				iput(lsm->lsm_md_oinfo[i].lmo_root);
+		}
 
-	if (*lsmp && !lmm) {
-		kvfree(*tmea);
+		kvfree(lsm);
 		*lsmp = NULL;
 		return 0;
 	}
 
-	LASSERT(mea_size == lmm_size);
+	/* Alloc memmd */
+	if (!lsm && !lmm) {
+		lsm_size = lmv_stripe_md_size(stripe_count);
+		lsm = libcfs_kvzalloc(lsm_size, GFP_NOFS);
+		if (!lsm)
+			return -ENOMEM;
+		lsm->lsm_md_stripe_count = stripe_count;
+		*lsmp = lsm;
+		return 0;
+	}
 
-	*tmea = libcfs_kvzalloc(mea_size, GFP_NOFS);
-	if (!*tmea)
-		return -ENOMEM;
+	/* Unpack memmd */
+	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1) {
+		CERROR("%s: invalid magic %x.\n", exp->exp_obd->obd_name,
+		       le32_to_cpu(lmm->lmv_magic));
+		return -EINVAL;
+	}
 
-	if (!lmm)
-		return mea_size;
+	lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
+	if (!lsm) {
+		lsm = libcfs_kvzalloc(lsm_size, GFP_NOFS);
+		if (!lsm)
+			return -ENOMEM;
+		allocated = true;
+		*lsmp = lsm;
+	}
 
-	if (mea->mea_magic == MEA_MAGIC_LAST_CHAR ||
-	    mea->mea_magic == MEA_MAGIC_ALL_CHARS ||
-	    mea->mea_magic == MEA_MAGIC_HASH_SEGMENT) {
-		magic = le32_to_cpu(mea->mea_magic);
-	} else {
-		/*
-		 * Old mea is not handled here.
-		 */
-		CERROR("Old not supportable EA is found\n");
-		LBUG();
+	switch (le32_to_cpu(lmm->lmv_magic)) {
+	case LMV_MAGIC_V1:
+		rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1);
+		break;
+	default:
+		CERROR("%s: unrecognized magic %x\n", exp->exp_obd->obd_name,
+		       le32_to_cpu(lmm->lmv_magic));
+		rc = -EINVAL;
+		break;
 	}
 
-	(*tmea)->mea_magic = magic;
-	(*tmea)->mea_count = le32_to_cpu(mea->mea_count);
-	(*tmea)->mea_master = le32_to_cpu(mea->mea_master);
+	if (rc && allocated) {
+		kvfree(lsm);
+		*lsmp = NULL;
+		lsm_size = rc;
+	}
+	return lsm_size;
+}
 
-	for (i = 0; i < (*tmea)->mea_count; i++) {
-		(*tmea)->mea_ids[i] = mea->mea_ids[i];
-		fid_le_to_cpu(&(*tmea)->mea_ids[i], &(*tmea)->mea_ids[i]);
+int lmv_unpackmd(struct obd_export *exp, struct lov_stripe_md **lsmp,
+		 struct lov_mds_md *lmm, int disk_len)
+{
+	return lmv_unpack_md(exp, (struct lmv_stripe_md **)lsmp,
+			     (union lmv_mds_md *)lmm, disk_len);
+}
+
+int lmv_packmd(struct obd_export *exp, struct lov_mds_md **lmmp,
+	       struct lov_stripe_md *lsm)
+{
+	const struct lmv_stripe_md *lmv = (struct lmv_stripe_md *)lsm;
+	struct obd_device *obd = exp->exp_obd;
+	struct lmv_obd *lmv_obd = &obd->u.lmv;
+	int stripe_count;
+
+	if (!lmmp) {
+		if (lsm)
+			stripe_count = lmv->lsm_md_stripe_count;
+		else
+			stripe_count = lmv_obd->desc.ld_tgt_count;
+
+		return lmv_mds_md_size(stripe_count, LMV_MAGIC_V1);
 	}
-	return mea_size;
+
+	return lmv_pack_md((union lmv_mds_md **)lmmp, lmv, 0);
 }
 
 static int lmv_cancel_unused(struct obd_export *exp, const struct lu_fid *fid,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 09/80] staging: lustre: lmv: change handling of lmv striping information
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

The lmv_[un]pack_md function are used to calculate the
size of the data used to represent the LMV striping data.
The original code was straight forward in its calculate
with lmv_get_easize since only one type of data format
could exist. We want to be able to support different
version of this data in the future so this patch moves
to generating the size of the data using the stripe count
and which LMV_MAGIC_* version.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 -
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   15 +-
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |    7 +
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  245 +++++++++++++++-----
 4 files changed, 198 insertions(+), 73 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 5f31724..0ad6605 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2482,10 +2482,6 @@ struct lmv_desc {
 	struct obd_uuid ld_uuid;
 };
 
-#define MEA_MAGIC_LAST_CHAR      0xb2221ca1
-#define MEA_MAGIC_ALL_CHARS      0xb222a11c
-#define MEA_MAGIC_HASH_SEGMENT   0xb222a11b
-
 /* lmv structures */
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 0620c8c..784d67b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -41,12 +41,15 @@ struct lmv_oinfo {
 };
 
 struct lmv_stripe_md {
-	__u32	mea_magic;
-	__u32	mea_count;
-	__u32	mea_master;
-	__u32	mea_padding;
-	char	mea_pool_name[LOV_MAXPOOLNAME];
-	struct lu_fid mea_ids[0];
+	__u32	lsm_md_magic;
+	__u32	lsm_md_stripe_count;
+	__u32	lsm_md_master_mdt_index;
+	__u32	lsm_md_hash_type;
+	__u32	lsm_md_layout_version;
+	__u32	lsm_md_default_count;
+	__u32	lsm_md_default_index;
+	char	lsm_md_pool_name[LOV_MAXPOOLNAME];
+	struct lmv_oinfo lsm_md_oinfo[0];
 };
 
 union lmv_mds_md;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index ab01560..90a9786 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -94,6 +94,13 @@ lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid)
 	return lmv_get_target(lmv, mds);
 }
 
+static inline int lmv_stripe_md_size(int stripe_count)
+{
+	struct lmv_stripe_md *lsm;
+
+	return sizeof(*lsm) + stripe_count * sizeof(lsm->lsm_md_oinfo[0]);
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 8e83263..1ba5900 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2376,105 +2376,224 @@ static int lmv_set_info_async(const struct lu_env *env, struct obd_export *exp,
 	return -EINVAL;
 }
 
-static int lmv_packmd(struct obd_export *exp, struct lov_mds_md **lmmp,
-		      struct lov_stripe_md *lsm)
+static int lmv_pack_md_v1(const struct lmv_stripe_md *lsm,
+			  struct lmv_mds_md_v1 *lmm1)
 {
-	struct obd_device	 *obd = class_exp2obd(exp);
-	struct lmv_obd	    *lmv = &obd->u.lmv;
-	struct lmv_stripe_md      *meap;
-	struct lmv_stripe_md      *lsmp;
-	int			mea_size;
-	int			i;
+	int cplen;
+	int i;
+
+	lmm1->lmv_magic = cpu_to_le32(lsm->lsm_md_magic);
+	lmm1->lmv_stripe_count = cpu_to_le32(lsm->lsm_md_stripe_count);
+	lmm1->lmv_master_mdt_index = cpu_to_le32(lsm->lsm_md_master_mdt_index);
+	lmm1->lmv_hash_type = cpu_to_le32(lsm->lsm_md_hash_type);
+	cplen = strlcpy(lmm1->lmv_pool_name, lsm->lsm_md_pool_name,
+			sizeof(lmm1->lmv_pool_name));
+	if (cplen >= sizeof(lmm1->lmv_pool_name))
+		return -E2BIG;
+
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++)
+		fid_cpu_to_le(&lmm1->lmv_stripe_fids[i],
+			      &lsm->lsm_md_oinfo[i].lmo_fid);
+	return 0;
+}
+
+int lmv_pack_md(union lmv_mds_md **lmmp, const struct lmv_stripe_md *lsm,
+		int stripe_count)
+{
+	int lmm_size = 0, rc = 0;
+	bool allocated = false;
 
-	mea_size = lmv_get_easize(lmv);
-	if (!lmmp)
-		return mea_size;
+	LASSERT(lmmp);
 
+	/* Free lmm */
 	if (*lmmp && !lsm) {
+		int stripe_cnt;
+
+		stripe_cnt = lmv_mds_md_stripe_count_get(*lmmp);
+		lmm_size = lmv_mds_md_size(stripe_cnt,
+					   le32_to_cpu((*lmmp)->lmv_magic));
+		if (!lmm_size)
+			return -EINVAL;
 		kvfree(*lmmp);
 		*lmmp = NULL;
 		return 0;
 	}
 
+	/* Alloc lmm */
+	if (!*lmmp && !lsm) {
+		lmm_size = lmv_mds_md_size(stripe_count, LMV_MAGIC);
+		LASSERT(lmm_size > 0);
+		*lmmp = libcfs_kvzalloc(lmm_size, GFP_NOFS);
+		if (!*lmmp)
+			return -ENOMEM;
+		lmv_mds_md_stripe_count_set(*lmmp, stripe_count);
+		(*lmmp)->lmv_magic = cpu_to_le32(LMV_MAGIC);
+		return lmm_size;
+	}
+
+	/* pack lmm */
+	LASSERT(lsm);
+	lmm_size = lmv_mds_md_size(lsm->lsm_md_stripe_count,
+				   lsm->lsm_md_magic);
 	if (!*lmmp) {
-		*lmmp = libcfs_kvzalloc(mea_size, GFP_NOFS);
+		*lmmp = libcfs_kvzalloc(lmm_size, GFP_NOFS);
 		if (!*lmmp)
 			return -ENOMEM;
+		allocated = true;
 	}
 
-	if (!lsm)
-		return mea_size;
+	switch (lsm->lsm_md_magic) {
+	case LMV_MAGIC_V1:
+		rc = lmv_pack_md_v1(lsm, &(*lmmp)->lmv_md_v1);
+		break;
+	default:
+		rc = -EINVAL;
+		break;
+	}
 
-	lsmp = (struct lmv_stripe_md *)lsm;
-	meap = (struct lmv_stripe_md *)*lmmp;
+	if (rc && allocated) {
+		kvfree(*lmmp);
+		*lmmp = NULL;
+	}
 
-	if (lsmp->mea_magic != MEA_MAGIC_LAST_CHAR &&
-	    lsmp->mea_magic != MEA_MAGIC_ALL_CHARS)
-		return -EINVAL;
+	return lmm_size;
+}
+EXPORT_SYMBOL(lmv_pack_md);
 
-	meap->mea_magic = cpu_to_le32(lsmp->mea_magic);
-	meap->mea_count = cpu_to_le32(lsmp->mea_count);
-	meap->mea_master = cpu_to_le32(lsmp->mea_master);
+static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
+			    const struct lmv_mds_md_v1 *lmm1)
+{
+	struct lmv_obd *lmv = &exp->exp_obd->u.lmv;
+	int stripe_count;
+	int rc = 0;
+	int cplen;
+	int i;
 
-	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
-		meap->mea_ids[i] = lsmp->mea_ids[i];
-		fid_cpu_to_le(&meap->mea_ids[i], &lsmp->mea_ids[i]);
+	lsm->lsm_md_magic = le32_to_cpu(lmm1->lmv_magic);
+	lsm->lsm_md_stripe_count = le32_to_cpu(lmm1->lmv_stripe_count);
+	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmm1->lmv_master_mdt_index);
+	lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
+	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
+	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
+			sizeof(lsm->lsm_md_pool_name));
+
+	if (cplen >= sizeof(lsm->lsm_md_pool_name))
+		return -E2BIG;
+
+	CDEBUG(D_INFO, "unpack lsm count %d, master %d hash_type %d layout_version %d\n",
+	       lsm->lsm_md_stripe_count, lsm->lsm_md_master_mdt_index,
+	       lsm->lsm_md_hash_type, lsm->lsm_md_layout_version);
+
+	stripe_count = le32_to_cpu(lmm1->lmv_stripe_count);
+	for (i = 0; i < le32_to_cpu(stripe_count); i++) {
+		fid_le_to_cpu(&lsm->lsm_md_oinfo[i].lmo_fid,
+			      &lmm1->lmv_stripe_fids[i]);
+		rc = lmv_fld_lookup(lmv, &lsm->lsm_md_oinfo[i].lmo_fid,
+				    &lsm->lsm_md_oinfo[i].lmo_mds);
+		if (rc)
+			return rc;
+		CDEBUG(D_INFO, "unpack fid #%d "DFID"\n", i,
+		       PFID(&lsm->lsm_md_oinfo[i].lmo_fid));
 	}
 
-	return mea_size;
+	return rc;
 }
 
-static int lmv_unpackmd(struct obd_export *exp, struct lov_stripe_md **lsmp,
-			struct lov_mds_md *lmm, int lmm_size)
+int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
+		  const union lmv_mds_md *lmm, int stripe_count)
 {
-	struct obd_device	  *obd = class_exp2obd(exp);
-	struct lmv_stripe_md      **tmea = (struct lmv_stripe_md **)lsmp;
-	struct lmv_stripe_md       *mea = (struct lmv_stripe_md *)lmm;
-	struct lmv_obd	     *lmv = &obd->u.lmv;
-	int			 mea_size;
-	int			 i;
-	__u32		       magic;
+	struct lmv_stripe_md *lsm;
+	bool allocated = false;
+	int lsm_size, rc;
+
+	LASSERT(lsmp);
+
+	lsm = *lsmp;
+	/* Free memmd */
+	if (lsm && !lmm) {
+		int i;
 
-	mea_size = lmv_get_easize(lmv);
-	if (!lsmp)
-		return mea_size;
+		for (i = 1; i < lsm->lsm_md_stripe_count; i++) {
+			if (lsm->lsm_md_oinfo[i].lmo_root)
+				iput(lsm->lsm_md_oinfo[i].lmo_root);
+		}
 
-	if (*lsmp && !lmm) {
-		kvfree(*tmea);
+		kvfree(lsm);
 		*lsmp = NULL;
 		return 0;
 	}
 
-	LASSERT(mea_size == lmm_size);
+	/* Alloc memmd */
+	if (!lsm && !lmm) {
+		lsm_size = lmv_stripe_md_size(stripe_count);
+		lsm = libcfs_kvzalloc(lsm_size, GFP_NOFS);
+		if (!lsm)
+			return -ENOMEM;
+		lsm->lsm_md_stripe_count = stripe_count;
+		*lsmp = lsm;
+		return 0;
+	}
 
-	*tmea = libcfs_kvzalloc(mea_size, GFP_NOFS);
-	if (!*tmea)
-		return -ENOMEM;
+	/* Unpack memmd */
+	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1) {
+		CERROR("%s: invalid magic %x.\n", exp->exp_obd->obd_name,
+		       le32_to_cpu(lmm->lmv_magic));
+		return -EINVAL;
+	}
 
-	if (!lmm)
-		return mea_size;
+	lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
+	if (!lsm) {
+		lsm = libcfs_kvzalloc(lsm_size, GFP_NOFS);
+		if (!lsm)
+			return -ENOMEM;
+		allocated = true;
+		*lsmp = lsm;
+	}
 
-	if (mea->mea_magic == MEA_MAGIC_LAST_CHAR ||
-	    mea->mea_magic == MEA_MAGIC_ALL_CHARS ||
-	    mea->mea_magic == MEA_MAGIC_HASH_SEGMENT) {
-		magic = le32_to_cpu(mea->mea_magic);
-	} else {
-		/*
-		 * Old mea is not handled here.
-		 */
-		CERROR("Old not supportable EA is found\n");
-		LBUG();
+	switch (le32_to_cpu(lmm->lmv_magic)) {
+	case LMV_MAGIC_V1:
+		rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1);
+		break;
+	default:
+		CERROR("%s: unrecognized magic %x\n", exp->exp_obd->obd_name,
+		       le32_to_cpu(lmm->lmv_magic));
+		rc = -EINVAL;
+		break;
 	}
 
-	(*tmea)->mea_magic = magic;
-	(*tmea)->mea_count = le32_to_cpu(mea->mea_count);
-	(*tmea)->mea_master = le32_to_cpu(mea->mea_master);
+	if (rc && allocated) {
+		kvfree(lsm);
+		*lsmp = NULL;
+		lsm_size = rc;
+	}
+	return lsm_size;
+}
 
-	for (i = 0; i < (*tmea)->mea_count; i++) {
-		(*tmea)->mea_ids[i] = mea->mea_ids[i];
-		fid_le_to_cpu(&(*tmea)->mea_ids[i], &(*tmea)->mea_ids[i]);
+int lmv_unpackmd(struct obd_export *exp, struct lov_stripe_md **lsmp,
+		 struct lov_mds_md *lmm, int disk_len)
+{
+	return lmv_unpack_md(exp, (struct lmv_stripe_md **)lsmp,
+			     (union lmv_mds_md *)lmm, disk_len);
+}
+
+int lmv_packmd(struct obd_export *exp, struct lov_mds_md **lmmp,
+	       struct lov_stripe_md *lsm)
+{
+	const struct lmv_stripe_md *lmv = (struct lmv_stripe_md *)lsm;
+	struct obd_device *obd = exp->exp_obd;
+	struct lmv_obd *lmv_obd = &obd->u.lmv;
+	int stripe_count;
+
+	if (!lmmp) {
+		if (lsm)
+			stripe_count = lmv->lsm_md_stripe_count;
+		else
+			stripe_count = lmv_obd->desc.ld_tgt_count;
+
+		return lmv_mds_md_size(stripe_count, LMV_MAGIC_V1);
 	}
-	return mea_size;
+
+	return lmv_pack_md((union lmv_mds_md **)lmmp, lmv, 0);
 }
 
 static int lmv_cancel_unused(struct obd_export *exp, const struct lu_fid *fid,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 10/80] staging: lustre: lmv: remove lmv_get_easize
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Completely replace lmv_get_easize with lmv_mds_md_size.
With this change we can delete lmv_get_easize.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |    7 -------
 drivers/staging/lustre/lustre/lmv/lmv_obd.c      |    2 +-
 2 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index 90a9786..f4c917b 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -55,13 +55,6 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
 int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
 		  struct md_op_data *op_data);
 
-static inline int lmv_get_easize(struct lmv_obd *lmv)
-{
-	return sizeof(struct lmv_stripe_md) +
-		lmv->desc.ld_tgt_count *
-		sizeof(struct lu_fid);
-}
-
 static inline struct lmv_tgt_desc *
 lmv_get_target(struct lmv_obd *lmv, u32 mds)
 {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 1ba5900..0b1260d 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -558,7 +558,7 @@ int lmv_check_connect(struct obd_device *obd)
 	lmv_set_timeouts(obd);
 	class_export_put(lmv->exp);
 	lmv->connected = 1;
-	easize = lmv_get_easize(lmv);
+	easize = lmv_mds_md_size(lmv->desc.ld_tgt_count, LMV_MAGIC);
 	lmv_init_ea_size(obd->obd_self_export, easize, 0, 0, 0);
 	mutex_unlock(&lmv->lmv_init_mutex);
 	return 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 10/80] staging: lustre: lmv: remove lmv_get_easize
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Completely replace lmv_get_easize with lmv_mds_md_size.
With this change we can delete lmv_get_easize.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |    7 -------
 drivers/staging/lustre/lustre/lmv/lmv_obd.c      |    2 +-
 2 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index 90a9786..f4c917b 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -55,13 +55,6 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
 int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
 		  struct md_op_data *op_data);
 
-static inline int lmv_get_easize(struct lmv_obd *lmv)
-{
-	return sizeof(struct lmv_stripe_md) +
-		lmv->desc.ld_tgt_count *
-		sizeof(struct lu_fid);
-}
-
 static inline struct lmv_tgt_desc *
 lmv_get_target(struct lmv_obd *lmv, u32 mds)
 {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 1ba5900..0b1260d 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -558,7 +558,7 @@ int lmv_check_connect(struct obd_device *obd)
 	lmv_set_timeouts(obd);
 	class_export_put(lmv->exp);
 	lmv->connected = 1;
-	easize = lmv_get_easize(lmv);
+	easize = lmv_mds_md_size(lmv->desc.ld_tgt_count, LMV_MAGIC);
 	lmv_init_ea_size(obd->obd_self_export, easize, 0, 0, 0);
 	mutex_unlock(&lmv->lmv_init_mutex);
 	return 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 11/80] staging: lustre: lmv: replace obd_free_memmd with lmv_free_memmd
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Use lmv_free_memmd for proper cleanup instead of
the generic obd_free_memmd.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 0b1260d..6be2afc 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2692,7 +2692,7 @@ static int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md)
 	struct lmv_tgt_desc *tgt = lmv->tgts[0];
 
 	if (md->lmv)
-		obd_free_memmd(exp, (void *)&md->lmv);
+		lmv_free_memmd(md->lmv);
 	if (!tgt || !tgt->ltd_exp)
 		return -EINVAL;
 	return md_free_lustre_md(tgt->ltd_exp, md);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 11/80] staging: lustre: lmv: replace obd_free_memmd with lmv_free_memmd
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Use lmv_free_memmd for proper cleanup instead of
the generic obd_free_memmd.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 0b1260d..6be2afc 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2692,7 +2692,7 @@ static int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md)
 	struct lmv_tgt_desc *tgt = lmv->tgts[0];
 
 	if (md->lmv)
-		obd_free_memmd(exp, (void *)&md->lmv);
+		lmv_free_memmd(md->lmv);
 	if (!tgt || !tgt->ltd_exp)
 		return -EINVAL;
 	return md_free_lustre_md(tgt->ltd_exp, md);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 12/80] staging: lustre: create striped directory
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 49029 bytes --]

From: wang di <di.wang@intel.com>

1. client send create request to the master MDT, which
  will allocate FIDs and create slaves. for all of slaves.

2. Client needs to revalidate slaves during intent getattr
   and open request.

3. lmv_stripe_md will include attributes(size, nlink etc)
   from all of stripe, which will be protected by UPDATE lock.
   client needs to merge these attributes when update inode.

4. send create request to the MDT where the file is located,
   which can help creating master stripe of striped directory.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3529
Reviewed-on: http://review.whamcloud.com/7196
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |    3 +
 .../lustre/lustre/include/lustre/lustre_idl.h      |   40 +++-
 .../lustre/lustre/include/lustre/lustre_user.h     |   16 +-
 drivers/staging/lustre/lustre/include/lustre_lib.h |    2 +
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   59 +++++
 drivers/staging/lustre/lustre/include/obd.h        |   16 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |   19 ++
 drivers/staging/lustre/lustre/llite/dir.c          |   26 ++-
 drivers/staging/lustre/lustre/llite/file.c         |   40 +++-
 .../staging/lustre/lustre/llite/llite_internal.h   |   12 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  193 +++++++++++++++-
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |    7 +-
 drivers/staging/lustre/lustre/llite/namei.c        |   42 +++-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |  244 +++++++++++++++++---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |   32 +++
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  221 +++++++++++++++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |    3 +
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   11 +
 18 files changed, 880 insertions(+), 106 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 3cd4a25..0fa71a5 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -191,6 +191,9 @@ struct cl_attr {
 	 * Group identifier for quota purposes.
 	 */
 	gid_t  cat_gid;
+
+	/* nlink of the directory */
+	__u64  cat_nlink;
 };
 
 /**
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 0ad6605..a612080 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1610,6 +1610,7 @@ static inline void lmm_oi_cpu_to_le(struct ost_id *dst_oi,
 #define XATTR_NAME_LOV	  "trusted.lov"
 #define XATTR_NAME_LMA	  "trusted.lma"
 #define XATTR_NAME_LMV	  "trusted.lmv"
+#define XATTR_NAME_DEFAULT_LMV	"trusted.dmv"
 #define XATTR_NAME_LINK	 "trusted.link"
 #define XATTR_NAME_FID	  "trusted.fid"
 #define XATTR_NAME_VERSION      "trusted.version"
@@ -2472,7 +2473,7 @@ struct lmv_desc {
 	__u32 ld_tgt_count;		/* how many MDS's */
 	__u32 ld_active_tgt_count;	 /* how many active */
 	__u32 ld_default_stripe_count;     /* how many objects are used */
-	__u32 ld_pattern;		  /* default MEA_MAGIC_* */
+	__u32 ld_pattern;		  /* default hash pattern */
 	__u64 ld_default_hash_size;
 	__u64 ld_padding_1;		/* also fix lustre_swab_lmv_desc */
 	__u32 ld_padding_2;		/* also fix lustre_swab_lmv_desc */
@@ -2486,6 +2487,43 @@ struct lmv_desc {
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
 #define LMV_MAGIC	LMV_MAGIC_V1
+
+enum lmv_hash_type {
+	LMV_HASH_TYPE_ALL_CHARS = 1,
+	LMV_HASH_TYPE_FNV_1A_64 = 2,
+};
+
+#define LMV_HASH_NAME_ALL_CHARS		"all_char"
+#define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
+
+/**
+ * The FNV-1a hash algorithm is as follows:
+ *     hash = FNV_offset_basis
+ *     for each octet_of_data to be hashed
+ *             hash = hash XOR octet_of_data
+ *             hash = hash × FNV_prime
+ *     return hash
+ * http://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function#FNV-1a_hash
+ *
+ * http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-reference-source
+ * FNV_prime is 2^40 + 2^8 + 0xb3 = 0x100000001b3ULL
+ **/
+#define LUSTRE_FNV_1A_64_PRIME		0x100000001b3ULL
+#define LUSTRE_FNV_1A_64_OFFSET_BIAS	0xcbf29ce484222325ULL
+static inline __u64 lustre_hash_fnv_1a_64(const void *buf, size_t size)
+{
+	__u64 hash = LUSTRE_FNV_1A_64_OFFSET_BIAS;
+	const unsigned char *p = buf;
+	size_t i;
+
+	for (i = 0; i < size; i++) {
+		hash ^= p[i];
+		hash *= LUSTRE_FNV_1A_64_PRIME;
+	}
+
+	return hash;
+}
+
 struct lmv_mds_md_v1 {
 	__u32 lmv_magic;
 	__u32 lmv_stripe_count;		/* stripe count */
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index ef6f38f..d496d0e 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -374,19 +374,17 @@ struct lov_user_mds_data_v3 {
 } __packed;
 #endif
 
-/* keep this to be the same size as lov_user_ost_data_v1 */
 struct lmv_user_mds_data {
 	struct lu_fid	lum_fid;
 	__u32		lum_padding;
 	__u32		lum_mds;
 };
 
-/* lum_type */
-enum {
-	LMV_STRIPE_TYPE = 0,
-	LMV_DEFAULT_TYPE = 1,
-};
-
+/*
+ * Got this according to how get LOV_MAX_STRIPE_COUNT, see above,
+ * (max buffer size - lmv+rpc header) / sizeof(struct lmv_user_mds_data)
+ */
+#define LMV_MAX_STRIPE_COUNT 2000  /* ((12 * 4096 - 256) / 24) */
 #define lmv_user_md lmv_user_md_v1
 struct lmv_user_md_v1 {
 	__u32	lum_magic;	 /* must be the first field */
@@ -399,7 +397,7 @@ struct lmv_user_md_v1 {
 	__u32	lum_padding3;
 	char	lum_pool_name[LOV_MAXPOOLNAME];
 	struct	lmv_user_mds_data  lum_objects[0];
-};
+} __packed;
 
 static inline int lmv_user_md_size(int stripes, int lmm_magic)
 {
@@ -407,6 +405,8 @@ static inline int lmv_user_md_size(int stripes, int lmm_magic)
 		      stripes * sizeof(struct lmv_user_mds_data);
 }
 
+void lustre_swab_lmv_user_md(struct lmv_user_md *lum);
+
 struct ll_recreate_obj {
 	__u64 lrc_id;
 	__u32 lrc_ost_idx;
diff --git a/drivers/staging/lustre/lustre/include/lustre_lib.h b/drivers/staging/lustre/lustre/include/lustre_lib.h
index 06958f2..def0193 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lib.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lib.h
@@ -391,6 +391,8 @@ static inline void obd_ioctl_freedata(char *buf, int len)
 #define LOVEA_DELETE_VALUES(size, count, offset) (size == 0 && count == 0 && \
 						 offset == (typeof(offset))(-1))
 
+#define LMVEA_DELETE_VALUES(count, offset) ((count) == 0 && \
+					    (offset) == (typeof(offset))(-1))
 /* #define POISON_BULK 0 */
 
 /*
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 784d67b..4036fce 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -66,4 +66,63 @@ static inline void lmv_free_memmd(struct lmv_stripe_md *lsm)
 {
 	lmv_unpack_md(NULL, &lsm, NULL, 0);
 }
+
+static inline void lmv1_cpu_to_le(struct lmv_mds_md_v1 *lmv_dst,
+				  const struct lmv_mds_md_v1 *lmv_src)
+{
+	int i;
+
+	lmv_dst->lmv_magic = cpu_to_le32(lmv_src->lmv_magic);
+	lmv_dst->lmv_stripe_count = cpu_to_le32(lmv_src->lmv_stripe_count);
+	lmv_dst->lmv_master_mdt_index =
+		cpu_to_le32(lmv_src->lmv_master_mdt_index);
+	lmv_dst->lmv_hash_type = cpu_to_le32(lmv_src->lmv_hash_type);
+	lmv_dst->lmv_layout_version = cpu_to_le32(lmv_src->lmv_layout_version);
+
+	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
+		fid_cpu_to_le(&lmv_dst->lmv_stripe_fids[i],
+			      &lmv_src->lmv_stripe_fids[i]);
+}
+
+static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
+				  const struct lmv_mds_md_v1 *lmv_src)
+{
+	int i;
+
+	lmv_dst->lmv_magic = le32_to_cpu(lmv_src->lmv_magic);
+	lmv_dst->lmv_stripe_count = le32_to_cpu(lmv_src->lmv_stripe_count);
+	lmv_dst->lmv_master_mdt_index =
+		le32_to_cpu(lmv_src->lmv_master_mdt_index);
+	lmv_dst->lmv_hash_type = le32_to_cpu(lmv_src->lmv_hash_type);
+	lmv_dst->lmv_layout_version = le32_to_cpu(lmv_src->lmv_layout_version);
+
+	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
+		fid_le_to_cpu(&lmv_dst->lmv_stripe_fids[i],
+			      &lmv_src->lmv_stripe_fids[i]);
+}
+
+static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
+				 const union lmv_mds_md *lmv_src)
+{
+	switch (lmv_src->lmv_magic) {
+	case LMV_MAGIC_V1:
+		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
+		break;
+	default:
+		break;
+	}
+}
+
+static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
+				 const union lmv_mds_md *lmv_src)
+{
+	switch (le32_to_cpu(lmv_src->lmv_magic)) {
+	case LMV_MAGIC_V1:
+		lmv1_le_to_cpu(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
+		break;
+	default:
+		break;
+	}
+}
+
 #endif
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 17b8d22..a9f4e13 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -1022,14 +1022,6 @@ enum {
 };
 
 /* lmv structures */
-#define MEA_MAGIC_LAST_CHAR      0xb2221ca1
-#define MEA_MAGIC_ALL_CHARS      0xb222a11c
-#define MEA_MAGIC_HASH_SEGMENT   0xb222a11b
-
-#define MAX_HASH_SIZE_32	 0x7fffffffUL
-#define MAX_HASH_SIZE	    0x7fffffffffffffffULL
-#define MAX_HASH_HIGHEST_BIT     0x1000000000000000ULL
-
 struct lustre_md {
 	struct mdt_body	 *body;
 	struct lov_stripe_md    *lsm;
@@ -1049,6 +1041,7 @@ struct md_open_data {
 };
 
 struct lookup_intent;
+struct cl_attr;
 
 struct md_ops {
 	int (*getstatus)(struct obd_export *, struct lu_fid *);
@@ -1109,6 +1102,13 @@ struct md_ops {
 
 	int (*free_lustre_md)(struct obd_export *, struct lustre_md *);
 
+	int (*merge_attr)(struct obd_export *,
+			  const struct lmv_stripe_md *lsm,
+			  struct cl_attr *attr);
+
+	int (*update_lsm_md)(struct obd_export *, struct lmv_stripe_md *lsm,
+			     struct mdt_body *, ldlm_blocking_callback);
+
 	int (*set_open_replay_data)(struct obd_export *,
 				    struct obd_client_handle *,
 				    struct lookup_intent *);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 6482a93..2f111a8 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1559,6 +1559,25 @@ static inline int md_free_lustre_md(struct obd_export *exp,
 	return MDP(exp->exp_obd, free_lustre_md)(exp, md);
 }
 
+static inline int md_update_lsm_md(struct obd_export *exp,
+				   struct lmv_stripe_md *lsm,
+				   struct mdt_body *body,
+				   ldlm_blocking_callback cb)
+{
+	EXP_CHECK_MD_OP(exp, update_lsm_md);
+	EXP_MD_COUNTER_INCREMENT(exp, update_lsm_md);
+	return MDP(exp->exp_obd, update_lsm_md)(exp, lsm, body, cb);
+}
+
+static inline int md_merge_attr(struct obd_export *exp,
+				const struct lmv_stripe_md *lsm,
+				struct cl_attr *attr)
+{
+	EXP_CHECK_MD_OP(exp, merge_attr);
+	EXP_MD_COUNTER_INCREMENT(exp, merge_attr);
+	return MDP(exp->exp_obd, merge_attr)(exp, lsm, attr);
+}
+
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
 			      u64 valid, const char *name,
 			      const char *input, int input_size,
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index a72b486..a0560b6 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -668,7 +668,7 @@ static int ll_send_mgc_param(struct obd_export *mgc, char *string)
 }
 
 static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
-			       char *filename)
+			       const char *filename)
 {
 	struct ptlrpc_request *request = NULL;
 	struct md_op_data *op_data;
@@ -676,6 +676,26 @@ static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
 	int mode;
 	int err;
 
+	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
+		return -EINVAL;
+
+	if (lump->lum_stripe_offset == (__u32)-1) {
+		int mdtidx;
+
+		mdtidx = ll_get_mdt_idx(dir);
+		if (mdtidx < 0)
+			return mdtidx;
+
+		lump->lum_stripe_offset = mdtidx;
+	}
+
+	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p) name %s stripe_offset %d, stripe_count: %u\n",
+	       PFID(ll_inode2fid(dir)), dir, filename,
+	       (int)lump->lum_stripe_offset, lump->lum_stripe_count);
+
+	if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC))
+		lustre_swab_lmv_user_md(lump);
+
 	mode = (~current_umask() & 0755) | S_IFDIR;
 	op_data = ll_prep_md_op_data(NULL, dir, NULL, filename,
 				     strlen(filename), mode, LUSTRE_OPC_MKDIR,
@@ -745,9 +765,6 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	if (lump && lump->lmm_magic == cpu_to_le32(LMV_USER_MAGIC))
-		op_data->op_cli_flags |= CLI_SET_MEA;
-
 	/* swabbing is done in lov_setstripe() on server side */
 	rc = md_setattr(sbi->ll_md_exp, op_data, lump, lum_size,
 			NULL, 0, &req, NULL);
@@ -1424,7 +1441,6 @@ lmv_out_free:
 		}
 
 		*tmp = lum;
-		tmp->lum_type = LMV_STRIPE_TYPE;
 		tmp->lum_stripe_count = 1;
 		mdtindex = ll_get_mdt_idx(inode);
 		if (mdtindex < 0) {
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 58a7401..18fb713 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3015,6 +3015,27 @@ out:
 	return rc;
 }
 
+static int ll_merge_md_attr(struct inode *inode)
+{
+	struct cl_attr attr = { 0 };
+	int rc;
+
+	LASSERT(ll_i2info(inode)->lli_lsm_md);
+	rc = md_merge_attr(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
+			   &attr);
+	if (rc)
+		return rc;
+
+	ll_i2info(inode)->lli_stripe_dir_size = attr.cat_size;
+	ll_i2info(inode)->lli_stripe_dir_nlink = attr.cat_nlink;
+
+	ll_i2info(inode)->lli_atime = attr.cat_atime;
+	ll_i2info(inode)->lli_mtime = attr.cat_mtime;
+	ll_i2info(inode)->lli_ctime = attr.cat_ctime;
+
+	return 0;
+}
+
 static int ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 {
 	struct inode *inode = d_inode(dentry);
@@ -3026,6 +3047,13 @@ static int ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 
 	/* if object isn't regular file, don't validate size */
 	if (!S_ISREG(inode->i_mode)) {
+		if (S_ISDIR(inode->i_mode) &&
+		    ll_i2info(inode)->lli_lsm_md) {
+			rc = ll_merge_md_attr(inode);
+			if (rc)
+				return rc;
+		}
+
 		LTIME_S(inode->i_atime) = ll_i2info(inode)->lli_atime;
 		LTIME_S(inode->i_mtime) = ll_i2info(inode)->lli_mtime;
 		LTIME_S(inode->i_ctime) = ll_i2info(inode)->lli_ctime;
@@ -3063,7 +3091,6 @@ int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat)
 	else
 		stat->ino = inode->i_ino;
 	stat->mode = inode->i_mode;
-	stat->nlink = inode->i_nlink;
 	stat->uid = inode->i_uid;
 	stat->gid = inode->i_gid;
 	stat->rdev = inode->i_rdev;
@@ -3071,10 +3098,17 @@ int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat)
 	stat->mtime = inode->i_mtime;
 	stat->ctime = inode->i_ctime;
 	stat->blksize = 1 << inode->i_blkbits;
-
-	stat->size = i_size_read(inode);
 	stat->blocks = inode->i_blocks;
 
+	if (S_ISDIR(inode->i_mode) &&
+	    ll_i2info(inode)->lli_lsm_md) {
+		stat->nlink = lli->lli_stripe_dir_nlink;
+		stat->size = lli->lli_stripe_dir_size;
+	} else {
+		stat->nlink = inode->i_nlink;
+		stat->size = i_size_read(inode);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 07b6918..f3b8504 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -39,6 +39,7 @@
 
 /* for struct cl_lock_descr and struct cl_io */
 #include "../include/cl_object.h"
+#include "../include/lustre_lmv.h"
 #include "../include/lustre_mdc.h"
 #include "../include/lustre_intent.h"
 #include <linux/compat.h>
@@ -174,7 +175,11 @@ struct ll_inode_info {
 			 */
 			pid_t			   d_opendir_pid;
 			/* directory stripe information */
-			struct lmv_stripe_md		*d_lmv_md;
+			struct lmv_stripe_md		*d_lsm_md;
+			/* striped directory size */
+			loff_t				d_stripe_size;
+			/* striped directory nlink */
+			__u64				d_stripe_nlink;
 		} d;
 
 #define lli_readdir_mutex       u.d.d_readdir_mutex
@@ -182,7 +187,9 @@ struct ll_inode_info {
 #define lli_sai		 u.d.d_sai
 #define lli_sa_lock	     u.d.d_sa_lock
 #define lli_opendir_pid	 u.d.d_opendir_pid
-#define lli_lmv_md		u.d.d_lmv_md
+#define lli_lsm_md		u.d.d_lsm_md
+#define lli_stripe_dir_size	u.d.d_stripe_size
+#define lli_stripe_dir_nlink	u.d.d_stripe_nlink
 
 		/* for non-directory */
 		struct {
@@ -664,6 +671,7 @@ int ll_objects_destroy(struct ptlrpc_request *request,
 		       struct inode *dir);
 struct inode *ll_iget(struct super_block *sb, ino_t hash,
 		      struct lustre_md *lic);
+int ll_test_inode_by_fid(struct inode *inode, void *opaque);
 int ll_md_blocking_ast(struct ldlm_lock *, struct ldlm_lock_desc *,
 		       void *data, int flag);
 struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index eb715be..ef8d87a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -992,6 +992,188 @@ struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
 	return inode;
 }
 
+static void ll_dir_clear_lsm_md(struct inode *inode)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+
+	LASSERT(S_ISDIR(inode->i_mode));
+
+	if (lli->lli_lsm_md) {
+		lmv_free_memmd(lli->lli_lsm_md);
+		lli->lli_lsm_md = NULL;
+	}
+}
+
+static struct inode *ll_iget_anon_dir(struct super_block *sb,
+				      const struct lu_fid *fid,
+				      struct lustre_md *md)
+{
+	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct mdt_body *body = md->body;
+	struct inode *inode;
+	ino_t ino;
+
+	ino = cl_fid_build_ino(fid, sbi->ll_flags & LL_SBI_32BIT_API);
+	inode = iget_locked(sb, ino);
+	if (!inode) {
+		CERROR("%s: failed get simple inode "DFID": rc = -ENOENT\n",
+		       ll_get_fsname(sb, NULL, 0), PFID(fid));
+		return ERR_PTR(-ENOENT);
+	}
+
+	if (inode->i_state & I_NEW) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+		struct lmv_stripe_md *lsm = md->lmv;
+
+		inode->i_mode = (inode->i_mode & ~S_IFMT) |
+				(body->mode & S_IFMT);
+		LASSERTF(S_ISDIR(inode->i_mode), "Not slave inode "DFID"\n",
+			 PFID(fid));
+
+		LTIME_S(inode->i_mtime) = 0;
+		LTIME_S(inode->i_atime) = 0;
+		LTIME_S(inode->i_ctime) = 0;
+		inode->i_rdev = 0;
+
+		inode->i_op = &ll_dir_inode_operations;
+		inode->i_fop = &ll_dir_operations;
+		lli->lli_fid = *fid;
+		ll_lli_init(lli);
+
+		LASSERT(lsm);
+		/* master stripe FID */
+		lli->lli_pfid = lsm->lsm_md_oinfo[0].lmo_fid;
+		CDEBUG(D_INODE, "lli %p master "DFID" slave "DFID"\n",
+		       lli, PFID(fid), PFID(&lli->lli_pfid));
+		unlock_new_inode(inode);
+	}
+
+	return inode;
+}
+
+static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md)
+{
+	struct lmv_stripe_md *lsm = md->lmv;
+	struct lu_fid *fid;
+	int i;
+
+	LASSERT(lsm);
+	/*
+	 * XXX sigh, this lsm_root initialization should be in
+	 * LMV layer, but it needs ll_iget right now, so we
+	 * put this here right now.
+	 */
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
+		fid = &lsm->lsm_md_oinfo[i].lmo_fid;
+		LASSERT(!lsm->lsm_md_oinfo[i].lmo_root);
+		if (!i) {
+			lsm->lsm_md_oinfo[i].lmo_root = inode;
+		} else {
+			/*
+			 * Unfortunately ll_iget will call ll_update_inode,
+			 * where the initialization of slave inode is slightly
+			 * different, so it reset lsm_md to NULL to avoid
+			 * initializing lsm for slave inode.
+			 */
+			lsm->lsm_md_oinfo[i].lmo_root =
+				ll_iget_anon_dir(inode->i_sb, fid, md);
+			if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) {
+				int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root);
+
+				lsm->lsm_md_oinfo[i].lmo_root = NULL;
+				return rc;
+			}
+		}
+	}
+
+	/*
+	 * Here is where the lsm is being initialized(fill lmo_info) after
+	 * client retrieve MD stripe information from MDT.
+	 */
+	return md_update_lsm_md(ll_i2mdexp(inode), lsm, md->body,
+				ll_md_blocking_ast);
+}
+
+static inline int lli_lsm_md_eq(const struct lmv_stripe_md *lsm_md1,
+				const struct lmv_stripe_md *lsm_md2)
+{
+	return lsm_md1->lsm_md_magic == lsm_md2->lsm_md_magic &&
+	       lsm_md1->lsm_md_stripe_count == lsm_md2->lsm_md_stripe_count &&
+	       lsm_md1->lsm_md_master_mdt_index ==
+			lsm_md2->lsm_md_master_mdt_index &&
+	       lsm_md1->lsm_md_hash_type == lsm_md2->lsm_md_hash_type &&
+	       lsm_md1->lsm_md_layout_version ==
+			lsm_md2->lsm_md_layout_version &&
+	       !strcmp(lsm_md1->lsm_md_pool_name,
+		       lsm_md2->lsm_md_pool_name);
+}
+
+static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct lmv_stripe_md *lsm = md->lmv;
+	int idx;
+
+	LASSERT(lsm);
+	LASSERT(S_ISDIR(inode->i_mode));
+	if (!lli->lli_lsm_md) {
+		int rc;
+
+		rc = ll_init_lsm_md(inode, md);
+		if (rc) {
+			CERROR("%s: init "DFID" failed: rc = %d\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0),
+			       PFID(&lli->lli_fid), rc);
+			return;
+		}
+		lli->lli_lsm_md = lsm;
+		/*
+		 * set lsm_md to NULL, so the following free lustre_md
+		 * will not free this lsm
+		 */
+		md->lmv = NULL;
+		return;
+	}
+
+	/* Compare the old and new stripe information */
+	if (!lli_lsm_md_eq(lli->lli_lsm_md, lsm)) {
+		CERROR("inode %p %lu mismatch\n"
+		       "    new(%p)     vs     lli_lsm_md(%p):\n"
+		       "    magic:      %x                   %x\n"
+		       "    count:      %x                   %x\n"
+		       "    master:     %x                   %x\n"
+		       "    hash_type:  %x                   %x\n"
+		       "    layout:     %x                   %x\n"
+		       "    pool:       %s                   %s\n",
+		       inode, inode->i_ino, lsm, lli->lli_lsm_md,
+		       lsm->lsm_md_magic, lli->lli_lsm_md->lsm_md_magic,
+		       lsm->lsm_md_stripe_count,
+		       lli->lli_lsm_md->lsm_md_stripe_count,
+		       lsm->lsm_md_master_mdt_index,
+		       lli->lli_lsm_md->lsm_md_master_mdt_index,
+		       lsm->lsm_md_hash_type, lli->lli_lsm_md->lsm_md_hash_type,
+		       lsm->lsm_md_layout_version,
+		       lli->lli_lsm_md->lsm_md_layout_version,
+		       lsm->lsm_md_pool_name,
+		       lli->lli_lsm_md->lsm_md_pool_name);
+		return;
+	}
+
+	for (idx = 0; idx < lli->lli_lsm_md->lsm_md_stripe_count; idx++) {
+		if (!lu_fid_eq(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid,
+			       &lsm->lsm_md_oinfo[idx].lmo_fid)) {
+			CERROR("%s: FID in lsm mismatch idx %d, old: "DFID" new:"DFID"\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
+			       PFID(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid),
+			       PFID(&lsm->lsm_md_oinfo[idx].lmo_fid));
+			return;
+		}
+	}
+
+	md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
+			 md->body, ll_md_blocking_ast);
+}
+
 void ll_clear_inode(struct inode *inode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
@@ -1039,7 +1221,9 @@ void ll_clear_inode(struct inode *inode)
 #endif
 	lli->lli_inode_magic = LLI_INODE_DEAD;
 
-	if (!S_ISDIR(inode->i_mode))
+	if (S_ISDIR(inode->i_mode))
+		ll_dir_clear_lsm_md(inode);
+	else
 		LASSERT(list_empty(&lli->lli_agl_list));
 
 	/*
@@ -1484,6 +1668,9 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			lli->lli_maxbytes = MAX_LFS_FILESIZE;
 	}
 
+	if (S_ISDIR(inode->i_mode) && md->lmv)
+		ll_update_lsm_md(inode, md);
+
 #ifdef CONFIG_FS_POSIX_ACL
 	if (body->valid & OBD_MD_FLACL) {
 		spin_lock(&lli->lli_lock);
@@ -2091,12 +2278,12 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	ll_i2gids(op_data->op_suppgids, i1, i2);
 	op_data->op_fid1 = *ll_inode2fid(i1);
 	if (S_ISDIR(i1->i_mode))
-		op_data->op_mea1 = ll_i2info(i1)->lli_lmv_md;
+		op_data->op_mea1 = ll_i2info(i1)->lli_lsm_md;
 
 	if (i2) {
 		op_data->op_fid2 = *ll_inode2fid(i2);
 		if (S_ISDIR(i2->i_mode))
-			op_data->op_mea2 = ll_i2info(i2)->lli_lmv_md;
+			op_data->op_mea2 = ll_i2info(i2)->lli_lsm_md;
 	} else {
 		fid_zero(&op_data->op_fid2);
 	}
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index 74eb1fc..ab9d5cc 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -73,11 +73,6 @@ void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid)
 	fsid->val[1] = key >> 32;
 }
 
-static int ll_nfs_test_inode(struct inode *inode, void *opaque)
-{
-	return lu_fid_eq(&ll_i2info(inode)->lli_fid, opaque);
-}
-
 struct inode *search_inode_for_lustre(struct super_block *sb,
 				      const struct lu_fid *fid)
 {
@@ -92,7 +87,7 @@ struct inode *search_inode_for_lustre(struct super_block *sb,
 
 	CDEBUG(D_INFO, "searching inode for:(%lu,"DFID")\n", hash, PFID(fid));
 
-	inode = ilookup5(sb, hash, ll_nfs_test_inode, (void *)fid);
+	inode = ilookup5(sb, hash, ll_test_inode_by_fid, (void *)fid);
 	if (inode)
 		return inode;
 
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 1e75f5b..e32d08b 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -158,6 +158,11 @@ static void ll_invalidate_negative_children(struct inode *dir)
 	spin_unlock(&dir->i_lock);
 }
 
+int ll_test_inode_by_fid(struct inode *inode, void *opaque)
+{
+	return lu_fid_eq(&ll_i2info(inode)->lli_fid, opaque);
+}
+
 int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 		       void *data, int flag)
 {
@@ -253,10 +258,41 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 		}
 
 		if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) {
-			CDEBUG(D_INODE, "invalidating inode "DFID"\n",
-			       PFID(ll_inode2fid(inode)));
+			struct ll_inode_info *lli = ll_i2info(inode);
+
+			CDEBUG(D_INODE, "invalidating inode "DFID" lli = %p, pfid  = "DFID"\n",
+			       PFID(ll_inode2fid(inode)), lli,
+			       PFID(&lli->lli_pfid));
+
 			truncate_inode_pages(inode->i_mapping, 0);
-			ll_invalidate_negative_children(inode);
+
+			if (unlikely(!fid_is_zero(&lli->lli_pfid))) {
+				struct inode *master_inode = NULL;
+				unsigned long hash;
+
+				/*
+				 * This is slave inode, since all of the child
+				 * dentry is connected on the master inode, so
+				 * we have to invalidate the negative children
+				 * on master inode
+				 */
+				CDEBUG(D_INODE, "Invalidate s"DFID" m"DFID"\n",
+				       PFID(ll_inode2fid(inode)),
+				       PFID(&lli->lli_pfid));
+
+				hash = cl_fid_build_ino(&lli->lli_pfid,
+							ll_need_32bit_api(ll_i2sbi(inode)));
+
+				master_inode = ilookup5(inode->i_sb, hash,
+							ll_test_inode_by_fid,
+							(void *)&lli->lli_pfid);
+				if (master_inode && !IS_ERR(master_inode)) {
+					ll_invalidate_negative_children(master_inode);
+					iput(master_inode);
+				}
+			} else {
+				ll_invalidate_negative_children(inode);
+			}
 		}
 
 		if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) &&
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 2f58fda..1b9bbb2 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -150,6 +150,160 @@ out:
 	return rc;
 }
 
+int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
+			  struct lmv_stripe_md *lsm,
+			  ldlm_blocking_callback cb_blocking,
+			  int extra_lock_flags)
+{
+	struct obd_device *obd = exp->exp_obd;
+	struct lmv_obd *lmv = &obd->u.lmv;
+	struct mdt_body *body;
+	struct md_op_data *op_data;
+	unsigned long size = 0;
+	unsigned long nlink = 0;
+	__s64 atime = 0;
+	__s64 ctime = 0;
+	__s64 mtime = 0;
+	int rc = 0, i;
+
+	/**
+	 * revalidate slaves has some problems, temporarily return,
+	 * we may not need that
+	 */
+	if (lsm->lsm_md_stripe_count <= 1)
+		return 0;
+
+	op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
+	if (!op_data)
+		return -ENOMEM;
+
+	/**
+	 * Loop over the stripe information, check validity and update them
+	 * from MDS if needed.
+	 */
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
+		struct lookup_intent it = { .it_op = IT_GETATTR };
+		struct ptlrpc_request *req = NULL;
+		struct lustre_handle *lockh = NULL;
+		struct lmv_tgt_desc *tgt = NULL;
+		struct inode *inode;
+		struct lu_fid fid;
+
+		fid = lsm->lsm_md_oinfo[i].lmo_fid;
+		inode = lsm->lsm_md_oinfo[i].lmo_root;
+		if (!i) {
+			if (mbody) {
+				body = mbody;
+				goto update;
+			} else {
+				goto release_lock;
+			}
+		}
+
+		/*
+		 * Prepare op_data for revalidating. Note that @fid2 shluld be
+		 * defined otherwise it will go to server and take new lock
+		 * which is not needed here.
+		 */
+		memset(op_data, 0, sizeof(*op_data));
+		op_data->op_fid1 = fid;
+		op_data->op_fid2 = fid;
+
+		tgt = lmv_locate_mds(lmv, op_data, &fid);
+		if (IS_ERR(tgt)) {
+			rc = PTR_ERR(tgt);
+			goto cleanup;
+		}
+
+		CDEBUG(D_INODE, "Revalidate slave "DFID" -> mds #%d\n",
+		       PFID(&fid), tgt->ltd_idx);
+
+		rc = md_intent_lock(tgt->ltd_exp, op_data, NULL, 0, &it, 0,
+				    &req, cb_blocking, extra_lock_flags);
+		if (rc < 0)
+			goto cleanup;
+
+		lockh = (struct lustre_handle *)&it.it_lock_handle;
+		if (rc > 0 && !req) {
+			/* slave inode is still valid */
+			CDEBUG(D_INODE, "slave "DFID" is still valid.\n",
+			       PFID(&fid));
+			rc = 0;
+		} else {
+			/* refresh slave from server */
+			body = req_capsule_server_get(&req->rq_pill,
+						      &RMF_MDT_BODY);
+			LASSERT(body);
+update:
+			if (unlikely(body->nlink < 2)) {
+				CERROR("%s: nlink %d < 2 corrupt stripe %d "DFID":" DFID"\n",
+				       obd->obd_name, body->nlink, i,
+				       PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
+				       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
+
+				if (req)
+					ptlrpc_req_finished(req);
+
+				rc = -EIO;
+				goto cleanup;
+			}
+
+			if (i)
+				md_set_lock_data(tgt->ltd_exp, &lockh->cookie,
+						 inode, NULL);
+
+			i_size_write(inode, body->size);
+			set_nlink(inode, body->nlink);
+			LTIME_S(inode->i_atime) = body->atime;
+			LTIME_S(inode->i_ctime) = body->ctime;
+			LTIME_S(inode->i_mtime) = body->mtime;
+
+			if (req)
+				ptlrpc_req_finished(req);
+		}
+release_lock:
+		size += i_size_read(inode);
+
+		if (i != 0)
+			nlink += inode->i_nlink - 2;
+		else
+			nlink += inode->i_nlink;
+
+		atime = LTIME_S(inode->i_atime) > atime ?
+				LTIME_S(inode->i_atime) : atime;
+		ctime = LTIME_S(inode->i_ctime) > ctime ?
+				LTIME_S(inode->i_ctime) : ctime;
+		mtime = LTIME_S(inode->i_mtime) > mtime ?
+				LTIME_S(inode->i_mtime) : mtime;
+
+		if (it.it_lock_mode && lockh) {
+			ldlm_lock_decref(lockh, it.it_lock_mode);
+			it.it_lock_mode = 0;
+		}
+
+		CDEBUG(D_INODE, "i %d "DFID" size %llu, nlink %u, atime %lu, mtime %lu, ctime %lu.\n",
+		       i, PFID(&fid), i_size_read(inode), inode->i_nlink,
+		       LTIME_S(inode->i_atime), LTIME_S(inode->i_mtime),
+		       LTIME_S(inode->i_ctime));
+	}
+
+	/*
+	 * update attr of master request.
+	 */
+	CDEBUG(D_INODE, "Return refreshed attrs: size = %lu nlink %lu atime %llu ctime %llu mtime %llu for " DFID"\n",
+	       size, nlink, atime, ctime, mtime,
+	       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
+
+	if (mbody) {
+		mbody->atime = atime;
+		mbody->ctime = ctime;
+		mbody->mtime = mtime;
+	}
+cleanup:
+	kfree(op_data);
+	return rc;
+}
+
 /*
  * IT_OPEN is intended to open (and create, possible) an object. Parent (pid)
  * may be split dir.
@@ -166,9 +320,26 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	struct mdt_body		*body;
 	int			rc;
 
-	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(tgt))
-		return PTR_ERR(tgt);
+	if (it->it_flags & MDS_OPEN_BY_FID && fid_is_sane(&op_data->op_fid2)) {
+		if (op_data->op_mea1) {
+			struct lmv_stripe_md *lsm = op_data->op_mea1;
+			const struct lmv_oinfo *oinfo;
+
+			oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
+							op_data->op_namelen);
+			op_data->op_fid1 = oinfo->lmo_fid;
+		}
+
+		tgt = lmv_find_target(lmv, &op_data->op_fid2);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		op_data->op_mds = tgt->ltd_idx;
+	} else {
+		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
 
 	/* If it is ready to open the file by FID, do not need
 	 * allocate FID at all, otherwise it will confuse MDT
@@ -205,31 +376,18 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
-	/*
-	 * Not cross-ref case, just get out of here.
-	 */
-	if (likely(!(body->valid & OBD_MD_MDS)))
-		return 0;
 
-	/*
-	 * Okay, MDS has returned success. Probably name has been resolved in
-	 * remote inode.
-	 */
-	rc = lmv_intent_remote(exp, lmm, lmmsize, it, &op_data->op_fid1, flags,
-			       reqp, cb_blocking, extra_lock_flags);
-	if (rc != 0) {
-		LASSERT(rc < 0);
-		/*
-		 * This is possible, that some userspace application will try to
-		 * open file as directory and we will have -ENOTDIR here. As
-		 * this is normal situation, we should not print error here,
-		 * only debug info.
-		 */
-		CDEBUG(D_INODE, "Can't handle remote %s: dir " DFID "(" DFID "):%*s: %d\n",
-		       LL_IT2STR(it), PFID(&op_data->op_fid2),
-		       PFID(&op_data->op_fid1), op_data->op_namelen,
-		       op_data->op_name, rc);
-		return rc;
+	/* Not cross-ref case, just get out of here. */
+	if (unlikely((body->valid & OBD_MD_MDS))) {
+		rc = lmv_intent_remote(exp, lmm, lmmsize, it, &op_data->op_fid1,
+				       flags, reqp, cb_blocking,
+				       extra_lock_flags);
+		if (rc != 0)
+			return rc;
+
+		body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
+		if (!body)
+			return -EPROTO;
 	}
 
 	return rc;
@@ -269,8 +427,23 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, reqp, cb_blocking, extra_lock_flags);
 
-	if (rc < 0 || !*reqp)
+	if (rc < 0)
+		return rc;
+
+	if (!*reqp) {
+		/*
+		 * If RPC happens, lsm information will be revalidated
+		 * during update_inode process (see ll_update_lsm_md)
+		 */
+		if (op_data->op_mea2) {
+			rc = lmv_revalidate_slaves(exp, NULL, op_data->op_mea2,
+						   cb_blocking,
+						   extra_lock_flags);
+			if (rc != 0)
+				return rc;
+		}
 		return rc;
+	}
 
 	/*
 	 * MDS has returned success. Probably name has been resolved in
@@ -279,12 +452,17 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
-	/* Not cross-ref case, just get out of here. */
-	if (likely(!(body->valid & OBD_MD_MDS)))
-		return 0;
 
-	rc = lmv_intent_remote(exp, lmm, lmmsize, it, NULL, flags, reqp,
-			       cb_blocking, extra_lock_flags);
+	/* Not cross-ref case, just get out of here. */
+	if (unlikely((body->valid & OBD_MD_MDS))) {
+		rc = lmv_intent_remote(exp, lmm, lmmsize, it, NULL, flags,
+				       reqp, cb_blocking, extra_lock_flags);
+		if (rc != 0)
+			return rc;
+		body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
+		if (!body)
+			return -EPROTO;
+	}
 
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index f4c917b..ed02927 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -55,6 +55,14 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
 int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
 		  struct md_op_data *op_data);
 
+int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
+		  const union lmv_mds_md *lmm, int stripe_count);
+
+int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
+			  struct lmv_stripe_md *lsm,
+			  ldlm_blocking_callback cb_blocking,
+			  int extra_lock_flags);
+
 static inline struct lmv_tgt_desc *
 lmv_get_target(struct lmv_obd *lmv, u32 mds)
 {
@@ -94,6 +102,30 @@ static inline int lmv_stripe_md_size(int stripe_count)
 	return sizeof(*lsm) + stripe_count * sizeof(lsm->lsm_md_oinfo[0]);
 }
 
+int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
+			     unsigned int max_mdt_index,
+			     const char *name, int namelen);
+
+static inline const struct lmv_oinfo *
+lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name,
+			int namelen)
+{
+	int stripe_index;
+
+	stripe_index = lmv_name_to_stripe_index(lsm->lsm_md_hash_type,
+						lsm->lsm_md_stripe_count,
+						name, namelen);
+	if (stripe_index < 0)
+		return ERR_PTR(stripe_index);
+
+	LASSERTF(stripe_index < lsm->lsm_md_stripe_count,
+		 "stripe_index = %d, stripe_count = %d hash_type = %x name = %.*s\n",
+		 stripe_index, lsm->lsm_md_stripe_count,
+		 lsm->lsm_md_hash_type, namelen, name);
+
+	return &lsm->lsm_md_oinfo[stripe_index];
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 6be2afc..da4855d 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -48,11 +48,63 @@
 #include "../include/obd_class.h"
 #include "../include/lustre_lmv.h"
 #include "../include/lprocfs_status.h"
+#include "../include/cl_object.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_fid.h"
 #include "../include/lustre_kernelcomm.h"
 #include "lmv_internal.h"
 
+/* This hash is only for testing purpose */
+static inline unsigned int
+lmv_hash_all_chars(unsigned int count, const char *name, int namelen)
+{
+	const unsigned char *p = (const unsigned char *)name;
+	unsigned int c = 0;
+
+	while (--namelen >= 0)
+		c += p[namelen];
+
+	c = c % count;
+
+	return c;
+}
+
+static inline unsigned int
+lmv_hash_fnv1a(unsigned int count, const char *name, int namelen)
+{
+	__u64 hash;
+
+	hash = lustre_hash_fnv_1a_64(name, namelen);
+
+	return do_div(hash, count);
+}
+
+int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
+			     unsigned int max_mdt_index,
+			     const char *name, int namelen)
+{
+	int idx;
+
+	LASSERT(namelen > 0);
+	if (max_mdt_index <= 1)
+		return 0;
+
+	switch (hashtype) {
+	case LMV_HASH_TYPE_ALL_CHARS:
+		idx = lmv_hash_all_chars(max_mdt_index, name, namelen);
+		break;
+	case LMV_HASH_TYPE_FNV_1A_64:
+		idx = lmv_hash_fnv1a(max_mdt_index, name, namelen);
+		break;
+	default:
+		CERROR("Unknown hash type 0x%x\n", hashtype);
+		return -EINVAL;
+	}
+
+	LASSERT(idx < max_mdt_index);
+	return idx;
+}
+
 static void lmv_activate_target(struct lmv_obd *lmv,
 				struct lmv_tgt_desc *tgt,
 				int activate)
@@ -1174,28 +1226,19 @@ static int lmv_placement_policy(struct obd_device *obd,
 	 * If stripe_offset is provided during setdirstripe
 	 * (setdirstripe -i xx), xx MDS will be chosen.
 	 */
-	if (op_data->op_cli_flags & CLI_SET_MEA) {
+	if (op_data->op_cli_flags & CLI_SET_MEA && op_data->op_data) {
 		struct lmv_user_md *lum;
 
-		lum = (struct lmv_user_md *)op_data->op_data;
-		if (lum->lum_type == LMV_STRIPE_TYPE &&
-		    lum->lum_stripe_offset != -1) {
-			if (lum->lum_stripe_offset >= lmv->desc.ld_tgt_count) {
-				CERROR("%s: Stripe_offset %d > MDT count %d: rc = %d\n",
-				       obd->obd_name,
-				       lum->lum_stripe_offset,
-				       lmv->desc.ld_tgt_count, -ERANGE);
-				return -ERANGE;
-			}
-			*mds = lum->lum_stripe_offset;
-			return 0;
-		}
+		lum = op_data->op_data;
+		*mds = lum->lum_stripe_offset;
+	} else {
+		/*
+		 * Allocate new fid on target according to operation type and
+		 * parent home mds.
+		 */
+		*mds = op_data->op_mds;
 	}
 
-	/* Allocate new fid on target according to operation type and parent
-	 * home mds.
-	 */
-	*mds = op_data->op_mds;
 	return 0;
 }
 
@@ -1597,17 +1640,38 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 	return rc;
 }
 
+/**
+ * Choosing the MDT by name or FID in @op_data.
+ * For non-striped directory, it will locate MDT by fid.
+ * For striped-directory, it will locate MDT by name. And also
+ * it will reset op_fid1 with the FID of the chosen stripe.
+ **/
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid)
 {
+	struct lmv_stripe_md *lsm = op_data->op_mea1;
+	const struct lmv_oinfo *oinfo;
 	struct lmv_tgt_desc *tgt;
 
-	tgt = lmv_find_target(lmv, fid);
-	if (IS_ERR(tgt))
+	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
+	    !op_data->op_namelen) {
+		tgt = lmv_find_target(lmv, fid);
+		if (IS_ERR(tgt))
+			return tgt;
+
+		op_data->op_mds = tgt->ltd_idx;
+
 		return tgt;
+	}
 
-	op_data->op_mds = tgt->ltd_idx;
+	oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
+					op_data->op_namelen);
+	*fid = oinfo->lmo_fid;
+	op_data->op_mds = oinfo->lmo_mds;
+	tgt = lmv_get_target(lmv, op_data->op_mds);
+
+	CDEBUG(D_INFO, "locate on mds %u\n", op_data->op_mds);
 
 	return tgt;
 }
@@ -1633,13 +1697,26 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
+	CDEBUG(D_INODE, "CREATE name '%.*s' on "DFID" -> mds #%x\n",
+	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
+	       op_data->op_mds);
+
 	rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
 	if (rc)
 		return rc;
 
-	CDEBUG(D_INODE, "CREATE '%*s' on "DFID" -> mds #%x\n",
-	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
-	       op_data->op_mds);
+	/*
+	 * Send the create request to the MDT where the object
+	 * will be located
+	 */
+	tgt = lmv_find_target(lmv, &op_data->op_fid2);
+	if (IS_ERR(tgt))
+		return PTR_ERR(tgt);
+
+	op_data->op_mds = tgt->ltd_idx;
+
+	CDEBUG(D_INODE, "CREATE obj "DFID" -> mds #%x\n",
+	       PFID(&op_data->op_fid1), op_data->op_mds);
 
 	op_data->op_flags |= MF_MDC_CANCEL_FID1;
 	rc = md_create(tgt->ltd_exp, op_data, data, datalen, mode, uid, gid,
@@ -1889,6 +1966,15 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
+	if (op_data->op_mea2) {
+		struct lmv_stripe_md *lsm = op_data->op_mea2;
+		const struct lmv_oinfo *oinfo;
+
+		oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
+						op_data->op_namelen);
+		op_data->op_fid2 = oinfo->lmo_fid;
+	}
+
 	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
@@ -1914,14 +2000,15 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc     *src_tgt;
-	struct lmv_tgt_desc     *tgt_tgt;
 	int			rc;
 
 	LASSERT(oldlen != 0);
 
-	CDEBUG(D_INODE, "RENAME %*s in "DFID" to %*s in "DFID"\n",
+	CDEBUG(D_INODE, "RENAME %.*s in "DFID":%d to %.*s in "DFID":%d\n",
 	       oldlen, old, PFID(&op_data->op_fid1),
-	       newlen, new, PFID(&op_data->op_fid2));
+	       op_data->op_mea1 ? op_data->op_mea1->lsm_md_stripe_count : 0,
+	       newlen, new, PFID(&op_data->op_fid2),
+	       op_data->op_mea2 ? op_data->op_mea2->lsm_md_stripe_count : 0);
 
 	rc = lmv_check_connect(obd);
 	if (rc)
@@ -1930,13 +2017,33 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
-	src_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(src_tgt))
-		return PTR_ERR(src_tgt);
 
-	tgt_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2);
-	if (IS_ERR(tgt_tgt))
-		return PTR_ERR(tgt_tgt);
+	if (op_data->op_mea1) {
+		struct lmv_stripe_md *lsm = op_data->op_mea1;
+		const struct lmv_oinfo *oinfo;
+
+		oinfo = lsm_name_to_stripe_info(lsm, old, oldlen);
+		op_data->op_fid1 = oinfo->lmo_fid;
+		op_data->op_mds = oinfo->lmo_mds;
+		src_tgt = lmv_get_target(lmv, op_data->op_mds);
+		if (IS_ERR(src_tgt))
+			return PTR_ERR(src_tgt);
+	} else {
+		src_tgt = lmv_find_target(lmv, &op_data->op_fid1);
+		if (IS_ERR(src_tgt))
+			return PTR_ERR(src_tgt);
+
+		op_data->op_mds = src_tgt->ltd_idx;
+	}
+
+	if (op_data->op_mea2) {
+		struct lmv_stripe_md *lsm = op_data->op_mea2;
+		const struct lmv_oinfo *oinfo;
+
+		oinfo = lsm_name_to_stripe_info(lsm, new, newlen);
+		op_data->op_fid2 = oinfo->lmo_fid;
+	}
+
 	/*
 	 * LOOKUP lock on src child (fid3) should also be cancelled for
 	 * src_tgt in mdc_rename.
@@ -2568,6 +2675,7 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 	}
 	return lsm_size;
 }
+EXPORT_SYMBOL(lmv_unpack_md);
 
 int lmv_unpackmd(struct obd_export *exp, struct lov_stripe_md **lsmp,
 		 struct lov_mds_md *lmm, int disk_len)
@@ -2741,7 +2849,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp,
 	if (rc)
 		return rc;
 
-	tgt = lmv_find_target(lmv, &op_data->op_fid1);
+	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
@@ -2843,6 +2951,49 @@ static int lmv_quotacheck(struct obd_device *unused, struct obd_export *exp,
 	return rc;
 }
 
+int lmv_update_lsm_md(struct obd_export *exp, struct lmv_stripe_md *lsm,
+		      struct mdt_body *body, ldlm_blocking_callback cb_blocking)
+{
+	if (lsm->lsm_md_stripe_count <= 1)
+		return 0;
+
+	return lmv_revalidate_slaves(exp, body, lsm, cb_blocking, 0);
+}
+
+int lmv_merge_attr(struct obd_export *exp, const struct lmv_stripe_md *lsm,
+		   struct cl_attr *attr)
+{
+	int i;
+
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
+		struct inode *inode = lsm->lsm_md_oinfo[i].lmo_root;
+
+		CDEBUG(D_INFO, ""DFID" size %llu, nlink %u, atime %lu ctime %lu, mtime %lu.\n",
+		       PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
+		       i_size_read(inode), inode->i_nlink,
+		       LTIME_S(inode->i_atime), LTIME_S(inode->i_ctime),
+		       LTIME_S(inode->i_mtime));
+
+		/* for slave stripe, it needs to subtract nlink for . and .. */
+		if (i)
+			attr->cat_nlink += inode->i_nlink - 2;
+		else
+			attr->cat_nlink = inode->i_nlink;
+
+		attr->cat_size += i_size_read(inode);
+
+		if (attr->cat_atime < LTIME_S(inode->i_atime))
+			attr->cat_atime = LTIME_S(inode->i_atime);
+
+		if (attr->cat_ctime < LTIME_S(inode->i_ctime))
+			attr->cat_ctime = LTIME_S(inode->i_ctime);
+
+		if (attr->cat_mtime < LTIME_S(inode->i_mtime))
+			attr->cat_mtime = LTIME_S(inode->i_mtime);
+	}
+	return 0;
+}
+
 static struct obd_ops lmv_obd_ops = {
 	.owner		= THIS_MODULE,
 	.setup		= lmv_setup,
@@ -2888,6 +3039,8 @@ static struct md_ops lmv_md_ops = {
 	.lock_match		= lmv_lock_match,
 	.get_lustre_md		= lmv_get_lustre_md,
 	.free_lustre_md		= lmv_free_lustre_md,
+	.update_lsm_md		= lmv_update_lsm_md,
+	.merge_attr		= lmv_merge_attr,
 	.set_open_replay_data	= lmv_set_open_replay_data,
 	.clear_open_replay_data	= lmv_clear_open_replay_data,
 	.intent_getattr_async	= lmv_intent_getattr_async,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 06a1274..626fce5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -325,6 +325,9 @@ static struct ptlrpc_request *mdc_intent_open_pack(struct obd_export *exp,
 	mdc_open_pack(req, op_data, it->it_create_mode, 0, it->it_flags, lmm,
 		      lmmsize);
 
+	req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER,
+			     obddev->u.cli.cl_max_mds_easize);
+
 	ptlrpc_request_set_replen(req);
 	return req;
 }
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index b514f18..07e23d1 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1878,6 +1878,17 @@ void lustre_swab_lov_desc(struct lov_desc *ld)
 }
 EXPORT_SYMBOL(lustre_swab_lov_desc);
 
+void lustre_swab_lmv_user_md(struct lmv_user_md *lum)
+{
+	__swab32s(&lum->lum_magic);
+	__swab32s(&lum->lum_stripe_count);
+	__swab32s(&lum->lum_stripe_offset);
+	__swab32s(&lum->lum_hash_type);
+	__swab32s(&lum->lum_type);
+	CLASSERT(offsetof(typeof(*lum), lum_padding1));
+}
+EXPORT_SYMBOL(lustre_swab_lmv_user_md);
+
 static void print_lum(struct lov_user_md *lum)
 {
 	CDEBUG(D_OTHER, "lov_user_md %p:\n", lum);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 12/80] staging: lustre: create striped directory
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

1. client send create request to the master MDT, which
  will allocate FIDs and create slaves. for all of slaves.

2. Client needs to revalidate slaves during intent getattr
   and open request.

3. lmv_stripe_md will include attributes(size, nlink etc)
   from all of stripe, which will be protected by UPDATE lock.
   client needs to merge these attributes when update inode.

4. send create request to the MDT where the file is located,
   which can help creating master stripe of striped directory.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3529
Reviewed-on: http://review.whamcloud.com/7196
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |    3 +
 .../lustre/lustre/include/lustre/lustre_idl.h      |   40 +++-
 .../lustre/lustre/include/lustre/lustre_user.h     |   16 +-
 drivers/staging/lustre/lustre/include/lustre_lib.h |    2 +
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   59 +++++
 drivers/staging/lustre/lustre/include/obd.h        |   16 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |   19 ++
 drivers/staging/lustre/lustre/llite/dir.c          |   26 ++-
 drivers/staging/lustre/lustre/llite/file.c         |   40 +++-
 .../staging/lustre/lustre/llite/llite_internal.h   |   12 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  193 +++++++++++++++-
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |    7 +-
 drivers/staging/lustre/lustre/llite/namei.c        |   42 +++-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |  244 +++++++++++++++++---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |   32 +++
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  221 +++++++++++++++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |    3 +
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   11 +
 18 files changed, 880 insertions(+), 106 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 3cd4a25..0fa71a5 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -191,6 +191,9 @@ struct cl_attr {
 	 * Group identifier for quota purposes.
 	 */
 	gid_t  cat_gid;
+
+	/* nlink of the directory */
+	__u64  cat_nlink;
 };
 
 /**
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 0ad6605..a612080 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1610,6 +1610,7 @@ static inline void lmm_oi_cpu_to_le(struct ost_id *dst_oi,
 #define XATTR_NAME_LOV	  "trusted.lov"
 #define XATTR_NAME_LMA	  "trusted.lma"
 #define XATTR_NAME_LMV	  "trusted.lmv"
+#define XATTR_NAME_DEFAULT_LMV	"trusted.dmv"
 #define XATTR_NAME_LINK	 "trusted.link"
 #define XATTR_NAME_FID	  "trusted.fid"
 #define XATTR_NAME_VERSION      "trusted.version"
@@ -2472,7 +2473,7 @@ struct lmv_desc {
 	__u32 ld_tgt_count;		/* how many MDS's */
 	__u32 ld_active_tgt_count;	 /* how many active */
 	__u32 ld_default_stripe_count;     /* how many objects are used */
-	__u32 ld_pattern;		  /* default MEA_MAGIC_* */
+	__u32 ld_pattern;		  /* default hash pattern */
 	__u64 ld_default_hash_size;
 	__u64 ld_padding_1;		/* also fix lustre_swab_lmv_desc */
 	__u32 ld_padding_2;		/* also fix lustre_swab_lmv_desc */
@@ -2486,6 +2487,43 @@ struct lmv_desc {
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
 #define LMV_MAGIC	LMV_MAGIC_V1
+
+enum lmv_hash_type {
+	LMV_HASH_TYPE_ALL_CHARS = 1,
+	LMV_HASH_TYPE_FNV_1A_64 = 2,
+};
+
+#define LMV_HASH_NAME_ALL_CHARS		"all_char"
+#define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
+
+/**
+ * The FNV-1a hash algorithm is as follows:
+ *     hash = FNV_offset_basis
+ *     for each octet_of_data to be hashed
+ *             hash = hash XOR octet_of_data
+ *             hash = hash ?? FNV_prime
+ *     return hash
+ * http://en.wikipedia.org/wiki/Fowler???Noll???Vo_hash_function#FNV-1a_hash
+ *
+ * http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-reference-source
+ * FNV_prime is 2^40 + 2^8 + 0xb3 = 0x100000001b3ULL
+ **/
+#define LUSTRE_FNV_1A_64_PRIME		0x100000001b3ULL
+#define LUSTRE_FNV_1A_64_OFFSET_BIAS	0xcbf29ce484222325ULL
+static inline __u64 lustre_hash_fnv_1a_64(const void *buf, size_t size)
+{
+	__u64 hash = LUSTRE_FNV_1A_64_OFFSET_BIAS;
+	const unsigned char *p = buf;
+	size_t i;
+
+	for (i = 0; i < size; i++) {
+		hash ^= p[i];
+		hash *= LUSTRE_FNV_1A_64_PRIME;
+	}
+
+	return hash;
+}
+
 struct lmv_mds_md_v1 {
 	__u32 lmv_magic;
 	__u32 lmv_stripe_count;		/* stripe count */
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index ef6f38f..d496d0e 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -374,19 +374,17 @@ struct lov_user_mds_data_v3 {
 } __packed;
 #endif
 
-/* keep this to be the same size as lov_user_ost_data_v1 */
 struct lmv_user_mds_data {
 	struct lu_fid	lum_fid;
 	__u32		lum_padding;
 	__u32		lum_mds;
 };
 
-/* lum_type */
-enum {
-	LMV_STRIPE_TYPE = 0,
-	LMV_DEFAULT_TYPE = 1,
-};
-
+/*
+ * Got this according to how get LOV_MAX_STRIPE_COUNT, see above,
+ * (max buffer size - lmv+rpc header) / sizeof(struct lmv_user_mds_data)
+ */
+#define LMV_MAX_STRIPE_COUNT 2000  /* ((12 * 4096 - 256) / 24) */
 #define lmv_user_md lmv_user_md_v1
 struct lmv_user_md_v1 {
 	__u32	lum_magic;	 /* must be the first field */
@@ -399,7 +397,7 @@ struct lmv_user_md_v1 {
 	__u32	lum_padding3;
 	char	lum_pool_name[LOV_MAXPOOLNAME];
 	struct	lmv_user_mds_data  lum_objects[0];
-};
+} __packed;
 
 static inline int lmv_user_md_size(int stripes, int lmm_magic)
 {
@@ -407,6 +405,8 @@ static inline int lmv_user_md_size(int stripes, int lmm_magic)
 		      stripes * sizeof(struct lmv_user_mds_data);
 }
 
+void lustre_swab_lmv_user_md(struct lmv_user_md *lum);
+
 struct ll_recreate_obj {
 	__u64 lrc_id;
 	__u32 lrc_ost_idx;
diff --git a/drivers/staging/lustre/lustre/include/lustre_lib.h b/drivers/staging/lustre/lustre/include/lustre_lib.h
index 06958f2..def0193 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lib.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lib.h
@@ -391,6 +391,8 @@ static inline void obd_ioctl_freedata(char *buf, int len)
 #define LOVEA_DELETE_VALUES(size, count, offset) (size == 0 && count == 0 && \
 						 offset == (typeof(offset))(-1))
 
+#define LMVEA_DELETE_VALUES(count, offset) ((count) == 0 && \
+					    (offset) == (typeof(offset))(-1))
 /* #define POISON_BULK 0 */
 
 /*
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 784d67b..4036fce 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -66,4 +66,63 @@ static inline void lmv_free_memmd(struct lmv_stripe_md *lsm)
 {
 	lmv_unpack_md(NULL, &lsm, NULL, 0);
 }
+
+static inline void lmv1_cpu_to_le(struct lmv_mds_md_v1 *lmv_dst,
+				  const struct lmv_mds_md_v1 *lmv_src)
+{
+	int i;
+
+	lmv_dst->lmv_magic = cpu_to_le32(lmv_src->lmv_magic);
+	lmv_dst->lmv_stripe_count = cpu_to_le32(lmv_src->lmv_stripe_count);
+	lmv_dst->lmv_master_mdt_index =
+		cpu_to_le32(lmv_src->lmv_master_mdt_index);
+	lmv_dst->lmv_hash_type = cpu_to_le32(lmv_src->lmv_hash_type);
+	lmv_dst->lmv_layout_version = cpu_to_le32(lmv_src->lmv_layout_version);
+
+	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
+		fid_cpu_to_le(&lmv_dst->lmv_stripe_fids[i],
+			      &lmv_src->lmv_stripe_fids[i]);
+}
+
+static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
+				  const struct lmv_mds_md_v1 *lmv_src)
+{
+	int i;
+
+	lmv_dst->lmv_magic = le32_to_cpu(lmv_src->lmv_magic);
+	lmv_dst->lmv_stripe_count = le32_to_cpu(lmv_src->lmv_stripe_count);
+	lmv_dst->lmv_master_mdt_index =
+		le32_to_cpu(lmv_src->lmv_master_mdt_index);
+	lmv_dst->lmv_hash_type = le32_to_cpu(lmv_src->lmv_hash_type);
+	lmv_dst->lmv_layout_version = le32_to_cpu(lmv_src->lmv_layout_version);
+
+	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
+		fid_le_to_cpu(&lmv_dst->lmv_stripe_fids[i],
+			      &lmv_src->lmv_stripe_fids[i]);
+}
+
+static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
+				 const union lmv_mds_md *lmv_src)
+{
+	switch (lmv_src->lmv_magic) {
+	case LMV_MAGIC_V1:
+		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
+		break;
+	default:
+		break;
+	}
+}
+
+static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
+				 const union lmv_mds_md *lmv_src)
+{
+	switch (le32_to_cpu(lmv_src->lmv_magic)) {
+	case LMV_MAGIC_V1:
+		lmv1_le_to_cpu(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
+		break;
+	default:
+		break;
+	}
+}
+
 #endif
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 17b8d22..a9f4e13 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -1022,14 +1022,6 @@ enum {
 };
 
 /* lmv structures */
-#define MEA_MAGIC_LAST_CHAR      0xb2221ca1
-#define MEA_MAGIC_ALL_CHARS      0xb222a11c
-#define MEA_MAGIC_HASH_SEGMENT   0xb222a11b
-
-#define MAX_HASH_SIZE_32	 0x7fffffffUL
-#define MAX_HASH_SIZE	    0x7fffffffffffffffULL
-#define MAX_HASH_HIGHEST_BIT     0x1000000000000000ULL
-
 struct lustre_md {
 	struct mdt_body	 *body;
 	struct lov_stripe_md    *lsm;
@@ -1049,6 +1041,7 @@ struct md_open_data {
 };
 
 struct lookup_intent;
+struct cl_attr;
 
 struct md_ops {
 	int (*getstatus)(struct obd_export *, struct lu_fid *);
@@ -1109,6 +1102,13 @@ struct md_ops {
 
 	int (*free_lustre_md)(struct obd_export *, struct lustre_md *);
 
+	int (*merge_attr)(struct obd_export *,
+			  const struct lmv_stripe_md *lsm,
+			  struct cl_attr *attr);
+
+	int (*update_lsm_md)(struct obd_export *, struct lmv_stripe_md *lsm,
+			     struct mdt_body *, ldlm_blocking_callback);
+
 	int (*set_open_replay_data)(struct obd_export *,
 				    struct obd_client_handle *,
 				    struct lookup_intent *);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 6482a93..2f111a8 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1559,6 +1559,25 @@ static inline int md_free_lustre_md(struct obd_export *exp,
 	return MDP(exp->exp_obd, free_lustre_md)(exp, md);
 }
 
+static inline int md_update_lsm_md(struct obd_export *exp,
+				   struct lmv_stripe_md *lsm,
+				   struct mdt_body *body,
+				   ldlm_blocking_callback cb)
+{
+	EXP_CHECK_MD_OP(exp, update_lsm_md);
+	EXP_MD_COUNTER_INCREMENT(exp, update_lsm_md);
+	return MDP(exp->exp_obd, update_lsm_md)(exp, lsm, body, cb);
+}
+
+static inline int md_merge_attr(struct obd_export *exp,
+				const struct lmv_stripe_md *lsm,
+				struct cl_attr *attr)
+{
+	EXP_CHECK_MD_OP(exp, merge_attr);
+	EXP_MD_COUNTER_INCREMENT(exp, merge_attr);
+	return MDP(exp->exp_obd, merge_attr)(exp, lsm, attr);
+}
+
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
 			      u64 valid, const char *name,
 			      const char *input, int input_size,
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index a72b486..a0560b6 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -668,7 +668,7 @@ static int ll_send_mgc_param(struct obd_export *mgc, char *string)
 }
 
 static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
-			       char *filename)
+			       const char *filename)
 {
 	struct ptlrpc_request *request = NULL;
 	struct md_op_data *op_data;
@@ -676,6 +676,26 @@ static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
 	int mode;
 	int err;
 
+	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
+		return -EINVAL;
+
+	if (lump->lum_stripe_offset == (__u32)-1) {
+		int mdtidx;
+
+		mdtidx = ll_get_mdt_idx(dir);
+		if (mdtidx < 0)
+			return mdtidx;
+
+		lump->lum_stripe_offset = mdtidx;
+	}
+
+	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p) name %s stripe_offset %d, stripe_count: %u\n",
+	       PFID(ll_inode2fid(dir)), dir, filename,
+	       (int)lump->lum_stripe_offset, lump->lum_stripe_count);
+
+	if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC))
+		lustre_swab_lmv_user_md(lump);
+
 	mode = (~current_umask() & 0755) | S_IFDIR;
 	op_data = ll_prep_md_op_data(NULL, dir, NULL, filename,
 				     strlen(filename), mode, LUSTRE_OPC_MKDIR,
@@ -745,9 +765,6 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	if (lump && lump->lmm_magic == cpu_to_le32(LMV_USER_MAGIC))
-		op_data->op_cli_flags |= CLI_SET_MEA;
-
 	/* swabbing is done in lov_setstripe() on server side */
 	rc = md_setattr(sbi->ll_md_exp, op_data, lump, lum_size,
 			NULL, 0, &req, NULL);
@@ -1424,7 +1441,6 @@ lmv_out_free:
 		}
 
 		*tmp = lum;
-		tmp->lum_type = LMV_STRIPE_TYPE;
 		tmp->lum_stripe_count = 1;
 		mdtindex = ll_get_mdt_idx(inode);
 		if (mdtindex < 0) {
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 58a7401..18fb713 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3015,6 +3015,27 @@ out:
 	return rc;
 }
 
+static int ll_merge_md_attr(struct inode *inode)
+{
+	struct cl_attr attr = { 0 };
+	int rc;
+
+	LASSERT(ll_i2info(inode)->lli_lsm_md);
+	rc = md_merge_attr(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
+			   &attr);
+	if (rc)
+		return rc;
+
+	ll_i2info(inode)->lli_stripe_dir_size = attr.cat_size;
+	ll_i2info(inode)->lli_stripe_dir_nlink = attr.cat_nlink;
+
+	ll_i2info(inode)->lli_atime = attr.cat_atime;
+	ll_i2info(inode)->lli_mtime = attr.cat_mtime;
+	ll_i2info(inode)->lli_ctime = attr.cat_ctime;
+
+	return 0;
+}
+
 static int ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 {
 	struct inode *inode = d_inode(dentry);
@@ -3026,6 +3047,13 @@ static int ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 
 	/* if object isn't regular file, don't validate size */
 	if (!S_ISREG(inode->i_mode)) {
+		if (S_ISDIR(inode->i_mode) &&
+		    ll_i2info(inode)->lli_lsm_md) {
+			rc = ll_merge_md_attr(inode);
+			if (rc)
+				return rc;
+		}
+
 		LTIME_S(inode->i_atime) = ll_i2info(inode)->lli_atime;
 		LTIME_S(inode->i_mtime) = ll_i2info(inode)->lli_mtime;
 		LTIME_S(inode->i_ctime) = ll_i2info(inode)->lli_ctime;
@@ -3063,7 +3091,6 @@ int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat)
 	else
 		stat->ino = inode->i_ino;
 	stat->mode = inode->i_mode;
-	stat->nlink = inode->i_nlink;
 	stat->uid = inode->i_uid;
 	stat->gid = inode->i_gid;
 	stat->rdev = inode->i_rdev;
@@ -3071,10 +3098,17 @@ int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat)
 	stat->mtime = inode->i_mtime;
 	stat->ctime = inode->i_ctime;
 	stat->blksize = 1 << inode->i_blkbits;
-
-	stat->size = i_size_read(inode);
 	stat->blocks = inode->i_blocks;
 
+	if (S_ISDIR(inode->i_mode) &&
+	    ll_i2info(inode)->lli_lsm_md) {
+		stat->nlink = lli->lli_stripe_dir_nlink;
+		stat->size = lli->lli_stripe_dir_size;
+	} else {
+		stat->nlink = inode->i_nlink;
+		stat->size = i_size_read(inode);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 07b6918..f3b8504 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -39,6 +39,7 @@
 
 /* for struct cl_lock_descr and struct cl_io */
 #include "../include/cl_object.h"
+#include "../include/lustre_lmv.h"
 #include "../include/lustre_mdc.h"
 #include "../include/lustre_intent.h"
 #include <linux/compat.h>
@@ -174,7 +175,11 @@ struct ll_inode_info {
 			 */
 			pid_t			   d_opendir_pid;
 			/* directory stripe information */
-			struct lmv_stripe_md		*d_lmv_md;
+			struct lmv_stripe_md		*d_lsm_md;
+			/* striped directory size */
+			loff_t				d_stripe_size;
+			/* striped directory nlink */
+			__u64				d_stripe_nlink;
 		} d;
 
 #define lli_readdir_mutex       u.d.d_readdir_mutex
@@ -182,7 +187,9 @@ struct ll_inode_info {
 #define lli_sai		 u.d.d_sai
 #define lli_sa_lock	     u.d.d_sa_lock
 #define lli_opendir_pid	 u.d.d_opendir_pid
-#define lli_lmv_md		u.d.d_lmv_md
+#define lli_lsm_md		u.d.d_lsm_md
+#define lli_stripe_dir_size	u.d.d_stripe_size
+#define lli_stripe_dir_nlink	u.d.d_stripe_nlink
 
 		/* for non-directory */
 		struct {
@@ -664,6 +671,7 @@ int ll_objects_destroy(struct ptlrpc_request *request,
 		       struct inode *dir);
 struct inode *ll_iget(struct super_block *sb, ino_t hash,
 		      struct lustre_md *lic);
+int ll_test_inode_by_fid(struct inode *inode, void *opaque);
 int ll_md_blocking_ast(struct ldlm_lock *, struct ldlm_lock_desc *,
 		       void *data, int flag);
 struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index eb715be..ef8d87a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -992,6 +992,188 @@ struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
 	return inode;
 }
 
+static void ll_dir_clear_lsm_md(struct inode *inode)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+
+	LASSERT(S_ISDIR(inode->i_mode));
+
+	if (lli->lli_lsm_md) {
+		lmv_free_memmd(lli->lli_lsm_md);
+		lli->lli_lsm_md = NULL;
+	}
+}
+
+static struct inode *ll_iget_anon_dir(struct super_block *sb,
+				      const struct lu_fid *fid,
+				      struct lustre_md *md)
+{
+	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct mdt_body *body = md->body;
+	struct inode *inode;
+	ino_t ino;
+
+	ino = cl_fid_build_ino(fid, sbi->ll_flags & LL_SBI_32BIT_API);
+	inode = iget_locked(sb, ino);
+	if (!inode) {
+		CERROR("%s: failed get simple inode "DFID": rc = -ENOENT\n",
+		       ll_get_fsname(sb, NULL, 0), PFID(fid));
+		return ERR_PTR(-ENOENT);
+	}
+
+	if (inode->i_state & I_NEW) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+		struct lmv_stripe_md *lsm = md->lmv;
+
+		inode->i_mode = (inode->i_mode & ~S_IFMT) |
+				(body->mode & S_IFMT);
+		LASSERTF(S_ISDIR(inode->i_mode), "Not slave inode "DFID"\n",
+			 PFID(fid));
+
+		LTIME_S(inode->i_mtime) = 0;
+		LTIME_S(inode->i_atime) = 0;
+		LTIME_S(inode->i_ctime) = 0;
+		inode->i_rdev = 0;
+
+		inode->i_op = &ll_dir_inode_operations;
+		inode->i_fop = &ll_dir_operations;
+		lli->lli_fid = *fid;
+		ll_lli_init(lli);
+
+		LASSERT(lsm);
+		/* master stripe FID */
+		lli->lli_pfid = lsm->lsm_md_oinfo[0].lmo_fid;
+		CDEBUG(D_INODE, "lli %p master "DFID" slave "DFID"\n",
+		       lli, PFID(fid), PFID(&lli->lli_pfid));
+		unlock_new_inode(inode);
+	}
+
+	return inode;
+}
+
+static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md)
+{
+	struct lmv_stripe_md *lsm = md->lmv;
+	struct lu_fid *fid;
+	int i;
+
+	LASSERT(lsm);
+	/*
+	 * XXX sigh, this lsm_root initialization should be in
+	 * LMV layer, but it needs ll_iget right now, so we
+	 * put this here right now.
+	 */
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
+		fid = &lsm->lsm_md_oinfo[i].lmo_fid;
+		LASSERT(!lsm->lsm_md_oinfo[i].lmo_root);
+		if (!i) {
+			lsm->lsm_md_oinfo[i].lmo_root = inode;
+		} else {
+			/*
+			 * Unfortunately ll_iget will call ll_update_inode,
+			 * where the initialization of slave inode is slightly
+			 * different, so it reset lsm_md to NULL to avoid
+			 * initializing lsm for slave inode.
+			 */
+			lsm->lsm_md_oinfo[i].lmo_root =
+				ll_iget_anon_dir(inode->i_sb, fid, md);
+			if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) {
+				int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root);
+
+				lsm->lsm_md_oinfo[i].lmo_root = NULL;
+				return rc;
+			}
+		}
+	}
+
+	/*
+	 * Here is where the lsm is being initialized(fill lmo_info) after
+	 * client retrieve MD stripe information from MDT.
+	 */
+	return md_update_lsm_md(ll_i2mdexp(inode), lsm, md->body,
+				ll_md_blocking_ast);
+}
+
+static inline int lli_lsm_md_eq(const struct lmv_stripe_md *lsm_md1,
+				const struct lmv_stripe_md *lsm_md2)
+{
+	return lsm_md1->lsm_md_magic == lsm_md2->lsm_md_magic &&
+	       lsm_md1->lsm_md_stripe_count == lsm_md2->lsm_md_stripe_count &&
+	       lsm_md1->lsm_md_master_mdt_index ==
+			lsm_md2->lsm_md_master_mdt_index &&
+	       lsm_md1->lsm_md_hash_type == lsm_md2->lsm_md_hash_type &&
+	       lsm_md1->lsm_md_layout_version ==
+			lsm_md2->lsm_md_layout_version &&
+	       !strcmp(lsm_md1->lsm_md_pool_name,
+		       lsm_md2->lsm_md_pool_name);
+}
+
+static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct lmv_stripe_md *lsm = md->lmv;
+	int idx;
+
+	LASSERT(lsm);
+	LASSERT(S_ISDIR(inode->i_mode));
+	if (!lli->lli_lsm_md) {
+		int rc;
+
+		rc = ll_init_lsm_md(inode, md);
+		if (rc) {
+			CERROR("%s: init "DFID" failed: rc = %d\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0),
+			       PFID(&lli->lli_fid), rc);
+			return;
+		}
+		lli->lli_lsm_md = lsm;
+		/*
+		 * set lsm_md to NULL, so the following free lustre_md
+		 * will not free this lsm
+		 */
+		md->lmv = NULL;
+		return;
+	}
+
+	/* Compare the old and new stripe information */
+	if (!lli_lsm_md_eq(lli->lli_lsm_md, lsm)) {
+		CERROR("inode %p %lu mismatch\n"
+		       "    new(%p)     vs     lli_lsm_md(%p):\n"
+		       "    magic:      %x                   %x\n"
+		       "    count:      %x                   %x\n"
+		       "    master:     %x                   %x\n"
+		       "    hash_type:  %x                   %x\n"
+		       "    layout:     %x                   %x\n"
+		       "    pool:       %s                   %s\n",
+		       inode, inode->i_ino, lsm, lli->lli_lsm_md,
+		       lsm->lsm_md_magic, lli->lli_lsm_md->lsm_md_magic,
+		       lsm->lsm_md_stripe_count,
+		       lli->lli_lsm_md->lsm_md_stripe_count,
+		       lsm->lsm_md_master_mdt_index,
+		       lli->lli_lsm_md->lsm_md_master_mdt_index,
+		       lsm->lsm_md_hash_type, lli->lli_lsm_md->lsm_md_hash_type,
+		       lsm->lsm_md_layout_version,
+		       lli->lli_lsm_md->lsm_md_layout_version,
+		       lsm->lsm_md_pool_name,
+		       lli->lli_lsm_md->lsm_md_pool_name);
+		return;
+	}
+
+	for (idx = 0; idx < lli->lli_lsm_md->lsm_md_stripe_count; idx++) {
+		if (!lu_fid_eq(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid,
+			       &lsm->lsm_md_oinfo[idx].lmo_fid)) {
+			CERROR("%s: FID in lsm mismatch idx %d, old: "DFID" new:"DFID"\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
+			       PFID(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid),
+			       PFID(&lsm->lsm_md_oinfo[idx].lmo_fid));
+			return;
+		}
+	}
+
+	md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
+			 md->body, ll_md_blocking_ast);
+}
+
 void ll_clear_inode(struct inode *inode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
@@ -1039,7 +1221,9 @@ void ll_clear_inode(struct inode *inode)
 #endif
 	lli->lli_inode_magic = LLI_INODE_DEAD;
 
-	if (!S_ISDIR(inode->i_mode))
+	if (S_ISDIR(inode->i_mode))
+		ll_dir_clear_lsm_md(inode);
+	else
 		LASSERT(list_empty(&lli->lli_agl_list));
 
 	/*
@@ -1484,6 +1668,9 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			lli->lli_maxbytes = MAX_LFS_FILESIZE;
 	}
 
+	if (S_ISDIR(inode->i_mode) && md->lmv)
+		ll_update_lsm_md(inode, md);
+
 #ifdef CONFIG_FS_POSIX_ACL
 	if (body->valid & OBD_MD_FLACL) {
 		spin_lock(&lli->lli_lock);
@@ -2091,12 +2278,12 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	ll_i2gids(op_data->op_suppgids, i1, i2);
 	op_data->op_fid1 = *ll_inode2fid(i1);
 	if (S_ISDIR(i1->i_mode))
-		op_data->op_mea1 = ll_i2info(i1)->lli_lmv_md;
+		op_data->op_mea1 = ll_i2info(i1)->lli_lsm_md;
 
 	if (i2) {
 		op_data->op_fid2 = *ll_inode2fid(i2);
 		if (S_ISDIR(i2->i_mode))
-			op_data->op_mea2 = ll_i2info(i2)->lli_lmv_md;
+			op_data->op_mea2 = ll_i2info(i2)->lli_lsm_md;
 	} else {
 		fid_zero(&op_data->op_fid2);
 	}
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index 74eb1fc..ab9d5cc 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -73,11 +73,6 @@ void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid)
 	fsid->val[1] = key >> 32;
 }
 
-static int ll_nfs_test_inode(struct inode *inode, void *opaque)
-{
-	return lu_fid_eq(&ll_i2info(inode)->lli_fid, opaque);
-}
-
 struct inode *search_inode_for_lustre(struct super_block *sb,
 				      const struct lu_fid *fid)
 {
@@ -92,7 +87,7 @@ struct inode *search_inode_for_lustre(struct super_block *sb,
 
 	CDEBUG(D_INFO, "searching inode for:(%lu,"DFID")\n", hash, PFID(fid));
 
-	inode = ilookup5(sb, hash, ll_nfs_test_inode, (void *)fid);
+	inode = ilookup5(sb, hash, ll_test_inode_by_fid, (void *)fid);
 	if (inode)
 		return inode;
 
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 1e75f5b..e32d08b 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -158,6 +158,11 @@ static void ll_invalidate_negative_children(struct inode *dir)
 	spin_unlock(&dir->i_lock);
 }
 
+int ll_test_inode_by_fid(struct inode *inode, void *opaque)
+{
+	return lu_fid_eq(&ll_i2info(inode)->lli_fid, opaque);
+}
+
 int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 		       void *data, int flag)
 {
@@ -253,10 +258,41 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 		}
 
 		if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) {
-			CDEBUG(D_INODE, "invalidating inode "DFID"\n",
-			       PFID(ll_inode2fid(inode)));
+			struct ll_inode_info *lli = ll_i2info(inode);
+
+			CDEBUG(D_INODE, "invalidating inode "DFID" lli = %p, pfid  = "DFID"\n",
+			       PFID(ll_inode2fid(inode)), lli,
+			       PFID(&lli->lli_pfid));
+
 			truncate_inode_pages(inode->i_mapping, 0);
-			ll_invalidate_negative_children(inode);
+
+			if (unlikely(!fid_is_zero(&lli->lli_pfid))) {
+				struct inode *master_inode = NULL;
+				unsigned long hash;
+
+				/*
+				 * This is slave inode, since all of the child
+				 * dentry is connected on the master inode, so
+				 * we have to invalidate the negative children
+				 * on master inode
+				 */
+				CDEBUG(D_INODE, "Invalidate s"DFID" m"DFID"\n",
+				       PFID(ll_inode2fid(inode)),
+				       PFID(&lli->lli_pfid));
+
+				hash = cl_fid_build_ino(&lli->lli_pfid,
+							ll_need_32bit_api(ll_i2sbi(inode)));
+
+				master_inode = ilookup5(inode->i_sb, hash,
+							ll_test_inode_by_fid,
+							(void *)&lli->lli_pfid);
+				if (master_inode && !IS_ERR(master_inode)) {
+					ll_invalidate_negative_children(master_inode);
+					iput(master_inode);
+				}
+			} else {
+				ll_invalidate_negative_children(inode);
+			}
 		}
 
 		if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) &&
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 2f58fda..1b9bbb2 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -150,6 +150,160 @@ out:
 	return rc;
 }
 
+int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
+			  struct lmv_stripe_md *lsm,
+			  ldlm_blocking_callback cb_blocking,
+			  int extra_lock_flags)
+{
+	struct obd_device *obd = exp->exp_obd;
+	struct lmv_obd *lmv = &obd->u.lmv;
+	struct mdt_body *body;
+	struct md_op_data *op_data;
+	unsigned long size = 0;
+	unsigned long nlink = 0;
+	__s64 atime = 0;
+	__s64 ctime = 0;
+	__s64 mtime = 0;
+	int rc = 0, i;
+
+	/**
+	 * revalidate slaves has some problems, temporarily return,
+	 * we may not need that
+	 */
+	if (lsm->lsm_md_stripe_count <= 1)
+		return 0;
+
+	op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
+	if (!op_data)
+		return -ENOMEM;
+
+	/**
+	 * Loop over the stripe information, check validity and update them
+	 * from MDS if needed.
+	 */
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
+		struct lookup_intent it = { .it_op = IT_GETATTR };
+		struct ptlrpc_request *req = NULL;
+		struct lustre_handle *lockh = NULL;
+		struct lmv_tgt_desc *tgt = NULL;
+		struct inode *inode;
+		struct lu_fid fid;
+
+		fid = lsm->lsm_md_oinfo[i].lmo_fid;
+		inode = lsm->lsm_md_oinfo[i].lmo_root;
+		if (!i) {
+			if (mbody) {
+				body = mbody;
+				goto update;
+			} else {
+				goto release_lock;
+			}
+		}
+
+		/*
+		 * Prepare op_data for revalidating. Note that @fid2 shluld be
+		 * defined otherwise it will go to server and take new lock
+		 * which is not needed here.
+		 */
+		memset(op_data, 0, sizeof(*op_data));
+		op_data->op_fid1 = fid;
+		op_data->op_fid2 = fid;
+
+		tgt = lmv_locate_mds(lmv, op_data, &fid);
+		if (IS_ERR(tgt)) {
+			rc = PTR_ERR(tgt);
+			goto cleanup;
+		}
+
+		CDEBUG(D_INODE, "Revalidate slave "DFID" -> mds #%d\n",
+		       PFID(&fid), tgt->ltd_idx);
+
+		rc = md_intent_lock(tgt->ltd_exp, op_data, NULL, 0, &it, 0,
+				    &req, cb_blocking, extra_lock_flags);
+		if (rc < 0)
+			goto cleanup;
+
+		lockh = (struct lustre_handle *)&it.it_lock_handle;
+		if (rc > 0 && !req) {
+			/* slave inode is still valid */
+			CDEBUG(D_INODE, "slave "DFID" is still valid.\n",
+			       PFID(&fid));
+			rc = 0;
+		} else {
+			/* refresh slave from server */
+			body = req_capsule_server_get(&req->rq_pill,
+						      &RMF_MDT_BODY);
+			LASSERT(body);
+update:
+			if (unlikely(body->nlink < 2)) {
+				CERROR("%s: nlink %d < 2 corrupt stripe %d "DFID":" DFID"\n",
+				       obd->obd_name, body->nlink, i,
+				       PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
+				       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
+
+				if (req)
+					ptlrpc_req_finished(req);
+
+				rc = -EIO;
+				goto cleanup;
+			}
+
+			if (i)
+				md_set_lock_data(tgt->ltd_exp, &lockh->cookie,
+						 inode, NULL);
+
+			i_size_write(inode, body->size);
+			set_nlink(inode, body->nlink);
+			LTIME_S(inode->i_atime) = body->atime;
+			LTIME_S(inode->i_ctime) = body->ctime;
+			LTIME_S(inode->i_mtime) = body->mtime;
+
+			if (req)
+				ptlrpc_req_finished(req);
+		}
+release_lock:
+		size += i_size_read(inode);
+
+		if (i != 0)
+			nlink += inode->i_nlink - 2;
+		else
+			nlink += inode->i_nlink;
+
+		atime = LTIME_S(inode->i_atime) > atime ?
+				LTIME_S(inode->i_atime) : atime;
+		ctime = LTIME_S(inode->i_ctime) > ctime ?
+				LTIME_S(inode->i_ctime) : ctime;
+		mtime = LTIME_S(inode->i_mtime) > mtime ?
+				LTIME_S(inode->i_mtime) : mtime;
+
+		if (it.it_lock_mode && lockh) {
+			ldlm_lock_decref(lockh, it.it_lock_mode);
+			it.it_lock_mode = 0;
+		}
+
+		CDEBUG(D_INODE, "i %d "DFID" size %llu, nlink %u, atime %lu, mtime %lu, ctime %lu.\n",
+		       i, PFID(&fid), i_size_read(inode), inode->i_nlink,
+		       LTIME_S(inode->i_atime), LTIME_S(inode->i_mtime),
+		       LTIME_S(inode->i_ctime));
+	}
+
+	/*
+	 * update attr of master request.
+	 */
+	CDEBUG(D_INODE, "Return refreshed attrs: size = %lu nlink %lu atime %llu ctime %llu mtime %llu for " DFID"\n",
+	       size, nlink, atime, ctime, mtime,
+	       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
+
+	if (mbody) {
+		mbody->atime = atime;
+		mbody->ctime = ctime;
+		mbody->mtime = mtime;
+	}
+cleanup:
+	kfree(op_data);
+	return rc;
+}
+
 /*
  * IT_OPEN is intended to open (and create, possible) an object. Parent (pid)
  * may be split dir.
@@ -166,9 +320,26 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	struct mdt_body		*body;
 	int			rc;
 
-	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(tgt))
-		return PTR_ERR(tgt);
+	if (it->it_flags & MDS_OPEN_BY_FID && fid_is_sane(&op_data->op_fid2)) {
+		if (op_data->op_mea1) {
+			struct lmv_stripe_md *lsm = op_data->op_mea1;
+			const struct lmv_oinfo *oinfo;
+
+			oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
+							op_data->op_namelen);
+			op_data->op_fid1 = oinfo->lmo_fid;
+		}
+
+		tgt = lmv_find_target(lmv, &op_data->op_fid2);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		op_data->op_mds = tgt->ltd_idx;
+	} else {
+		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
 
 	/* If it is ready to open the file by FID, do not need
 	 * allocate FID at all, otherwise it will confuse MDT
@@ -205,31 +376,18 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
-	/*
-	 * Not cross-ref case, just get out of here.
-	 */
-	if (likely(!(body->valid & OBD_MD_MDS)))
-		return 0;
 
-	/*
-	 * Okay, MDS has returned success. Probably name has been resolved in
-	 * remote inode.
-	 */
-	rc = lmv_intent_remote(exp, lmm, lmmsize, it, &op_data->op_fid1, flags,
-			       reqp, cb_blocking, extra_lock_flags);
-	if (rc != 0) {
-		LASSERT(rc < 0);
-		/*
-		 * This is possible, that some userspace application will try to
-		 * open file as directory and we will have -ENOTDIR here. As
-		 * this is normal situation, we should not print error here,
-		 * only debug info.
-		 */
-		CDEBUG(D_INODE, "Can't handle remote %s: dir " DFID "(" DFID "):%*s: %d\n",
-		       LL_IT2STR(it), PFID(&op_data->op_fid2),
-		       PFID(&op_data->op_fid1), op_data->op_namelen,
-		       op_data->op_name, rc);
-		return rc;
+	/* Not cross-ref case, just get out of here. */
+	if (unlikely((body->valid & OBD_MD_MDS))) {
+		rc = lmv_intent_remote(exp, lmm, lmmsize, it, &op_data->op_fid1,
+				       flags, reqp, cb_blocking,
+				       extra_lock_flags);
+		if (rc != 0)
+			return rc;
+
+		body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
+		if (!body)
+			return -EPROTO;
 	}
 
 	return rc;
@@ -269,8 +427,23 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, reqp, cb_blocking, extra_lock_flags);
 
-	if (rc < 0 || !*reqp)
+	if (rc < 0)
+		return rc;
+
+	if (!*reqp) {
+		/*
+		 * If RPC happens, lsm information will be revalidated
+		 * during update_inode process (see ll_update_lsm_md)
+		 */
+		if (op_data->op_mea2) {
+			rc = lmv_revalidate_slaves(exp, NULL, op_data->op_mea2,
+						   cb_blocking,
+						   extra_lock_flags);
+			if (rc != 0)
+				return rc;
+		}
 		return rc;
+	}
 
 	/*
 	 * MDS has returned success. Probably name has been resolved in
@@ -279,12 +452,17 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
-	/* Not cross-ref case, just get out of here. */
-	if (likely(!(body->valid & OBD_MD_MDS)))
-		return 0;
 
-	rc = lmv_intent_remote(exp, lmm, lmmsize, it, NULL, flags, reqp,
-			       cb_blocking, extra_lock_flags);
+	/* Not cross-ref case, just get out of here. */
+	if (unlikely((body->valid & OBD_MD_MDS))) {
+		rc = lmv_intent_remote(exp, lmm, lmmsize, it, NULL, flags,
+				       reqp, cb_blocking, extra_lock_flags);
+		if (rc != 0)
+			return rc;
+		body = req_capsule_server_get(&(*reqp)->rq_pill, &RMF_MDT_BODY);
+		if (!body)
+			return -EPROTO;
+	}
 
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index f4c917b..ed02927 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -55,6 +55,14 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
 int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
 		  struct md_op_data *op_data);
 
+int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
+		  const union lmv_mds_md *lmm, int stripe_count);
+
+int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
+			  struct lmv_stripe_md *lsm,
+			  ldlm_blocking_callback cb_blocking,
+			  int extra_lock_flags);
+
 static inline struct lmv_tgt_desc *
 lmv_get_target(struct lmv_obd *lmv, u32 mds)
 {
@@ -94,6 +102,30 @@ static inline int lmv_stripe_md_size(int stripe_count)
 	return sizeof(*lsm) + stripe_count * sizeof(lsm->lsm_md_oinfo[0]);
 }
 
+int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
+			     unsigned int max_mdt_index,
+			     const char *name, int namelen);
+
+static inline const struct lmv_oinfo *
+lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name,
+			int namelen)
+{
+	int stripe_index;
+
+	stripe_index = lmv_name_to_stripe_index(lsm->lsm_md_hash_type,
+						lsm->lsm_md_stripe_count,
+						name, namelen);
+	if (stripe_index < 0)
+		return ERR_PTR(stripe_index);
+
+	LASSERTF(stripe_index < lsm->lsm_md_stripe_count,
+		 "stripe_index = %d, stripe_count = %d hash_type = %x name = %.*s\n",
+		 stripe_index, lsm->lsm_md_stripe_count,
+		 lsm->lsm_md_hash_type, namelen, name);
+
+	return &lsm->lsm_md_oinfo[stripe_index];
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 6be2afc..da4855d 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -48,11 +48,63 @@
 #include "../include/obd_class.h"
 #include "../include/lustre_lmv.h"
 #include "../include/lprocfs_status.h"
+#include "../include/cl_object.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_fid.h"
 #include "../include/lustre_kernelcomm.h"
 #include "lmv_internal.h"
 
+/* This hash is only for testing purpose */
+static inline unsigned int
+lmv_hash_all_chars(unsigned int count, const char *name, int namelen)
+{
+	const unsigned char *p = (const unsigned char *)name;
+	unsigned int c = 0;
+
+	while (--namelen >= 0)
+		c += p[namelen];
+
+	c = c % count;
+
+	return c;
+}
+
+static inline unsigned int
+lmv_hash_fnv1a(unsigned int count, const char *name, int namelen)
+{
+	__u64 hash;
+
+	hash = lustre_hash_fnv_1a_64(name, namelen);
+
+	return do_div(hash, count);
+}
+
+int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
+			     unsigned int max_mdt_index,
+			     const char *name, int namelen)
+{
+	int idx;
+
+	LASSERT(namelen > 0);
+	if (max_mdt_index <= 1)
+		return 0;
+
+	switch (hashtype) {
+	case LMV_HASH_TYPE_ALL_CHARS:
+		idx = lmv_hash_all_chars(max_mdt_index, name, namelen);
+		break;
+	case LMV_HASH_TYPE_FNV_1A_64:
+		idx = lmv_hash_fnv1a(max_mdt_index, name, namelen);
+		break;
+	default:
+		CERROR("Unknown hash type 0x%x\n", hashtype);
+		return -EINVAL;
+	}
+
+	LASSERT(idx < max_mdt_index);
+	return idx;
+}
+
 static void lmv_activate_target(struct lmv_obd *lmv,
 				struct lmv_tgt_desc *tgt,
 				int activate)
@@ -1174,28 +1226,19 @@ static int lmv_placement_policy(struct obd_device *obd,
 	 * If stripe_offset is provided during setdirstripe
 	 * (setdirstripe -i xx), xx MDS will be chosen.
 	 */
-	if (op_data->op_cli_flags & CLI_SET_MEA) {
+	if (op_data->op_cli_flags & CLI_SET_MEA && op_data->op_data) {
 		struct lmv_user_md *lum;
 
-		lum = (struct lmv_user_md *)op_data->op_data;
-		if (lum->lum_type == LMV_STRIPE_TYPE &&
-		    lum->lum_stripe_offset != -1) {
-			if (lum->lum_stripe_offset >= lmv->desc.ld_tgt_count) {
-				CERROR("%s: Stripe_offset %d > MDT count %d: rc = %d\n",
-				       obd->obd_name,
-				       lum->lum_stripe_offset,
-				       lmv->desc.ld_tgt_count, -ERANGE);
-				return -ERANGE;
-			}
-			*mds = lum->lum_stripe_offset;
-			return 0;
-		}
+		lum = op_data->op_data;
+		*mds = lum->lum_stripe_offset;
+	} else {
+		/*
+		 * Allocate new fid on target according to operation type and
+		 * parent home mds.
+		 */
+		*mds = op_data->op_mds;
 	}
 
-	/* Allocate new fid on target according to operation type and parent
-	 * home mds.
-	 */
-	*mds = op_data->op_mds;
 	return 0;
 }
 
@@ -1597,17 +1640,38 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 	return rc;
 }
 
+/**
+ * Choosing the MDT by name or FID in @op_data.
+ * For non-striped directory, it will locate MDT by fid.
+ * For striped-directory, it will locate MDT by name. And also
+ * it will reset op_fid1 with the FID of the chosen stripe.
+ **/
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid)
 {
+	struct lmv_stripe_md *lsm = op_data->op_mea1;
+	const struct lmv_oinfo *oinfo;
 	struct lmv_tgt_desc *tgt;
 
-	tgt = lmv_find_target(lmv, fid);
-	if (IS_ERR(tgt))
+	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
+	    !op_data->op_namelen) {
+		tgt = lmv_find_target(lmv, fid);
+		if (IS_ERR(tgt))
+			return tgt;
+
+		op_data->op_mds = tgt->ltd_idx;
+
 		return tgt;
+	}
 
-	op_data->op_mds = tgt->ltd_idx;
+	oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
+					op_data->op_namelen);
+	*fid = oinfo->lmo_fid;
+	op_data->op_mds = oinfo->lmo_mds;
+	tgt = lmv_get_target(lmv, op_data->op_mds);
+
+	CDEBUG(D_INFO, "locate on mds %u\n", op_data->op_mds);
 
 	return tgt;
 }
@@ -1633,13 +1697,26 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
+	CDEBUG(D_INODE, "CREATE name '%.*s' on "DFID" -> mds #%x\n",
+	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
+	       op_data->op_mds);
+
 	rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
 	if (rc)
 		return rc;
 
-	CDEBUG(D_INODE, "CREATE '%*s' on "DFID" -> mds #%x\n",
-	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
-	       op_data->op_mds);
+	/*
+	 * Send the create request to the MDT where the object
+	 * will be located
+	 */
+	tgt = lmv_find_target(lmv, &op_data->op_fid2);
+	if (IS_ERR(tgt))
+		return PTR_ERR(tgt);
+
+	op_data->op_mds = tgt->ltd_idx;
+
+	CDEBUG(D_INODE, "CREATE obj "DFID" -> mds #%x\n",
+	       PFID(&op_data->op_fid1), op_data->op_mds);
 
 	op_data->op_flags |= MF_MDC_CANCEL_FID1;
 	rc = md_create(tgt->ltd_exp, op_data, data, datalen, mode, uid, gid,
@@ -1889,6 +1966,15 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
+	if (op_data->op_mea2) {
+		struct lmv_stripe_md *lsm = op_data->op_mea2;
+		const struct lmv_oinfo *oinfo;
+
+		oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
+						op_data->op_namelen);
+		op_data->op_fid2 = oinfo->lmo_fid;
+	}
+
 	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
@@ -1914,14 +2000,15 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc     *src_tgt;
-	struct lmv_tgt_desc     *tgt_tgt;
 	int			rc;
 
 	LASSERT(oldlen != 0);
 
-	CDEBUG(D_INODE, "RENAME %*s in "DFID" to %*s in "DFID"\n",
+	CDEBUG(D_INODE, "RENAME %.*s in "DFID":%d to %.*s in "DFID":%d\n",
 	       oldlen, old, PFID(&op_data->op_fid1),
-	       newlen, new, PFID(&op_data->op_fid2));
+	       op_data->op_mea1 ? op_data->op_mea1->lsm_md_stripe_count : 0,
+	       newlen, new, PFID(&op_data->op_fid2),
+	       op_data->op_mea2 ? op_data->op_mea2->lsm_md_stripe_count : 0);
 
 	rc = lmv_check_connect(obd);
 	if (rc)
@@ -1930,13 +2017,33 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
-	src_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(src_tgt))
-		return PTR_ERR(src_tgt);
 
-	tgt_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2);
-	if (IS_ERR(tgt_tgt))
-		return PTR_ERR(tgt_tgt);
+	if (op_data->op_mea1) {
+		struct lmv_stripe_md *lsm = op_data->op_mea1;
+		const struct lmv_oinfo *oinfo;
+
+		oinfo = lsm_name_to_stripe_info(lsm, old, oldlen);
+		op_data->op_fid1 = oinfo->lmo_fid;
+		op_data->op_mds = oinfo->lmo_mds;
+		src_tgt = lmv_get_target(lmv, op_data->op_mds);
+		if (IS_ERR(src_tgt))
+			return PTR_ERR(src_tgt);
+	} else {
+		src_tgt = lmv_find_target(lmv, &op_data->op_fid1);
+		if (IS_ERR(src_tgt))
+			return PTR_ERR(src_tgt);
+
+		op_data->op_mds = src_tgt->ltd_idx;
+	}
+
+	if (op_data->op_mea2) {
+		struct lmv_stripe_md *lsm = op_data->op_mea2;
+		const struct lmv_oinfo *oinfo;
+
+		oinfo = lsm_name_to_stripe_info(lsm, new, newlen);
+		op_data->op_fid2 = oinfo->lmo_fid;
+	}
+
 	/*
 	 * LOOKUP lock on src child (fid3) should also be cancelled for
 	 * src_tgt in mdc_rename.
@@ -2568,6 +2675,7 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 	}
 	return lsm_size;
 }
+EXPORT_SYMBOL(lmv_unpack_md);
 
 int lmv_unpackmd(struct obd_export *exp, struct lov_stripe_md **lsmp,
 		 struct lov_mds_md *lmm, int disk_len)
@@ -2741,7 +2849,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp,
 	if (rc)
 		return rc;
 
-	tgt = lmv_find_target(lmv, &op_data->op_fid1);
+	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
@@ -2843,6 +2951,49 @@ static int lmv_quotacheck(struct obd_device *unused, struct obd_export *exp,
 	return rc;
 }
 
+int lmv_update_lsm_md(struct obd_export *exp, struct lmv_stripe_md *lsm,
+		      struct mdt_body *body, ldlm_blocking_callback cb_blocking)
+{
+	if (lsm->lsm_md_stripe_count <= 1)
+		return 0;
+
+	return lmv_revalidate_slaves(exp, body, lsm, cb_blocking, 0);
+}
+
+int lmv_merge_attr(struct obd_export *exp, const struct lmv_stripe_md *lsm,
+		   struct cl_attr *attr)
+{
+	int i;
+
+	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
+		struct inode *inode = lsm->lsm_md_oinfo[i].lmo_root;
+
+		CDEBUG(D_INFO, ""DFID" size %llu, nlink %u, atime %lu ctime %lu, mtime %lu.\n",
+		       PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
+		       i_size_read(inode), inode->i_nlink,
+		       LTIME_S(inode->i_atime), LTIME_S(inode->i_ctime),
+		       LTIME_S(inode->i_mtime));
+
+		/* for slave stripe, it needs to subtract nlink for . and .. */
+		if (i)
+			attr->cat_nlink += inode->i_nlink - 2;
+		else
+			attr->cat_nlink = inode->i_nlink;
+
+		attr->cat_size += i_size_read(inode);
+
+		if (attr->cat_atime < LTIME_S(inode->i_atime))
+			attr->cat_atime = LTIME_S(inode->i_atime);
+
+		if (attr->cat_ctime < LTIME_S(inode->i_ctime))
+			attr->cat_ctime = LTIME_S(inode->i_ctime);
+
+		if (attr->cat_mtime < LTIME_S(inode->i_mtime))
+			attr->cat_mtime = LTIME_S(inode->i_mtime);
+	}
+	return 0;
+}
+
 static struct obd_ops lmv_obd_ops = {
 	.owner		= THIS_MODULE,
 	.setup		= lmv_setup,
@@ -2888,6 +3039,8 @@ static struct md_ops lmv_md_ops = {
 	.lock_match		= lmv_lock_match,
 	.get_lustre_md		= lmv_get_lustre_md,
 	.free_lustre_md		= lmv_free_lustre_md,
+	.update_lsm_md		= lmv_update_lsm_md,
+	.merge_attr		= lmv_merge_attr,
 	.set_open_replay_data	= lmv_set_open_replay_data,
 	.clear_open_replay_data	= lmv_clear_open_replay_data,
 	.intent_getattr_async	= lmv_intent_getattr_async,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 06a1274..626fce5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -325,6 +325,9 @@ static struct ptlrpc_request *mdc_intent_open_pack(struct obd_export *exp,
 	mdc_open_pack(req, op_data, it->it_create_mode, 0, it->it_flags, lmm,
 		      lmmsize);
 
+	req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER,
+			     obddev->u.cli.cl_max_mds_easize);
+
 	ptlrpc_request_set_replen(req);
 	return req;
 }
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index b514f18..07e23d1 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1878,6 +1878,17 @@ void lustre_swab_lov_desc(struct lov_desc *ld)
 }
 EXPORT_SYMBOL(lustre_swab_lov_desc);
 
+void lustre_swab_lmv_user_md(struct lmv_user_md *lum)
+{
+	__swab32s(&lum->lum_magic);
+	__swab32s(&lum->lum_stripe_count);
+	__swab32s(&lum->lum_stripe_offset);
+	__swab32s(&lum->lum_hash_type);
+	__swab32s(&lum->lum_type);
+	CLASSERT(offsetof(typeof(*lum), lum_padding1));
+}
+EXPORT_SYMBOL(lustre_swab_lmv_user_md);
+
 static void print_lum(struct lov_user_md *lum)
 {
 	CDEBUG(D_OTHER, "lov_user_md %p:\n", lum);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 13/80] staging: lustre: llite: fix "getdirstripe" to show stripe info
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Fix "lfs getdirstripe", so it can show layout information
of striped directory

[root@testnode tests]# ../utils/lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 2
lmv_stripe_offset: 0
mdtidx               FID[seq:oid:ver]
     0               [0x280000400:0x1:0x0]
     1               [0x2c0000400:0x1:0x0]

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-on: http://review.whamcloud.com/7228
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 +
 .../lustre/lustre/include/lustre/lustre_user.h     |    1 +
 drivers/staging/lustre/lustre/llite/dir.c          |  184 +++++++++++++++-----
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 +-
 drivers/staging/lustre/lustre/llite/xattr.c        |    7 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   26 +++
 6 files changed, 180 insertions(+), 46 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a612080..0ff30c6 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1728,6 +1728,8 @@ lov_mds_md_max_stripe_count(size_t buf_size, __u32 lmm_magic)
 #define OBD_MD_FLDATAVERSION (0x0010000000000000ULL) /* iversion sum */
 #define OBD_MD_FLRELEASED    (0x0020000000000000ULL) /* file released */
 
+#define OBD_MD_DEFAULT_MEA   (0x0040000000000000ULL) /* default MEA */
+
 #define OBD_MD_FLGETATTR (OBD_MD_FLID    | OBD_MD_FLATIME | OBD_MD_FLMTIME | \
 			  OBD_MD_FLCTIME | OBD_MD_FLSIZE  | OBD_MD_FLBLKSZ | \
 			  OBD_MD_FLMODE  | OBD_MD_FLTYPE  | OBD_MD_FLUID   | \
@@ -2543,6 +2545,8 @@ union lmv_mds_md {
 	struct lmv_user_md	lmv_user_md;
 };
 
+void lustre_swab_lmv_mds_md(union lmv_mds_md *lmm);
+
 static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic)
 {
 	ssize_t len = -EINVAL;
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index d496d0e..26dbda0 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -242,6 +242,7 @@ struct ost_id {
 #define LL_IOC_SET_LEASE		_IOWR('f', 243, long)
 #define LL_IOC_GET_LEASE		_IO('f', 244)
 #define LL_IOC_HSM_IMPORT		_IOWR('f', 245, struct hsm_user_import)
+#define LL_IOC_LMV_SET_DEFAULT_STRIPE	_IOWR('f', 246, struct lmv_user_md)
 
 #define LL_STATFS_LMV	   1
 #define LL_STATFS_LOV	   2
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index a0560b6..5288750 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -749,6 +749,13 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 			lum_size = sizeof(struct lov_user_md_v3);
 			break;
 		}
+		case LMV_USER_MAGIC: {
+			if (lump->lmm_magic != cpu_to_le32(LMV_USER_MAGIC))
+				lustre_swab_lmv_user_md(
+					(struct lmv_user_md *)lump);
+			lum_size = sizeof(struct lmv_user_md);
+			break;
+		}
 		default: {
 			CDEBUG(D_IOCTL, "bad userland LOV MAGIC: %#08x != %#08x nor %#08x\n",
 			       lump->lmm_magic, LOV_USER_MAGIC_V1,
@@ -819,8 +826,16 @@ end:
 	return rc;
 }
 
-int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
-		     int *lmm_size, struct ptlrpc_request **request)
+/**
+ * This function will be used to get default LOV/LMV/Default LMV
+ * @valid will be used to indicate which stripe it will retrieve
+ *	OBD_MD_MEA		LMV stripe EA
+ *	OBD_MD_DEFAULT_MEA	Default LMV stripe EA
+ *	otherwise		Default LOV EA.
+ * Each time, it can only retrieve 1 stripe EA
+ **/
+int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
+		     struct ptlrpc_request **request, u64 valid)
 {
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct mdt_body   *body;
@@ -829,7 +844,7 @@ int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 	int rc, lmmsize;
 	struct md_op_data *op_data;
 
-	rc = ll_get_default_mdsize(sbi, &lmmsize);
+	rc = ll_get_max_mdsize(sbi, &lmmsize);
 	if (rc)
 		return rc;
 
@@ -860,6 +875,7 @@ int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 
 	lmm = req_capsule_server_sized_get(&req->rq_pill,
 					   &RMF_MDT_MD, lmmsize);
+	LASSERT(lmm);
 
 	/*
 	 * This is coming from the MDS, so is probably in
@@ -876,40 +892,48 @@ int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 		if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC)
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
+	case LMV_USER_MAGIC:
+		if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC)
+			lustre_swab_lmv_user_md((struct lmv_user_md *)lmm);
+		break;
 	default:
 		CERROR("unknown magic: %lX\n", (unsigned long)lmm->lmm_magic);
 		rc = -EPROTO;
 	}
 out:
-	*lmmp = lmm;
-	*lmm_size = lmmsize;
+	*plmm = lmm;
+	*plmm_size = lmmsize;
 	*request = req;
 	return rc;
 }
 
-/*
- *  Get MDT index for the inode.
- */
-int ll_get_mdt_idx(struct inode *inode)
+static int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi,
+				 const struct lu_fid *fid)
 {
-	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct md_op_data *op_data;
-	int rc, mdtidx;
+	int mdt_index, rc;
 
-	op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0,
-				     0, LUSTRE_OPC_ANY, NULL);
-	if (IS_ERR(op_data))
-		return PTR_ERR(op_data);
+	op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
+	if (!op_data)
+		return -ENOMEM;
 
 	op_data->op_flags |= MF_GET_MDT_IDX;
+	op_data->op_fid1 = *fid;
 	rc = md_getattr(sbi->ll_md_exp, op_data, NULL);
-	mdtidx = op_data->op_mds;
-	ll_finish_md_op_data(op_data);
-	if (rc < 0) {
-		CDEBUG(D_INFO, "md_getattr_name: %d\n", rc);
+	mdt_index = op_data->op_mds;
+	kvfree(op_data);
+	if (rc < 0)
 		return rc;
-	}
-	return mdtidx;
+
+	return mdt_index;
+}
+
+/*
+ *  Get MDT index for the inode.
+ */
+int ll_get_mdt_idx(struct inode *inode)
+{
+	return ll_get_mdt_idx_by_fid(ll_i2sbi(inode), ll_inode2fid(inode));
 }
 
 /**
@@ -1391,6 +1415,22 @@ lmv_out_free:
 		obd_ioctl_freedata(buf, len);
 		return rc;
 	}
+	case LL_IOC_LMV_SET_DEFAULT_STRIPE: {
+		struct lmv_user_md __user *ulump;
+		struct lmv_user_md lum;
+		int rc;
+
+		ulump = (struct lmv_user_md __user *)arg;
+		if (copy_from_user(&lum, ulump, sizeof(lum)))
+			return -EFAULT;
+
+		if (lum.lum_magic != LMV_USER_MAGIC)
+			return -EINVAL;
+
+		rc = ll_dir_setstripe(inode, (struct lov_user_md *)&lum, 0);
+
+		return rc;
+	}
 	case LL_IOC_LOV_SETSTRIPE: {
 		struct lov_user_md_v3 lumv3;
 		struct lov_user_md_v1 *lumv1 = (struct lov_user_md_v1 *)&lumv3;
@@ -1420,46 +1460,107 @@ lmv_out_free:
 		return rc;
 	}
 	case LL_IOC_LMV_GETSTRIPE: {
-		struct lmv_user_md __user *lump = (void __user *)arg;
+		struct lmv_user_md __user *ulmv;
 		struct lmv_user_md lum;
-		struct lmv_user_md *tmp;
+		struct ptlrpc_request *request = NULL;
+		struct lmv_user_md *tmp = NULL;
+		union lmv_mds_md *lmm = NULL;
+		u64 valid = 0;
+		int stripe_count;
+		int mdt_index;
 		int lum_size;
-		int rc = 0;
-		int mdtindex;
+		int lmmsize;
+		int rc;
+		int i;
 
-		if (copy_from_user(&lum, lump, sizeof(struct lmv_user_md)))
+		ulmv = (struct lmv_user_md __user *)arg;
+		if (copy_from_user(&lum, ulmv, sizeof(*ulmv)))
 			return -EFAULT;
 
-		if (lum.lum_magic != LMV_MAGIC_V1)
+		/*
+		 * lum_magic will indicate which stripe the ioctl will like
+		 * to get, LMV_MAGIC_V1 is for normal LMV stripe, LMV_USER_MAGIC
+		 * is for default LMV stripe
+		 */
+		if (lum.lum_magic == LMV_MAGIC_V1)
+			valid |= OBD_MD_MEA;
+		else if (lum.lum_magic == LMV_USER_MAGIC)
+			valid |= OBD_MD_DEFAULT_MEA;
+		else
 			return -EINVAL;
 
-		lum_size = lmv_user_md_size(1, LMV_MAGIC_V1);
+		rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize, &request,
+				      valid);
+		if (rc && rc != -ENODATA)
+			goto finish_req;
+
+		/* Get default LMV EA */
+		if (lum.lum_magic == LMV_USER_MAGIC) {
+			if (rc)
+				goto finish_req;
+
+			if (lmmsize > sizeof(*ulmv)) {
+				rc = -EINVAL;
+				goto finish_req;
+			}
+
+			if (copy_to_user(ulmv, lmm, lmmsize))
+				rc = -EFAULT;
+
+			goto finish_req;
+		}
+
+		/* Get normal LMV EA */
+		if (rc == -ENODATA) {
+			stripe_count = 1;
+		} else {
+			LASSERT(lmm);
+			stripe_count = lmv_mds_md_stripe_count_get(lmm);
+		}
+
+		lum_size = lmv_user_md_size(stripe_count, LMV_MAGIC_V1);
 		tmp = kzalloc(lum_size, GFP_NOFS);
 		if (!tmp) {
 			rc = -ENOMEM;
-			goto free_lmv;
+			goto finish_req;
 		}
 
-		*tmp = lum;
+		tmp->lum_magic = LMV_MAGIC_V1;
 		tmp->lum_stripe_count = 1;
-		mdtindex = ll_get_mdt_idx(inode);
-		if (mdtindex < 0) {
+		mdt_index = ll_get_mdt_idx(inode);
+		if (mdt_index < 0) {
 			rc = -ENOMEM;
-			goto free_lmv;
+			goto out_tmp;
+		}
+		tmp->lum_stripe_offset = mdt_index;
+		tmp->lum_objects[0].lum_mds = mdt_index;
+		tmp->lum_objects[0].lum_fid = *ll_inode2fid(inode);
+		for (i = 1; i < stripe_count; i++) {
+			struct lmv_mds_md_v1 *lmm1;
+
+			lmm1 = &lmm->lmv_md_v1;
+			mdt_index = ll_get_mdt_idx_by_fid(sbi,
+							  &lmm1->lmv_stripe_fids[i]);
+			if (mdt_index < 0) {
+				rc = mdt_index;
+				goto out_tmp;
+			}
+			tmp->lum_objects[i].lum_mds = mdt_index;
+			tmp->lum_objects[i].lum_fid = lmm1->lmv_stripe_fids[i];
+			tmp->lum_stripe_count++;
 		}
 
-		tmp->lum_stripe_offset = mdtindex;
-		tmp->lum_objects[0].lum_mds = mdtindex;
-		memcpy(&tmp->lum_objects[0].lum_fid, ll_inode2fid(inode),
-		       sizeof(struct lu_fid));
-		if (copy_to_user((void __user *)arg, tmp, lum_size)) {
+		if (copy_to_user(ulmv, tmp, lum_size)) {
 			rc = -EFAULT;
-			goto free_lmv;
+			goto out_tmp;
 		}
-free_lmv:
+out_tmp:
 		kfree(tmp);
+finish_req:
+		ptlrpc_req_finished(request);
 		return rc;
 	}
+
 	case LL_IOC_LOV_SWAP_LAYOUTS:
 		return -EPERM;
 	case LL_IOC_OBD_STATFS:
@@ -1484,7 +1585,8 @@ free_lmv:
 			rc = ll_lov_getstripe_ea_info(inode, filename, &lmm,
 						      &lmmsize, &request);
 		} else {
-			rc = ll_dir_getstripe(inode, &lmm, &lmmsize, &request);
+			rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize,
+					      &request, 0);
 		}
 
 		if (request) {
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index f3b8504..82c3a88 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -728,8 +728,8 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 			     struct ptlrpc_request **request);
 int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 		     int set_default);
-int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
-		     int *lmm_size, struct ptlrpc_request **request);
+int ll_dir_getstripe(struct inode *inode, void **lmmp, int *lmm_size,
+		     struct ptlrpc_request **request, u64 valid);
 int ll_fsync(struct file *file, loff_t start, loff_t end, int data);
 int ll_merge_attr(const struct lu_env *env, struct inode *inode);
 int ll_fid2path(struct inode *inode, void __user *arg);
diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index a02b802..aa0738b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -390,8 +390,8 @@ static int ll_xattr_get(const struct xattr_handler *handler,
 		lsm = ccc_inode_lsm_get(inode);
 		if (!lsm) {
 			if (S_ISDIR(inode->i_mode)) {
-				rc = ll_dir_getstripe(inode, &lmm,
-						      &lmmsize, &request);
+				rc = ll_dir_getstripe(inode, (void **)&lmm,
+						      &lmmsize, &request, 0);
 			} else {
 				rc = -ENODATA;
 			}
@@ -491,7 +491,8 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 		if (!ll_i2info(inode)->lli_has_smd)
 			rc2 = -1;
 	} else if (S_ISDIR(inode->i_mode)) {
-		rc2 = ll_dir_getstripe(inode, &lmm, &lmmsize, &request);
+		rc2 = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize,
+				       &request, 0);
 	}
 
 	if (rc2 < 0) {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 07e23d1..6ddc9c7 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1878,6 +1878,32 @@ void lustre_swab_lov_desc(struct lov_desc *ld)
 }
 EXPORT_SYMBOL(lustre_swab_lov_desc);
 
+/* This structure is always in little-endian */
+static void lustre_swab_lmv_mds_md_v1(struct lmv_mds_md_v1 *lmm1)
+{
+	int i;
+
+	__swab32s(&lmm1->lmv_magic);
+	__swab32s(&lmm1->lmv_stripe_count);
+	__swab32s(&lmm1->lmv_master_mdt_index);
+	__swab32s(&lmm1->lmv_hash_type);
+	__swab32s(&lmm1->lmv_layout_version);
+	for (i = 0; i < lmm1->lmv_stripe_count; i++)
+		lustre_swab_lu_fid(&lmm1->lmv_stripe_fids[i]);
+}
+
+void lustre_swab_lmv_mds_md(union lmv_mds_md *lmm)
+{
+	switch (lmm->lmv_magic) {
+	case LMV_MAGIC_V1:
+		lustre_swab_lmv_mds_md_v1(&lmm->lmv_md_v1);
+		break;
+	default:
+		break;
+	}
+}
+EXPORT_SYMBOL(lustre_swab_lmv_mds_md);
+
 void lustre_swab_lmv_user_md(struct lmv_user_md *lum)
 {
 	__swab32s(&lum->lum_magic);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 13/80] staging: lustre: llite: fix "getdirstripe" to show stripe info
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Fix "lfs getdirstripe", so it can show layout information
of striped directory

[root at testnode tests]# ../utils/lfs getdirstripe /mnt/lustre/test1
/mnt/lustre/test1
lmv_stripe_count: 2
lmv_stripe_offset: 0
mdtidx               FID[seq:oid:ver]
     0               [0x280000400:0x1:0x0]
     1               [0x2c0000400:0x1:0x0]

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-on: http://review.whamcloud.com/7228
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 +
 .../lustre/lustre/include/lustre/lustre_user.h     |    1 +
 drivers/staging/lustre/lustre/llite/dir.c          |  184 +++++++++++++++-----
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 +-
 drivers/staging/lustre/lustre/llite/xattr.c        |    7 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   26 +++
 6 files changed, 180 insertions(+), 46 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a612080..0ff30c6 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1728,6 +1728,8 @@ lov_mds_md_max_stripe_count(size_t buf_size, __u32 lmm_magic)
 #define OBD_MD_FLDATAVERSION (0x0010000000000000ULL) /* iversion sum */
 #define OBD_MD_FLRELEASED    (0x0020000000000000ULL) /* file released */
 
+#define OBD_MD_DEFAULT_MEA   (0x0040000000000000ULL) /* default MEA */
+
 #define OBD_MD_FLGETATTR (OBD_MD_FLID    | OBD_MD_FLATIME | OBD_MD_FLMTIME | \
 			  OBD_MD_FLCTIME | OBD_MD_FLSIZE  | OBD_MD_FLBLKSZ | \
 			  OBD_MD_FLMODE  | OBD_MD_FLTYPE  | OBD_MD_FLUID   | \
@@ -2543,6 +2545,8 @@ union lmv_mds_md {
 	struct lmv_user_md	lmv_user_md;
 };
 
+void lustre_swab_lmv_mds_md(union lmv_mds_md *lmm);
+
 static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic)
 {
 	ssize_t len = -EINVAL;
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index d496d0e..26dbda0 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -242,6 +242,7 @@ struct ost_id {
 #define LL_IOC_SET_LEASE		_IOWR('f', 243, long)
 #define LL_IOC_GET_LEASE		_IO('f', 244)
 #define LL_IOC_HSM_IMPORT		_IOWR('f', 245, struct hsm_user_import)
+#define LL_IOC_LMV_SET_DEFAULT_STRIPE	_IOWR('f', 246, struct lmv_user_md)
 
 #define LL_STATFS_LMV	   1
 #define LL_STATFS_LOV	   2
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index a0560b6..5288750 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -749,6 +749,13 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 			lum_size = sizeof(struct lov_user_md_v3);
 			break;
 		}
+		case LMV_USER_MAGIC: {
+			if (lump->lmm_magic != cpu_to_le32(LMV_USER_MAGIC))
+				lustre_swab_lmv_user_md(
+					(struct lmv_user_md *)lump);
+			lum_size = sizeof(struct lmv_user_md);
+			break;
+		}
 		default: {
 			CDEBUG(D_IOCTL, "bad userland LOV MAGIC: %#08x != %#08x nor %#08x\n",
 			       lump->lmm_magic, LOV_USER_MAGIC_V1,
@@ -819,8 +826,16 @@ end:
 	return rc;
 }
 
-int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
-		     int *lmm_size, struct ptlrpc_request **request)
+/**
+ * This function will be used to get default LOV/LMV/Default LMV
+ * @valid will be used to indicate which stripe it will retrieve
+ *	OBD_MD_MEA		LMV stripe EA
+ *	OBD_MD_DEFAULT_MEA	Default LMV stripe EA
+ *	otherwise		Default LOV EA.
+ * Each time, it can only retrieve 1 stripe EA
+ **/
+int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
+		     struct ptlrpc_request **request, u64 valid)
 {
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct mdt_body   *body;
@@ -829,7 +844,7 @@ int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 	int rc, lmmsize;
 	struct md_op_data *op_data;
 
-	rc = ll_get_default_mdsize(sbi, &lmmsize);
+	rc = ll_get_max_mdsize(sbi, &lmmsize);
 	if (rc)
 		return rc;
 
@@ -860,6 +875,7 @@ int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 
 	lmm = req_capsule_server_sized_get(&req->rq_pill,
 					   &RMF_MDT_MD, lmmsize);
+	LASSERT(lmm);
 
 	/*
 	 * This is coming from the MDS, so is probably in
@@ -876,40 +892,48 @@ int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 		if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC)
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
+	case LMV_USER_MAGIC:
+		if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC)
+			lustre_swab_lmv_user_md((struct lmv_user_md *)lmm);
+		break;
 	default:
 		CERROR("unknown magic: %lX\n", (unsigned long)lmm->lmm_magic);
 		rc = -EPROTO;
 	}
 out:
-	*lmmp = lmm;
-	*lmm_size = lmmsize;
+	*plmm = lmm;
+	*plmm_size = lmmsize;
 	*request = req;
 	return rc;
 }
 
-/*
- *  Get MDT index for the inode.
- */
-int ll_get_mdt_idx(struct inode *inode)
+static int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi,
+				 const struct lu_fid *fid)
 {
-	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct md_op_data *op_data;
-	int rc, mdtidx;
+	int mdt_index, rc;
 
-	op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0,
-				     0, LUSTRE_OPC_ANY, NULL);
-	if (IS_ERR(op_data))
-		return PTR_ERR(op_data);
+	op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
+	if (!op_data)
+		return -ENOMEM;
 
 	op_data->op_flags |= MF_GET_MDT_IDX;
+	op_data->op_fid1 = *fid;
 	rc = md_getattr(sbi->ll_md_exp, op_data, NULL);
-	mdtidx = op_data->op_mds;
-	ll_finish_md_op_data(op_data);
-	if (rc < 0) {
-		CDEBUG(D_INFO, "md_getattr_name: %d\n", rc);
+	mdt_index = op_data->op_mds;
+	kvfree(op_data);
+	if (rc < 0)
 		return rc;
-	}
-	return mdtidx;
+
+	return mdt_index;
+}
+
+/*
+ *  Get MDT index for the inode.
+ */
+int ll_get_mdt_idx(struct inode *inode)
+{
+	return ll_get_mdt_idx_by_fid(ll_i2sbi(inode), ll_inode2fid(inode));
 }
 
 /**
@@ -1391,6 +1415,22 @@ lmv_out_free:
 		obd_ioctl_freedata(buf, len);
 		return rc;
 	}
+	case LL_IOC_LMV_SET_DEFAULT_STRIPE: {
+		struct lmv_user_md __user *ulump;
+		struct lmv_user_md lum;
+		int rc;
+
+		ulump = (struct lmv_user_md __user *)arg;
+		if (copy_from_user(&lum, ulump, sizeof(lum)))
+			return -EFAULT;
+
+		if (lum.lum_magic != LMV_USER_MAGIC)
+			return -EINVAL;
+
+		rc = ll_dir_setstripe(inode, (struct lov_user_md *)&lum, 0);
+
+		return rc;
+	}
 	case LL_IOC_LOV_SETSTRIPE: {
 		struct lov_user_md_v3 lumv3;
 		struct lov_user_md_v1 *lumv1 = (struct lov_user_md_v1 *)&lumv3;
@@ -1420,46 +1460,107 @@ lmv_out_free:
 		return rc;
 	}
 	case LL_IOC_LMV_GETSTRIPE: {
-		struct lmv_user_md __user *lump = (void __user *)arg;
+		struct lmv_user_md __user *ulmv;
 		struct lmv_user_md lum;
-		struct lmv_user_md *tmp;
+		struct ptlrpc_request *request = NULL;
+		struct lmv_user_md *tmp = NULL;
+		union lmv_mds_md *lmm = NULL;
+		u64 valid = 0;
+		int stripe_count;
+		int mdt_index;
 		int lum_size;
-		int rc = 0;
-		int mdtindex;
+		int lmmsize;
+		int rc;
+		int i;
 
-		if (copy_from_user(&lum, lump, sizeof(struct lmv_user_md)))
+		ulmv = (struct lmv_user_md __user *)arg;
+		if (copy_from_user(&lum, ulmv, sizeof(*ulmv)))
 			return -EFAULT;
 
-		if (lum.lum_magic != LMV_MAGIC_V1)
+		/*
+		 * lum_magic will indicate which stripe the ioctl will like
+		 * to get, LMV_MAGIC_V1 is for normal LMV stripe, LMV_USER_MAGIC
+		 * is for default LMV stripe
+		 */
+		if (lum.lum_magic == LMV_MAGIC_V1)
+			valid |= OBD_MD_MEA;
+		else if (lum.lum_magic == LMV_USER_MAGIC)
+			valid |= OBD_MD_DEFAULT_MEA;
+		else
 			return -EINVAL;
 
-		lum_size = lmv_user_md_size(1, LMV_MAGIC_V1);
+		rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize, &request,
+				      valid);
+		if (rc && rc != -ENODATA)
+			goto finish_req;
+
+		/* Get default LMV EA */
+		if (lum.lum_magic == LMV_USER_MAGIC) {
+			if (rc)
+				goto finish_req;
+
+			if (lmmsize > sizeof(*ulmv)) {
+				rc = -EINVAL;
+				goto finish_req;
+			}
+
+			if (copy_to_user(ulmv, lmm, lmmsize))
+				rc = -EFAULT;
+
+			goto finish_req;
+		}
+
+		/* Get normal LMV EA */
+		if (rc == -ENODATA) {
+			stripe_count = 1;
+		} else {
+			LASSERT(lmm);
+			stripe_count = lmv_mds_md_stripe_count_get(lmm);
+		}
+
+		lum_size = lmv_user_md_size(stripe_count, LMV_MAGIC_V1);
 		tmp = kzalloc(lum_size, GFP_NOFS);
 		if (!tmp) {
 			rc = -ENOMEM;
-			goto free_lmv;
+			goto finish_req;
 		}
 
-		*tmp = lum;
+		tmp->lum_magic = LMV_MAGIC_V1;
 		tmp->lum_stripe_count = 1;
-		mdtindex = ll_get_mdt_idx(inode);
-		if (mdtindex < 0) {
+		mdt_index = ll_get_mdt_idx(inode);
+		if (mdt_index < 0) {
 			rc = -ENOMEM;
-			goto free_lmv;
+			goto out_tmp;
+		}
+		tmp->lum_stripe_offset = mdt_index;
+		tmp->lum_objects[0].lum_mds = mdt_index;
+		tmp->lum_objects[0].lum_fid = *ll_inode2fid(inode);
+		for (i = 1; i < stripe_count; i++) {
+			struct lmv_mds_md_v1 *lmm1;
+
+			lmm1 = &lmm->lmv_md_v1;
+			mdt_index = ll_get_mdt_idx_by_fid(sbi,
+							  &lmm1->lmv_stripe_fids[i]);
+			if (mdt_index < 0) {
+				rc = mdt_index;
+				goto out_tmp;
+			}
+			tmp->lum_objects[i].lum_mds = mdt_index;
+			tmp->lum_objects[i].lum_fid = lmm1->lmv_stripe_fids[i];
+			tmp->lum_stripe_count++;
 		}
 
-		tmp->lum_stripe_offset = mdtindex;
-		tmp->lum_objects[0].lum_mds = mdtindex;
-		memcpy(&tmp->lum_objects[0].lum_fid, ll_inode2fid(inode),
-		       sizeof(struct lu_fid));
-		if (copy_to_user((void __user *)arg, tmp, lum_size)) {
+		if (copy_to_user(ulmv, tmp, lum_size)) {
 			rc = -EFAULT;
-			goto free_lmv;
+			goto out_tmp;
 		}
-free_lmv:
+out_tmp:
 		kfree(tmp);
+finish_req:
+		ptlrpc_req_finished(request);
 		return rc;
 	}
+
 	case LL_IOC_LOV_SWAP_LAYOUTS:
 		return -EPERM;
 	case LL_IOC_OBD_STATFS:
@@ -1484,7 +1585,8 @@ free_lmv:
 			rc = ll_lov_getstripe_ea_info(inode, filename, &lmm,
 						      &lmmsize, &request);
 		} else {
-			rc = ll_dir_getstripe(inode, &lmm, &lmmsize, &request);
+			rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize,
+					      &request, 0);
 		}
 
 		if (request) {
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index f3b8504..82c3a88 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -728,8 +728,8 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 			     struct ptlrpc_request **request);
 int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 		     int set_default);
-int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
-		     int *lmm_size, struct ptlrpc_request **request);
+int ll_dir_getstripe(struct inode *inode, void **lmmp, int *lmm_size,
+		     struct ptlrpc_request **request, u64 valid);
 int ll_fsync(struct file *file, loff_t start, loff_t end, int data);
 int ll_merge_attr(const struct lu_env *env, struct inode *inode);
 int ll_fid2path(struct inode *inode, void __user *arg);
diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index a02b802..aa0738b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -390,8 +390,8 @@ static int ll_xattr_get(const struct xattr_handler *handler,
 		lsm = ccc_inode_lsm_get(inode);
 		if (!lsm) {
 			if (S_ISDIR(inode->i_mode)) {
-				rc = ll_dir_getstripe(inode, &lmm,
-						      &lmmsize, &request);
+				rc = ll_dir_getstripe(inode, (void **)&lmm,
+						      &lmmsize, &request, 0);
 			} else {
 				rc = -ENODATA;
 			}
@@ -491,7 +491,8 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 		if (!ll_i2info(inode)->lli_has_smd)
 			rc2 = -1;
 	} else if (S_ISDIR(inode->i_mode)) {
-		rc2 = ll_dir_getstripe(inode, &lmm, &lmmsize, &request);
+		rc2 = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize,
+				       &request, 0);
 	}
 
 	if (rc2 < 0) {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 07e23d1..6ddc9c7 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1878,6 +1878,32 @@ void lustre_swab_lov_desc(struct lov_desc *ld)
 }
 EXPORT_SYMBOL(lustre_swab_lov_desc);
 
+/* This structure is always in little-endian */
+static void lustre_swab_lmv_mds_md_v1(struct lmv_mds_md_v1 *lmm1)
+{
+	int i;
+
+	__swab32s(&lmm1->lmv_magic);
+	__swab32s(&lmm1->lmv_stripe_count);
+	__swab32s(&lmm1->lmv_master_mdt_index);
+	__swab32s(&lmm1->lmv_hash_type);
+	__swab32s(&lmm1->lmv_layout_version);
+	for (i = 0; i < lmm1->lmv_stripe_count; i++)
+		lustre_swab_lu_fid(&lmm1->lmv_stripe_fids[i]);
+}
+
+void lustre_swab_lmv_mds_md(union lmv_mds_md *lmm)
+{
+	switch (lmm->lmv_magic) {
+	case LMV_MAGIC_V1:
+		lustre_swab_lmv_mds_md_v1(&lmm->lmv_md_v1);
+		break;
+	default:
+		break;
+	}
+}
+EXPORT_SYMBOL(lustre_swab_lmv_mds_md);
+
 void lustre_swab_lmv_user_md(struct lmv_user_md *lum)
 {
 	__swab32s(&lum->lum_magic);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 14/80] staging: lustre: delete striped directory
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add delete striped directory, it includes

1. enable sync log between MDTs, so slave objects will
   be delete by unlink log, which is similar as deleting ost
   object.

2. retrieve layout information of striped directory on MDT,
   then lock all of the slave objects before unlink.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-on: http://review.whamcloud.com/7445
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_dlm.h |    1 +
 drivers/staging/lustre/lustre/include/lustre_fid.h |    1 +
 drivers/staging/lustre/lustre/llite/dir.c          |   10 ---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |    5 ++
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   75 +++++++++++++++-----
 5 files changed, 65 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index 60051a5..f7805cc 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -968,6 +968,7 @@ struct ldlm_enqueue_info {
 	void *ei_cb_cp;  /** lock completion callback */
 	void *ei_cb_gl;  /** lock glimpse callback */
 	void *ei_cbdata; /** Data to be passed into callbacks. */
+	unsigned int ei_enq_slave:1; /* whether enqueue slave stripes */
 };
 
 extern struct obd_ops ldlm_obd_ops;
diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index 743671a..61f3930 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -229,6 +229,7 @@ enum local_oid {
 	MDD_LOV_OBJ_OSEQ	= 4121UL,
 	LFSCK_NAMESPACE_OID     = 4122UL,
 	REMOTE_PARENT_DIR_OID	= 4123UL,
+	SLAVE_LLOG_CATALOGS_OID	= 4124UL,
 };
 
 static inline void lu_local_obj_fid(struct lu_fid *fid, __u32 oid)
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 5288750..96ae7d5 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -679,16 +679,6 @@ static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
 	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
 		return -EINVAL;
 
-	if (lump->lum_stripe_offset == (__u32)-1) {
-		int mdtidx;
-
-		mdtidx = ll_get_mdt_idx(dir);
-		if (mdtidx < 0)
-			return mdtidx;
-
-		lump->lum_stripe_offset = mdtidx;
-	}
-
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p) name %s stripe_offset %d, stripe_count: %u\n",
 	       PFID(ll_inode2fid(dir)), dir, filename,
 	       (int)lump->lum_stripe_offset, lump->lum_stripe_count);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 1b9bbb2..5313dfc 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -244,6 +244,11 @@ update:
 				if (req)
 					ptlrpc_req_finished(req);
 
+				if (it.it_lock_mode && lockh) {
+					ldlm_lock_decref(lockh, it.it_lock_mode);
+					it.it_lock_mode = 0;
+				}
+
 				rc = -EIO;
 				goto cleanup;
 			}
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index da4855d..81dcc0a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -101,6 +101,9 @@ int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
 		return -EINVAL;
 	}
 
+	CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name,
+	       hashtype, idx);
+
 	LASSERT(idx < max_mdt_index);
 	return idx;
 }
@@ -1230,7 +1233,16 @@ static int lmv_placement_policy(struct obd_device *obd,
 		struct lmv_user_md *lum;
 
 		lum = op_data->op_data;
-		*mds = lum->lum_stripe_offset;
+		if (lum->lum_stripe_offset != (__u32)-1) {
+			*mds = lum->lum_stripe_offset;
+		} else {
+			/*
+			 * -1 means default, which will be in the same MDT with
+			 * the stripe
+			 */
+			*mds = op_data->op_mds;
+			lum->lum_stripe_offset = op_data->op_mds;
+		}
 	} else {
 		/*
 		 * Allocate new fid on target according to operation type and
@@ -1646,12 +1658,28 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
  * For striped-directory, it will locate MDT by name. And also
  * it will reset op_fid1 with the FID of the chosen stripe.
  **/
+struct lmv_tgt_desc *
+lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
+			   const char *name, int namelen, struct lu_fid *fid,
+			   u32 *mds)
+{
+	const struct lmv_oinfo *oinfo;
+	struct lmv_tgt_desc *tgt;
+
+	oinfo = lsm_name_to_stripe_info(lsm, name, namelen);
+	*fid = oinfo->lmo_fid;
+	*mds = oinfo->lmo_mds;
+	tgt = lmv_get_target(lmv, *mds);
+
+	CDEBUG(D_INFO, "locate on mds %u "DFID"\n", *mds, PFID(fid));
+	return tgt;
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid)
 {
 	struct lmv_stripe_md *lsm = op_data->op_mea1;
-	const struct lmv_oinfo *oinfo;
 	struct lmv_tgt_desc *tgt;
 
 	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
@@ -1665,15 +1693,9 @@ struct lmv_tgt_desc
 		return tgt;
 	}
 
-	oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
-					op_data->op_namelen);
-	*fid = oinfo->lmo_fid;
-	op_data->op_mds = oinfo->lmo_mds;
-	tgt = lmv_get_target(lmv, op_data->op_mds);
-
-	CDEBUG(D_INFO, "locate on mds %u\n", op_data->op_mds);
-
-	return tgt;
+	return lmv_locate_target_for_name(lmv, lsm, op_data->op_name,
+					  op_data->op_namelen, fid,
+					  &op_data->op_mds);
 }
 
 static int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
@@ -2075,6 +2097,9 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 				      LCK_EX, MDS_INODELOCK_FULL,
 				      MF_MDC_CANCEL_FID4);
 
+	CDEBUG(D_INODE, DFID":m%d to "DFID"\n", PFID(&op_data->op_fid1),
+	       op_data->op_mds, PFID(&op_data->op_fid2));
+
 	if (rc == 0)
 		rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen,
 			       new, newlen, request);
@@ -2288,12 +2313,26 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 		return rc;
 retry:
 	/* Send unlink requests to the MDT where the child is located */
-	if (likely(!fid_is_zero(&op_data->op_fid2)))
-		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2);
-	else
+	if (likely(!fid_is_zero(&op_data->op_fid2))) {
+		tgt = lmv_find_target(lmv, &op_data->op_fid2);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		/* For striped dir, we need to locate the parent as well */
+		if (op_data->op_mea1 &&
+		    op_data->op_mea1->lsm_md_stripe_count > 1) {
+			LASSERT(op_data->op_name && op_data->op_namelen);
+			lmv_locate_target_for_name(lmv, op_data->op_mea1,
+						   op_data->op_name,
+						   op_data->op_namelen,
+						   &op_data->op_fid1,
+						   &op_data->op_mds);
+		}
+	} else {
 		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(tgt))
-		return PTR_ERR(tgt);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
 
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
@@ -2799,8 +2838,10 @@ static int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md)
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt = lmv->tgts[0];
 
-	if (md->lmv)
+	if (md->lmv) {
 		lmv_free_memmd(md->lmv);
+		md->lmv = NULL;
+	}
 	if (!tgt || !tgt->ltd_exp)
 		return -EINVAL;
 	return md_free_lustre_md(tgt->ltd_exp, md);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 14/80] staging: lustre: delete striped directory
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add delete striped directory, it includes

1. enable sync log between MDTs, so slave objects will
   be delete by unlink log, which is similar as deleting ost
   object.

2. retrieve layout information of striped directory on MDT,
   then lock all of the slave objects before unlink.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-on: http://review.whamcloud.com/7445
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_dlm.h |    1 +
 drivers/staging/lustre/lustre/include/lustre_fid.h |    1 +
 drivers/staging/lustre/lustre/llite/dir.c          |   10 ---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |    5 ++
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   75 +++++++++++++++-----
 5 files changed, 65 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index 60051a5..f7805cc 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -968,6 +968,7 @@ struct ldlm_enqueue_info {
 	void *ei_cb_cp;  /** lock completion callback */
 	void *ei_cb_gl;  /** lock glimpse callback */
 	void *ei_cbdata; /** Data to be passed into callbacks. */
+	unsigned int ei_enq_slave:1; /* whether enqueue slave stripes */
 };
 
 extern struct obd_ops ldlm_obd_ops;
diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index 743671a..61f3930 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -229,6 +229,7 @@ enum local_oid {
 	MDD_LOV_OBJ_OSEQ	= 4121UL,
 	LFSCK_NAMESPACE_OID     = 4122UL,
 	REMOTE_PARENT_DIR_OID	= 4123UL,
+	SLAVE_LLOG_CATALOGS_OID	= 4124UL,
 };
 
 static inline void lu_local_obj_fid(struct lu_fid *fid, __u32 oid)
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 5288750..96ae7d5 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -679,16 +679,6 @@ static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
 	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
 		return -EINVAL;
 
-	if (lump->lum_stripe_offset == (__u32)-1) {
-		int mdtidx;
-
-		mdtidx = ll_get_mdt_idx(dir);
-		if (mdtidx < 0)
-			return mdtidx;
-
-		lump->lum_stripe_offset = mdtidx;
-	}
-
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p) name %s stripe_offset %d, stripe_count: %u\n",
 	       PFID(ll_inode2fid(dir)), dir, filename,
 	       (int)lump->lum_stripe_offset, lump->lum_stripe_count);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 1b9bbb2..5313dfc 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -244,6 +244,11 @@ update:
 				if (req)
 					ptlrpc_req_finished(req);
 
+				if (it.it_lock_mode && lockh) {
+					ldlm_lock_decref(lockh, it.it_lock_mode);
+					it.it_lock_mode = 0;
+				}
+
 				rc = -EIO;
 				goto cleanup;
 			}
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index da4855d..81dcc0a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -101,6 +101,9 @@ int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
 		return -EINVAL;
 	}
 
+	CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name,
+	       hashtype, idx);
+
 	LASSERT(idx < max_mdt_index);
 	return idx;
 }
@@ -1230,7 +1233,16 @@ static int lmv_placement_policy(struct obd_device *obd,
 		struct lmv_user_md *lum;
 
 		lum = op_data->op_data;
-		*mds = lum->lum_stripe_offset;
+		if (lum->lum_stripe_offset != (__u32)-1) {
+			*mds = lum->lum_stripe_offset;
+		} else {
+			/*
+			 * -1 means default, which will be in the same MDT with
+			 * the stripe
+			 */
+			*mds = op_data->op_mds;
+			lum->lum_stripe_offset = op_data->op_mds;
+		}
 	} else {
 		/*
 		 * Allocate new fid on target according to operation type and
@@ -1646,12 +1658,28 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
  * For striped-directory, it will locate MDT by name. And also
  * it will reset op_fid1 with the FID of the chosen stripe.
  **/
+struct lmv_tgt_desc *
+lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
+			   const char *name, int namelen, struct lu_fid *fid,
+			   u32 *mds)
+{
+	const struct lmv_oinfo *oinfo;
+	struct lmv_tgt_desc *tgt;
+
+	oinfo = lsm_name_to_stripe_info(lsm, name, namelen);
+	*fid = oinfo->lmo_fid;
+	*mds = oinfo->lmo_mds;
+	tgt = lmv_get_target(lmv, *mds);
+
+	CDEBUG(D_INFO, "locate on mds %u "DFID"\n", *mds, PFID(fid));
+	return tgt;
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid)
 {
 	struct lmv_stripe_md *lsm = op_data->op_mea1;
-	const struct lmv_oinfo *oinfo;
 	struct lmv_tgt_desc *tgt;
 
 	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
@@ -1665,15 +1693,9 @@ struct lmv_tgt_desc
 		return tgt;
 	}
 
-	oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
-					op_data->op_namelen);
-	*fid = oinfo->lmo_fid;
-	op_data->op_mds = oinfo->lmo_mds;
-	tgt = lmv_get_target(lmv, op_data->op_mds);
-
-	CDEBUG(D_INFO, "locate on mds %u\n", op_data->op_mds);
-
-	return tgt;
+	return lmv_locate_target_for_name(lmv, lsm, op_data->op_name,
+					  op_data->op_namelen, fid,
+					  &op_data->op_mds);
 }
 
 static int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
@@ -2075,6 +2097,9 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 				      LCK_EX, MDS_INODELOCK_FULL,
 				      MF_MDC_CANCEL_FID4);
 
+	CDEBUG(D_INODE, DFID":m%d to "DFID"\n", PFID(&op_data->op_fid1),
+	       op_data->op_mds, PFID(&op_data->op_fid2));
+
 	if (rc == 0)
 		rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen,
 			       new, newlen, request);
@@ -2288,12 +2313,26 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 		return rc;
 retry:
 	/* Send unlink requests to the MDT where the child is located */
-	if (likely(!fid_is_zero(&op_data->op_fid2)))
-		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid2);
-	else
+	if (likely(!fid_is_zero(&op_data->op_fid2))) {
+		tgt = lmv_find_target(lmv, &op_data->op_fid2);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		/* For striped dir, we need to locate the parent as well */
+		if (op_data->op_mea1 &&
+		    op_data->op_mea1->lsm_md_stripe_count > 1) {
+			LASSERT(op_data->op_name && op_data->op_namelen);
+			lmv_locate_target_for_name(lmv, op_data->op_mea1,
+						   op_data->op_name,
+						   op_data->op_namelen,
+						   &op_data->op_fid1,
+						   &op_data->op_mds);
+		}
+	} else {
 		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(tgt))
-		return PTR_ERR(tgt);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
 
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
@@ -2799,8 +2838,10 @@ static int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md)
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt = lmv->tgts[0];
 
-	if (md->lmv)
+	if (md->lmv) {
 		lmv_free_memmd(md->lmv);
+		md->lmv = NULL;
+	}
 	if (!tgt || !tgt->ltd_exp)
 		return -EINVAL;
 	return md_free_lustre_md(tgt->ltd_exp, md);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 15/80] staging: lustre: obdclass: fix lmd_parse() to handle comma-separated NIDs
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Jian Yu,
	James Simmons

From: Jian Yu <jian.yu@intel.com>

This patch handles the  upgrade situation that old mountdata already
contains comma-separated NIDs. The correct way to fix the original
issue is to parse comma-separated NIDs in lmd_parse().

Signed-off-by: Jian Yu <jian.yu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4460
Reviewed-on: http://review.whamcloud.com/8918
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index aa84a50..4931e37 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -880,7 +880,7 @@ static int lmd_parse_mgs(struct lustre_mount_data *lmd, char **ptr)
  */
 static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 {
-	char *s1, *s2, *devname = NULL;
+	char *s1, *s2, *s3, *devname = NULL;
 	struct lustre_mount_data *raw = (struct lustre_mount_data *)options;
 	int rc = 0;
 
@@ -913,6 +913,7 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 		/* Skip whitespace and extra commas */
 		while (*s1 == ' ' || *s1 == ',')
 			s1++;
+		s3 = s1;
 
 		/* Client options are parsed in ll_options: eg. flock,
 		 * user_xattr, acl
@@ -970,6 +971,7 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			rc = lmd_parse_mgssec(lmd, s1 + 7);
 			if (rc)
 				goto invalid;
+			s3 = s2;
 			clear++;
 		/* ost exclusion list */
 		} else if (strncmp(s1, "exclude=", 8) == 0) {
@@ -990,10 +992,19 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			size_t length, params_length;
 			char *tail = strchr(s1 + 6, ',');
 
-			if (!tail)
+			if (!tail) {
 				length = strlen(s1);
-			else
-				length = tail - s1;
+			} else {
+				lnet_nid_t nid;
+				char *param_str = tail + 1;
+				int supplementary = 1;
+
+				while (!class_parse_nid_quiet(param_str, &nid,
+							      &param_str)) {
+					supplementary = 0;
+				}
+				length = param_str - s1 - supplementary;
+			}
 			length -= 6;
 			params_length = strlen(lmd->lmd_params);
 			if (params_length + length + 1 >= LMD_PARAMS_MAXLEN)
@@ -1001,6 +1012,7 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			strncat(lmd->lmd_params, s1 + 6, length);
 			lmd->lmd_params[params_length + length] = '\0';
 			strlcat(lmd->lmd_params, " ", LMD_PARAMS_MAXLEN);
+			s3 = s1 + 6 + length;
 			clear++;
 		} else if (strncmp(s1, "osd=", 4) == 0) {
 			rc = lmd_parse_string(&lmd->lmd_osd_type, s1 + 4);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 15/80] staging: lustre: obdclass: fix lmd_parse() to handle comma-separated NIDs
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Jian Yu,
	James Simmons

From: Jian Yu <jian.yu@intel.com>

This patch handles the  upgrade situation that old mountdata already
contains comma-separated NIDs. The correct way to fix the original
issue is to parse comma-separated NIDs in lmd_parse().

Signed-off-by: Jian Yu <jian.yu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4460
Reviewed-on: http://review.whamcloud.com/8918
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index aa84a50..4931e37 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -880,7 +880,7 @@ static int lmd_parse_mgs(struct lustre_mount_data *lmd, char **ptr)
  */
 static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 {
-	char *s1, *s2, *devname = NULL;
+	char *s1, *s2, *s3, *devname = NULL;
 	struct lustre_mount_data *raw = (struct lustre_mount_data *)options;
 	int rc = 0;
 
@@ -913,6 +913,7 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 		/* Skip whitespace and extra commas */
 		while (*s1 == ' ' || *s1 == ',')
 			s1++;
+		s3 = s1;
 
 		/* Client options are parsed in ll_options: eg. flock,
 		 * user_xattr, acl
@@ -970,6 +971,7 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			rc = lmd_parse_mgssec(lmd, s1 + 7);
 			if (rc)
 				goto invalid;
+			s3 = s2;
 			clear++;
 		/* ost exclusion list */
 		} else if (strncmp(s1, "exclude=", 8) == 0) {
@@ -990,10 +992,19 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			size_t length, params_length;
 			char *tail = strchr(s1 + 6, ',');
 
-			if (!tail)
+			if (!tail) {
 				length = strlen(s1);
-			else
-				length = tail - s1;
+			} else {
+				lnet_nid_t nid;
+				char *param_str = tail + 1;
+				int supplementary = 1;
+
+				while (!class_parse_nid_quiet(param_str, &nid,
+							      &param_str)) {
+					supplementary = 0;
+				}
+				length = param_str - s1 - supplementary;
+			}
 			length -= 6;
 			params_length = strlen(lmd->lmd_params);
 			if (params_length + length + 1 >= LMD_PARAMS_MAXLEN)
@@ -1001,6 +1012,7 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			strncat(lmd->lmd_params, s1 + 6, length);
 			lmd->lmd_params[params_length + length] = '\0';
 			strlcat(lmd->lmd_params, " ", LMD_PARAMS_MAXLEN);
+			s3 = s1 + 6 + length;
 			clear++;
 		} else if (strncmp(s1, "osd=", 4) == 0) {
 			rc = lmd_parse_string(&lmd->lmd_osd_type, s1 + 4);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 16/80] staging: lustre: obdclass: bug fixes for lu_device_type handling
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

There was no protection when inc/dec lu_device_type::ldt_device_nr,
which may caused the ldt_device_nr to be wrong and trigger assert.
This patch redefine lu_device_type::ldt_device_nr as atomic type.

There was no protection when add/del lu_device_type::ldt_linkage
into/from the global lu_device_types list, which may caused bad
address accessing. This patch uses the existing obd_types_lock
to protect related operations.

We do NOT need lu_types_stop() any longer. Such function scans
the global lu_device_types list, and for each type item on it
which has zerod lu_device_type::ldt_device_nr, call its stop()
method. In fact, the lu_device_type::ldt_device_nr only will be
zero when the last lu_device_fini() is called, and at that time,
inside the lu_device_fini(), its stop() method will be called.
So it is unnecessary to call the stop() again via lu_types_stop().

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4604
Reviewed-on: http://review.whamcloud.com/8694
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lu_object.h  |    3 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |    6 ---
 drivers/staging/lustre/lustre/obdclass/lu_object.c |   34 ++++++++++----------
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |    1 -
 4 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
index 6e25c1b..25c12d8 100644
--- a/drivers/staging/lustre/lustre/include/lu_object.h
+++ b/drivers/staging/lustre/lustre/include/lu_object.h
@@ -327,7 +327,7 @@ struct lu_device_type {
 	/**
 	 * Number of existing device type instances.
 	 */
-	unsigned				ldt_device_nr;
+	atomic_t				ldt_device_nr;
 	/**
 	 * Linkage into a global list of all device types.
 	 *
@@ -673,7 +673,6 @@ void lu_object_add(struct lu_object *before, struct lu_object *o);
 
 int  lu_device_type_init(struct lu_device_type *ldt);
 void lu_device_type_fini(struct lu_device_type *ldt);
-void lu_types_stop(void);
 
 /** @} ctors */
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index e623216..771c0bd 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -368,12 +368,6 @@ int cl_sb_fini(struct super_block *sb)
 		CERROR("Cannot cleanup cl-stack due to memory shortage.\n");
 		result = PTR_ERR(env);
 	}
-	/*
-	 * If mount failed (sbi->ll_cl == NULL), and this there are no other
-	 * mounts, stop device types manually (this usually happens
-	 * automatically when last device is destroyed).
-	 */
-	lu_types_stop();
 	return result;
 }
 
diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 9b03059..0c00bf8 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -726,34 +726,31 @@ int lu_device_type_init(struct lu_device_type *ldt)
 {
 	int result = 0;
 
+	atomic_set(&ldt->ldt_device_nr, 0);
 	INIT_LIST_HEAD(&ldt->ldt_linkage);
 	if (ldt->ldt_ops->ldto_init)
 		result = ldt->ldt_ops->ldto_init(ldt);
-	if (result == 0)
+
+	if (!result) {
+		spin_lock(&obd_types_lock);
 		list_add(&ldt->ldt_linkage, &lu_device_types);
+		spin_unlock(&obd_types_lock);
+	}
+
 	return result;
 }
 EXPORT_SYMBOL(lu_device_type_init);
 
 void lu_device_type_fini(struct lu_device_type *ldt)
 {
+	spin_lock(&obd_types_lock);
 	list_del_init(&ldt->ldt_linkage);
+	spin_unlock(&obd_types_lock);
 	if (ldt->ldt_ops->ldto_fini)
 		ldt->ldt_ops->ldto_fini(ldt);
 }
 EXPORT_SYMBOL(lu_device_type_fini);
 
-void lu_types_stop(void)
-{
-	struct lu_device_type *ldt;
-
-	list_for_each_entry(ldt, &lu_device_types, ldt_linkage) {
-		if (ldt->ldt_device_nr == 0 && ldt->ldt_ops->ldto_stop)
-			ldt->ldt_ops->ldto_stop(ldt);
-	}
-}
-EXPORT_SYMBOL(lu_types_stop);
-
 /**
  * Global list of all sites on this node
  */
@@ -1082,8 +1079,10 @@ EXPORT_SYMBOL(lu_device_put);
  */
 int lu_device_init(struct lu_device *d, struct lu_device_type *t)
 {
-	if (t->ldt_device_nr++ == 0 && t->ldt_ops->ldto_start)
+	if (atomic_inc_return(&t->ldt_device_nr) == 1 &&
+	    t->ldt_ops->ldto_start)
 		t->ldt_ops->ldto_start(t);
+
 	memset(d, 0, sizeof(*d));
 	atomic_set(&d->ld_ref, 0);
 	d->ld_type = t;
@@ -1098,9 +1097,8 @@ EXPORT_SYMBOL(lu_device_init);
  */
 void lu_device_fini(struct lu_device *d)
 {
-	struct lu_device_type *t;
+	struct lu_device_type *t = d->ld_type;
 
-	t = d->ld_type;
 	if (d->ld_obd) {
 		d->ld_obd->obd_lu_dev = NULL;
 		d->ld_obd = NULL;
@@ -1109,8 +1107,10 @@ void lu_device_fini(struct lu_device *d)
 	lu_ref_fini(&d->ld_reference);
 	LASSERTF(atomic_read(&d->ld_ref) == 0,
 		 "Refcount is %u\n", atomic_read(&d->ld_ref));
-	LASSERT(t->ldt_device_nr > 0);
-	if (--t->ldt_device_nr == 0 && t->ldt_ops->ldto_stop)
+	LASSERT(atomic_read(&t->ldt_device_nr) > 0);
+
+	if (atomic_dec_and_test(&t->ldt_device_nr) &&
+	    t->ldt_ops->ldto_stop)
 		t->ldt_ops->ldto_stop(t);
 }
 EXPORT_SYMBOL(lu_device_fini);
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 4931e37..33d6c42 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -670,7 +670,6 @@ int lustre_common_put_super(struct super_block *sb)
 	}
 	/* Drop a ref to the mounted disk */
 	lustre_put_lsi(sb);
-	lu_types_stop();
 	return rc;
 }
 EXPORT_SYMBOL(lustre_common_put_super);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 16/80] staging: lustre: obdclass: bug fixes for lu_device_type handling
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

There was no protection when inc/dec lu_device_type::ldt_device_nr,
which may caused the ldt_device_nr to be wrong and trigger assert.
This patch redefine lu_device_type::ldt_device_nr as atomic type.

There was no protection when add/del lu_device_type::ldt_linkage
into/from the global lu_device_types list, which may caused bad
address accessing. This patch uses the existing obd_types_lock
to protect related operations.

We do NOT need lu_types_stop() any longer. Such function scans
the global lu_device_types list, and for each type item on it
which has zerod lu_device_type::ldt_device_nr, call its stop()
method. In fact, the lu_device_type::ldt_device_nr only will be
zero when the last lu_device_fini() is called, and at that time,
inside the lu_device_fini(), its stop() method will be called.
So it is unnecessary to call the stop() again via lu_types_stop().

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4604
Reviewed-on: http://review.whamcloud.com/8694
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lu_object.h  |    3 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |    6 ---
 drivers/staging/lustre/lustre/obdclass/lu_object.c |   34 ++++++++++----------
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |    1 -
 4 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
index 6e25c1b..25c12d8 100644
--- a/drivers/staging/lustre/lustre/include/lu_object.h
+++ b/drivers/staging/lustre/lustre/include/lu_object.h
@@ -327,7 +327,7 @@ struct lu_device_type {
 	/**
 	 * Number of existing device type instances.
 	 */
-	unsigned				ldt_device_nr;
+	atomic_t				ldt_device_nr;
 	/**
 	 * Linkage into a global list of all device types.
 	 *
@@ -673,7 +673,6 @@ void lu_object_add(struct lu_object *before, struct lu_object *o);
 
 int  lu_device_type_init(struct lu_device_type *ldt);
 void lu_device_type_fini(struct lu_device_type *ldt);
-void lu_types_stop(void);
 
 /** @} ctors */
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index e623216..771c0bd 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -368,12 +368,6 @@ int cl_sb_fini(struct super_block *sb)
 		CERROR("Cannot cleanup cl-stack due to memory shortage.\n");
 		result = PTR_ERR(env);
 	}
-	/*
-	 * If mount failed (sbi->ll_cl == NULL), and this there are no other
-	 * mounts, stop device types manually (this usually happens
-	 * automatically when last device is destroyed).
-	 */
-	lu_types_stop();
 	return result;
 }
 
diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 9b03059..0c00bf8 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -726,34 +726,31 @@ int lu_device_type_init(struct lu_device_type *ldt)
 {
 	int result = 0;
 
+	atomic_set(&ldt->ldt_device_nr, 0);
 	INIT_LIST_HEAD(&ldt->ldt_linkage);
 	if (ldt->ldt_ops->ldto_init)
 		result = ldt->ldt_ops->ldto_init(ldt);
-	if (result == 0)
+
+	if (!result) {
+		spin_lock(&obd_types_lock);
 		list_add(&ldt->ldt_linkage, &lu_device_types);
+		spin_unlock(&obd_types_lock);
+	}
+
 	return result;
 }
 EXPORT_SYMBOL(lu_device_type_init);
 
 void lu_device_type_fini(struct lu_device_type *ldt)
 {
+	spin_lock(&obd_types_lock);
 	list_del_init(&ldt->ldt_linkage);
+	spin_unlock(&obd_types_lock);
 	if (ldt->ldt_ops->ldto_fini)
 		ldt->ldt_ops->ldto_fini(ldt);
 }
 EXPORT_SYMBOL(lu_device_type_fini);
 
-void lu_types_stop(void)
-{
-	struct lu_device_type *ldt;
-
-	list_for_each_entry(ldt, &lu_device_types, ldt_linkage) {
-		if (ldt->ldt_device_nr == 0 && ldt->ldt_ops->ldto_stop)
-			ldt->ldt_ops->ldto_stop(ldt);
-	}
-}
-EXPORT_SYMBOL(lu_types_stop);
-
 /**
  * Global list of all sites on this node
  */
@@ -1082,8 +1079,10 @@ EXPORT_SYMBOL(lu_device_put);
  */
 int lu_device_init(struct lu_device *d, struct lu_device_type *t)
 {
-	if (t->ldt_device_nr++ == 0 && t->ldt_ops->ldto_start)
+	if (atomic_inc_return(&t->ldt_device_nr) == 1 &&
+	    t->ldt_ops->ldto_start)
 		t->ldt_ops->ldto_start(t);
+
 	memset(d, 0, sizeof(*d));
 	atomic_set(&d->ld_ref, 0);
 	d->ld_type = t;
@@ -1098,9 +1097,8 @@ EXPORT_SYMBOL(lu_device_init);
  */
 void lu_device_fini(struct lu_device *d)
 {
-	struct lu_device_type *t;
+	struct lu_device_type *t = d->ld_type;
 
-	t = d->ld_type;
 	if (d->ld_obd) {
 		d->ld_obd->obd_lu_dev = NULL;
 		d->ld_obd = NULL;
@@ -1109,8 +1107,10 @@ void lu_device_fini(struct lu_device *d)
 	lu_ref_fini(&d->ld_reference);
 	LASSERTF(atomic_read(&d->ld_ref) == 0,
 		 "Refcount is %u\n", atomic_read(&d->ld_ref));
-	LASSERT(t->ldt_device_nr > 0);
-	if (--t->ldt_device_nr == 0 && t->ldt_ops->ldto_stop)
+	LASSERT(atomic_read(&t->ldt_device_nr) > 0);
+
+	if (atomic_dec_and_test(&t->ldt_device_nr) &&
+	    t->ldt_ops->ldto_stop)
 		t->ldt_ops->ldto_stop(t);
 }
 EXPORT_SYMBOL(lu_device_fini);
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 4931e37..33d6c42 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -670,7 +670,6 @@ int lustre_common_put_super(struct super_block *sb)
 	}
 	/* Drop a ref to the mounted disk */
 	lustre_put_lsi(sb);
-	lu_types_stop();
 	return rc;
 }
 EXPORT_SYMBOL(lustre_common_put_super);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 17/80] staging: lustre: add ability to migrate inodes.
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add client support to migrate the individual inodes
from one MDT to another MDT, and this functionality
will only migrate inode layout on MDT but not touch
data object on OST.

The directory will be migrated from top to the bottom,
i.e. migrating parent first, then migrating the child.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2430
Reviewed-on: http://review.whamcloud.com/6662
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   11 +-
 .../lustre/lustre/include/lustre/lustre_user.h     |    1 +
 drivers/staging/lustre/lustre/include/lustre_lmv.h |    2 +
 drivers/staging/lustre/lustre/include/obd.h        |   12 +--
 drivers/staging/lustre/lustre/llite/dir.c          |   43 +++++-
 drivers/staging/lustre/lustre/llite/file.c         |  113 ++++++++++++-
 .../staging/lustre/lustre/llite/llite_internal.h   |   14 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   33 ++++-
 drivers/staging/lustre/lustre/llite/namei.c        |    3 +-
 drivers/staging/lustre/lustre/llite/rw.c           |    4 +
 drivers/staging/lustre/lustre/llite/statahead.c    |    1 +
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   32 +++-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  176 ++++++++++++++------
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |    2 +
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    4 +-
 15 files changed, 368 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 0ff30c6..6853f62 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1482,6 +1482,7 @@ enum obdo_flags {
 #define LOV_MAGIC	 LOV_MAGIC_V1
 #define LOV_MAGIC_JOIN_V1 0x0BD20BD0
 #define LOV_MAGIC_V3      0x0BD30BD0
+#define LOV_MAGIC_MIGRATE 0x0BD40BD0
 
 /*
  * magic for fully defined striping
@@ -1987,7 +1988,7 @@ enum mdt_reint_cmd {
 	REINT_OPEN     = 6,
 	REINT_SETXATTR = 7,
 	REINT_RMENTRY  = 8,
-/*      REINT_WRITE    = 9, */
+	REINT_MIGRATE  = 9,
 	REINT_MAX
 };
 
@@ -2280,6 +2281,7 @@ enum mds_op_bias {
 	MDS_CREATE_VOLATILE	= 1 << 10,
 	MDS_OWNEROVERRIDE	= 1 << 11,
 	MDS_HSM_RELEASE		= 1 << 12,
+	MDS_RENAME_MIGRATE	= BIT(13),
 };
 
 /* instance of mdt_reint_rec */
@@ -2488,11 +2490,13 @@ struct lmv_desc {
 /* lmv structures */
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
+#define LMV_MAGIC_MIGRATE	0x0CD30CD0	/* migrate stripe lmv magic */
 #define LMV_MAGIC	LMV_MAGIC_V1
 
 enum lmv_hash_type {
 	LMV_HASH_TYPE_ALL_CHARS = 1,
 	LMV_HASH_TYPE_FNV_1A_64 = 2,
+	LMV_HASH_TYPE_MIGRATION = 3,
 };
 
 #define LMV_HASH_NAME_ALL_CHARS		"all_char"
@@ -2552,7 +2556,8 @@ static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic)
 	ssize_t len = -EINVAL;
 
 	switch (lmm_magic) {
-	case LMV_MAGIC_V1: {
+	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE: {
 		struct lmv_mds_md_v1 *lmm1;
 
 		len = sizeof(*lmm1);
@@ -2568,6 +2573,7 @@ static inline int lmv_mds_md_stripe_count_get(const union lmv_mds_md *lmm)
 {
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		return le32_to_cpu(lmm->lmv_md_v1.lmv_stripe_count);
 	case LMV_USER_MAGIC:
 		return le32_to_cpu(lmm->lmv_user_md.lum_stripe_count);
@@ -2583,6 +2589,7 @@ static inline int lmv_mds_md_stripe_count_set(union lmv_mds_md *lmm,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		lmm->lmv_md_v1.lmv_stripe_count = cpu_to_le32(stripe_count);
 		break;
 	case LMV_USER_MAGIC:
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 26dbda0..4746320 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -243,6 +243,7 @@ struct ost_id {
 #define LL_IOC_GET_LEASE		_IO('f', 244)
 #define LL_IOC_HSM_IMPORT		_IOWR('f', 245, struct hsm_user_import)
 #define LL_IOC_LMV_SET_DEFAULT_STRIPE	_IOWR('f', 246, struct lmv_user_md)
+#define LL_IOC_MIGRATE			_IOR('f', 247, int)
 
 #define LL_STATFS_LMV	   1
 #define LL_STATFS_LOV	   2
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 4036fce..feee981 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -106,6 +106,7 @@ static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
 {
 	switch (lmv_src->lmv_magic) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
@@ -118,6 +119,7 @@ static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
 {
 	switch (le32_to_cpu(lmv_src->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		lmv1_le_to_cpu(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index a9f4e13..f5eeb05 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -847,9 +847,6 @@ struct md_op_data {
 	/* Various operation flags. */
 	enum mds_op_bias        op_bias;
 
-	/* Operation type */
-	__u32		   op_opc;
-
 	/* Used by readdir */
 	__u64		   op_offset;
 
@@ -871,6 +868,7 @@ enum op_cli_flags {
 	CLI_RM_ENTRY	= 1 << 1,
 	CLI_HASH64	= BIT(2),
 	CLI_API32	= BIT(3),
+	CLI_MIGRATE	= BIT(4),
 };
 
 struct md_enqueue_info;
@@ -1013,14 +1011,6 @@ struct obd_ops {
 	 */
 };
 
-enum {
-	LUSTRE_OPC_MKDIR    = (1 << 0),
-	LUSTRE_OPC_SYMLINK  = (1 << 1),
-	LUSTRE_OPC_MKNOD    = (1 << 2),
-	LUSTRE_OPC_CREATE   = (1 << 3),
-	LUSTRE_OPC_ANY      = (1 << 4)
-};
-
 /* lmv structures */
 struct lustre_md {
 	struct mdt_body	 *body;
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 96ae7d5..ef7322e 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -883,6 +883,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
 	case LMV_USER_MAGIC:
+	case LMV_MAGIC_MIGRATE:
 		if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC)
 			lustre_swab_lmv_user_md((struct lmv_user_md *)lmm);
 		break;
@@ -897,8 +898,7 @@ out:
 	return rc;
 }
 
-static int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi,
-				 const struct lu_fid *fid)
+int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid)
 {
 	struct md_op_data *op_data;
 	int mdt_index, rc;
@@ -1960,6 +1960,45 @@ out_quotactl:
 		kfree(copy);
 		return rc;
 	}
+	case LL_IOC_MIGRATE: {
+		char *buf = NULL;
+		const char *filename;
+		int namelen = 0;
+		int len;
+		int rc;
+		int mdtidx;
+
+		rc = obd_ioctl_getdata(&buf, &len, (void __user *)arg);
+		if (rc < 0)
+			return rc;
+
+		data = (struct obd_ioctl_data *)buf;
+		if (!data->ioc_inlbuf1 || !data->ioc_inlbuf2 ||
+		    !data->ioc_inllen1 || !data->ioc_inllen2) {
+			rc = -EINVAL;
+			goto migrate_free;
+		}
+
+		filename = data->ioc_inlbuf1;
+		namelen = data->ioc_inllen1;
+		if (namelen < 1) {
+			rc = -EINVAL;
+			goto migrate_free;
+		}
+
+		if (data->ioc_inllen2 != sizeof(mdtidx)) {
+			rc = -EINVAL;
+			goto migrate_free;
+		}
+		mdtidx = *(int *)data->ioc_inlbuf2;
+
+		rc = ll_migrate(inode, file, mdtidx, filename, namelen);
+migrate_free:
+		obd_ioctl_freedata(buf, len);
+
+		return rc;
+	}
+
 	default:
 		return obd_iocontrol(cmd, sbi->ll_dt_exp, 0, NULL,
 				     (void __user *)arg);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 18fb713..8d98db6 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -364,7 +364,8 @@ int ll_file_release(struct inode *inode, struct file *file)
 	}
 
 	if (!S_ISDIR(inode->i_mode)) {
-		lov_read_and_clear_async_rc(lli->lli_clob);
+		if (lli->lli_clob)
+			lov_read_and_clear_async_rc(lli->lli_clob);
 		lli->lli_async_rc = 0;
 	}
 
@@ -2593,9 +2594,11 @@ static int ll_flush(struct file *file, fl_owner_t id)
 	 */
 	rc = lli->lli_async_rc;
 	lli->lli_async_rc = 0;
-	err = lov_read_and_clear_async_rc(lli->lli_clob);
-	if (rc == 0)
-		rc = err;
+	if (lli->lli_clob) {
+		err = lov_read_and_clear_async_rc(lli->lli_clob);
+		if (!rc)
+			rc = err;
+	}
 
 	/* The application has been told about write failure already.
 	 * Do not report failure again.
@@ -2825,6 +2828,108 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	return rc;
 }
 
+static int ll_get_fid_by_name(struct inode *parent, const char *name,
+			      int namelen, struct lu_fid *fid)
+{
+	struct md_op_data *op_data = NULL;
+	struct ptlrpc_request *req;
+	struct mdt_body *body;
+	int rc;
+
+	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen, 0,
+				     LUSTRE_OPC_ANY, NULL);
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
+	op_data->op_valid = OBD_MD_FLID;
+	rc = md_getattr_name(ll_i2sbi(parent)->ll_md_exp, op_data, &req);
+	if (rc < 0)
+		goto out_free;
+
+	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
+	if (!body) {
+		rc = -EFAULT;
+		goto out_req;
+	}
+	*fid = body->fid1;
+out_req:
+	ptlrpc_req_finished(req);
+out_free:
+	if (op_data)
+		ll_finish_md_op_data(op_data);
+	return rc;
+}
+
+int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
+	       const char *name, int namelen)
+{
+	struct ptlrpc_request *request = NULL;
+	struct dentry *dchild = NULL;
+	struct md_op_data *op_data;
+	struct qstr qstr;
+	int rc;
+
+	CDEBUG(D_VFSTRACE, "migrate %s under"DFID" to MDT%d\n",
+	       name, PFID(ll_inode2fid(parent)), mdtidx);
+
+	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen,
+				     0, LUSTRE_OPC_ANY, NULL);
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
+	/* Get child FID first */
+	qstr.hash = full_name_hash(parent, name, namelen);
+	qstr.name = name;
+	qstr.len = namelen;
+	dchild = d_lookup(file_dentry(file), &qstr);
+	if (dchild && dchild->d_inode) {
+		op_data->op_fid3 = *ll_inode2fid(dchild->d_inode);
+	} else {
+		rc = ll_get_fid_by_name(parent, name, strnlen(name, namelen),
+					&op_data->op_fid3);
+		if (rc)
+			goto out_free;
+	}
+
+	if (!fid_is_sane(&op_data->op_fid3)) {
+		CERROR("%s: migrate %s, but fid "DFID" is insane\n",
+		       ll_get_fsname(parent->i_sb, NULL, 0), name,
+		       PFID(&op_data->op_fid3));
+		goto out_free;
+	}
+
+	rc = ll_get_mdt_idx_by_fid(ll_i2sbi(parent), &op_data->op_fid3);
+	if (rc < 0)
+		goto out_free;
+
+	if (rc == mdtidx) {
+		CDEBUG(D_INFO, "%s:"DFID" is already on MDT%d.\n", name,
+		       PFID(&op_data->op_fid3), mdtidx);
+		rc = 0;
+		goto out_free;
+	}
+
+	op_data->op_mds = mdtidx;
+	op_data->op_cli_flags = CLI_MIGRATE;
+	rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name,
+		       strnlen(name, namelen), name, strnlen(name, namelen),
+		       &request);
+	if (!rc)
+		ll_update_times(request, parent);
+
+	ptlrpc_req_finished(request);
+
+out_free:
+	if (dchild) {
+		if (dchild->d_inode)
+			ll_delete_inode(dchild->d_inode);
+		dput(dchild);
+	}
+
+	ll_finish_md_op_data(op_data);
+	return rc;
+}
+
 static int
 ll_file_noflock(struct file *file, int cmd, struct file_lock *file_lock)
 {
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 82c3a88..69492f0 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -660,6 +660,7 @@ extern const struct inode_operations ll_dir_inode_operations;
 int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 		struct dir_context *ctx);
 int ll_get_mdt_idx(struct inode *inode);
+int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     __u64 hash, struct ll_dir_chain *chain);
 void ll_release_page(struct inode *inode, struct page *page, bool remove);
@@ -675,6 +676,7 @@ int ll_test_inode_by_fid(struct inode *inode, void *opaque);
 int ll_md_blocking_ast(struct ldlm_lock *, struct ldlm_lock_desc *,
 		       void *data, int flag);
 struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de);
+void ll_update_times(struct ptlrpc_request *request, struct inode *inode);
 
 /* llite/rw.c */
 int ll_writepage(struct page *page, struct writeback_control *wbc);
@@ -717,7 +719,8 @@ void ll_pack_inode2opdata(struct inode *inode, struct md_op_data *op_data,
 			  struct lustre_handle *fh);
 int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat);
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
-
+int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
+	       const char *name, int namelen);
 int ll_inode_permission(struct inode *inode, int mask);
 
 int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
@@ -777,6 +780,15 @@ int ll_obd_statfs(struct inode *inode, void __user *arg);
 int ll_get_max_mdsize(struct ll_sb_info *sbi, int *max_mdsize);
 int ll_get_default_mdsize(struct ll_sb_info *sbi, int *default_mdsize);
 int ll_process_config(struct lustre_cfg *lcfg);
+
+enum {
+	LUSTRE_OPC_MKDIR	= 0,
+	LUSTRE_OPC_SYMLINK	= 1,
+	LUSTRE_OPC_MKNOD	= 2,
+	LUSTRE_OPC_CREATE	= 3,
+	LUSTRE_OPC_ANY		= 5,
+};
+
 struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 				      struct inode *i1, struct inode *i2,
 				      const char *name, int namelen,
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index ef8d87a..e320400 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1114,8 +1114,34 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	struct lmv_stripe_md *lsm = md->lmv;
 	int idx;
 
-	LASSERT(lsm);
 	LASSERT(S_ISDIR(inode->i_mode));
+	CDEBUG(D_INODE, "update lsm %p of "DFID"\n", lli->lli_lsm_md,
+	       PFID(ll_inode2fid(inode)));
+
+	/* no striped information from request. */
+	if (!lsm) {
+		if (!lli->lli_lsm_md) {
+			return;
+		} else if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+			/*
+			 * migration is done, the temporay MIGRATE layout has
+			 * been removed
+			 */
+			CDEBUG(D_INODE, DFID" finish migration.\n",
+			       PFID(ll_inode2fid(inode)));
+			lmv_free_memmd(lli->lli_lsm_md);
+			lli->lli_lsm_md = NULL;
+			return;
+		} else {
+			/*
+			 * The lustre_md from req does not include stripeEA,
+			 * see ll_md_setattr
+			 */
+			return;
+		}
+	}
+
+	/* set the directory layout */
 	if (!lli->lli_lsm_md) {
 		int rc;
 
@@ -1132,6 +1158,8 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 		 * will not free this lsm
 		 */
 		md->lmv = NULL;
+		CDEBUG(D_INODE, "Set lsm %p magic %x to "DFID"\n", lsm,
+		       lsm->lsm_md_magic, PFID(ll_inode2fid(inode)));
 		return;
 	}
 
@@ -1668,7 +1696,7 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			lli->lli_maxbytes = MAX_LFS_FILESIZE;
 	}
 
-	if (S_ISDIR(inode->i_mode) && md->lmv)
+	if (S_ISDIR(inode->i_mode))
 		ll_update_lsm_md(inode, md);
 
 #ifdef CONFIG_FS_POSIX_ACL
@@ -2306,7 +2334,6 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	if ((opc == LUSTRE_OPC_CREATE) && name &&
 	    filename_is_volatile(name, namelen, NULL))
 		op_data->op_bias |= MDS_CREATE_VOLATILE;
-	op_data->op_opc = opc;
 	op_data->op_mds = 0;
 	op_data->op_data = data;
 
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index e32d08b..f059882 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -752,8 +752,7 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, int mode,
 	return 0;
 }
 
-static void ll_update_times(struct ptlrpc_request *request,
-			    struct inode *inode)
+void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
 {
 	struct mdt_body *body = req_capsule_server_get(&request->rq_pill,
 						       &RMF_MDT_BODY);
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 87393c4..01aee84 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -1015,6 +1015,10 @@ int ll_writepages(struct address_space *mapping, struct writeback_control *wbc)
 		 * is called later on.
 		 */
 		ignore_layout = 1;
+
+	if (!ll_i2info(inode)->lli_clob)
+		return 0;
+
 	result = cl_sync_file_range(inode, start, end, mode, ignore_layout);
 	if (result > 0) {
 		wbc->nr_to_write -= result;
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 6ce7442..e8c1959 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1607,6 +1607,7 @@ int do_statahead_enter(struct inode *dir, struct dentry **dentryp,
 					       *dentryp,
 					       PFID(ll_inode2fid(d_inode(*dentryp))),
 					       PFID(ll_inode2fid(inode)));
+					ll_intent_release(&it);
 					ll_sai_unplug(sai, entry);
 					return -ESTALE;
 				} else {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 5313dfc..2bc1098 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -43,6 +43,7 @@
 #include "../include/lustre_lib.h"
 #include "../include/lustre_net.h"
 #include "../include/lustre_dlm.h"
+#include "../include/lustre_mdc.h"
 #include "../include/obd_class.h"
 #include "../include/lprocfs_status.h"
 #include "lmv_internal.h"
@@ -332,6 +333,8 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 
 			oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
 							op_data->op_namelen);
+			if (IS_ERR(oinfo))
+				return PTR_ERR(oinfo);
 			op_data->op_fid1 = oinfo->lmo_fid;
 		}
 
@@ -408,6 +411,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 			     ldlm_blocking_callback cb_blocking,
 			     __u64 extra_lock_flags)
 {
+	struct lmv_stripe_md *lsm = op_data->op_mea1;
 	struct obd_device      *obd = exp->exp_obd;
 	struct lmv_obd	 *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc    *tgt = NULL;
@@ -421,17 +425,15 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	if (!fid_is_sane(&op_data->op_fid2))
 		fid_zero(&op_data->op_fid2);
 
-	CDEBUG(D_INODE, "LOOKUP_INTENT with fid1="DFID", fid2="DFID
-	       ", name='%s' -> mds #%d\n", PFID(&op_data->op_fid1),
-	       PFID(&op_data->op_fid2),
+	CDEBUG(D_INODE, "LOOKUP_INTENT with fid1="DFID", fid2="DFID", name='%s' -> mds #%d lsm=%p lsm_magic=%x\n",
+	       PFID(&op_data->op_fid1), PFID(&op_data->op_fid2),
 	       op_data->op_name ? op_data->op_name : "<NULL>",
-	       tgt->ltd_idx);
+	       tgt->ltd_idx, lsm, !lsm ? -1 : lsm->lsm_md_magic);
 
 	op_data->op_bias &= ~MDS_CROSS_REF;
 
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, reqp, cb_blocking, extra_lock_flags);
-
 	if (rc < 0)
 		return rc;
 
@@ -448,6 +450,26 @@ static int lmv_intent_lookup(struct obd_export *exp,
 				return rc;
 		}
 		return rc;
+	} else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm &&
+		   lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+		/*
+		 * For migrating directory, if it can not find the child in
+		 * the source directory(master stripe), try the targeting
+		 * directory(stripe 1)
+		 */
+		tgt = lmv_find_target(lmv, &lsm->lsm_md_oinfo[1].lmo_fid);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		ptlrpc_req_finished(*reqp);
+		CDEBUG(D_INODE, "For migrating dir, try target dir "DFID"\n",
+		       PFID(&lsm->lsm_md_oinfo[1].lmo_fid));
+
+		op_data->op_fid1 = lsm->lsm_md_oinfo[1].lmo_fid;
+		it->it_disposition &= ~DISP_ENQ_COMPLETE;
+		rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
+				    flags, reqp, cb_blocking, extra_lock_flags);
+		return rc;
 	}
 
 	/*
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 81dcc0a..09b2efe 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -96,6 +96,15 @@ int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
 	case LMV_HASH_TYPE_FNV_1A_64:
 		idx = lmv_hash_fnv1a(max_mdt_index, name, namelen);
 		break;
+	/*
+	 * LMV_HASH_TYPE_MIGRATION means the file is being migrated,
+	 * and the file should be accessed by client, except for
+	 * lookup(see lmv_intent_lookup), return -EACCES here
+	 */
+	case LMV_HASH_TYPE_MIGRATION:
+		CERROR("%.*s is being migrated: rc = %d\n", namelen,
+		       name, -EACCES);
+		return -EACCES;
 	default:
 		CERROR("Unknown hash type 0x%x\n", hashtype);
 		return -EINVAL;
@@ -1667,6 +1676,9 @@ lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 	struct lmv_tgt_desc *tgt;
 
 	oinfo = lsm_name_to_stripe_info(lsm, name, namelen);
+	if (IS_ERR(oinfo))
+		return ERR_CAST(oinfo);
+
 	*fid = oinfo->lmo_fid;
 	*mds = oinfo->lmo_mds;
 	tgt = lmv_get_target(lmv, *mds);
@@ -1683,7 +1695,8 @@ struct lmv_tgt_desc
 	struct lmv_tgt_desc *tgt;
 
 	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
-	    !op_data->op_namelen) {
+	    !op_data->op_namelen ||
+	    lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
 		tgt = lmv_find_target(lmv, fid);
 		if (IS_ERR(tgt))
 			return tgt;
@@ -1929,23 +1942,24 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 	 fl == MF_MDC_CANCEL_FID4 ? &op_data->op_fid4 : \
 	 NULL)
 
-static int lmv_early_cancel(struct obd_export *exp, struct md_op_data *op_data,
-			    int op_tgt, enum ldlm_mode mode, int bits,
-			    int flag)
+static int lmv_early_cancel(struct obd_export *exp, struct lmv_tgt_desc *tgt,
+			    struct md_op_data *op_data, int op_tgt,
+			    enum ldlm_mode mode, int bits, int flag)
 {
 	struct lu_fid	  *fid = md_op_data_fid(op_data, flag);
 	struct obd_device      *obd = exp->exp_obd;
 	struct lmv_obd	 *lmv = &obd->u.lmv;
-	struct lmv_tgt_desc    *tgt;
 	ldlm_policy_data_t      policy = { {0} };
 	int		     rc = 0;
 
 	if (!fid_is_sane(fid))
 		return 0;
 
-	tgt = lmv_find_target(lmv, fid);
-	if (IS_ERR(tgt))
-		return PTR_ERR(tgt);
+	if (!tgt) {
+		tgt = lmv_find_target(lmv, fid);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
 
 	if (tgt->ltd_idx != op_tgt) {
 		CDEBUG(D_INODE, "EARLY_CANCEL on "DFID"\n", PFID(fid));
@@ -1994,6 +2008,9 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 
 		oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
 						op_data->op_namelen);
+		if (IS_ERR(oinfo))
+			return PTR_ERR(oinfo);
+
 		op_data->op_fid2 = oinfo->lmo_fid;
 	}
 
@@ -2005,7 +2022,7 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 	 * Cancel UPDATE lock on child (fid1).
 	 */
 	op_data->op_flags |= MF_MDC_CANCEL_FID2;
-	rc = lmv_early_cancel(exp, op_data, tgt->ltd_idx, LCK_EX,
+	rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX,
 			      MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1);
 	if (rc != 0)
 		return rc;
@@ -2040,31 +2057,44 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
 
-	if (op_data->op_mea1) {
-		struct lmv_stripe_md *lsm = op_data->op_mea1;
-		const struct lmv_oinfo *oinfo;
-
-		oinfo = lsm_name_to_stripe_info(lsm, old, oldlen);
-		op_data->op_fid1 = oinfo->lmo_fid;
-		op_data->op_mds = oinfo->lmo_mds;
-		src_tgt = lmv_get_target(lmv, op_data->op_mds);
-		if (IS_ERR(src_tgt))
-			return PTR_ERR(src_tgt);
+	if (op_data->op_cli_flags & CLI_MIGRATE) {
+		LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n",
+			 PFID(&op_data->op_fid3));
+		rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+		if (rc)
+			return rc;
+		src_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid3);
 	} else {
-		src_tgt = lmv_find_target(lmv, &op_data->op_fid1);
-		if (IS_ERR(src_tgt))
-			return PTR_ERR(src_tgt);
+		if (op_data->op_mea1) {
+			struct lmv_stripe_md *lsm = op_data->op_mea1;
+
+			src_tgt = lmv_locate_target_for_name(lmv, lsm, old,
+							     oldlen,
+							     &op_data->op_fid1,
+							     &op_data->op_mds);
+			if (IS_ERR(src_tgt))
+				return PTR_ERR(src_tgt);
+		} else {
+			src_tgt = lmv_find_target(lmv, &op_data->op_fid1);
+			if (IS_ERR(src_tgt))
+				return PTR_ERR(src_tgt);
 
-		op_data->op_mds = src_tgt->ltd_idx;
-	}
+			op_data->op_mds = src_tgt->ltd_idx;
+		}
 
-	if (op_data->op_mea2) {
-		struct lmv_stripe_md *lsm = op_data->op_mea2;
-		const struct lmv_oinfo *oinfo;
+		if (op_data->op_mea2) {
+			struct lmv_stripe_md *lsm = op_data->op_mea2;
+			const struct lmv_oinfo *oinfo;
 
-		oinfo = lsm_name_to_stripe_info(lsm, new, newlen);
-		op_data->op_fid2 = oinfo->lmo_fid;
+			oinfo = lsm_name_to_stripe_info(lsm, new, newlen);
+			if (IS_ERR(oinfo))
+				return PTR_ERR(oinfo);
+
+			op_data->op_fid2 = oinfo->lmo_fid;
+		}
 	}
+	if (IS_ERR(src_tgt))
+		return PTR_ERR(src_tgt);
 
 	/*
 	 * LOOKUP lock on src child (fid3) should also be cancelled for
@@ -2076,33 +2106,48 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	 * Cancel UPDATE locks on tgt parent (fid2), tgt_tgt is its
 	 * own target.
 	 */
-	rc = lmv_early_cancel(exp, op_data, src_tgt->ltd_idx,
+	rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx,
 			      LCK_EX, MDS_INODELOCK_UPDATE,
 			      MF_MDC_CANCEL_FID2);
-
+	if (rc)
+		return rc;
 	/*
-	 * Cancel LOOKUP locks on tgt child (fid4) for parent tgt_tgt.
+	 * Cancel LOOKUP locks on source child (fid3) for parent tgt_tgt.
 	 */
-	if (rc == 0) {
-		rc = lmv_early_cancel(exp, op_data, src_tgt->ltd_idx,
+	if (fid_is_sane(&op_data->op_fid3)) {
+		struct lmv_tgt_desc *tgt;
+
+		tgt = lmv_find_target(lmv, &op_data->op_fid1);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		/* Cancel LOOKUP lock on its parent */
+		rc = lmv_early_cancel(exp, tgt, op_data, src_tgt->ltd_idx,
 				      LCK_EX, MDS_INODELOCK_LOOKUP,
-				      MF_MDC_CANCEL_FID4);
+				      MF_MDC_CANCEL_FID3);
+		if (rc)
+			return rc;
+
+		rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx,
+				      LCK_EX, MDS_INODELOCK_FULL,
+				      MF_MDC_CANCEL_FID3);
+		if (rc)
+			return rc;
 	}
 
 	/*
 	 * Cancel all the locks on tgt child (fid4).
 	 */
-	if (rc == 0)
-		rc = lmv_early_cancel(exp, op_data, src_tgt->ltd_idx,
+	if (fid_is_sane(&op_data->op_fid4))
+		rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx,
 				      LCK_EX, MDS_INODELOCK_FULL,
 				      MF_MDC_CANCEL_FID4);
 
 	CDEBUG(D_INODE, DFID":m%d to "DFID"\n", PFID(&op_data->op_fid1),
 	       op_data->op_mds, PFID(&op_data->op_fid2));
 
-	if (rc == 0)
-		rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen,
-			       new, newlen, request);
+	rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen,
+		       new, newlen, request);
 	return rc;
 }
 
@@ -2304,6 +2349,7 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 {
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
+	struct lmv_tgt_desc *parent_tgt = NULL;
 	struct lmv_tgt_desc     *tgt = NULL;
 	struct mdt_body		*body;
 	int		     rc;
@@ -2321,12 +2367,16 @@ retry:
 		/* For striped dir, we need to locate the parent as well */
 		if (op_data->op_mea1 &&
 		    op_data->op_mea1->lsm_md_stripe_count > 1) {
+			struct lmv_tgt_desc *tmp;
+
 			LASSERT(op_data->op_name && op_data->op_namelen);
-			lmv_locate_target_for_name(lmv, op_data->op_mea1,
-						   op_data->op_name,
-						   op_data->op_namelen,
-						   &op_data->op_fid1,
-						   &op_data->op_mds);
+			tmp = lmv_locate_target_for_name(lmv, op_data->op_mea1,
+							 op_data->op_name,
+							 op_data->op_namelen,
+							 &op_data->op_fid1,
+							 &op_data->op_mds);
+			if (IS_ERR(tmp))
+				return PTR_ERR(tmp);
 		}
 	} else {
 		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
@@ -2350,9 +2400,18 @@ retry:
 	/*
 	 * Cancel FULL locks on child (fid3).
 	 */
-	rc = lmv_early_cancel(exp, op_data, tgt->ltd_idx, LCK_EX,
-			      MDS_INODELOCK_FULL, MF_MDC_CANCEL_FID3);
+	parent_tgt = lmv_find_target(lmv, &op_data->op_fid1);
+	if (IS_ERR(parent_tgt))
+		return PTR_ERR(parent_tgt);
+
+	if (parent_tgt != tgt) {
+		rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx,
+				      LCK_EX, MDS_INODELOCK_LOOKUP,
+				      MF_MDC_CANCEL_FID3);
+	}
 
+	rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX,
+			      MDS_INODELOCK_FULL, MF_MDC_CANCEL_FID3);
 	if (rc != 0)
 		return rc;
 
@@ -2681,13 +2740,25 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 	}
 
 	/* Unpack memmd */
-	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1) {
-		CERROR("%s: invalid magic %x.\n", exp->exp_obd->obd_name,
-		       le32_to_cpu(lmm->lmv_magic));
-		return -EINVAL;
+	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1 &&
+	    le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_MIGRATE &&
+	    le32_to_cpu(lmm->lmv_magic) != LMV_USER_MAGIC) {
+		CERROR("%s: invalid lmv magic %x: rc = %d\n",
+		       exp->exp_obd->obd_name, le32_to_cpu(lmm->lmv_magic),
+		       -EIO);
+		return -EIO;
 	}
 
-	lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
+	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_V1 ||
+	    le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_MIGRATE)
+		lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
+	else
+		/**
+		 * Unpack default dirstripe(lmv_user_md) to lmv_stripe_md,
+		 * stripecount should be 0 then.
+		 */
+		lsm_size = lmv_stripe_md_size(0);
+
 	if (!lsm) {
 		lsm = libcfs_kvzalloc(lsm_size, GFP_NOFS);
 		if (!lsm)
@@ -2698,6 +2769,7 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1);
 		break;
 	default:
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 143bd76..95c4550 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -390,6 +390,8 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
 
 	/* XXX do something about time, uid, gid */
+	rec->rn_opcode	 = op_data->op_cli_flags & CLI_MIGRATE ?
+				REINT_MIGRATE : REINT_RENAME;
 	rec->rn_opcode   = REINT_RENAME;
 	rec->rn_fsuid    = op_data->op_fsuid;
 	rec->rn_fsgid    = op_data->op_fsgid;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 4c500a9..bc27f8d 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -190,7 +190,9 @@ void lustre_assert_wire_constants(void)
 		 (long long)REINT_SETXATTR);
 	LASSERTF(REINT_RMENTRY == 8, "found %lld\n",
 		 (long long)REINT_RMENTRY);
-	LASSERTF(REINT_MAX == 9, "found %lld\n",
+	LASSERTF(REINT_MIGRATE == 9, "found %lld\n",
+		 (long long)REINT_MIGRATE);
+	LASSERTF(REINT_MAX == 10, "found %lld\n",
 		 (long long)REINT_MAX);
 	LASSERTF(DISP_IT_EXECD == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)DISP_IT_EXECD);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 17/80] staging: lustre: add ability to migrate inodes.
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add client support to migrate the individual inodes
from one MDT to another MDT, and this functionality
will only migrate inode layout on MDT but not touch
data object on OST.

The directory will be migrated from top to the bottom,
i.e. migrating parent first, then migrating the child.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2430
Reviewed-on: http://review.whamcloud.com/6662
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   11 +-
 .../lustre/lustre/include/lustre/lustre_user.h     |    1 +
 drivers/staging/lustre/lustre/include/lustre_lmv.h |    2 +
 drivers/staging/lustre/lustre/include/obd.h        |   12 +--
 drivers/staging/lustre/lustre/llite/dir.c          |   43 +++++-
 drivers/staging/lustre/lustre/llite/file.c         |  113 ++++++++++++-
 .../staging/lustre/lustre/llite/llite_internal.h   |   14 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   33 ++++-
 drivers/staging/lustre/lustre/llite/namei.c        |    3 +-
 drivers/staging/lustre/lustre/llite/rw.c           |    4 +
 drivers/staging/lustre/lustre/llite/statahead.c    |    1 +
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   32 +++-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  176 ++++++++++++++------
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |    2 +
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    4 +-
 15 files changed, 368 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 0ff30c6..6853f62 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1482,6 +1482,7 @@ enum obdo_flags {
 #define LOV_MAGIC	 LOV_MAGIC_V1
 #define LOV_MAGIC_JOIN_V1 0x0BD20BD0
 #define LOV_MAGIC_V3      0x0BD30BD0
+#define LOV_MAGIC_MIGRATE 0x0BD40BD0
 
 /*
  * magic for fully defined striping
@@ -1987,7 +1988,7 @@ enum mdt_reint_cmd {
 	REINT_OPEN     = 6,
 	REINT_SETXATTR = 7,
 	REINT_RMENTRY  = 8,
-/*      REINT_WRITE    = 9, */
+	REINT_MIGRATE  = 9,
 	REINT_MAX
 };
 
@@ -2280,6 +2281,7 @@ enum mds_op_bias {
 	MDS_CREATE_VOLATILE	= 1 << 10,
 	MDS_OWNEROVERRIDE	= 1 << 11,
 	MDS_HSM_RELEASE		= 1 << 12,
+	MDS_RENAME_MIGRATE	= BIT(13),
 };
 
 /* instance of mdt_reint_rec */
@@ -2488,11 +2490,13 @@ struct lmv_desc {
 /* lmv structures */
 #define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
 #define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
+#define LMV_MAGIC_MIGRATE	0x0CD30CD0	/* migrate stripe lmv magic */
 #define LMV_MAGIC	LMV_MAGIC_V1
 
 enum lmv_hash_type {
 	LMV_HASH_TYPE_ALL_CHARS = 1,
 	LMV_HASH_TYPE_FNV_1A_64 = 2,
+	LMV_HASH_TYPE_MIGRATION = 3,
 };
 
 #define LMV_HASH_NAME_ALL_CHARS		"all_char"
@@ -2552,7 +2556,8 @@ static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic)
 	ssize_t len = -EINVAL;
 
 	switch (lmm_magic) {
-	case LMV_MAGIC_V1: {
+	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE: {
 		struct lmv_mds_md_v1 *lmm1;
 
 		len = sizeof(*lmm1);
@@ -2568,6 +2573,7 @@ static inline int lmv_mds_md_stripe_count_get(const union lmv_mds_md *lmm)
 {
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		return le32_to_cpu(lmm->lmv_md_v1.lmv_stripe_count);
 	case LMV_USER_MAGIC:
 		return le32_to_cpu(lmm->lmv_user_md.lum_stripe_count);
@@ -2583,6 +2589,7 @@ static inline int lmv_mds_md_stripe_count_set(union lmv_mds_md *lmm,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		lmm->lmv_md_v1.lmv_stripe_count = cpu_to_le32(stripe_count);
 		break;
 	case LMV_USER_MAGIC:
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 26dbda0..4746320 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -243,6 +243,7 @@ struct ost_id {
 #define LL_IOC_GET_LEASE		_IO('f', 244)
 #define LL_IOC_HSM_IMPORT		_IOWR('f', 245, struct hsm_user_import)
 #define LL_IOC_LMV_SET_DEFAULT_STRIPE	_IOWR('f', 246, struct lmv_user_md)
+#define LL_IOC_MIGRATE			_IOR('f', 247, int)
 
 #define LL_STATFS_LMV	   1
 #define LL_STATFS_LOV	   2
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 4036fce..feee981 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -106,6 +106,7 @@ static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
 {
 	switch (lmv_src->lmv_magic) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
@@ -118,6 +119,7 @@ static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
 {
 	switch (le32_to_cpu(lmv_src->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		lmv1_le_to_cpu(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index a9f4e13..f5eeb05 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -847,9 +847,6 @@ struct md_op_data {
 	/* Various operation flags. */
 	enum mds_op_bias        op_bias;
 
-	/* Operation type */
-	__u32		   op_opc;
-
 	/* Used by readdir */
 	__u64		   op_offset;
 
@@ -871,6 +868,7 @@ enum op_cli_flags {
 	CLI_RM_ENTRY	= 1 << 1,
 	CLI_HASH64	= BIT(2),
 	CLI_API32	= BIT(3),
+	CLI_MIGRATE	= BIT(4),
 };
 
 struct md_enqueue_info;
@@ -1013,14 +1011,6 @@ struct obd_ops {
 	 */
 };
 
-enum {
-	LUSTRE_OPC_MKDIR    = (1 << 0),
-	LUSTRE_OPC_SYMLINK  = (1 << 1),
-	LUSTRE_OPC_MKNOD    = (1 << 2),
-	LUSTRE_OPC_CREATE   = (1 << 3),
-	LUSTRE_OPC_ANY      = (1 << 4)
-};
-
 /* lmv structures */
 struct lustre_md {
 	struct mdt_body	 *body;
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 96ae7d5..ef7322e 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -883,6 +883,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
 	case LMV_USER_MAGIC:
+	case LMV_MAGIC_MIGRATE:
 		if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC)
 			lustre_swab_lmv_user_md((struct lmv_user_md *)lmm);
 		break;
@@ -897,8 +898,7 @@ out:
 	return rc;
 }
 
-static int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi,
-				 const struct lu_fid *fid)
+int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid)
 {
 	struct md_op_data *op_data;
 	int mdt_index, rc;
@@ -1960,6 +1960,45 @@ out_quotactl:
 		kfree(copy);
 		return rc;
 	}
+	case LL_IOC_MIGRATE: {
+		char *buf = NULL;
+		const char *filename;
+		int namelen = 0;
+		int len;
+		int rc;
+		int mdtidx;
+
+		rc = obd_ioctl_getdata(&buf, &len, (void __user *)arg);
+		if (rc < 0)
+			return rc;
+
+		data = (struct obd_ioctl_data *)buf;
+		if (!data->ioc_inlbuf1 || !data->ioc_inlbuf2 ||
+		    !data->ioc_inllen1 || !data->ioc_inllen2) {
+			rc = -EINVAL;
+			goto migrate_free;
+		}
+
+		filename = data->ioc_inlbuf1;
+		namelen = data->ioc_inllen1;
+		if (namelen < 1) {
+			rc = -EINVAL;
+			goto migrate_free;
+		}
+
+		if (data->ioc_inllen2 != sizeof(mdtidx)) {
+			rc = -EINVAL;
+			goto migrate_free;
+		}
+		mdtidx = *(int *)data->ioc_inlbuf2;
+
+		rc = ll_migrate(inode, file, mdtidx, filename, namelen);
+migrate_free:
+		obd_ioctl_freedata(buf, len);
+
+		return rc;
+	}
+
 	default:
 		return obd_iocontrol(cmd, sbi->ll_dt_exp, 0, NULL,
 				     (void __user *)arg);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 18fb713..8d98db6 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -364,7 +364,8 @@ int ll_file_release(struct inode *inode, struct file *file)
 	}
 
 	if (!S_ISDIR(inode->i_mode)) {
-		lov_read_and_clear_async_rc(lli->lli_clob);
+		if (lli->lli_clob)
+			lov_read_and_clear_async_rc(lli->lli_clob);
 		lli->lli_async_rc = 0;
 	}
 
@@ -2593,9 +2594,11 @@ static int ll_flush(struct file *file, fl_owner_t id)
 	 */
 	rc = lli->lli_async_rc;
 	lli->lli_async_rc = 0;
-	err = lov_read_and_clear_async_rc(lli->lli_clob);
-	if (rc == 0)
-		rc = err;
+	if (lli->lli_clob) {
+		err = lov_read_and_clear_async_rc(lli->lli_clob);
+		if (!rc)
+			rc = err;
+	}
 
 	/* The application has been told about write failure already.
 	 * Do not report failure again.
@@ -2825,6 +2828,108 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	return rc;
 }
 
+static int ll_get_fid_by_name(struct inode *parent, const char *name,
+			      int namelen, struct lu_fid *fid)
+{
+	struct md_op_data *op_data = NULL;
+	struct ptlrpc_request *req;
+	struct mdt_body *body;
+	int rc;
+
+	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen, 0,
+				     LUSTRE_OPC_ANY, NULL);
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
+	op_data->op_valid = OBD_MD_FLID;
+	rc = md_getattr_name(ll_i2sbi(parent)->ll_md_exp, op_data, &req);
+	if (rc < 0)
+		goto out_free;
+
+	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
+	if (!body) {
+		rc = -EFAULT;
+		goto out_req;
+	}
+	*fid = body->fid1;
+out_req:
+	ptlrpc_req_finished(req);
+out_free:
+	if (op_data)
+		ll_finish_md_op_data(op_data);
+	return rc;
+}
+
+int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
+	       const char *name, int namelen)
+{
+	struct ptlrpc_request *request = NULL;
+	struct dentry *dchild = NULL;
+	struct md_op_data *op_data;
+	struct qstr qstr;
+	int rc;
+
+	CDEBUG(D_VFSTRACE, "migrate %s under"DFID" to MDT%d\n",
+	       name, PFID(ll_inode2fid(parent)), mdtidx);
+
+	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen,
+				     0, LUSTRE_OPC_ANY, NULL);
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
+	/* Get child FID first */
+	qstr.hash = full_name_hash(parent, name, namelen);
+	qstr.name = name;
+	qstr.len = namelen;
+	dchild = d_lookup(file_dentry(file), &qstr);
+	if (dchild && dchild->d_inode) {
+		op_data->op_fid3 = *ll_inode2fid(dchild->d_inode);
+	} else {
+		rc = ll_get_fid_by_name(parent, name, strnlen(name, namelen),
+					&op_data->op_fid3);
+		if (rc)
+			goto out_free;
+	}
+
+	if (!fid_is_sane(&op_data->op_fid3)) {
+		CERROR("%s: migrate %s, but fid "DFID" is insane\n",
+		       ll_get_fsname(parent->i_sb, NULL, 0), name,
+		       PFID(&op_data->op_fid3));
+		goto out_free;
+	}
+
+	rc = ll_get_mdt_idx_by_fid(ll_i2sbi(parent), &op_data->op_fid3);
+	if (rc < 0)
+		goto out_free;
+
+	if (rc == mdtidx) {
+		CDEBUG(D_INFO, "%s:"DFID" is already on MDT%d.\n", name,
+		       PFID(&op_data->op_fid3), mdtidx);
+		rc = 0;
+		goto out_free;
+	}
+
+	op_data->op_mds = mdtidx;
+	op_data->op_cli_flags = CLI_MIGRATE;
+	rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name,
+		       strnlen(name, namelen), name, strnlen(name, namelen),
+		       &request);
+	if (!rc)
+		ll_update_times(request, parent);
+
+	ptlrpc_req_finished(request);
+
+out_free:
+	if (dchild) {
+		if (dchild->d_inode)
+			ll_delete_inode(dchild->d_inode);
+		dput(dchild);
+	}
+
+	ll_finish_md_op_data(op_data);
+	return rc;
+}
+
 static int
 ll_file_noflock(struct file *file, int cmd, struct file_lock *file_lock)
 {
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 82c3a88..69492f0 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -660,6 +660,7 @@ extern const struct inode_operations ll_dir_inode_operations;
 int ll_dir_read(struct inode *inode, __u64 *ppos, struct md_op_data *op_data,
 		struct dir_context *ctx);
 int ll_get_mdt_idx(struct inode *inode);
+int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid);
 struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data,
 			     __u64 hash, struct ll_dir_chain *chain);
 void ll_release_page(struct inode *inode, struct page *page, bool remove);
@@ -675,6 +676,7 @@ int ll_test_inode_by_fid(struct inode *inode, void *opaque);
 int ll_md_blocking_ast(struct ldlm_lock *, struct ldlm_lock_desc *,
 		       void *data, int flag);
 struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de);
+void ll_update_times(struct ptlrpc_request *request, struct inode *inode);
 
 /* llite/rw.c */
 int ll_writepage(struct page *page, struct writeback_control *wbc);
@@ -717,7 +719,8 @@ void ll_pack_inode2opdata(struct inode *inode, struct md_op_data *op_data,
 			  struct lustre_handle *fh);
 int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat);
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
-
+int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
+	       const char *name, int namelen);
 int ll_inode_permission(struct inode *inode, int mask);
 
 int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
@@ -777,6 +780,15 @@ int ll_obd_statfs(struct inode *inode, void __user *arg);
 int ll_get_max_mdsize(struct ll_sb_info *sbi, int *max_mdsize);
 int ll_get_default_mdsize(struct ll_sb_info *sbi, int *default_mdsize);
 int ll_process_config(struct lustre_cfg *lcfg);
+
+enum {
+	LUSTRE_OPC_MKDIR	= 0,
+	LUSTRE_OPC_SYMLINK	= 1,
+	LUSTRE_OPC_MKNOD	= 2,
+	LUSTRE_OPC_CREATE	= 3,
+	LUSTRE_OPC_ANY		= 5,
+};
+
 struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 				      struct inode *i1, struct inode *i2,
 				      const char *name, int namelen,
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index ef8d87a..e320400 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1114,8 +1114,34 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	struct lmv_stripe_md *lsm = md->lmv;
 	int idx;
 
-	LASSERT(lsm);
 	LASSERT(S_ISDIR(inode->i_mode));
+	CDEBUG(D_INODE, "update lsm %p of "DFID"\n", lli->lli_lsm_md,
+	       PFID(ll_inode2fid(inode)));
+
+	/* no striped information from request. */
+	if (!lsm) {
+		if (!lli->lli_lsm_md) {
+			return;
+		} else if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+			/*
+			 * migration is done, the temporay MIGRATE layout has
+			 * been removed
+			 */
+			CDEBUG(D_INODE, DFID" finish migration.\n",
+			       PFID(ll_inode2fid(inode)));
+			lmv_free_memmd(lli->lli_lsm_md);
+			lli->lli_lsm_md = NULL;
+			return;
+		} else {
+			/*
+			 * The lustre_md from req does not include stripeEA,
+			 * see ll_md_setattr
+			 */
+			return;
+		}
+	}
+
+	/* set the directory layout */
 	if (!lli->lli_lsm_md) {
 		int rc;
 
@@ -1132,6 +1158,8 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 		 * will not free this lsm
 		 */
 		md->lmv = NULL;
+		CDEBUG(D_INODE, "Set lsm %p magic %x to "DFID"\n", lsm,
+		       lsm->lsm_md_magic, PFID(ll_inode2fid(inode)));
 		return;
 	}
 
@@ -1668,7 +1696,7 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			lli->lli_maxbytes = MAX_LFS_FILESIZE;
 	}
 
-	if (S_ISDIR(inode->i_mode) && md->lmv)
+	if (S_ISDIR(inode->i_mode))
 		ll_update_lsm_md(inode, md);
 
 #ifdef CONFIG_FS_POSIX_ACL
@@ -2306,7 +2334,6 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	if ((opc == LUSTRE_OPC_CREATE) && name &&
 	    filename_is_volatile(name, namelen, NULL))
 		op_data->op_bias |= MDS_CREATE_VOLATILE;
-	op_data->op_opc = opc;
 	op_data->op_mds = 0;
 	op_data->op_data = data;
 
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index e32d08b..f059882 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -752,8 +752,7 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, int mode,
 	return 0;
 }
 
-static void ll_update_times(struct ptlrpc_request *request,
-			    struct inode *inode)
+void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
 {
 	struct mdt_body *body = req_capsule_server_get(&request->rq_pill,
 						       &RMF_MDT_BODY);
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 87393c4..01aee84 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -1015,6 +1015,10 @@ int ll_writepages(struct address_space *mapping, struct writeback_control *wbc)
 		 * is called later on.
 		 */
 		ignore_layout = 1;
+
+	if (!ll_i2info(inode)->lli_clob)
+		return 0;
+
 	result = cl_sync_file_range(inode, start, end, mode, ignore_layout);
 	if (result > 0) {
 		wbc->nr_to_write -= result;
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 6ce7442..e8c1959 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1607,6 +1607,7 @@ int do_statahead_enter(struct inode *dir, struct dentry **dentryp,
 					       *dentryp,
 					       PFID(ll_inode2fid(d_inode(*dentryp))),
 					       PFID(ll_inode2fid(inode)));
+					ll_intent_release(&it);
 					ll_sai_unplug(sai, entry);
 					return -ESTALE;
 				} else {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 5313dfc..2bc1098 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -43,6 +43,7 @@
 #include "../include/lustre_lib.h"
 #include "../include/lustre_net.h"
 #include "../include/lustre_dlm.h"
+#include "../include/lustre_mdc.h"
 #include "../include/obd_class.h"
 #include "../include/lprocfs_status.h"
 #include "lmv_internal.h"
@@ -332,6 +333,8 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 
 			oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
 							op_data->op_namelen);
+			if (IS_ERR(oinfo))
+				return PTR_ERR(oinfo);
 			op_data->op_fid1 = oinfo->lmo_fid;
 		}
 
@@ -408,6 +411,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 			     ldlm_blocking_callback cb_blocking,
 			     __u64 extra_lock_flags)
 {
+	struct lmv_stripe_md *lsm = op_data->op_mea1;
 	struct obd_device      *obd = exp->exp_obd;
 	struct lmv_obd	 *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc    *tgt = NULL;
@@ -421,17 +425,15 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	if (!fid_is_sane(&op_data->op_fid2))
 		fid_zero(&op_data->op_fid2);
 
-	CDEBUG(D_INODE, "LOOKUP_INTENT with fid1="DFID", fid2="DFID
-	       ", name='%s' -> mds #%d\n", PFID(&op_data->op_fid1),
-	       PFID(&op_data->op_fid2),
+	CDEBUG(D_INODE, "LOOKUP_INTENT with fid1="DFID", fid2="DFID", name='%s' -> mds #%d lsm=%p lsm_magic=%x\n",
+	       PFID(&op_data->op_fid1), PFID(&op_data->op_fid2),
 	       op_data->op_name ? op_data->op_name : "<NULL>",
-	       tgt->ltd_idx);
+	       tgt->ltd_idx, lsm, !lsm ? -1 : lsm->lsm_md_magic);
 
 	op_data->op_bias &= ~MDS_CROSS_REF;
 
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, reqp, cb_blocking, extra_lock_flags);
-
 	if (rc < 0)
 		return rc;
 
@@ -448,6 +450,26 @@ static int lmv_intent_lookup(struct obd_export *exp,
 				return rc;
 		}
 		return rc;
+	} else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm &&
+		   lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+		/*
+		 * For migrating directory, if it can not find the child in
+		 * the source directory(master stripe), try the targeting
+		 * directory(stripe 1)
+		 */
+		tgt = lmv_find_target(lmv, &lsm->lsm_md_oinfo[1].lmo_fid);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		ptlrpc_req_finished(*reqp);
+		CDEBUG(D_INODE, "For migrating dir, try target dir "DFID"\n",
+		       PFID(&lsm->lsm_md_oinfo[1].lmo_fid));
+
+		op_data->op_fid1 = lsm->lsm_md_oinfo[1].lmo_fid;
+		it->it_disposition &= ~DISP_ENQ_COMPLETE;
+		rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
+				    flags, reqp, cb_blocking, extra_lock_flags);
+		return rc;
 	}
 
 	/*
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 81dcc0a..09b2efe 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -96,6 +96,15 @@ int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
 	case LMV_HASH_TYPE_FNV_1A_64:
 		idx = lmv_hash_fnv1a(max_mdt_index, name, namelen);
 		break;
+	/*
+	 * LMV_HASH_TYPE_MIGRATION means the file is being migrated,
+	 * and the file should be accessed by client, except for
+	 * lookup(see lmv_intent_lookup), return -EACCES here
+	 */
+	case LMV_HASH_TYPE_MIGRATION:
+		CERROR("%.*s is being migrated: rc = %d\n", namelen,
+		       name, -EACCES);
+		return -EACCES;
 	default:
 		CERROR("Unknown hash type 0x%x\n", hashtype);
 		return -EINVAL;
@@ -1667,6 +1676,9 @@ lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 	struct lmv_tgt_desc *tgt;
 
 	oinfo = lsm_name_to_stripe_info(lsm, name, namelen);
+	if (IS_ERR(oinfo))
+		return ERR_CAST(oinfo);
+
 	*fid = oinfo->lmo_fid;
 	*mds = oinfo->lmo_mds;
 	tgt = lmv_get_target(lmv, *mds);
@@ -1683,7 +1695,8 @@ struct lmv_tgt_desc
 	struct lmv_tgt_desc *tgt;
 
 	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
-	    !op_data->op_namelen) {
+	    !op_data->op_namelen ||
+	    lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
 		tgt = lmv_find_target(lmv, fid);
 		if (IS_ERR(tgt))
 			return tgt;
@@ -1929,23 +1942,24 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 	 fl == MF_MDC_CANCEL_FID4 ? &op_data->op_fid4 : \
 	 NULL)
 
-static int lmv_early_cancel(struct obd_export *exp, struct md_op_data *op_data,
-			    int op_tgt, enum ldlm_mode mode, int bits,
-			    int flag)
+static int lmv_early_cancel(struct obd_export *exp, struct lmv_tgt_desc *tgt,
+			    struct md_op_data *op_data, int op_tgt,
+			    enum ldlm_mode mode, int bits, int flag)
 {
 	struct lu_fid	  *fid = md_op_data_fid(op_data, flag);
 	struct obd_device      *obd = exp->exp_obd;
 	struct lmv_obd	 *lmv = &obd->u.lmv;
-	struct lmv_tgt_desc    *tgt;
 	ldlm_policy_data_t      policy = { {0} };
 	int		     rc = 0;
 
 	if (!fid_is_sane(fid))
 		return 0;
 
-	tgt = lmv_find_target(lmv, fid);
-	if (IS_ERR(tgt))
-		return PTR_ERR(tgt);
+	if (!tgt) {
+		tgt = lmv_find_target(lmv, fid);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
 
 	if (tgt->ltd_idx != op_tgt) {
 		CDEBUG(D_INODE, "EARLY_CANCEL on "DFID"\n", PFID(fid));
@@ -1994,6 +2008,9 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 
 		oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
 						op_data->op_namelen);
+		if (IS_ERR(oinfo))
+			return PTR_ERR(oinfo);
+
 		op_data->op_fid2 = oinfo->lmo_fid;
 	}
 
@@ -2005,7 +2022,7 @@ static int lmv_link(struct obd_export *exp, struct md_op_data *op_data,
 	 * Cancel UPDATE lock on child (fid1).
 	 */
 	op_data->op_flags |= MF_MDC_CANCEL_FID2;
-	rc = lmv_early_cancel(exp, op_data, tgt->ltd_idx, LCK_EX,
+	rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX,
 			      MDS_INODELOCK_UPDATE, MF_MDC_CANCEL_FID1);
 	if (rc != 0)
 		return rc;
@@ -2040,31 +2057,44 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
 
-	if (op_data->op_mea1) {
-		struct lmv_stripe_md *lsm = op_data->op_mea1;
-		const struct lmv_oinfo *oinfo;
-
-		oinfo = lsm_name_to_stripe_info(lsm, old, oldlen);
-		op_data->op_fid1 = oinfo->lmo_fid;
-		op_data->op_mds = oinfo->lmo_mds;
-		src_tgt = lmv_get_target(lmv, op_data->op_mds);
-		if (IS_ERR(src_tgt))
-			return PTR_ERR(src_tgt);
+	if (op_data->op_cli_flags & CLI_MIGRATE) {
+		LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n",
+			 PFID(&op_data->op_fid3));
+		rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+		if (rc)
+			return rc;
+		src_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid3);
 	} else {
-		src_tgt = lmv_find_target(lmv, &op_data->op_fid1);
-		if (IS_ERR(src_tgt))
-			return PTR_ERR(src_tgt);
+		if (op_data->op_mea1) {
+			struct lmv_stripe_md *lsm = op_data->op_mea1;
+
+			src_tgt = lmv_locate_target_for_name(lmv, lsm, old,
+							     oldlen,
+							     &op_data->op_fid1,
+							     &op_data->op_mds);
+			if (IS_ERR(src_tgt))
+				return PTR_ERR(src_tgt);
+		} else {
+			src_tgt = lmv_find_target(lmv, &op_data->op_fid1);
+			if (IS_ERR(src_tgt))
+				return PTR_ERR(src_tgt);
 
-		op_data->op_mds = src_tgt->ltd_idx;
-	}
+			op_data->op_mds = src_tgt->ltd_idx;
+		}
 
-	if (op_data->op_mea2) {
-		struct lmv_stripe_md *lsm = op_data->op_mea2;
-		const struct lmv_oinfo *oinfo;
+		if (op_data->op_mea2) {
+			struct lmv_stripe_md *lsm = op_data->op_mea2;
+			const struct lmv_oinfo *oinfo;
 
-		oinfo = lsm_name_to_stripe_info(lsm, new, newlen);
-		op_data->op_fid2 = oinfo->lmo_fid;
+			oinfo = lsm_name_to_stripe_info(lsm, new, newlen);
+			if (IS_ERR(oinfo))
+				return PTR_ERR(oinfo);
+
+			op_data->op_fid2 = oinfo->lmo_fid;
+		}
 	}
+	if (IS_ERR(src_tgt))
+		return PTR_ERR(src_tgt);
 
 	/*
 	 * LOOKUP lock on src child (fid3) should also be cancelled for
@@ -2076,33 +2106,48 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	 * Cancel UPDATE locks on tgt parent (fid2), tgt_tgt is its
 	 * own target.
 	 */
-	rc = lmv_early_cancel(exp, op_data, src_tgt->ltd_idx,
+	rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx,
 			      LCK_EX, MDS_INODELOCK_UPDATE,
 			      MF_MDC_CANCEL_FID2);
-
+	if (rc)
+		return rc;
 	/*
-	 * Cancel LOOKUP locks on tgt child (fid4) for parent tgt_tgt.
+	 * Cancel LOOKUP locks on source child (fid3) for parent tgt_tgt.
 	 */
-	if (rc == 0) {
-		rc = lmv_early_cancel(exp, op_data, src_tgt->ltd_idx,
+	if (fid_is_sane(&op_data->op_fid3)) {
+		struct lmv_tgt_desc *tgt;
+
+		tgt = lmv_find_target(lmv, &op_data->op_fid1);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+
+		/* Cancel LOOKUP lock on its parent */
+		rc = lmv_early_cancel(exp, tgt, op_data, src_tgt->ltd_idx,
 				      LCK_EX, MDS_INODELOCK_LOOKUP,
-				      MF_MDC_CANCEL_FID4);
+				      MF_MDC_CANCEL_FID3);
+		if (rc)
+			return rc;
+
+		rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx,
+				      LCK_EX, MDS_INODELOCK_FULL,
+				      MF_MDC_CANCEL_FID3);
+		if (rc)
+			return rc;
 	}
 
 	/*
 	 * Cancel all the locks on tgt child (fid4).
 	 */
-	if (rc == 0)
-		rc = lmv_early_cancel(exp, op_data, src_tgt->ltd_idx,
+	if (fid_is_sane(&op_data->op_fid4))
+		rc = lmv_early_cancel(exp, NULL, op_data, src_tgt->ltd_idx,
 				      LCK_EX, MDS_INODELOCK_FULL,
 				      MF_MDC_CANCEL_FID4);
 
 	CDEBUG(D_INODE, DFID":m%d to "DFID"\n", PFID(&op_data->op_fid1),
 	       op_data->op_mds, PFID(&op_data->op_fid2));
 
-	if (rc == 0)
-		rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen,
-			       new, newlen, request);
+	rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen,
+		       new, newlen, request);
 	return rc;
 }
 
@@ -2304,6 +2349,7 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 {
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
+	struct lmv_tgt_desc *parent_tgt = NULL;
 	struct lmv_tgt_desc     *tgt = NULL;
 	struct mdt_body		*body;
 	int		     rc;
@@ -2321,12 +2367,16 @@ retry:
 		/* For striped dir, we need to locate the parent as well */
 		if (op_data->op_mea1 &&
 		    op_data->op_mea1->lsm_md_stripe_count > 1) {
+			struct lmv_tgt_desc *tmp;
+
 			LASSERT(op_data->op_name && op_data->op_namelen);
-			lmv_locate_target_for_name(lmv, op_data->op_mea1,
-						   op_data->op_name,
-						   op_data->op_namelen,
-						   &op_data->op_fid1,
-						   &op_data->op_mds);
+			tmp = lmv_locate_target_for_name(lmv, op_data->op_mea1,
+							 op_data->op_name,
+							 op_data->op_namelen,
+							 &op_data->op_fid1,
+							 &op_data->op_mds);
+			if (IS_ERR(tmp))
+				return PTR_ERR(tmp);
 		}
 	} else {
 		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
@@ -2350,9 +2400,18 @@ retry:
 	/*
 	 * Cancel FULL locks on child (fid3).
 	 */
-	rc = lmv_early_cancel(exp, op_data, tgt->ltd_idx, LCK_EX,
-			      MDS_INODELOCK_FULL, MF_MDC_CANCEL_FID3);
+	parent_tgt = lmv_find_target(lmv, &op_data->op_fid1);
+	if (IS_ERR(parent_tgt))
+		return PTR_ERR(parent_tgt);
+
+	if (parent_tgt != tgt) {
+		rc = lmv_early_cancel(exp, parent_tgt, op_data, tgt->ltd_idx,
+				      LCK_EX, MDS_INODELOCK_LOOKUP,
+				      MF_MDC_CANCEL_FID3);
+	}
 
+	rc = lmv_early_cancel(exp, NULL, op_data, tgt->ltd_idx, LCK_EX,
+			      MDS_INODELOCK_FULL, MF_MDC_CANCEL_FID3);
 	if (rc != 0)
 		return rc;
 
@@ -2681,13 +2740,25 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 	}
 
 	/* Unpack memmd */
-	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1) {
-		CERROR("%s: invalid magic %x.\n", exp->exp_obd->obd_name,
-		       le32_to_cpu(lmm->lmv_magic));
-		return -EINVAL;
+	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1 &&
+	    le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_MIGRATE &&
+	    le32_to_cpu(lmm->lmv_magic) != LMV_USER_MAGIC) {
+		CERROR("%s: invalid lmv magic %x: rc = %d\n",
+		       exp->exp_obd->obd_name, le32_to_cpu(lmm->lmv_magic),
+		       -EIO);
+		return -EIO;
 	}
 
-	lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
+	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_V1 ||
+	    le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_MIGRATE)
+		lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
+	else
+		/**
+		 * Unpack default dirstripe(lmv_user_md) to lmv_stripe_md,
+		 * stripecount should be 0 then.
+		 */
+		lsm_size = lmv_stripe_md_size(0);
+
 	if (!lsm) {
 		lsm = libcfs_kvzalloc(lsm_size, GFP_NOFS);
 		if (!lsm)
@@ -2698,6 +2769,7 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
+	case LMV_MAGIC_MIGRATE:
 		rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1);
 		break;
 	default:
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 143bd76..95c4550 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -390,6 +390,8 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
 
 	/* XXX do something about time, uid, gid */
+	rec->rn_opcode	 = op_data->op_cli_flags & CLI_MIGRATE ?
+				REINT_MIGRATE : REINT_RENAME;
 	rec->rn_opcode   = REINT_RENAME;
 	rec->rn_fsuid    = op_data->op_fsuid;
 	rec->rn_fsgid    = op_data->op_fsgid;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 4c500a9..bc27f8d 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -190,7 +190,9 @@ void lustre_assert_wire_constants(void)
 		 (long long)REINT_SETXATTR);
 	LASSERTF(REINT_RMENTRY == 8, "found %lld\n",
 		 (long long)REINT_RMENTRY);
-	LASSERTF(REINT_MAX == 9, "found %lld\n",
+	LASSERTF(REINT_MIGRATE == 9, "found %lld\n",
+		 (long long)REINT_MIGRATE);
+	LASSERTF(REINT_MAX == 10, "found %lld\n",
 		 (long long)REINT_MAX);
 	LASSERTF(DISP_IT_EXECD == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)DISP_IT_EXECD);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 18/80] staging: lustre: lmv: fix issue found by Klocwork Insight tool
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Dmitry Eremin, James Simmons

From: Dmitry Eremin <dmitry.eremin@intel.com>

'plock.cookie' might be used uninitialized in this function.

sscanf format specification '%d' expects type 'int *' for 'd',
but parameter 3 has a different type '__u32*'

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-on: http://review.whamcloud.com/9390
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |    6 ++++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c    |    2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 2bc1098..51b7048 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -137,8 +137,10 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		it->it_remote_lock_mode = it->it_lock_mode;
 	}
 
-	it->it_lock_handle = plock.cookie;
-	it->it_lock_mode = pmode;
+	if (pmode) {
+		it->it_lock_handle = plock.cookie;
+		it->it_lock_mode = pmode;
+	}
 
 out_free_op_data:
 	kfree(op_data);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 09b2efe..c005a66 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1419,7 +1419,7 @@ static int lmv_process_config(struct obd_device *obd, u32 len, void *buf)
 
 		obd_str2uuid(&obd_uuid,  lustre_cfg_buf(lcfg, 1));
 
-		if (sscanf(lustre_cfg_buf(lcfg, 2), "%d", &index) != 1) {
+		if (sscanf(lustre_cfg_buf(lcfg, 2), "%u", &index) != 1) {
 			rc = -EINVAL;
 			goto out;
 		}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 18/80] staging: lustre: lmv: fix issue found by Klocwork Insight tool
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Dmitry Eremin, James Simmons

From: Dmitry Eremin <dmitry.eremin@intel.com>

'plock.cookie' might be used uninitialized in this function.

sscanf format specification '%d' expects type 'int *' for 'd',
but parameter 3 has a different type '__u32*'

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-on: http://review.whamcloud.com/9390
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |    6 ++++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c    |    2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 2bc1098..51b7048 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -137,8 +137,10 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		it->it_remote_lock_mode = it->it_lock_mode;
 	}
 
-	it->it_lock_handle = plock.cookie;
-	it->it_lock_mode = pmode;
+	if (pmode) {
+		it->it_lock_handle = plock.cookie;
+		it->it_lock_mode = pmode;
+	}
 
 out_free_op_data:
 	kfree(op_data);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 09b2efe..c005a66 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1419,7 +1419,7 @@ static int lmv_process_config(struct obd_device *obd, u32 len, void *buf)
 
 		obd_str2uuid(&obd_uuid,  lustre_cfg_buf(lcfg, 1));
 
-		if (sscanf(lustre_cfg_buf(lcfg, 2), "%d", &index) != 1) {
+		if (sscanf(lustre_cfg_buf(lcfg, 2), "%u", &index) != 1) {
 			rc = -EINVAL;
 			goto out;
 		}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 19/80] staging: lustre: libcfs: Only dump log once per sec. to avoid EEXIST
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Ryan Haasken,
	James Simmons

From: Ryan Haasken <haasken@cray.com>

Since the log file name contains the current time in seconds, dumping
the logs more than once per second causes EEXIST errors to be emitted.
Add a static variable to libcfs_debug_dumplog_internal that records
the time of the last Lustre log dump.  If the current time in seconds
is equal to the last time, do not dump logs again.

Note that this is not thread-safe.  However, in the rare case that two
threads try to access last_dump_time simultaneously, the worst thing
that could happen is that one of the threads will get an EEXIST error
when trying to write the log file.  This is no worse than the current
situation, and it is not likely to happen.

Signed-off-by: Ryan Haasken <haasken@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4129
Reviewed-on: http://review.whamcloud.com/8964
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/libcfs/debug.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/debug.c b/drivers/staging/lustre/lnet/libcfs/debug.c
index 42b15a7..23b36b8 100644
--- a/drivers/staging/lustre/lnet/libcfs/debug.c
+++ b/drivers/staging/lustre/lnet/libcfs/debug.c
@@ -328,15 +328,20 @@ libcfs_debug_str2mask(int *mask, const char *str, int is_subsys)
  */
 void libcfs_debug_dumplog_internal(void *arg)
 {
+	static time64_t last_dump_time;
+	time64_t current_time;
 	void *journal_info;
 
 	journal_info = current->journal_info;
 	current->journal_info = NULL;
+	current_time = ktime_get_real_seconds();
 
-	if (strncmp(libcfs_debug_file_path_arr, "NONE", 4) != 0) {
+	if (strncmp(libcfs_debug_file_path_arr, "NONE", 4) &&
+	    current_time > last_dump_time) {
+		last_dump_time = current_time;
 		snprintf(debug_file_name, sizeof(debug_file_name) - 1,
 			 "%s.%lld.%ld", libcfs_debug_file_path_arr,
-			 (s64)ktime_get_real_seconds(), (long_ptr_t)arg);
+			 (s64)current_time, (long_ptr_t)arg);
 		pr_alert("LustreError: dumping log to %s\n", debug_file_name);
 		cfs_tracefile_dump_all_pages(debug_file_name);
 		libcfs_run_debug_log_upcall(debug_file_name);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 19/80] staging: lustre: libcfs: Only dump log once per sec. to avoid EEXIST
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Ryan Haasken,
	James Simmons

From: Ryan Haasken <haasken@cray.com>

Since the log file name contains the current time in seconds, dumping
the logs more than once per second causes EEXIST errors to be emitted.
Add a static variable to libcfs_debug_dumplog_internal that records
the time of the last Lustre log dump.  If the current time in seconds
is equal to the last time, do not dump logs again.

Note that this is not thread-safe.  However, in the rare case that two
threads try to access last_dump_time simultaneously, the worst thing
that could happen is that one of the threads will get an EEXIST error
when trying to write the log file.  This is no worse than the current
situation, and it is not likely to happen.

Signed-off-by: Ryan Haasken <haasken@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4129
Reviewed-on: http://review.whamcloud.com/8964
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/libcfs/debug.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/debug.c b/drivers/staging/lustre/lnet/libcfs/debug.c
index 42b15a7..23b36b8 100644
--- a/drivers/staging/lustre/lnet/libcfs/debug.c
+++ b/drivers/staging/lustre/lnet/libcfs/debug.c
@@ -328,15 +328,20 @@ libcfs_debug_str2mask(int *mask, const char *str, int is_subsys)
  */
 void libcfs_debug_dumplog_internal(void *arg)
 {
+	static time64_t last_dump_time;
+	time64_t current_time;
 	void *journal_info;
 
 	journal_info = current->journal_info;
 	current->journal_info = NULL;
+	current_time = ktime_get_real_seconds();
 
-	if (strncmp(libcfs_debug_file_path_arr, "NONE", 4) != 0) {
+	if (strncmp(libcfs_debug_file_path_arr, "NONE", 4) &&
+	    current_time > last_dump_time) {
+		last_dump_time = current_time;
 		snprintf(debug_file_name, sizeof(debug_file_name) - 1,
 			 "%s.%lld.%ld", libcfs_debug_file_path_arr,
-			 (s64)ktime_get_real_seconds(), (long_ptr_t)arg);
+			 (s64)current_time, (long_ptr_t)arg);
 		pr_alert("LustreError: dumping log to %s\n", debug_file_name);
 		cfs_tracefile_dump_all_pages(debug_file_name);
 		libcfs_run_debug_log_upcall(debug_file_name);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 20/80] staging: lustre: llite: enable clients to inject error for lfsck
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

This enables the client to inject an error by altering
the parent FID in order to test if the server side file
system checker behaves properly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3951
Reviewed-on: http://review.whamcloud.com/7667
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/include/obd_support.h    |    1 +
 drivers/staging/lustre/lustre/llite/vvp_req.c      |    2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 845e64a..71bf844 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -474,6 +474,7 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_LFSCK_CRASH		0x160a
 #define OBD_FAIL_LFSCK_NO_AUTO		0x160b
 #define OBD_FAIL_LFSCK_NO_DOUBLESCAN	0x160c
+#define OBD_FAIL_LFSCK_INVALID_PFID	0x1619
 
 /* UPDATE */
 #define OBD_FAIL_UPDATE_OBJ_NET			0x1700
diff --git a/drivers/staging/lustre/lustre/llite/vvp_req.c b/drivers/staging/lustre/lustre/llite/vvp_req.c
index 9fe9d6c..0567a15 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_req.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_req.c
@@ -83,6 +83,8 @@ static void vvp_req_attr_set(const struct lu_env *env,
 	}
 	obdo_from_inode(oa, inode, valid_flags & flags);
 	obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid);
+	if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_INVALID_PFID))
+		oa->o_parent_oid++;
 	memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid,
 	       JOBSTATS_JOBID_SIZE);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 20/80] staging: lustre: llite: enable clients to inject error for lfsck
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

This enables the client to inject an error by altering
the parent FID in order to test if the server side file
system checker behaves properly.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3951
Reviewed-on: http://review.whamcloud.com/7667
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/include/obd_support.h    |    1 +
 drivers/staging/lustre/lustre/llite/vvp_req.c      |    2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 845e64a..71bf844 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -474,6 +474,7 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_LFSCK_CRASH		0x160a
 #define OBD_FAIL_LFSCK_NO_AUTO		0x160b
 #define OBD_FAIL_LFSCK_NO_DOUBLESCAN	0x160c
+#define OBD_FAIL_LFSCK_INVALID_PFID	0x1619
 
 /* UPDATE */
 #define OBD_FAIL_UPDATE_OBJ_NET			0x1700
diff --git a/drivers/staging/lustre/lustre/llite/vvp_req.c b/drivers/staging/lustre/lustre/llite/vvp_req.c
index 9fe9d6c..0567a15 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_req.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_req.c
@@ -83,6 +83,8 @@ static void vvp_req_attr_set(const struct lu_env *env,
 	}
 	obdo_from_inode(oa, inode, valid_flags & flags);
 	obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid);
+	if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_INVALID_PFID))
+		oa->o_parent_oid++;
 	memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid,
 	       JOBSTATS_JOBID_SIZE);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 21/80] staging: lustre: osc: allow to call brw_commit() multiple times
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

Sometimes the rq_commit_cb of BRW RPC can be called twice if that RPC
has already committed at reply time. This will cause inaccuracy of
unstable pages accounting and then assertion.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3274
Reviewed-on: http://review.whamcloud.com/8215
Reviewed-by: Prakash Surya <surya1@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/osc/osc_cache.c   |   19 ++++---------------
 drivers/staging/lustre/lustre/osc/osc_request.c |    8 ++++----
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 53b5d73..683b3c2 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1875,11 +1875,6 @@ void osc_dec_unstable_pages(struct ptlrpc_request *req)
 	atomic_sub(page_count, &obd_unstable_pages);
 	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
 
-	spin_lock(&req->rq_lock);
-	req->rq_committed = 1;
-	req->rq_unstable  = 0;
-	spin_unlock(&req->rq_lock);
-
 	wake_up_all(&cli->cl_cache->ccc_unstable_waitq);
 }
 
@@ -1909,27 +1904,21 @@ void osc_inc_unstable_pages(struct ptlrpc_request *req)
 	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
 	atomic_add(page_count, &obd_unstable_pages);
 
-	spin_lock(&req->rq_lock);
-
 	/*
 	 * If the request has already been committed (i.e. brw_commit
 	 * called via rq_commit_cb), we need to undo the unstable page
 	 * increments we just performed because rq_commit_cb wont be
-	 * called again. Otherwise, just set the commit callback so the
-	 * unstable page accounting is properly updated when the request
-	 * is committed
+	 * called again.
 	 */
-	if (req->rq_committed) {
+	spin_lock(&req->rq_lock);
+	if (unlikely(req->rq_committed)) {
 		/* Drop lock before calling osc_dec_unstable_pages */
 		spin_unlock(&req->rq_lock);
 		osc_dec_unstable_pages(req);
-		spin_lock(&req->rq_lock);
 	} else {
 		req->rq_unstable = 1;
-		req->rq_commit_cb = osc_dec_unstable_pages;
+		spin_unlock(&req->rq_lock);
 	}
-
-	spin_unlock(&req->rq_lock);
 }
 
 /* this must be called holding the loi list lock to give coverage to exit_cache,
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 536b868..a2d948f 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1847,21 +1847,21 @@ static int brw_interpret(const struct lu_env *env,
 
 static void brw_commit(struct ptlrpc_request *req)
 {
-	spin_lock(&req->rq_lock);
 	/*
 	 * If osc_inc_unstable_pages (via osc_extent_finish) races with
 	 * this called via the rq_commit_cb, I need to ensure
 	 * osc_dec_unstable_pages is still called. Otherwise unstable
 	 * pages may be leaked.
 	 */
-	if (req->rq_unstable) {
+	spin_lock(&req->rq_lock);
+	if (unlikely(req->rq_unstable)) {
+		req->rq_unstable = 0;
 		spin_unlock(&req->rq_lock);
 		osc_dec_unstable_pages(req);
-		spin_lock(&req->rq_lock);
 	} else {
 		req->rq_committed = 1;
+		spin_unlock(&req->rq_lock);
 	}
-	spin_unlock(&req->rq_lock);
 }
 
 /**
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 21/80] staging: lustre: osc: allow to call brw_commit() multiple times
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

Sometimes the rq_commit_cb of BRW RPC can be called twice if that RPC
has already committed at reply time. This will cause inaccuracy of
unstable pages accounting and then assertion.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3274
Reviewed-on: http://review.whamcloud.com/8215
Reviewed-by: Prakash Surya <surya1@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/osc/osc_cache.c   |   19 ++++---------------
 drivers/staging/lustre/lustre/osc/osc_request.c |    8 ++++----
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 53b5d73..683b3c2 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1875,11 +1875,6 @@ void osc_dec_unstable_pages(struct ptlrpc_request *req)
 	atomic_sub(page_count, &obd_unstable_pages);
 	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
 
-	spin_lock(&req->rq_lock);
-	req->rq_committed = 1;
-	req->rq_unstable  = 0;
-	spin_unlock(&req->rq_lock);
-
 	wake_up_all(&cli->cl_cache->ccc_unstable_waitq);
 }
 
@@ -1909,27 +1904,21 @@ void osc_inc_unstable_pages(struct ptlrpc_request *req)
 	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
 	atomic_add(page_count, &obd_unstable_pages);
 
-	spin_lock(&req->rq_lock);
-
 	/*
 	 * If the request has already been committed (i.e. brw_commit
 	 * called via rq_commit_cb), we need to undo the unstable page
 	 * increments we just performed because rq_commit_cb wont be
-	 * called again. Otherwise, just set the commit callback so the
-	 * unstable page accounting is properly updated when the request
-	 * is committed
+	 * called again.
 	 */
-	if (req->rq_committed) {
+	spin_lock(&req->rq_lock);
+	if (unlikely(req->rq_committed)) {
 		/* Drop lock before calling osc_dec_unstable_pages */
 		spin_unlock(&req->rq_lock);
 		osc_dec_unstable_pages(req);
-		spin_lock(&req->rq_lock);
 	} else {
 		req->rq_unstable = 1;
-		req->rq_commit_cb = osc_dec_unstable_pages;
+		spin_unlock(&req->rq_lock);
 	}
-
-	spin_unlock(&req->rq_lock);
 }
 
 /* this must be called holding the loi list lock to give coverage to exit_cache,
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 536b868..a2d948f 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1847,21 +1847,21 @@ static int brw_interpret(const struct lu_env *env,
 
 static void brw_commit(struct ptlrpc_request *req)
 {
-	spin_lock(&req->rq_lock);
 	/*
 	 * If osc_inc_unstable_pages (via osc_extent_finish) races with
 	 * this called via the rq_commit_cb, I need to ensure
 	 * osc_dec_unstable_pages is still called. Otherwise unstable
 	 * pages may be leaked.
 	 */
-	if (req->rq_unstable) {
+	spin_lock(&req->rq_lock);
+	if (unlikely(req->rq_unstable)) {
+		req->rq_unstable = 0;
 		spin_unlock(&req->rq_lock);
 		osc_dec_unstable_pages(req);
-		spin_lock(&req->rq_lock);
 	} else {
 		req->rq_committed = 1;
+		spin_unlock(&req->rq_lock);
 	}
-	spin_unlock(&req->rq_lock);
 }
 
 /**
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 22/80] staging: lustre: llite: a few fixes for migration.
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

1. Clear the client dentry cache before migrating file/directory
   to the remote MDT.

2. Do not return stripe information to client, if it did not get
   the layout lock.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4682
Reviewed-on: http://review.whamcloud.com/9522
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   22 +++---------
 drivers/staging/lustre/lustre/llite/file.c         |   34 +++++++++++---------
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +
 drivers/staging/lustre/lustre/lov/lov_object.c     |    1 +
 4 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index ef7322e..84bec03 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -1318,11 +1318,9 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return 0;
 	}
 	case IOC_MDC_LOOKUP: {
-		struct ptlrpc_request *request = NULL;
 		int namelen, len = 0;
 		char *buf = NULL;
 		char *filename;
-		struct md_op_data *op_data;
 
 		rc = obd_ioctl_getdata(&buf, &len, (void __user *)arg);
 		if (rc)
@@ -1338,21 +1336,13 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			goto out_free;
 		}
 
-		op_data = ll_prep_md_op_data(NULL, inode, NULL, filename, namelen,
-					     0, LUSTRE_OPC_ANY, NULL);
-		if (IS_ERR(op_data)) {
-			rc = PTR_ERR(op_data);
-			goto out_free;
-		}
-
-		op_data->op_valid = OBD_MD_FLID;
-		rc = md_getattr_name(sbi->ll_md_exp, op_data, &request);
-		ll_finish_md_op_data(op_data);
+		rc = ll_get_fid_by_name(inode, filename, namelen, NULL);
 		if (rc < 0) {
-			CDEBUG(D_INFO, "md_getattr_name: %d\n", rc);
+			CERROR("%s: lookup %.*s failed: rc = %d\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0), namelen,
+			       filename, rc);
 			goto out_free;
 		}
-		ptlrpc_req_finished(request);
 out_free:
 		obd_ioctl_freedata(buf, len);
 		return rc;
@@ -1981,7 +1971,7 @@ out_quotactl:
 
 		filename = data->ioc_inlbuf1;
 		namelen = data->ioc_inllen1;
-		if (namelen < 1) {
+		if (namelen < 1 || namelen != strlen(filename) + 1) {
 			rc = -EINVAL;
 			goto migrate_free;
 		}
@@ -1992,7 +1982,7 @@ out_quotactl:
 		}
 		mdtidx = *(int *)data->ioc_inlbuf2;
 
-		rc = ll_migrate(inode, file, mdtidx, filename, namelen);
+		rc = ll_migrate(inode, file, mdtidx, filename, namelen - 1);
 migrate_free:
 		obd_ioctl_freedata(buf, len);
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 8d98db6..769b028 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2828,8 +2828,8 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	return rc;
 }
 
-static int ll_get_fid_by_name(struct inode *parent, const char *name,
-			      int namelen, struct lu_fid *fid)
+int ll_get_fid_by_name(struct inode *parent, const char *name,
+		       int namelen, struct lu_fid *fid)
 {
 	struct md_op_data *op_data = NULL;
 	struct ptlrpc_request *req;
@@ -2843,20 +2843,19 @@ static int ll_get_fid_by_name(struct inode *parent, const char *name,
 
 	op_data->op_valid = OBD_MD_FLID;
 	rc = md_getattr_name(ll_i2sbi(parent)->ll_md_exp, op_data, &req);
+	ll_finish_md_op_data(op_data);
 	if (rc < 0)
-		goto out_free;
+		return rc;
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 	if (!body) {
 		rc = -EFAULT;
 		goto out_req;
 	}
-	*fid = body->fid1;
+	if (fid)
+		*fid = body->fid1;
 out_req:
 	ptlrpc_req_finished(req);
-out_free:
-	if (op_data)
-		ll_finish_md_op_data(op_data);
 	return rc;
 }
 
@@ -2864,12 +2863,13 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	       const char *name, int namelen)
 {
 	struct ptlrpc_request *request = NULL;
+	struct inode *child_inode = NULL;
 	struct dentry *dchild = NULL;
 	struct md_op_data *op_data;
 	struct qstr qstr;
 	int rc;
 
-	CDEBUG(D_VFSTRACE, "migrate %s under"DFID" to MDT%d\n",
+	CDEBUG(D_VFSTRACE, "migrate %s under "DFID" to MDT%d\n",
 	       name, PFID(ll_inode2fid(parent)), mdtidx);
 
 	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen,
@@ -2884,8 +2884,13 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	dchild = d_lookup(file_dentry(file), &qstr);
 	if (dchild && dchild->d_inode) {
 		op_data->op_fid3 = *ll_inode2fid(dchild->d_inode);
+		if (dchild->d_inode) {
+			child_inode = igrab(dchild->d_inode);
+			ll_invalidate_aliases(child_inode);
+		}
+		dput(dchild);
 	} else {
-		rc = ll_get_fid_by_name(parent, name, strnlen(name, namelen),
+		rc = ll_get_fid_by_name(parent, name, namelen,
 					&op_data->op_fid3);
 		if (rc)
 			goto out_free;
@@ -2895,6 +2900,7 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 		CERROR("%s: migrate %s, but fid "DFID" is insane\n",
 		       ll_get_fsname(parent->i_sb, NULL, 0), name,
 		       PFID(&op_data->op_fid3));
+		rc = -EINVAL;
 		goto out_free;
 	}
 
@@ -2912,18 +2918,16 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	op_data->op_mds = mdtidx;
 	op_data->op_cli_flags = CLI_MIGRATE;
 	rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name,
-		       strnlen(name, namelen), name, strnlen(name, namelen),
-		       &request);
+		       namelen, name, namelen, &request);
 	if (!rc)
 		ll_update_times(request, parent);
 
 	ptlrpc_req_finished(request);
 
 out_free:
-	if (dchild) {
-		if (dchild->d_inode)
-			ll_delete_inode(dchild->d_inode);
-		dput(dchild);
+	if (child_inode) {
+		clear_nlink(child_inode);
+		iput(child_inode);
 	}
 
 	ll_finish_md_op_data(op_data);
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 69492f0..120aca3 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -721,6 +721,8 @@ int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat);
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	       const char *name, int namelen);
+int ll_get_fid_by_name(struct inode *parent, const char *name,
+		       int namelen, struct lu_fid *fid);
 int ll_inode_permission(struct inode *inode, int mask);
 
 int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index f9621b0..2a52d0c 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -224,6 +224,7 @@ static int lov_init_raid0(const struct lu_env *env,
 
 	LASSERT(!lov->lo_lsm);
 	lov->lo_lsm = lsm_addref(lsm);
+	lov->lo_layout_invalid = true;
 	r0->lo_nr  = lsm->lsm_stripe_count;
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 22/80] staging: lustre: llite: a few fixes for migration.
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

1. Clear the client dentry cache before migrating file/directory
   to the remote MDT.

2. Do not return stripe information to client, if it did not get
   the layout lock.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4682
Reviewed-on: http://review.whamcloud.com/9522
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   22 +++---------
 drivers/staging/lustre/lustre/llite/file.c         |   34 +++++++++++---------
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +
 drivers/staging/lustre/lustre/lov/lov_object.c     |    1 +
 4 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index ef7322e..84bec03 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -1318,11 +1318,9 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return 0;
 	}
 	case IOC_MDC_LOOKUP: {
-		struct ptlrpc_request *request = NULL;
 		int namelen, len = 0;
 		char *buf = NULL;
 		char *filename;
-		struct md_op_data *op_data;
 
 		rc = obd_ioctl_getdata(&buf, &len, (void __user *)arg);
 		if (rc)
@@ -1338,21 +1336,13 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 			goto out_free;
 		}
 
-		op_data = ll_prep_md_op_data(NULL, inode, NULL, filename, namelen,
-					     0, LUSTRE_OPC_ANY, NULL);
-		if (IS_ERR(op_data)) {
-			rc = PTR_ERR(op_data);
-			goto out_free;
-		}
-
-		op_data->op_valid = OBD_MD_FLID;
-		rc = md_getattr_name(sbi->ll_md_exp, op_data, &request);
-		ll_finish_md_op_data(op_data);
+		rc = ll_get_fid_by_name(inode, filename, namelen, NULL);
 		if (rc < 0) {
-			CDEBUG(D_INFO, "md_getattr_name: %d\n", rc);
+			CERROR("%s: lookup %.*s failed: rc = %d\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0), namelen,
+			       filename, rc);
 			goto out_free;
 		}
-		ptlrpc_req_finished(request);
 out_free:
 		obd_ioctl_freedata(buf, len);
 		return rc;
@@ -1981,7 +1971,7 @@ out_quotactl:
 
 		filename = data->ioc_inlbuf1;
 		namelen = data->ioc_inllen1;
-		if (namelen < 1) {
+		if (namelen < 1 || namelen != strlen(filename) + 1) {
 			rc = -EINVAL;
 			goto migrate_free;
 		}
@@ -1992,7 +1982,7 @@ out_quotactl:
 		}
 		mdtidx = *(int *)data->ioc_inlbuf2;
 
-		rc = ll_migrate(inode, file, mdtidx, filename, namelen);
+		rc = ll_migrate(inode, file, mdtidx, filename, namelen - 1);
 migrate_free:
 		obd_ioctl_freedata(buf, len);
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 8d98db6..769b028 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2828,8 +2828,8 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	return rc;
 }
 
-static int ll_get_fid_by_name(struct inode *parent, const char *name,
-			      int namelen, struct lu_fid *fid)
+int ll_get_fid_by_name(struct inode *parent, const char *name,
+		       int namelen, struct lu_fid *fid)
 {
 	struct md_op_data *op_data = NULL;
 	struct ptlrpc_request *req;
@@ -2843,20 +2843,19 @@ static int ll_get_fid_by_name(struct inode *parent, const char *name,
 
 	op_data->op_valid = OBD_MD_FLID;
 	rc = md_getattr_name(ll_i2sbi(parent)->ll_md_exp, op_data, &req);
+	ll_finish_md_op_data(op_data);
 	if (rc < 0)
-		goto out_free;
+		return rc;
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 	if (!body) {
 		rc = -EFAULT;
 		goto out_req;
 	}
-	*fid = body->fid1;
+	if (fid)
+		*fid = body->fid1;
 out_req:
 	ptlrpc_req_finished(req);
-out_free:
-	if (op_data)
-		ll_finish_md_op_data(op_data);
 	return rc;
 }
 
@@ -2864,12 +2863,13 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	       const char *name, int namelen)
 {
 	struct ptlrpc_request *request = NULL;
+	struct inode *child_inode = NULL;
 	struct dentry *dchild = NULL;
 	struct md_op_data *op_data;
 	struct qstr qstr;
 	int rc;
 
-	CDEBUG(D_VFSTRACE, "migrate %s under"DFID" to MDT%d\n",
+	CDEBUG(D_VFSTRACE, "migrate %s under "DFID" to MDT%d\n",
 	       name, PFID(ll_inode2fid(parent)), mdtidx);
 
 	op_data = ll_prep_md_op_data(NULL, parent, NULL, name, namelen,
@@ -2884,8 +2884,13 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	dchild = d_lookup(file_dentry(file), &qstr);
 	if (dchild && dchild->d_inode) {
 		op_data->op_fid3 = *ll_inode2fid(dchild->d_inode);
+		if (dchild->d_inode) {
+			child_inode = igrab(dchild->d_inode);
+			ll_invalidate_aliases(child_inode);
+		}
+		dput(dchild);
 	} else {
-		rc = ll_get_fid_by_name(parent, name, strnlen(name, namelen),
+		rc = ll_get_fid_by_name(parent, name, namelen,
 					&op_data->op_fid3);
 		if (rc)
 			goto out_free;
@@ -2895,6 +2900,7 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 		CERROR("%s: migrate %s, but fid "DFID" is insane\n",
 		       ll_get_fsname(parent->i_sb, NULL, 0), name,
 		       PFID(&op_data->op_fid3));
+		rc = -EINVAL;
 		goto out_free;
 	}
 
@@ -2912,18 +2918,16 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	op_data->op_mds = mdtidx;
 	op_data->op_cli_flags = CLI_MIGRATE;
 	rc = md_rename(ll_i2sbi(parent)->ll_md_exp, op_data, name,
-		       strnlen(name, namelen), name, strnlen(name, namelen),
-		       &request);
+		       namelen, name, namelen, &request);
 	if (!rc)
 		ll_update_times(request, parent);
 
 	ptlrpc_req_finished(request);
 
 out_free:
-	if (dchild) {
-		if (dchild->d_inode)
-			ll_delete_inode(dchild->d_inode);
-		dput(dchild);
+	if (child_inode) {
+		clear_nlink(child_inode);
+		iput(child_inode);
 	}
 
 	ll_finish_md_op_data(op_data);
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 69492f0..120aca3 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -721,6 +721,8 @@ int ll_getattr(struct vfsmount *mnt, struct dentry *de, struct kstat *stat);
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 	       const char *name, int namelen);
+int ll_get_fid_by_name(struct inode *parent, const char *name,
+		       int namelen, struct lu_fid *fid);
 int ll_inode_permission(struct inode *inode, int mask);
 
 int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index f9621b0..2a52d0c 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -224,6 +224,7 @@ static int lov_init_raid0(const struct lu_env *env,
 
 	LASSERT(!lov->lo_lsm);
 	lov->lo_lsm = lsm_addref(lsm);
+	lov->lo_layout_invalid = true;
 	r0->lo_nr  = lsm->lsm_stripe_count;
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 23/80] staging: lustre: mdc: fixup MDS_SWAP_LAYOUTS ELC handling
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

In mdc_ioc_swap_layouts() cancel *any* unused locks with LAYOUT or
XATTR IBITS set on the two files. (This matches the locks acquired in
mdt_swap_layouts(). Previously only locks that conflicted with a CR
LAYOUT lock were cancelled.)

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4320
Reviewed-on: http://review.whamcloud.com/9329
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/mdc/mdc_request.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 702ced9..030295f 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1670,9 +1670,11 @@ static int mdc_ioc_swap_layouts(struct obd_export *exp,
 	 * with the request RPC to avoid extra RPC round trips
 	 */
 	count = mdc_resource_get_unused(exp, &op_data->op_fid1, &cancels,
-					LCK_CR, MDS_INODELOCK_LAYOUT);
+					LCK_CR, MDS_INODELOCK_LAYOUT |
+					MDS_INODELOCK_XATTR);
 	count += mdc_resource_get_unused(exp, &op_data->op_fid2, &cancels,
-					 LCK_CR, MDS_INODELOCK_LAYOUT);
+					 LCK_CR, MDS_INODELOCK_LAYOUT |
+					 MDS_INODELOCK_XATTR);
 
 	req = ptlrpc_request_alloc(class_exp2cliimp(exp),
 				   &RQF_MDS_SWAP_LAYOUTS);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 23/80] staging: lustre: mdc: fixup MDS_SWAP_LAYOUTS ELC handling
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

In mdc_ioc_swap_layouts() cancel *any* unused locks with LAYOUT or
XATTR IBITS set on the two files. (This matches the locks acquired in
mdt_swap_layouts(). Previously only locks that conflicted with a CR
LAYOUT lock were cancelled.)

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4320
Reviewed-on: http://review.whamcloud.com/9329
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/mdc/mdc_request.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 702ced9..030295f 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1670,9 +1670,11 @@ static int mdc_ioc_swap_layouts(struct obd_export *exp,
 	 * with the request RPC to avoid extra RPC round trips
 	 */
 	count = mdc_resource_get_unused(exp, &op_data->op_fid1, &cancels,
-					LCK_CR, MDS_INODELOCK_LAYOUT);
+					LCK_CR, MDS_INODELOCK_LAYOUT |
+					MDS_INODELOCK_XATTR);
 	count += mdc_resource_get_unused(exp, &op_data->op_fid2, &cancels,
-					 LCK_CR, MDS_INODELOCK_LAYOUT);
+					 LCK_CR, MDS_INODELOCK_LAYOUT |
+					 MDS_INODELOCK_XATTR);
 
 	req = ptlrpc_request_alloc(class_exp2cliimp(exp),
 				   &RQF_MDS_SWAP_LAYOUTS);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 24/80] staging: lustre: don't need to const __u64 parameters for lustre_idl.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Remove the const for the __u64 parameters for inline functions
in lustre_idl.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 6853f62..87e79b9 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -442,7 +442,7 @@ static inline int fid_seq_is_mdt0(__u64 seq)
 	return (seq == FID_SEQ_OST_MDT0);
 }
 
-static inline int fid_seq_is_mdt(const __u64 seq)
+static inline int fid_seq_is_mdt(__u64 seq)
 {
 	return seq == FID_SEQ_OST_MDT0 || seq >= FID_SEQ_NORMAL;
 };
@@ -468,33 +468,33 @@ static inline int fid_is_llog(const struct lu_fid *fid)
 	return fid_seq_is_llog(fid_seq(fid)) && fid_oid(fid) > 0;
 }
 
-static inline int fid_seq_is_rsvd(const __u64 seq)
+static inline int fid_seq_is_rsvd(__u64 seq)
 {
 	return (seq > FID_SEQ_OST_MDT0 && seq <= FID_SEQ_RSVD);
 };
 
-static inline int fid_seq_is_special(const __u64 seq)
+static inline int fid_seq_is_special(__u64 seq)
 {
 	return seq == FID_SEQ_SPECIAL;
 };
 
-static inline int fid_seq_is_local_file(const __u64 seq)
+static inline int fid_seq_is_local_file(__u64 seq)
 {
 	return seq == FID_SEQ_LOCAL_FILE ||
 	       seq == FID_SEQ_LOCAL_NAME;
 };
 
-static inline int fid_seq_is_root(const __u64 seq)
+static inline int fid_seq_is_root(__u64 seq)
 {
 	return seq == FID_SEQ_ROOT;
 }
 
-static inline int fid_seq_is_dot(const __u64 seq)
+static inline int fid_seq_is_dot(__u64 seq)
 {
 	return seq == FID_SEQ_DOT_LUSTRE;
 }
 
-static inline int fid_seq_is_default(const __u64 seq)
+static inline int fid_seq_is_default(__u64 seq)
 {
 	return seq == FID_SEQ_LOV_DEFAULT;
 }
@@ -516,7 +516,7 @@ static inline void lu_root_fid(struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a igif; otherwise false.
  */
-static inline int fid_seq_is_igif(const __u64 seq)
+static inline int fid_seq_is_igif(__u64 seq)
 {
 	return seq >= FID_SEQ_IGIF && seq <= FID_SEQ_IGIF_MAX;
 }
@@ -531,7 +531,7 @@ static inline int fid_is_igif(const struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a idif; otherwise false.
  */
-static inline int fid_seq_is_idif(const __u64 seq)
+static inline int fid_seq_is_idif(__u64 seq)
 {
 	return seq >= FID_SEQ_IDIF && seq <= FID_SEQ_IDIF_MAX;
 }
@@ -546,7 +546,7 @@ static inline int fid_is_local_file(const struct lu_fid *fid)
 	return fid_seq_is_local_file(fid_seq(fid));
 }
 
-static inline int fid_seq_is_norm(const __u64 seq)
+static inline int fid_seq_is_norm(__u64 seq)
 {
 	return (seq >= FID_SEQ_NORMAL);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 24/80] staging: lustre: don't need to const __u64 parameters for lustre_idl.h
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Remove the const for the __u64 parameters for inline functions
in lustre_idl.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 6853f62..87e79b9 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -442,7 +442,7 @@ static inline int fid_seq_is_mdt0(__u64 seq)
 	return (seq == FID_SEQ_OST_MDT0);
 }
 
-static inline int fid_seq_is_mdt(const __u64 seq)
+static inline int fid_seq_is_mdt(__u64 seq)
 {
 	return seq == FID_SEQ_OST_MDT0 || seq >= FID_SEQ_NORMAL;
 };
@@ -468,33 +468,33 @@ static inline int fid_is_llog(const struct lu_fid *fid)
 	return fid_seq_is_llog(fid_seq(fid)) && fid_oid(fid) > 0;
 }
 
-static inline int fid_seq_is_rsvd(const __u64 seq)
+static inline int fid_seq_is_rsvd(__u64 seq)
 {
 	return (seq > FID_SEQ_OST_MDT0 && seq <= FID_SEQ_RSVD);
 };
 
-static inline int fid_seq_is_special(const __u64 seq)
+static inline int fid_seq_is_special(__u64 seq)
 {
 	return seq == FID_SEQ_SPECIAL;
 };
 
-static inline int fid_seq_is_local_file(const __u64 seq)
+static inline int fid_seq_is_local_file(__u64 seq)
 {
 	return seq == FID_SEQ_LOCAL_FILE ||
 	       seq == FID_SEQ_LOCAL_NAME;
 };
 
-static inline int fid_seq_is_root(const __u64 seq)
+static inline int fid_seq_is_root(__u64 seq)
 {
 	return seq == FID_SEQ_ROOT;
 }
 
-static inline int fid_seq_is_dot(const __u64 seq)
+static inline int fid_seq_is_dot(__u64 seq)
 {
 	return seq == FID_SEQ_DOT_LUSTRE;
 }
 
-static inline int fid_seq_is_default(const __u64 seq)
+static inline int fid_seq_is_default(__u64 seq)
 {
 	return seq == FID_SEQ_LOV_DEFAULT;
 }
@@ -516,7 +516,7 @@ static inline void lu_root_fid(struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a igif; otherwise false.
  */
-static inline int fid_seq_is_igif(const __u64 seq)
+static inline int fid_seq_is_igif(__u64 seq)
 {
 	return seq >= FID_SEQ_IGIF && seq <= FID_SEQ_IGIF_MAX;
 }
@@ -531,7 +531,7 @@ static inline int fid_is_igif(const struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a idif; otherwise false.
  */
-static inline int fid_seq_is_idif(const __u64 seq)
+static inline int fid_seq_is_idif(__u64 seq)
 {
 	return seq >= FID_SEQ_IDIF && seq <= FID_SEQ_IDIF_MAX;
 }
@@ -546,7 +546,7 @@ static inline int fid_is_local_file(const struct lu_fid *fid)
 	return fid_seq_is_local_file(fid_seq(fid));
 }
 
-static inline int fid_seq_is_norm(const __u64 seq)
+static inline int fid_seq_is_norm(__u64 seq)
 {
 	return (seq >= FID_SEQ_NORMAL);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 25/80] staging: lustre: const correct FID/OSTID/... helpers
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Add a const qualifier wherever possible to the pointer parameters of
the inline helper functions in lustre_idl.h and lustre_fid.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   28 ++++++++++----------
 drivers/staging/lustre/lustre/include/lustre_fid.h |    9 +++---
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 87e79b9..c932e20 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1033,7 +1033,7 @@ static inline int lu_dirent_calc_size(int namelen, __u16 attr)
 	return (size + 7) & ~7;
 }
 
-static inline int lu_dirent_size(struct lu_dirent *ent)
+static inline int lu_dirent_size(const struct lu_dirent *ent)
 {
 	if (le16_to_cpu(ent->lde_reclen) == 0) {
 		return lu_dirent_calc_size(le16_to_cpu(ent->lde_namelen),
@@ -1067,7 +1067,7 @@ struct lustre_handle {
 
 #define DEAD_HANDLE_MAGIC 0xdeadbeefcafebabeULL
 
-static inline int lustre_handle_is_used(struct lustre_handle *lh)
+static inline int lustre_handle_is_used(const struct lustre_handle *lh)
 {
 	return lh->cookie != 0ull;
 }
@@ -1079,7 +1079,7 @@ static inline int lustre_handle_equal(const struct lustre_handle *lh1,
 }
 
 static inline void lustre_handle_copy(struct lustre_handle *tgt,
-				      struct lustre_handle *src)
+				      const struct lustre_handle *src)
 {
 	tgt->cookie = src->cookie;
 }
@@ -1570,25 +1570,25 @@ static inline void lmm_oi_set_id(struct ost_id *oi, __u64 oid)
 	oi->oi.oi_id = oid;
 }
 
-static inline __u64 lmm_oi_id(struct ost_id *oi)
+static inline __u64 lmm_oi_id(const struct ost_id *oi)
 {
 	return oi->oi.oi_id;
 }
 
-static inline __u64 lmm_oi_seq(struct ost_id *oi)
+static inline __u64 lmm_oi_seq(const struct ost_id *oi)
 {
 	return oi->oi.oi_seq;
 }
 
 static inline void lmm_oi_le_to_cpu(struct ost_id *dst_oi,
-				    struct ost_id *src_oi)
+				    const struct ost_id *src_oi)
 {
 	dst_oi->oi.oi_id = le64_to_cpu(src_oi->oi.oi_id);
 	dst_oi->oi.oi_seq = le64_to_cpu(src_oi->oi.oi_seq);
 }
 
 static inline void lmm_oi_cpu_to_le(struct ost_id *dst_oi,
-				    struct ost_id *src_oi)
+				    const struct ost_id *src_oi)
 {
 	dst_oi->oi.oi_id = cpu_to_le64(src_oi->oi.oi_id);
 	dst_oi->oi.oi_seq = cpu_to_le64(src_oi->oi.oi_seq);
@@ -2724,15 +2724,15 @@ struct ldlm_extent {
 
 #define LDLM_GID_ANY ((__u64)-1)
 
-static inline int ldlm_extent_overlap(struct ldlm_extent *ex1,
-				      struct ldlm_extent *ex2)
+static inline int ldlm_extent_overlap(const struct ldlm_extent *ex1,
+				      const struct ldlm_extent *ex2)
 {
 	return (ex1->start <= ex2->end) && (ex2->start <= ex1->end);
 }
 
 /* check if @ex1 contains @ex2 */
-static inline int ldlm_extent_contain(struct ldlm_extent *ex1,
-				      struct ldlm_extent *ex2)
+static inline int ldlm_extent_contain(const struct ldlm_extent *ex1,
+				      const struct ldlm_extent *ex2)
 {
 	return (ex1->start <= ex2->start) && (ex1->end >= ex2->end);
 }
@@ -3092,7 +3092,7 @@ enum agent_req_status {
 	ARS_SUCCEED,
 };
 
-static inline char *agent_req_status2name(enum agent_req_status ars)
+static inline const char *agent_req_status2name(const enum agent_req_status ars)
 {
 	switch (ars) {
 	case ARS_WAITING:
@@ -3268,7 +3268,7 @@ struct obdo {
 #define o_cksum   o_nlink
 #define o_grant_used o_data_version
 
-static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd,
+static inline void lustre_set_wire_obdo(const struct obd_connect_data *ocd,
 					struct obdo *wobdo,
 					const struct obdo *lobdo)
 {
@@ -3287,7 +3287,7 @@ static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd,
 	}
 }
 
-static inline void lustre_get_wire_obdo(struct obd_connect_data *ocd,
+static inline void lustre_get_wire_obdo(const struct obd_connect_data *ocd,
 					struct obdo *lobdo,
 					const struct obdo *wobdo)
 {
diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index 61f3930..a85183b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -483,7 +483,7 @@ fid_build_pdo_res_name(const struct lu_fid *fid, unsigned int hash,
  *    res will be built from normal FID directly, i.e. res[0] = f_seq,
  *    res[1] = f_oid + f_ver.
  */
-static inline void ostid_build_res_name(struct ost_id *oi,
+static inline void ostid_build_res_name(const struct ost_id *oi,
 					struct ldlm_res_id *name)
 {
 	memset(name, 0, sizeof(*name));
@@ -498,8 +498,8 @@ static inline void ostid_build_res_name(struct ost_id *oi,
 /**
  * Return true if the resource is for the object identified by this id & group.
  */
-static inline int ostid_res_name_eq(struct ost_id *oi,
-				    struct ldlm_res_id *name)
+static inline int ostid_res_name_eq(const struct ost_id *oi,
+				    const struct ldlm_res_id *name)
 {
 	/* Note: it is just a trick here to save some effort, probably the
 	 * correct way would be turn them into the FID and compare
@@ -610,7 +610,8 @@ static inline __u32 fid_flatten32(const struct lu_fid *fid)
 	return ino ? ino : fid_oid(fid);
 }
 
-static inline int lu_fid_diff(struct lu_fid *fid1, struct lu_fid *fid2)
+static inline int lu_fid_diff(const struct lu_fid *fid1,
+			      const struct lu_fid *fid2)
 {
 	LASSERTF(fid_seq(fid1) == fid_seq(fid2), "fid1:"DFID", fid2:"DFID"\n",
 		 PFID(fid1), PFID(fid2));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 25/80] staging: lustre: const correct FID/OSTID/... helpers
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Add a const qualifier wherever possible to the pointer parameters of
the inline helper functions in lustre_idl.h and lustre_fid.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   28 ++++++++++----------
 drivers/staging/lustre/lustre/include/lustre_fid.h |    9 +++---
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 87e79b9..c932e20 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1033,7 +1033,7 @@ static inline int lu_dirent_calc_size(int namelen, __u16 attr)
 	return (size + 7) & ~7;
 }
 
-static inline int lu_dirent_size(struct lu_dirent *ent)
+static inline int lu_dirent_size(const struct lu_dirent *ent)
 {
 	if (le16_to_cpu(ent->lde_reclen) == 0) {
 		return lu_dirent_calc_size(le16_to_cpu(ent->lde_namelen),
@@ -1067,7 +1067,7 @@ struct lustre_handle {
 
 #define DEAD_HANDLE_MAGIC 0xdeadbeefcafebabeULL
 
-static inline int lustre_handle_is_used(struct lustre_handle *lh)
+static inline int lustre_handle_is_used(const struct lustre_handle *lh)
 {
 	return lh->cookie != 0ull;
 }
@@ -1079,7 +1079,7 @@ static inline int lustre_handle_equal(const struct lustre_handle *lh1,
 }
 
 static inline void lustre_handle_copy(struct lustre_handle *tgt,
-				      struct lustre_handle *src)
+				      const struct lustre_handle *src)
 {
 	tgt->cookie = src->cookie;
 }
@@ -1570,25 +1570,25 @@ static inline void lmm_oi_set_id(struct ost_id *oi, __u64 oid)
 	oi->oi.oi_id = oid;
 }
 
-static inline __u64 lmm_oi_id(struct ost_id *oi)
+static inline __u64 lmm_oi_id(const struct ost_id *oi)
 {
 	return oi->oi.oi_id;
 }
 
-static inline __u64 lmm_oi_seq(struct ost_id *oi)
+static inline __u64 lmm_oi_seq(const struct ost_id *oi)
 {
 	return oi->oi.oi_seq;
 }
 
 static inline void lmm_oi_le_to_cpu(struct ost_id *dst_oi,
-				    struct ost_id *src_oi)
+				    const struct ost_id *src_oi)
 {
 	dst_oi->oi.oi_id = le64_to_cpu(src_oi->oi.oi_id);
 	dst_oi->oi.oi_seq = le64_to_cpu(src_oi->oi.oi_seq);
 }
 
 static inline void lmm_oi_cpu_to_le(struct ost_id *dst_oi,
-				    struct ost_id *src_oi)
+				    const struct ost_id *src_oi)
 {
 	dst_oi->oi.oi_id = cpu_to_le64(src_oi->oi.oi_id);
 	dst_oi->oi.oi_seq = cpu_to_le64(src_oi->oi.oi_seq);
@@ -2724,15 +2724,15 @@ struct ldlm_extent {
 
 #define LDLM_GID_ANY ((__u64)-1)
 
-static inline int ldlm_extent_overlap(struct ldlm_extent *ex1,
-				      struct ldlm_extent *ex2)
+static inline int ldlm_extent_overlap(const struct ldlm_extent *ex1,
+				      const struct ldlm_extent *ex2)
 {
 	return (ex1->start <= ex2->end) && (ex2->start <= ex1->end);
 }
 
 /* check if @ex1 contains @ex2 */
-static inline int ldlm_extent_contain(struct ldlm_extent *ex1,
-				      struct ldlm_extent *ex2)
+static inline int ldlm_extent_contain(const struct ldlm_extent *ex1,
+				      const struct ldlm_extent *ex2)
 {
 	return (ex1->start <= ex2->start) && (ex1->end >= ex2->end);
 }
@@ -3092,7 +3092,7 @@ enum agent_req_status {
 	ARS_SUCCEED,
 };
 
-static inline char *agent_req_status2name(enum agent_req_status ars)
+static inline const char *agent_req_status2name(const enum agent_req_status ars)
 {
 	switch (ars) {
 	case ARS_WAITING:
@@ -3268,7 +3268,7 @@ struct obdo {
 #define o_cksum   o_nlink
 #define o_grant_used o_data_version
 
-static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd,
+static inline void lustre_set_wire_obdo(const struct obd_connect_data *ocd,
 					struct obdo *wobdo,
 					const struct obdo *lobdo)
 {
@@ -3287,7 +3287,7 @@ static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd,
 	}
 }
 
-static inline void lustre_get_wire_obdo(struct obd_connect_data *ocd,
+static inline void lustre_get_wire_obdo(const struct obd_connect_data *ocd,
 					struct obdo *lobdo,
 					const struct obdo *wobdo)
 {
diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index 61f3930..a85183b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -483,7 +483,7 @@ fid_build_pdo_res_name(const struct lu_fid *fid, unsigned int hash,
  *    res will be built from normal FID directly, i.e. res[0] = f_seq,
  *    res[1] = f_oid + f_ver.
  */
-static inline void ostid_build_res_name(struct ost_id *oi,
+static inline void ostid_build_res_name(const struct ost_id *oi,
 					struct ldlm_res_id *name)
 {
 	memset(name, 0, sizeof(*name));
@@ -498,8 +498,8 @@ static inline void ostid_build_res_name(struct ost_id *oi,
 /**
  * Return true if the resource is for the object identified by this id & group.
  */
-static inline int ostid_res_name_eq(struct ost_id *oi,
-				    struct ldlm_res_id *name)
+static inline int ostid_res_name_eq(const struct ost_id *oi,
+				    const struct ldlm_res_id *name)
 {
 	/* Note: it is just a trick here to save some effort, probably the
 	 * correct way would be turn them into the FID and compare
@@ -610,7 +610,8 @@ static inline __u32 fid_flatten32(const struct lu_fid *fid)
 	return ino ? ino : fid_oid(fid);
 }
 
-static inline int lu_fid_diff(struct lu_fid *fid1, struct lu_fid *fid2)
+static inline int lu_fid_diff(const struct lu_fid *fid1,
+			      const struct lu_fid *fid2)
 {
 	LASSERTF(fid_seq(fid1) == fid_seq(fid2), "fid1:"DFID", fid2:"DFID"\n",
 		 PFID(fid1), PFID(fid2));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 26/80] staging: lustre: use bool for several function in lustre_idl.h/lustre_fid.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Change the return type of several predicate functions from int to bool.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   72 ++++++++++----------
 drivers/staging/lustre/lustre/include/lustre_fid.h |    4 +-
 2 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index c932e20..d3a9db9 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -196,12 +196,12 @@ static inline unsigned fld_range_type(const struct lu_seq_range *range)
 	return range->lsr_flags & LU_SEQ_RANGE_MASK;
 }
 
-static inline int fld_range_is_ost(const struct lu_seq_range *range)
+static inline bool fld_range_is_ost(const struct lu_seq_range *range)
 {
 	return fld_range_type(range) == LU_SEQ_RANGE_OST;
 }
 
-static inline int fld_range_is_mdt(const struct lu_seq_range *range)
+static inline bool fld_range_is_mdt(const struct lu_seq_range *range)
 {
 	return fld_range_type(range) == LU_SEQ_RANGE_MDT;
 }
@@ -260,23 +260,23 @@ static inline void range_init(struct lu_seq_range *range)
  * check if given seq id \a s is within given range \a r
  */
 
-static inline int range_within(const struct lu_seq_range *range,
-			       __u64 s)
+static inline bool range_within(const struct lu_seq_range *range,
+				__u64 s)
 {
 	return s >= range->lsr_start && s < range->lsr_end;
 }
 
-static inline int range_is_sane(const struct lu_seq_range *range)
+static inline bool range_is_sane(const struct lu_seq_range *range)
 {
 	return (range->lsr_end >= range->lsr_start);
 }
 
-static inline int range_is_zero(const struct lu_seq_range *range)
+static inline bool range_is_zero(const struct lu_seq_range *range)
 {
 	return (range->lsr_start == 0 && range->lsr_end == 0);
 }
 
-static inline int range_is_exhausted(const struct lu_seq_range *range)
+static inline bool range_is_exhausted(const struct lu_seq_range *range)
 
 {
 	return range_space(range) == 0;
@@ -437,69 +437,69 @@ enum dot_lustre_oid {
 	FID_OID_DOT_LUSTRE_OBF = 2UL,
 };
 
-static inline int fid_seq_is_mdt0(__u64 seq)
+static inline bool fid_seq_is_mdt0(__u64 seq)
 {
 	return (seq == FID_SEQ_OST_MDT0);
 }
 
-static inline int fid_seq_is_mdt(__u64 seq)
+static inline bool fid_seq_is_mdt(__u64 seq)
 {
 	return seq == FID_SEQ_OST_MDT0 || seq >= FID_SEQ_NORMAL;
 };
 
-static inline int fid_seq_is_echo(__u64 seq)
+static inline bool fid_seq_is_echo(__u64 seq)
 {
 	return (seq == FID_SEQ_ECHO);
 }
 
-static inline int fid_is_echo(const struct lu_fid *fid)
+static inline bool fid_is_echo(const struct lu_fid *fid)
 {
 	return fid_seq_is_echo(fid_seq(fid));
 }
 
-static inline int fid_seq_is_llog(__u64 seq)
+static inline bool fid_seq_is_llog(__u64 seq)
 {
 	return (seq == FID_SEQ_LLOG);
 }
 
-static inline int fid_is_llog(const struct lu_fid *fid)
+static inline bool fid_is_llog(const struct lu_fid *fid)
 {
 	/* file with OID == 0 is not llog but contains last oid */
 	return fid_seq_is_llog(fid_seq(fid)) && fid_oid(fid) > 0;
 }
 
-static inline int fid_seq_is_rsvd(__u64 seq)
+static inline bool fid_seq_is_rsvd(__u64 seq)
 {
 	return (seq > FID_SEQ_OST_MDT0 && seq <= FID_SEQ_RSVD);
 };
 
-static inline int fid_seq_is_special(__u64 seq)
+static inline bool fid_seq_is_special(__u64 seq)
 {
 	return seq == FID_SEQ_SPECIAL;
 };
 
-static inline int fid_seq_is_local_file(__u64 seq)
+static inline bool fid_seq_is_local_file(__u64 seq)
 {
 	return seq == FID_SEQ_LOCAL_FILE ||
 	       seq == FID_SEQ_LOCAL_NAME;
 };
 
-static inline int fid_seq_is_root(__u64 seq)
+static inline bool fid_seq_is_root(__u64 seq)
 {
 	return seq == FID_SEQ_ROOT;
 }
 
-static inline int fid_seq_is_dot(__u64 seq)
+static inline bool fid_seq_is_dot(__u64 seq)
 {
 	return seq == FID_SEQ_DOT_LUSTRE;
 }
 
-static inline int fid_seq_is_default(__u64 seq)
+static inline bool fid_seq_is_default(__u64 seq)
 {
 	return seq == FID_SEQ_LOV_DEFAULT;
 }
 
-static inline int fid_is_mdt0(const struct lu_fid *fid)
+static inline bool fid_is_mdt0(const struct lu_fid *fid)
 {
 	return fid_seq_is_mdt0(fid_seq(fid));
 }
@@ -516,12 +516,12 @@ static inline void lu_root_fid(struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a igif; otherwise false.
  */
-static inline int fid_seq_is_igif(__u64 seq)
+static inline bool fid_seq_is_igif(__u64 seq)
 {
 	return seq >= FID_SEQ_IGIF && seq <= FID_SEQ_IGIF_MAX;
 }
 
-static inline int fid_is_igif(const struct lu_fid *fid)
+static inline bool fid_is_igif(const struct lu_fid *fid)
 {
 	return fid_seq_is_igif(fid_seq(fid));
 }
@@ -531,27 +531,27 @@ static inline int fid_is_igif(const struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a idif; otherwise false.
  */
-static inline int fid_seq_is_idif(__u64 seq)
+static inline bool fid_seq_is_idif(__u64 seq)
 {
 	return seq >= FID_SEQ_IDIF && seq <= FID_SEQ_IDIF_MAX;
 }
 
-static inline int fid_is_idif(const struct lu_fid *fid)
+static inline bool fid_is_idif(const struct lu_fid *fid)
 {
 	return fid_seq_is_idif(fid_seq(fid));
 }
 
-static inline int fid_is_local_file(const struct lu_fid *fid)
+static inline bool fid_is_local_file(const struct lu_fid *fid)
 {
 	return fid_seq_is_local_file(fid_seq(fid));
 }
 
-static inline int fid_seq_is_norm(__u64 seq)
+static inline bool fid_seq_is_norm(__u64 seq)
 {
 	return (seq >= FID_SEQ_NORMAL);
 }
 
-static inline int fid_is_norm(const struct lu_fid *fid)
+static inline bool fid_is_norm(const struct lu_fid *fid)
 {
 	return fid_seq_is_norm(fid_seq(fid));
 }
@@ -769,7 +769,7 @@ static inline int fid_to_ostid(const struct lu_fid *fid, struct ost_id *ostid)
 }
 
 /* Check whether the fid is for LAST_ID */
-static inline int fid_is_last_id(const struct lu_fid *fid)
+static inline bool fid_is_last_id(const struct lu_fid *fid)
 {
 	return (fid_oid(fid) == 0);
 }
@@ -838,7 +838,7 @@ static inline void fid_be_to_cpu(struct lu_fid *dst, const struct lu_fid *src)
 	dst->f_ver = be32_to_cpu(fid_ver(src));
 }
 
-static inline int fid_is_sane(const struct lu_fid *fid)
+static inline bool fid_is_sane(const struct lu_fid *fid)
 {
 	return fid &&
 	       ((fid_seq(fid) >= FID_SEQ_START && fid_ver(fid) == 0) ||
@@ -846,7 +846,7 @@ static inline int fid_is_sane(const struct lu_fid *fid)
 		fid_seq_is_rsvd(fid_seq(fid)));
 }
 
-static inline int fid_is_zero(const struct lu_fid *fid)
+static inline bool fid_is_zero(const struct lu_fid *fid)
 {
 	return fid_seq(fid) == 0 && fid_oid(fid) == 0;
 }
@@ -854,7 +854,7 @@ static inline int fid_is_zero(const struct lu_fid *fid)
 void lustre_swab_lu_fid(struct lu_fid *fid);
 void lustre_swab_lu_seq_range(struct lu_seq_range *range);
 
-static inline int lu_fid_eq(const struct lu_fid *f0, const struct lu_fid *f1)
+static inline bool lu_fid_eq(const struct lu_fid *f0, const struct lu_fid *f1)
 {
 	return memcmp(f0, f1, sizeof(*f0)) == 0;
 }
@@ -1067,13 +1067,13 @@ struct lustre_handle {
 
 #define DEAD_HANDLE_MAGIC 0xdeadbeefcafebabeULL
 
-static inline int lustre_handle_is_used(const struct lustre_handle *lh)
+static inline bool lustre_handle_is_used(const struct lustre_handle *lh)
 {
 	return lh->cookie != 0ull;
 }
 
-static inline int lustre_handle_equal(const struct lustre_handle *lh1,
-				      const struct lustre_handle *lh2)
+static inline bool lustre_handle_equal(const struct lustre_handle *lh1,
+				       const struct lustre_handle *lh2)
 {
 	return lh1->cookie == lh2->cookie;
 }
@@ -2684,8 +2684,8 @@ struct ldlm_res_id {
 #define PLDLMRES(res)	(res)->lr_name.name[0], (res)->lr_name.name[1], \
 			(res)->lr_name.name[2], (res)->lr_name.name[3]
 
-static inline int ldlm_res_eq(const struct ldlm_res_id *res0,
-			      const struct ldlm_res_id *res1)
+static inline bool ldlm_res_eq(const struct ldlm_res_id *res0,
+			       const struct ldlm_res_id *res1)
 {
 	return !memcmp(res0, res1, sizeof(*res0));
 }
diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index a85183b..6f7dc15 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -406,8 +406,8 @@ fid_build_reg_res_name(const struct lu_fid *fid, struct ldlm_res_id *res)
 /*
  * Return true if resource is for object identified by FID.
  */
-static inline int fid_res_name_eq(const struct lu_fid *fid,
-				  const struct ldlm_res_id *res)
+static inline bool fid_res_name_eq(const struct lu_fid *fid,
+				   const struct ldlm_res_id *res)
 {
 	return res->name[LUSTRE_RES_ID_SEQ_OFF] == fid_seq(fid) &&
 	       res->name[LUSTRE_RES_ID_VER_OID_OFF] == fid_ver_oid(fid);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 26/80] staging: lustre: use bool for several function in lustre_idl.h/lustre_fid.h
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Change the return type of several predicate functions from int to bool.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   72 ++++++++++----------
 drivers/staging/lustre/lustre/include/lustre_fid.h |    4 +-
 2 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index c932e20..d3a9db9 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -196,12 +196,12 @@ static inline unsigned fld_range_type(const struct lu_seq_range *range)
 	return range->lsr_flags & LU_SEQ_RANGE_MASK;
 }
 
-static inline int fld_range_is_ost(const struct lu_seq_range *range)
+static inline bool fld_range_is_ost(const struct lu_seq_range *range)
 {
 	return fld_range_type(range) == LU_SEQ_RANGE_OST;
 }
 
-static inline int fld_range_is_mdt(const struct lu_seq_range *range)
+static inline bool fld_range_is_mdt(const struct lu_seq_range *range)
 {
 	return fld_range_type(range) == LU_SEQ_RANGE_MDT;
 }
@@ -260,23 +260,23 @@ static inline void range_init(struct lu_seq_range *range)
  * check if given seq id \a s is within given range \a r
  */
 
-static inline int range_within(const struct lu_seq_range *range,
-			       __u64 s)
+static inline bool range_within(const struct lu_seq_range *range,
+				__u64 s)
 {
 	return s >= range->lsr_start && s < range->lsr_end;
 }
 
-static inline int range_is_sane(const struct lu_seq_range *range)
+static inline bool range_is_sane(const struct lu_seq_range *range)
 {
 	return (range->lsr_end >= range->lsr_start);
 }
 
-static inline int range_is_zero(const struct lu_seq_range *range)
+static inline bool range_is_zero(const struct lu_seq_range *range)
 {
 	return (range->lsr_start == 0 && range->lsr_end == 0);
 }
 
-static inline int range_is_exhausted(const struct lu_seq_range *range)
+static inline bool range_is_exhausted(const struct lu_seq_range *range)
 
 {
 	return range_space(range) == 0;
@@ -437,69 +437,69 @@ enum dot_lustre_oid {
 	FID_OID_DOT_LUSTRE_OBF = 2UL,
 };
 
-static inline int fid_seq_is_mdt0(__u64 seq)
+static inline bool fid_seq_is_mdt0(__u64 seq)
 {
 	return (seq == FID_SEQ_OST_MDT0);
 }
 
-static inline int fid_seq_is_mdt(__u64 seq)
+static inline bool fid_seq_is_mdt(__u64 seq)
 {
 	return seq == FID_SEQ_OST_MDT0 || seq >= FID_SEQ_NORMAL;
 };
 
-static inline int fid_seq_is_echo(__u64 seq)
+static inline bool fid_seq_is_echo(__u64 seq)
 {
 	return (seq == FID_SEQ_ECHO);
 }
 
-static inline int fid_is_echo(const struct lu_fid *fid)
+static inline bool fid_is_echo(const struct lu_fid *fid)
 {
 	return fid_seq_is_echo(fid_seq(fid));
 }
 
-static inline int fid_seq_is_llog(__u64 seq)
+static inline bool fid_seq_is_llog(__u64 seq)
 {
 	return (seq == FID_SEQ_LLOG);
 }
 
-static inline int fid_is_llog(const struct lu_fid *fid)
+static inline bool fid_is_llog(const struct lu_fid *fid)
 {
 	/* file with OID == 0 is not llog but contains last oid */
 	return fid_seq_is_llog(fid_seq(fid)) && fid_oid(fid) > 0;
 }
 
-static inline int fid_seq_is_rsvd(__u64 seq)
+static inline bool fid_seq_is_rsvd(__u64 seq)
 {
 	return (seq > FID_SEQ_OST_MDT0 && seq <= FID_SEQ_RSVD);
 };
 
-static inline int fid_seq_is_special(__u64 seq)
+static inline bool fid_seq_is_special(__u64 seq)
 {
 	return seq == FID_SEQ_SPECIAL;
 };
 
-static inline int fid_seq_is_local_file(__u64 seq)
+static inline bool fid_seq_is_local_file(__u64 seq)
 {
 	return seq == FID_SEQ_LOCAL_FILE ||
 	       seq == FID_SEQ_LOCAL_NAME;
 };
 
-static inline int fid_seq_is_root(__u64 seq)
+static inline bool fid_seq_is_root(__u64 seq)
 {
 	return seq == FID_SEQ_ROOT;
 }
 
-static inline int fid_seq_is_dot(__u64 seq)
+static inline bool fid_seq_is_dot(__u64 seq)
 {
 	return seq == FID_SEQ_DOT_LUSTRE;
 }
 
-static inline int fid_seq_is_default(__u64 seq)
+static inline bool fid_seq_is_default(__u64 seq)
 {
 	return seq == FID_SEQ_LOV_DEFAULT;
 }
 
-static inline int fid_is_mdt0(const struct lu_fid *fid)
+static inline bool fid_is_mdt0(const struct lu_fid *fid)
 {
 	return fid_seq_is_mdt0(fid_seq(fid));
 }
@@ -516,12 +516,12 @@ static inline void lu_root_fid(struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a igif; otherwise false.
  */
-static inline int fid_seq_is_igif(__u64 seq)
+static inline bool fid_seq_is_igif(__u64 seq)
 {
 	return seq >= FID_SEQ_IGIF && seq <= FID_SEQ_IGIF_MAX;
 }
 
-static inline int fid_is_igif(const struct lu_fid *fid)
+static inline bool fid_is_igif(const struct lu_fid *fid)
 {
 	return fid_seq_is_igif(fid_seq(fid));
 }
@@ -531,27 +531,27 @@ static inline int fid_is_igif(const struct lu_fid *fid)
  * \param fid the fid to be tested.
  * \return true if the fid is a idif; otherwise false.
  */
-static inline int fid_seq_is_idif(__u64 seq)
+static inline bool fid_seq_is_idif(__u64 seq)
 {
 	return seq >= FID_SEQ_IDIF && seq <= FID_SEQ_IDIF_MAX;
 }
 
-static inline int fid_is_idif(const struct lu_fid *fid)
+static inline bool fid_is_idif(const struct lu_fid *fid)
 {
 	return fid_seq_is_idif(fid_seq(fid));
 }
 
-static inline int fid_is_local_file(const struct lu_fid *fid)
+static inline bool fid_is_local_file(const struct lu_fid *fid)
 {
 	return fid_seq_is_local_file(fid_seq(fid));
 }
 
-static inline int fid_seq_is_norm(__u64 seq)
+static inline bool fid_seq_is_norm(__u64 seq)
 {
 	return (seq >= FID_SEQ_NORMAL);
 }
 
-static inline int fid_is_norm(const struct lu_fid *fid)
+static inline bool fid_is_norm(const struct lu_fid *fid)
 {
 	return fid_seq_is_norm(fid_seq(fid));
 }
@@ -769,7 +769,7 @@ static inline int fid_to_ostid(const struct lu_fid *fid, struct ost_id *ostid)
 }
 
 /* Check whether the fid is for LAST_ID */
-static inline int fid_is_last_id(const struct lu_fid *fid)
+static inline bool fid_is_last_id(const struct lu_fid *fid)
 {
 	return (fid_oid(fid) == 0);
 }
@@ -838,7 +838,7 @@ static inline void fid_be_to_cpu(struct lu_fid *dst, const struct lu_fid *src)
 	dst->f_ver = be32_to_cpu(fid_ver(src));
 }
 
-static inline int fid_is_sane(const struct lu_fid *fid)
+static inline bool fid_is_sane(const struct lu_fid *fid)
 {
 	return fid &&
 	       ((fid_seq(fid) >= FID_SEQ_START && fid_ver(fid) == 0) ||
@@ -846,7 +846,7 @@ static inline int fid_is_sane(const struct lu_fid *fid)
 		fid_seq_is_rsvd(fid_seq(fid)));
 }
 
-static inline int fid_is_zero(const struct lu_fid *fid)
+static inline bool fid_is_zero(const struct lu_fid *fid)
 {
 	return fid_seq(fid) == 0 && fid_oid(fid) == 0;
 }
@@ -854,7 +854,7 @@ static inline int fid_is_zero(const struct lu_fid *fid)
 void lustre_swab_lu_fid(struct lu_fid *fid);
 void lustre_swab_lu_seq_range(struct lu_seq_range *range);
 
-static inline int lu_fid_eq(const struct lu_fid *f0, const struct lu_fid *f1)
+static inline bool lu_fid_eq(const struct lu_fid *f0, const struct lu_fid *f1)
 {
 	return memcmp(f0, f1, sizeof(*f0)) == 0;
 }
@@ -1067,13 +1067,13 @@ struct lustre_handle {
 
 #define DEAD_HANDLE_MAGIC 0xdeadbeefcafebabeULL
 
-static inline int lustre_handle_is_used(const struct lustre_handle *lh)
+static inline bool lustre_handle_is_used(const struct lustre_handle *lh)
 {
 	return lh->cookie != 0ull;
 }
 
-static inline int lustre_handle_equal(const struct lustre_handle *lh1,
-				      const struct lustre_handle *lh2)
+static inline bool lustre_handle_equal(const struct lustre_handle *lh1,
+				       const struct lustre_handle *lh2)
 {
 	return lh1->cookie == lh2->cookie;
 }
@@ -2684,8 +2684,8 @@ struct ldlm_res_id {
 #define PLDLMRES(res)	(res)->lr_name.name[0], (res)->lr_name.name[1], \
 			(res)->lr_name.name[2], (res)->lr_name.name[3]
 
-static inline int ldlm_res_eq(const struct ldlm_res_id *res0,
-			      const struct ldlm_res_id *res1)
+static inline bool ldlm_res_eq(const struct ldlm_res_id *res0,
+			       const struct ldlm_res_id *res1)
 {
 	return !memcmp(res0, res1, sizeof(*res0));
 }
diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index a85183b..6f7dc15 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -406,8 +406,8 @@ fid_build_reg_res_name(const struct lu_fid *fid, struct ldlm_res_id *res)
 /*
  * Return true if resource is for object identified by FID.
  */
-static inline int fid_res_name_eq(const struct lu_fid *fid,
-				  const struct ldlm_res_id *res)
+static inline bool fid_res_name_eq(const struct lu_fid *fid,
+				   const struct ldlm_res_id *res)
 {
 	return res->name[LUSTRE_RES_ID_SEQ_OFF] == fid_seq(fid) &&
 	       res->name[LUSTRE_RES_ID_VER_OID_OFF] == fid_ver_oid(fid);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 27/80] staging: lustre: simplify inline functions in lustre_fid.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Several inline functions return a structure that was passed in.
Their is no need for this so just make these function void.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_fid.h |   16 ++++------------
 1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index 6f7dc15..f1d5bbd 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -393,14 +393,12 @@ struct ldlm_namespace;
  * but was moved into name[1] along with the OID to avoid consuming the
  * renaming name[2,3] fields that need to be used for the quota identifier.
  */
-static inline struct ldlm_res_id *
+static inline void
 fid_build_reg_res_name(const struct lu_fid *fid, struct ldlm_res_id *res)
 {
 	memset(res, 0, sizeof(*res));
 	res->name[LUSTRE_RES_ID_SEQ_OFF] = fid_seq(fid);
 	res->name[LUSTRE_RES_ID_VER_OID_OFF] = fid_ver_oid(fid);
-
-	return res;
 }
 
 /*
@@ -416,29 +414,25 @@ static inline bool fid_res_name_eq(const struct lu_fid *fid,
 /*
  * Extract FID from LDLM resource. Reverse of fid_build_reg_res_name().
  */
-static inline struct lu_fid *
+static inline void
 fid_extract_from_res_name(struct lu_fid *fid, const struct ldlm_res_id *res)
 {
 	fid->f_seq = res->name[LUSTRE_RES_ID_SEQ_OFF];
 	fid->f_oid = (__u32)(res->name[LUSTRE_RES_ID_VER_OID_OFF]);
 	fid->f_ver = (__u32)(res->name[LUSTRE_RES_ID_VER_OID_OFF] >> 32);
 	LASSERT(fid_res_name_eq(fid, res));
-
-	return fid;
 }
 
 /*
  * Build (DLM) resource identifier from global quota FID and quota ID.
  */
-static inline struct ldlm_res_id *
+static inline void
 fid_build_quota_res_name(const struct lu_fid *glb_fid, union lquota_id *qid,
 			 struct ldlm_res_id *res)
 {
 	fid_build_reg_res_name(glb_fid, res);
 	res->name[LUSTRE_RES_ID_QUOTA_SEQ_OFF] = fid_seq(&qid->qid_fid);
 	res->name[LUSTRE_RES_ID_QUOTA_VER_OID_OFF] = fid_ver_oid(&qid->qid_fid);
-
-	return res;
 }
 
 /*
@@ -455,14 +449,12 @@ static inline void fid_extract_from_quota_res(struct lu_fid *glb_fid,
 		(__u32)(res->name[LUSTRE_RES_ID_QUOTA_VER_OID_OFF] >> 32);
 }
 
-static inline struct ldlm_res_id *
+static inline void
 fid_build_pdo_res_name(const struct lu_fid *fid, unsigned int hash,
 		       struct ldlm_res_id *res)
 {
 	fid_build_reg_res_name(fid, res);
 	res->name[LUSTRE_RES_ID_HSH_OFF] = hash;
-
-	return res;
 }
 
 /**
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 27/80] staging: lustre: simplify inline functions in lustre_fid.h
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Several inline functions return a structure that was passed in.
Their is no need for this so just make these function void.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/8641
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_fid.h |   16 ++++------------
 1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_fid.h b/drivers/staging/lustre/lustre/include/lustre_fid.h
index 6f7dc15..f1d5bbd 100644
--- a/drivers/staging/lustre/lustre/include/lustre_fid.h
+++ b/drivers/staging/lustre/lustre/include/lustre_fid.h
@@ -393,14 +393,12 @@ struct ldlm_namespace;
  * but was moved into name[1] along with the OID to avoid consuming the
  * renaming name[2,3] fields that need to be used for the quota identifier.
  */
-static inline struct ldlm_res_id *
+static inline void
 fid_build_reg_res_name(const struct lu_fid *fid, struct ldlm_res_id *res)
 {
 	memset(res, 0, sizeof(*res));
 	res->name[LUSTRE_RES_ID_SEQ_OFF] = fid_seq(fid);
 	res->name[LUSTRE_RES_ID_VER_OID_OFF] = fid_ver_oid(fid);
-
-	return res;
 }
 
 /*
@@ -416,29 +414,25 @@ static inline bool fid_res_name_eq(const struct lu_fid *fid,
 /*
  * Extract FID from LDLM resource. Reverse of fid_build_reg_res_name().
  */
-static inline struct lu_fid *
+static inline void
 fid_extract_from_res_name(struct lu_fid *fid, const struct ldlm_res_id *res)
 {
 	fid->f_seq = res->name[LUSTRE_RES_ID_SEQ_OFF];
 	fid->f_oid = (__u32)(res->name[LUSTRE_RES_ID_VER_OID_OFF]);
 	fid->f_ver = (__u32)(res->name[LUSTRE_RES_ID_VER_OID_OFF] >> 32);
 	LASSERT(fid_res_name_eq(fid, res));
-
-	return fid;
 }
 
 /*
  * Build (DLM) resource identifier from global quota FID and quota ID.
  */
-static inline struct ldlm_res_id *
+static inline void
 fid_build_quota_res_name(const struct lu_fid *glb_fid, union lquota_id *qid,
 			 struct ldlm_res_id *res)
 {
 	fid_build_reg_res_name(glb_fid, res);
 	res->name[LUSTRE_RES_ID_QUOTA_SEQ_OFF] = fid_seq(&qid->qid_fid);
 	res->name[LUSTRE_RES_ID_QUOTA_VER_OID_OFF] = fid_ver_oid(&qid->qid_fid);
-
-	return res;
 }
 
 /*
@@ -455,14 +449,12 @@ static inline void fid_extract_from_quota_res(struct lu_fid *glb_fid,
 		(__u32)(res->name[LUSTRE_RES_ID_QUOTA_VER_OID_OFF] >> 32);
 }
 
-static inline struct ldlm_res_id *
+static inline void
 fid_build_pdo_res_name(const struct lu_fid *fid, unsigned int hash,
 		       struct ldlm_res_id *res)
 {
 	fid_build_reg_res_name(fid, res);
 	res->name[LUSTRE_RES_ID_HSH_OFF] = hash;
-
-	return res;
 }
 
 /**
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 28/80] staging: lustre: lmv: access lum_stripe_offset as little endian
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

By the time that a struct lmv_user_md reaches lmv_placement_policy()
it has already been converted to little endian. Therefore use the
appropriate macros around accesses to this this field. This issue was
found by rewriting the definition of struct lmv_user_md to use the
__leXX typedefs and running sparse.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4738
Reviewed-on: http://review.whamcloud.com/9671
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Swapnil Pimpale <spimpale@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index c005a66..5929994 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1242,15 +1242,15 @@ static int lmv_placement_policy(struct obd_device *obd,
 		struct lmv_user_md *lum;
 
 		lum = op_data->op_data;
-		if (lum->lum_stripe_offset != (__u32)-1) {
-			*mds = lum->lum_stripe_offset;
+		if (le32_to_cpu(lum->lum_stripe_offset) != (__u32)-1) {
+			*mds = le32_to_cpu(lum->lum_stripe_offset);
 		} else {
 			/*
 			 * -1 means default, which will be in the same MDT with
 			 * the stripe
 			 */
 			*mds = op_data->op_mds;
-			lum->lum_stripe_offset = op_data->op_mds;
+			lum->lum_stripe_offset = cpu_to_le32(op_data->op_mds);
 		}
 	} else {
 		/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 28/80] staging: lustre: lmv: access lum_stripe_offset as little endian
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

By the time that a struct lmv_user_md reaches lmv_placement_policy()
it has already been converted to little endian. Therefore use the
appropriate macros around accesses to this this field. This issue was
found by rewriting the definition of struct lmv_user_md to use the
__leXX typedefs and running sparse.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4738
Reviewed-on: http://review.whamcloud.com/9671
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Swapnil Pimpale <spimpale@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index c005a66..5929994 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1242,15 +1242,15 @@ static int lmv_placement_policy(struct obd_device *obd,
 		struct lmv_user_md *lum;
 
 		lum = op_data->op_data;
-		if (lum->lum_stripe_offset != (__u32)-1) {
-			*mds = lum->lum_stripe_offset;
+		if (le32_to_cpu(lum->lum_stripe_offset) != (__u32)-1) {
+			*mds = le32_to_cpu(lum->lum_stripe_offset);
 		} else {
 			/*
 			 * -1 means default, which will be in the same MDT with
 			 * the stripe
 			 */
 			*mds = op_data->op_mds;
-			lum->lum_stripe_offset = op_data->op_mds;
+			lum->lum_stripe_offset = cpu_to_le32(op_data->op_mds);
 		}
 	} else {
 		/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 29/80] staging: lustre: lmv: lookup remote migrating object in LMV
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

If remote object is being found in a migrating directory,
it should continue to lookup the object in remote MDT,
instead of return.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4805
Reviewed-on: http://review.whamcloud.com/9806
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 51b7048..a38d343 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -471,7 +471,6 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		it->it_disposition &= ~DISP_ENQ_COMPLETE;
 		rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 				    flags, reqp, cb_blocking, extra_lock_flags);
-		return rc;
 	}
 
 	/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 29/80] staging: lustre: lmv: lookup remote migrating object in LMV
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

If remote object is being found in a migrating directory,
it should continue to lookup the object in remote MDT,
instead of return.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4805
Reviewed-on: http://review.whamcloud.com/9806
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 51b7048..a38d343 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -471,7 +471,6 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		it->it_disposition &= ~DISP_ENQ_COMPLETE;
 		rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 				    flags, reqp, cb_blocking, extra_lock_flags);
-		return rc;
 	}
 
 	/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 30/80] staging: lustre: lmv: Ensure lmv_intent_lookup cleans up reqp
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Nathaniel Clark, James Simmons

From: Nathaniel Clark <nathaniel.l.clark@intel.com>

Ensure there aren't invalid pointers hanging around after
ptlrpc_req_finished is called.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4826
Reviewed-on: http://review.whamcloud.com/9841
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index a38d343..d7e165f 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -464,6 +464,9 @@ static int lmv_intent_lookup(struct obd_export *exp,
 			return PTR_ERR(tgt);
 
 		ptlrpc_req_finished(*reqp);
+		it->it_request = NULL;
+		*reqp = NULL;
+
 		CDEBUG(D_INODE, "For migrating dir, try target dir "DFID"\n",
 		       PFID(&lsm->lsm_md_oinfo[1].lmo_fid));
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 30/80] staging: lustre: lmv: Ensure lmv_intent_lookup cleans up reqp
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Nathaniel Clark, James Simmons

From: Nathaniel Clark <nathaniel.l.clark@intel.com>

Ensure there aren't invalid pointers hanging around after
ptlrpc_req_finished is called.

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4826
Reviewed-on: http://review.whamcloud.com/9841
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index a38d343..d7e165f 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -464,6 +464,9 @@ static int lmv_intent_lookup(struct obd_export *exp,
 			return PTR_ERR(tgt);
 
 		ptlrpc_req_finished(*reqp);
+		it->it_request = NULL;
+		*reqp = NULL;
+
 		CDEBUG(D_INODE, "For migrating dir, try target dir "DFID"\n",
 		       PFID(&lsm->lsm_md_oinfo[1].lmo_fid));
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 31/80] staging: lustre: llite: avoid a deadlock in page write
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

For a partial page write, it will have to issue a READ RPC firstly
to get a full uptodate page. If another page is already locked by
this thread it can easily cause deadlock.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4873
Reviewed-on: http://review.whamcloud.com/9928
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/rw26.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index d98c7ac..c14a1b6 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -506,8 +506,9 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 	env = lcc->lcc_env;
 	io  = lcc->lcc_io;
 
-	/* To avoid deadlock, try to lock page first. */
-	vmpage = grab_cache_page_nowait(mapping, index);
+	if (likely(to == PAGE_SIZE)) /* LU-4873 */
+		/* To avoid deadlock, try to lock page first. */
+		vmpage = grab_cache_page_nowait(mapping, index);
 	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
 		struct vvp_io *vio = vvp_env_io(env);
 		struct cl_page_list *plist = &vio->u.write.vui_queue;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 31/80] staging: lustre: llite: avoid a deadlock in page write
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

For a partial page write, it will have to issue a READ RPC firstly
to get a full uptodate page. If another page is already locked by
this thread it can easily cause deadlock.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4873
Reviewed-on: http://review.whamcloud.com/9928
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/rw26.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index d98c7ac..c14a1b6 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -506,8 +506,9 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 	env = lcc->lcc_env;
 	io  = lcc->lcc_io;
 
-	/* To avoid deadlock, try to lock page first. */
-	vmpage = grab_cache_page_nowait(mapping, index);
+	if (likely(to == PAGE_SIZE)) /* LU-4873 */
+		/* To avoid deadlock, try to lock page first. */
+		vmpage = grab_cache_page_nowait(mapping, index);
 	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
 		struct vvp_io *vio = vvp_env_io(env);
 		struct cl_page_list *plist = &vio->u.write.vui_queue;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 32/80] staging: lustre: lov: handle the case of stripe size is not power 2
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

Calculate the end of current stripe correctly when the stripe size
is not power 2.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4860
Reviewed-on: http://review.whamcloud.com/9882
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_page.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index c17026f..45b5ae9 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -65,7 +65,9 @@ static int lov_raid0_page_is_under_lock(const struct lu_env *env,
 	pgoff_t index = *max_index;
 	unsigned int pps; /* pages per stripe */
 
-	CDEBUG(D_READA, "*max_index = %lu, nr = %d\n", index, r0->lo_nr);
+	CDEBUG(D_READA, DFID "*max_index = %lu, nr = %d\n",
+	       PFID(lu_object_fid(lov2lu(loo))), index, r0->lo_nr);
+
 	if (index == 0) /* the page is not covered by any lock */
 		return 0;
 
@@ -80,7 +82,12 @@ static int lov_raid0_page_is_under_lock(const struct lu_env *env,
 
 	/* calculate the end of current stripe */
 	pps = loo->lo_lsm->lsm_stripe_size >> PAGE_SHIFT;
-	index = ((slice->cpl_index + pps) & ~(pps - 1)) - 1;
+	index = slice->cpl_index + pps - slice->cpl_index % pps - 1;
+
+	CDEBUG(D_READA, DFID "*max_index = %lu, index = %lu, pps = %u, stripe_size = %u, stripe no = %u, page index = %lu\n",
+	       PFID(lu_object_fid(lov2lu(loo))), *max_index, index, pps,
+	       loo->lo_lsm->lsm_stripe_size, lov_page_stripe(slice->cpl_page),
+	       slice->cpl_index);
 
 	/* never exceed the end of the stripe */
 	*max_index = min_t(pgoff_t, *max_index, index);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 32/80] staging: lustre: lov: handle the case of stripe size is not power 2
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

Calculate the end of current stripe correctly when the stripe size
is not power 2.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4860
Reviewed-on: http://review.whamcloud.com/9882
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_page.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index c17026f..45b5ae9 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -65,7 +65,9 @@ static int lov_raid0_page_is_under_lock(const struct lu_env *env,
 	pgoff_t index = *max_index;
 	unsigned int pps; /* pages per stripe */
 
-	CDEBUG(D_READA, "*max_index = %lu, nr = %d\n", index, r0->lo_nr);
+	CDEBUG(D_READA, DFID "*max_index = %lu, nr = %d\n",
+	       PFID(lu_object_fid(lov2lu(loo))), index, r0->lo_nr);
+
 	if (index == 0) /* the page is not covered by any lock */
 		return 0;
 
@@ -80,7 +82,12 @@ static int lov_raid0_page_is_under_lock(const struct lu_env *env,
 
 	/* calculate the end of current stripe */
 	pps = loo->lo_lsm->lsm_stripe_size >> PAGE_SHIFT;
-	index = ((slice->cpl_index + pps) & ~(pps - 1)) - 1;
+	index = slice->cpl_index + pps - slice->cpl_index % pps - 1;
+
+	CDEBUG(D_READA, DFID "*max_index = %lu, index = %lu, pps = %u, stripe_size = %u, stripe no = %u, page index = %lu\n",
+	       PFID(lu_object_fid(lov2lu(loo))), *max_index, index, pps,
+	       loo->lo_lsm->lsm_stripe_size, lov_page_stripe(slice->cpl_page),
+	       slice->cpl_index);
 
 	/* never exceed the end of the stripe */
 	*max_index = min_t(pgoff_t, *max_index, index);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 33/80] staging: lustre: lmv: cleanup req in lmv_getattr_name()
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

In lmv_getattr_name() don't return a freed request in the error path.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4826
Reviewed-on: http://review.whamcloud.com/9863
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 5929994..f1b8ae9 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1920,6 +1920,7 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		tgt = lmv_find_target(lmv, &rid);
 		if (IS_ERR(tgt)) {
 			ptlrpc_req_finished(*request);
+			*request = NULL;
 			return PTR_ERR(tgt);
 		}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 33/80] staging: lustre: lmv: cleanup req in lmv_getattr_name()
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

In lmv_getattr_name() don't return a freed request in the error path.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4826
Reviewed-on: http://review.whamcloud.com/9863
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 5929994..f1b8ae9 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1920,6 +1920,7 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		tgt = lmv_find_target(lmv, &rid);
 		if (IS_ERR(tgt)) {
 			ptlrpc_req_finished(*request);
+			*request = NULL;
 			return PTR_ERR(tgt);
 		}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 34/80] staging: lustre: lmv: rename request to preq in lmv_getattr_name()
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Rename request to preq.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4826
Reviewed-on: http://review.whamcloud.com/9863
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index f1b8ae9..d07fd17 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1883,7 +1883,7 @@ lmv_enqueue(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 
 static int
 lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
-		 struct ptlrpc_request **request)
+		 struct ptlrpc_request **preq)
 {
 	struct ptlrpc_request   *req = NULL;
 	struct obd_device       *obd = exp->exp_obd;
@@ -1904,13 +1904,11 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
 	       tgt->ltd_idx);
 
-	rc = md_getattr_name(tgt->ltd_exp, op_data, request);
+	rc = md_getattr_name(tgt->ltd_exp, op_data, preq);
 	if (rc != 0)
 		return rc;
 
-	body = req_capsule_server_get(&(*request)->rq_pill,
-				      &RMF_MDT_BODY);
-
+	body = req_capsule_server_get(&(*preq)->rq_pill, &RMF_MDT_BODY);
 	if (body->valid & OBD_MD_MDS) {
 		struct lu_fid rid = body->fid1;
 
@@ -1919,8 +1917,8 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 
 		tgt = lmv_find_target(lmv, &rid);
 		if (IS_ERR(tgt)) {
-			ptlrpc_req_finished(*request);
-			*request = NULL;
+			ptlrpc_req_finished(*preq);
+			*preq = NULL;
 			return PTR_ERR(tgt);
 		}
 
@@ -1929,8 +1927,8 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		op_data->op_namelen = 0;
 		op_data->op_name = NULL;
 		rc = md_getattr_name(tgt->ltd_exp, op_data, &req);
-		ptlrpc_req_finished(*request);
-		*request = req;
+		ptlrpc_req_finished(*preq);
+		*preq = req;
 	}
 
 	return rc;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 34/80] staging: lustre: lmv: rename request to preq in lmv_getattr_name()
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Rename request to preq.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4826
Reviewed-on: http://review.whamcloud.com/9863
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |   16 +++++++---------
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index f1b8ae9..d07fd17 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1883,7 +1883,7 @@ lmv_enqueue(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 
 static int
 lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
-		 struct ptlrpc_request **request)
+		 struct ptlrpc_request **preq)
 {
 	struct ptlrpc_request   *req = NULL;
 	struct obd_device       *obd = exp->exp_obd;
@@ -1904,13 +1904,11 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
 	       tgt->ltd_idx);
 
-	rc = md_getattr_name(tgt->ltd_exp, op_data, request);
+	rc = md_getattr_name(tgt->ltd_exp, op_data, preq);
 	if (rc != 0)
 		return rc;
 
-	body = req_capsule_server_get(&(*request)->rq_pill,
-				      &RMF_MDT_BODY);
-
+	body = req_capsule_server_get(&(*preq)->rq_pill, &RMF_MDT_BODY);
 	if (body->valid & OBD_MD_MDS) {
 		struct lu_fid rid = body->fid1;
 
@@ -1919,8 +1917,8 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 
 		tgt = lmv_find_target(lmv, &rid);
 		if (IS_ERR(tgt)) {
-			ptlrpc_req_finished(*request);
-			*request = NULL;
+			ptlrpc_req_finished(*preq);
+			*preq = NULL;
 			return PTR_ERR(tgt);
 		}
 
@@ -1929,8 +1927,8 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		op_data->op_namelen = 0;
 		op_data->op_name = NULL;
 		rc = md_getattr_name(tgt->ltd_exp, op_data, &req);
-		ptlrpc_req_finished(*request);
-		*request = req;
+		ptlrpc_req_finished(*preq);
+		*preq = req;
 	}
 
 	return rc;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 35/80] staging: lustre: obdclass: unified flow control interfaces
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

Unify the flow control interfaces for MDC RPC and FLD RPC.
We allow to adjust the maximum inflight RPCs count via /sys
interface.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4687
Reviewed-on: http://review.whamcloud.com/9562
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/fld/fld_request.c    |   55 +--------
 drivers/staging/lustre/lustre/include/lustre_mdc.h |    5 -
 drivers/staging/lustre/lustre/include/obd.h        |   14 +--
 drivers/staging/lustre/lustre/include/obd_class.h  |    5 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |    4 +-
 drivers/staging/lustre/lustre/mdc/lproc_mdc.c      |   17 +--
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    2 -
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   64 ----------
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   10 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    6 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |  132 ++++++++++++++++++++
 11 files changed, 161 insertions(+), 153 deletions(-)

diff --git a/drivers/staging/lustre/lustre/fld/fld_request.c b/drivers/staging/lustre/lustre/fld/fld_request.c
index e59d626..ed7962e 100644
--- a/drivers/staging/lustre/lustre/fld/fld_request.c
+++ b/drivers/staging/lustre/lustre/fld/fld_request.c
@@ -53,57 +53,6 @@
 #include "../include/lustre_mdc.h"
 #include "fld_internal.h"
 
-/* TODO: these 3 functions are copies of flow-control code from mdc_lib.c
- * It should be common thing. The same about mdc RPC lock
- */
-static int fld_req_avail(struct client_obd *cli, struct mdc_cache_waiter *mcw)
-{
-	int rc;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	rc = list_empty(&mcw->mcw_entry);
-	spin_unlock(&cli->cl_loi_list_lock);
-	return rc;
-};
-
-static void fld_enter_request(struct client_obd *cli)
-{
-	struct mdc_cache_waiter mcw;
-	struct l_wait_info lwi = { 0 };
-
-	spin_lock(&cli->cl_loi_list_lock);
-	if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-		list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
-		init_waitqueue_head(&mcw.mcw_waitq);
-		spin_unlock(&cli->cl_loi_list_lock);
-		l_wait_event(mcw.mcw_waitq, fld_req_avail(cli, &mcw), &lwi);
-	} else {
-		cli->cl_r_in_flight++;
-		spin_unlock(&cli->cl_loi_list_lock);
-	}
-}
-
-static void fld_exit_request(struct client_obd *cli)
-{
-	struct list_head *l, *tmp;
-	struct mdc_cache_waiter *mcw;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_r_in_flight--;
-	list_for_each_safe(l, tmp, &cli->cl_cache_waiters) {
-		if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-			/* No free request slots anymore */
-			break;
-		}
-
-		mcw = list_entry(l, struct mdc_cache_waiter, mcw_entry);
-		list_del_init(&mcw->mcw_entry);
-		cli->cl_r_in_flight++;
-		wake_up(&mcw->mcw_waitq);
-	}
-	spin_unlock(&cli->cl_loi_list_lock);
-}
-
 static int fld_rrb_hash(struct lu_client_fld *fld, u64 seq)
 {
 	LASSERT(fld->lcf_count > 0);
@@ -439,9 +388,9 @@ int fld_client_rpc(struct obd_export *exp,
 	req->rq_reply_portal = MDC_REPLY_PORTAL;
 	ptlrpc_at_set_req_timeout(req);
 
-	fld_enter_request(&exp->exp_obd->u.cli);
+	obd_get_request_slot(&exp->exp_obd->u.cli);
 	rc = ptlrpc_queue_wait(req);
-	fld_exit_request(&exp->exp_obd->u.cli);
+	obd_put_request_slot(&exp->exp_obd->u.cli);
 	if (rc)
 		goto out_req;
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h b/drivers/staging/lustre/lustre/include/lustre_mdc.h
index 0a8c639..bf6f87a 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
@@ -179,11 +179,6 @@ static inline void mdc_update_max_ea_from_body(struct obd_export *exp,
 	}
 }
 
-struct mdc_cache_waiter {
-	struct list_head	      mcw_entry;
-	wait_queue_head_t	     mcw_waitq;
-};
-
 /* mdc/mdc_locks.c */
 int it_open_error(int phase, struct lookup_intent *it);
 
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index f5eeb05..cacd472 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -211,11 +211,12 @@ struct timeout_item {
 	struct list_head	 ti_chain;
 };
 
-#define OSC_MAX_RIF_DEFAULT       8
-#define OSC_MAX_RIF_MAX	 256
-#define OSC_MAX_DIRTY_DEFAULT  (OSC_MAX_RIF_DEFAULT * 4)
-#define OSC_MAX_DIRTY_MB_MAX   2048     /* arbitrary, but < MAX_LONG bytes */
-#define OSC_DEFAULT_RESENDS      10
+#define OBD_MAX_RIF_DEFAULT	8
+#define OBD_MAX_RIF_MAX		512
+#define OSC_MAX_RIF_MAX		256
+#define OSC_MAX_DIRTY_DEFAULT	(OBD_MAX_RIF_DEFAULT * 4)
+#define OSC_MAX_DIRTY_MB_MAX	2048	/* arbitrary, but < MAX_LONG bytes */
+#define OSC_DEFAULT_RESENDS	10
 
 /* possible values for fo_sync_lock_cancel */
 enum {
@@ -225,9 +226,6 @@ enum {
 	NUM_SYNC_ON_CANCEL_STATES
 };
 
-#define MDC_MAX_RIF_DEFAULT       8
-#define MDC_MAX_RIF_MAX	 512
-
 enum obd_cl_sem_lock_class {
 	OBD_CLI_SEM_NORMAL,
 	OBD_CLI_SEM_MGC,
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 2f111a8..de808ee 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -97,6 +97,11 @@ int obd_zombie_impexp_init(void);
 void obd_zombie_impexp_stop(void);
 void obd_zombie_barrier(void);
 
+int obd_get_request_slot(struct client_obd *cli);
+void obd_put_request_slot(struct client_obd *cli);
+__u32 obd_get_max_rpcs_in_flight(struct client_obd *cli);
+int obd_set_max_rpcs_in_flight(struct client_obd *cli, __u32 max);
+
 struct llog_handle;
 struct llog_rec_hdr;
 typedef int (*llog_cb_t)(const struct lu_env *, struct llog_handle *,
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index 7c832aa..ee40006 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -360,7 +360,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	cli->cl_chunkbits = PAGE_SHIFT;
 
 	if (!strcmp(name, LUSTRE_MDC_NAME)) {
-		cli->cl_max_rpcs_in_flight = MDC_MAX_RIF_DEFAULT;
+		cli->cl_max_rpcs_in_flight = OBD_MAX_RIF_DEFAULT;
 	} else if (totalram_pages >> (20 - PAGE_SHIFT) <= 128 /* MB */) {
 		cli->cl_max_rpcs_in_flight = 2;
 	} else if (totalram_pages >> (20 - PAGE_SHIFT) <= 256 /* MB */) {
@@ -368,7 +368,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	} else if (totalram_pages >> (20 - PAGE_SHIFT) <= 512 /* MB */) {
 		cli->cl_max_rpcs_in_flight = 4;
 	} else {
-		cli->cl_max_rpcs_in_flight = OSC_MAX_RIF_DEFAULT;
+		cli->cl_max_rpcs_in_flight = OBD_MAX_RIF_DEFAULT;
 	}
 	rc = ldlm_get_ref();
 	if (rc) {
diff --git a/drivers/staging/lustre/lustre/mdc/lproc_mdc.c b/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
index 98d15fb..fca9450 100644
--- a/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
+++ b/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
@@ -43,11 +43,10 @@ static ssize_t max_rpcs_in_flight_show(struct kobject *kobj,
 	int len;
 	struct obd_device *dev = container_of(kobj, struct obd_device,
 					      obd_kobj);
-	struct client_obd *cli = &dev->u.cli;
+	__u32 max;
 
-	spin_lock(&cli->cl_loi_list_lock);
-	len = sprintf(buf, "%u\n", cli->cl_max_rpcs_in_flight);
-	spin_unlock(&cli->cl_loi_list_lock);
+	max = obd_get_max_rpcs_in_flight(&dev->u.cli);
+	len = sprintf(buf, "%u\n", max);
 
 	return len;
 }
@@ -59,7 +58,6 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 {
 	struct obd_device *dev = container_of(kobj, struct obd_device,
 					      obd_kobj);
-	struct client_obd *cli = &dev->u.cli;
 	int rc;
 	unsigned long val;
 
@@ -67,12 +65,9 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 	if (rc)
 		return rc;
 
-	if (val < 1 || val > MDC_MAX_RIF_MAX)
-		return -ERANGE;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_max_rpcs_in_flight = val;
-	spin_unlock(&cli->cl_loi_list_lock);
+	rc = obd_set_max_rpcs_in_flight(&dev->u.cli, val);
+	if (rc)
+		count = rc;
 
 	return count;
 }
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 58f2841..53b4063 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -61,8 +61,6 @@ void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data);
 void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 		     const char *old, int oldlen, const char *new, int newlen);
 void mdc_close_pack(struct ptlrpc_request *req, struct md_op_data *op_data);
-int mdc_enter_request(struct client_obd *cli);
-void mdc_exit_request(struct client_obd *cli);
 
 /* mdc/mdc_locks.c */
 int mdc_set_lock_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 95c4550..b532623 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -484,67 +484,3 @@ void mdc_close_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 	mdc_ioepoch_pack(epoch, op_data);
 	mdc_hsm_release_pack(req, op_data);
 }
-
-static int mdc_req_avail(struct client_obd *cli, struct mdc_cache_waiter *mcw)
-{
-	int rc;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	rc = list_empty(&mcw->mcw_entry);
-	spin_unlock(&cli->cl_loi_list_lock);
-	return rc;
-};
-
-/* We record requests in flight in cli->cl_r_in_flight here.
- * There is only one write rpc possible in mdc anyway. If this to change
- * in the future - the code may need to be revisited.
- */
-int mdc_enter_request(struct client_obd *cli)
-{
-	int rc = 0;
-	struct mdc_cache_waiter mcw;
-	struct l_wait_info lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
-
-	spin_lock(&cli->cl_loi_list_lock);
-	if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-		list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
-		init_waitqueue_head(&mcw.mcw_waitq);
-		spin_unlock(&cli->cl_loi_list_lock);
-		rc = l_wait_event(mcw.mcw_waitq, mdc_req_avail(cli, &mcw),
-				  &lwi);
-		if (rc) {
-			spin_lock(&cli->cl_loi_list_lock);
-			if (list_empty(&mcw.mcw_entry))
-				cli->cl_r_in_flight--;
-			list_del_init(&mcw.mcw_entry);
-			spin_unlock(&cli->cl_loi_list_lock);
-		}
-	} else {
-		cli->cl_r_in_flight++;
-		spin_unlock(&cli->cl_loi_list_lock);
-	}
-	return rc;
-}
-
-void mdc_exit_request(struct client_obd *cli)
-{
-	struct list_head *l, *tmp;
-	struct mdc_cache_waiter *mcw;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_r_in_flight--;
-	list_for_each_safe(l, tmp, &cli->cl_cache_waiters) {
-		if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-			/* No free request slots anymore */
-			break;
-		}
-
-		mcw = list_entry(l, struct mdc_cache_waiter, mcw_entry);
-		list_del_init(&mcw->mcw_entry);
-		cli->cl_r_in_flight++;
-		wake_up(&mcw->mcw_waitq);
-	}
-	/* Empty waiting list? Decrease reqs in-flight number */
-
-	spin_unlock(&cli->cl_loi_list_lock);
-}
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 626fce5..d8406d5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -809,7 +809,7 @@ resend:
 	 */
 	if (it) {
 		mdc_get_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
-		rc = mdc_enter_request(&obddev->u.cli);
+		rc = obd_get_request_slot(&obddev->u.cli);
 		if (rc != 0) {
 			mdc_put_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
 			mdc_clear_replay_flag(req, 0);
@@ -837,7 +837,7 @@ resend:
 		return rc;
 	}
 
-	mdc_exit_request(&obddev->u.cli);
+	obd_put_request_slot(&obddev->u.cli);
 	mdc_put_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
 
 	if (rc < 0) {
@@ -1179,7 +1179,7 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 
 	obddev = class_exp2obd(exp);
 
-	mdc_exit_request(&obddev->u.cli);
+	obd_put_request_slot(&obddev->u.cli);
 	if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GETATTR_ENQUEUE))
 		rc = -ETIMEDOUT;
 
@@ -1239,7 +1239,7 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	rc = mdc_enter_request(&obddev->u.cli);
+	rc = obd_get_request_slot(&obddev->u.cli);
 	if (rc != 0) {
 		ptlrpc_req_finished(req);
 		return rc;
@@ -1248,7 +1248,7 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 	rc = ldlm_cli_enqueue(exp, &req, einfo, &res_id, &policy, &flags, NULL,
 			      0, LVB_T_NONE, &minfo->mi_lockh, 1);
 	if (rc < 0) {
-		mdc_exit_request(&obddev->u.cli);
+		obd_put_request_slot(&obddev->u.cli);
 		ptlrpc_req_finished(req);
 		return rc;
 	}
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 030295f..558f33b 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -58,16 +58,16 @@ static inline int mdc_queue_wait(struct ptlrpc_request *req)
 	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
 	int rc;
 
-	/* mdc_enter_request() ensures that this client has no more
+	/* obd_get_request_slot() ensures that this client has no more
 	 * than cl_max_rpcs_in_flight RPCs simultaneously inf light
 	 * against an MDT.
 	 */
-	rc = mdc_enter_request(cli);
+	rc = obd_get_request_slot(cli);
 	if (rc != 0)
 		return rc;
 
 	rc = ptlrpc_queue_wait(req);
-	mdc_exit_request(cli);
+	obd_put_request_slot(cli);
 
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index 99c2da6..be25434 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -1312,3 +1312,135 @@ void obd_zombie_impexp_stop(void)
 	obd_zombie_impexp_notify();
 	wait_for_completion(&obd_zombie_stop);
 }
+
+struct obd_request_slot_waiter {
+	struct list_head	orsw_entry;
+	wait_queue_head_t	orsw_waitq;
+	bool			orsw_signaled;
+};
+
+static bool obd_request_slot_avail(struct client_obd *cli,
+				   struct obd_request_slot_waiter *orsw)
+{
+	bool avail;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	avail = !!list_empty(&orsw->orsw_entry);
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return avail;
+};
+
+/*
+ * For network flow control, the RPC sponsor needs to acquire a credit
+ * before sending the RPC. The credits count for a connection is defined
+ * by the "cl_max_rpcs_in_flight". If all the credits are occpuied, then
+ * the subsequent RPC sponsors need to wait until others released their
+ * credits, or the administrator increased the "cl_max_rpcs_in_flight".
+ */
+int obd_get_request_slot(struct client_obd *cli)
+{
+	struct obd_request_slot_waiter orsw;
+	struct l_wait_info lwi;
+	int rc;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	if (cli->cl_r_in_flight < cli->cl_max_rpcs_in_flight) {
+		cli->cl_r_in_flight++;
+		spin_unlock(&cli->cl_loi_list_lock);
+		return 0;
+	}
+
+	init_waitqueue_head(&orsw.orsw_waitq);
+	list_add_tail(&orsw.orsw_entry, &cli->cl_loi_read_list);
+	orsw.orsw_signaled = false;
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
+	rc = l_wait_event(orsw.orsw_waitq,
+			  obd_request_slot_avail(cli, &orsw) ||
+			  orsw.orsw_signaled,
+			  &lwi);
+
+	/*
+	 * Here, we must take the lock to avoid the on-stack 'orsw' to be
+	 * freed but other (such as obd_put_request_slot) is using it.
+	 */
+	spin_lock(&cli->cl_loi_list_lock);
+	if (rc) {
+		if (!orsw.orsw_signaled) {
+			if (list_empty(&orsw.orsw_entry))
+				cli->cl_r_in_flight--;
+			else
+				list_del(&orsw.orsw_entry);
+		}
+	}
+
+	if (orsw.orsw_signaled) {
+		LASSERT(list_empty(&orsw.orsw_entry));
+
+		rc = -EINTR;
+	}
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return rc;
+}
+EXPORT_SYMBOL(obd_get_request_slot);
+
+void obd_put_request_slot(struct client_obd *cli)
+{
+	struct obd_request_slot_waiter *orsw;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	cli->cl_r_in_flight--;
+
+	/* If there is free slot, wakeup the first waiter. */
+	if (!list_empty(&cli->cl_loi_read_list) &&
+	    likely(cli->cl_r_in_flight < cli->cl_max_rpcs_in_flight)) {
+		orsw = list_entry(cli->cl_loi_read_list.next,
+				  struct obd_request_slot_waiter, orsw_entry);
+		list_del_init(&orsw->orsw_entry);
+		cli->cl_r_in_flight++;
+		wake_up(&orsw->orsw_waitq);
+	}
+	spin_unlock(&cli->cl_loi_list_lock);
+}
+EXPORT_SYMBOL(obd_put_request_slot);
+
+__u32 obd_get_max_rpcs_in_flight(struct client_obd *cli)
+{
+	return cli->cl_max_rpcs_in_flight;
+}
+EXPORT_SYMBOL(obd_get_max_rpcs_in_flight);
+
+int obd_set_max_rpcs_in_flight(struct client_obd *cli, __u32 max)
+{
+	struct obd_request_slot_waiter *orsw;
+	__u32 old;
+	int diff;
+	int i;
+
+	if (max > OBD_MAX_RIF_MAX || max < 1)
+		return -ERANGE;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	old = cli->cl_max_rpcs_in_flight;
+	cli->cl_max_rpcs_in_flight = max;
+	diff = max - old;
+
+	/* We increase the max_rpcs_in_flight, then wakeup some waiters. */
+	for (i = 0; i < diff; i++) {
+		if (list_empty(&cli->cl_loi_read_list))
+			break;
+
+		orsw = list_entry(cli->cl_loi_read_list.next,
+				  struct obd_request_slot_waiter, orsw_entry);
+		list_del_init(&orsw->orsw_entry);
+		cli->cl_r_in_flight++;
+		wake_up(&orsw->orsw_waitq);
+	}
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(obd_set_max_rpcs_in_flight);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 35/80] staging: lustre: obdclass: unified flow control interfaces
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

Unify the flow control interfaces for MDC RPC and FLD RPC.
We allow to adjust the maximum inflight RPCs count via /sys
interface.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4687
Reviewed-on: http://review.whamcloud.com/9562
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/fld/fld_request.c    |   55 +--------
 drivers/staging/lustre/lustre/include/lustre_mdc.h |    5 -
 drivers/staging/lustre/lustre/include/obd.h        |   14 +--
 drivers/staging/lustre/lustre/include/obd_class.h  |    5 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |    4 +-
 drivers/staging/lustre/lustre/mdc/lproc_mdc.c      |   17 +--
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    2 -
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   64 ----------
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   10 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    6 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |  132 ++++++++++++++++++++
 11 files changed, 161 insertions(+), 153 deletions(-)

diff --git a/drivers/staging/lustre/lustre/fld/fld_request.c b/drivers/staging/lustre/lustre/fld/fld_request.c
index e59d626..ed7962e 100644
--- a/drivers/staging/lustre/lustre/fld/fld_request.c
+++ b/drivers/staging/lustre/lustre/fld/fld_request.c
@@ -53,57 +53,6 @@
 #include "../include/lustre_mdc.h"
 #include "fld_internal.h"
 
-/* TODO: these 3 functions are copies of flow-control code from mdc_lib.c
- * It should be common thing. The same about mdc RPC lock
- */
-static int fld_req_avail(struct client_obd *cli, struct mdc_cache_waiter *mcw)
-{
-	int rc;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	rc = list_empty(&mcw->mcw_entry);
-	spin_unlock(&cli->cl_loi_list_lock);
-	return rc;
-};
-
-static void fld_enter_request(struct client_obd *cli)
-{
-	struct mdc_cache_waiter mcw;
-	struct l_wait_info lwi = { 0 };
-
-	spin_lock(&cli->cl_loi_list_lock);
-	if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-		list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
-		init_waitqueue_head(&mcw.mcw_waitq);
-		spin_unlock(&cli->cl_loi_list_lock);
-		l_wait_event(mcw.mcw_waitq, fld_req_avail(cli, &mcw), &lwi);
-	} else {
-		cli->cl_r_in_flight++;
-		spin_unlock(&cli->cl_loi_list_lock);
-	}
-}
-
-static void fld_exit_request(struct client_obd *cli)
-{
-	struct list_head *l, *tmp;
-	struct mdc_cache_waiter *mcw;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_r_in_flight--;
-	list_for_each_safe(l, tmp, &cli->cl_cache_waiters) {
-		if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-			/* No free request slots anymore */
-			break;
-		}
-
-		mcw = list_entry(l, struct mdc_cache_waiter, mcw_entry);
-		list_del_init(&mcw->mcw_entry);
-		cli->cl_r_in_flight++;
-		wake_up(&mcw->mcw_waitq);
-	}
-	spin_unlock(&cli->cl_loi_list_lock);
-}
-
 static int fld_rrb_hash(struct lu_client_fld *fld, u64 seq)
 {
 	LASSERT(fld->lcf_count > 0);
@@ -439,9 +388,9 @@ int fld_client_rpc(struct obd_export *exp,
 	req->rq_reply_portal = MDC_REPLY_PORTAL;
 	ptlrpc_at_set_req_timeout(req);
 
-	fld_enter_request(&exp->exp_obd->u.cli);
+	obd_get_request_slot(&exp->exp_obd->u.cli);
 	rc = ptlrpc_queue_wait(req);
-	fld_exit_request(&exp->exp_obd->u.cli);
+	obd_put_request_slot(&exp->exp_obd->u.cli);
 	if (rc)
 		goto out_req;
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h b/drivers/staging/lustre/lustre/include/lustre_mdc.h
index 0a8c639..bf6f87a 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
@@ -179,11 +179,6 @@ static inline void mdc_update_max_ea_from_body(struct obd_export *exp,
 	}
 }
 
-struct mdc_cache_waiter {
-	struct list_head	      mcw_entry;
-	wait_queue_head_t	     mcw_waitq;
-};
-
 /* mdc/mdc_locks.c */
 int it_open_error(int phase, struct lookup_intent *it);
 
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index f5eeb05..cacd472 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -211,11 +211,12 @@ struct timeout_item {
 	struct list_head	 ti_chain;
 };
 
-#define OSC_MAX_RIF_DEFAULT       8
-#define OSC_MAX_RIF_MAX	 256
-#define OSC_MAX_DIRTY_DEFAULT  (OSC_MAX_RIF_DEFAULT * 4)
-#define OSC_MAX_DIRTY_MB_MAX   2048     /* arbitrary, but < MAX_LONG bytes */
-#define OSC_DEFAULT_RESENDS      10
+#define OBD_MAX_RIF_DEFAULT	8
+#define OBD_MAX_RIF_MAX		512
+#define OSC_MAX_RIF_MAX		256
+#define OSC_MAX_DIRTY_DEFAULT	(OBD_MAX_RIF_DEFAULT * 4)
+#define OSC_MAX_DIRTY_MB_MAX	2048	/* arbitrary, but < MAX_LONG bytes */
+#define OSC_DEFAULT_RESENDS	10
 
 /* possible values for fo_sync_lock_cancel */
 enum {
@@ -225,9 +226,6 @@ enum {
 	NUM_SYNC_ON_CANCEL_STATES
 };
 
-#define MDC_MAX_RIF_DEFAULT       8
-#define MDC_MAX_RIF_MAX	 512
-
 enum obd_cl_sem_lock_class {
 	OBD_CLI_SEM_NORMAL,
 	OBD_CLI_SEM_MGC,
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 2f111a8..de808ee 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -97,6 +97,11 @@ int obd_zombie_impexp_init(void);
 void obd_zombie_impexp_stop(void);
 void obd_zombie_barrier(void);
 
+int obd_get_request_slot(struct client_obd *cli);
+void obd_put_request_slot(struct client_obd *cli);
+__u32 obd_get_max_rpcs_in_flight(struct client_obd *cli);
+int obd_set_max_rpcs_in_flight(struct client_obd *cli, __u32 max);
+
 struct llog_handle;
 struct llog_rec_hdr;
 typedef int (*llog_cb_t)(const struct lu_env *, struct llog_handle *,
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index 7c832aa..ee40006 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -360,7 +360,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	cli->cl_chunkbits = PAGE_SHIFT;
 
 	if (!strcmp(name, LUSTRE_MDC_NAME)) {
-		cli->cl_max_rpcs_in_flight = MDC_MAX_RIF_DEFAULT;
+		cli->cl_max_rpcs_in_flight = OBD_MAX_RIF_DEFAULT;
 	} else if (totalram_pages >> (20 - PAGE_SHIFT) <= 128 /* MB */) {
 		cli->cl_max_rpcs_in_flight = 2;
 	} else if (totalram_pages >> (20 - PAGE_SHIFT) <= 256 /* MB */) {
@@ -368,7 +368,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	} else if (totalram_pages >> (20 - PAGE_SHIFT) <= 512 /* MB */) {
 		cli->cl_max_rpcs_in_flight = 4;
 	} else {
-		cli->cl_max_rpcs_in_flight = OSC_MAX_RIF_DEFAULT;
+		cli->cl_max_rpcs_in_flight = OBD_MAX_RIF_DEFAULT;
 	}
 	rc = ldlm_get_ref();
 	if (rc) {
diff --git a/drivers/staging/lustre/lustre/mdc/lproc_mdc.c b/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
index 98d15fb..fca9450 100644
--- a/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
+++ b/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
@@ -43,11 +43,10 @@ static ssize_t max_rpcs_in_flight_show(struct kobject *kobj,
 	int len;
 	struct obd_device *dev = container_of(kobj, struct obd_device,
 					      obd_kobj);
-	struct client_obd *cli = &dev->u.cli;
+	__u32 max;
 
-	spin_lock(&cli->cl_loi_list_lock);
-	len = sprintf(buf, "%u\n", cli->cl_max_rpcs_in_flight);
-	spin_unlock(&cli->cl_loi_list_lock);
+	max = obd_get_max_rpcs_in_flight(&dev->u.cli);
+	len = sprintf(buf, "%u\n", max);
 
 	return len;
 }
@@ -59,7 +58,6 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 {
 	struct obd_device *dev = container_of(kobj, struct obd_device,
 					      obd_kobj);
-	struct client_obd *cli = &dev->u.cli;
 	int rc;
 	unsigned long val;
 
@@ -67,12 +65,9 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 	if (rc)
 		return rc;
 
-	if (val < 1 || val > MDC_MAX_RIF_MAX)
-		return -ERANGE;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_max_rpcs_in_flight = val;
-	spin_unlock(&cli->cl_loi_list_lock);
+	rc = obd_set_max_rpcs_in_flight(&dev->u.cli, val);
+	if (rc)
+		count = rc;
 
 	return count;
 }
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 58f2841..53b4063 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -61,8 +61,6 @@ void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data);
 void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 		     const char *old, int oldlen, const char *new, int newlen);
 void mdc_close_pack(struct ptlrpc_request *req, struct md_op_data *op_data);
-int mdc_enter_request(struct client_obd *cli);
-void mdc_exit_request(struct client_obd *cli);
 
 /* mdc/mdc_locks.c */
 int mdc_set_lock_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 95c4550..b532623 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -484,67 +484,3 @@ void mdc_close_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 	mdc_ioepoch_pack(epoch, op_data);
 	mdc_hsm_release_pack(req, op_data);
 }
-
-static int mdc_req_avail(struct client_obd *cli, struct mdc_cache_waiter *mcw)
-{
-	int rc;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	rc = list_empty(&mcw->mcw_entry);
-	spin_unlock(&cli->cl_loi_list_lock);
-	return rc;
-};
-
-/* We record requests in flight in cli->cl_r_in_flight here.
- * There is only one write rpc possible in mdc anyway. If this to change
- * in the future - the code may need to be revisited.
- */
-int mdc_enter_request(struct client_obd *cli)
-{
-	int rc = 0;
-	struct mdc_cache_waiter mcw;
-	struct l_wait_info lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
-
-	spin_lock(&cli->cl_loi_list_lock);
-	if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-		list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
-		init_waitqueue_head(&mcw.mcw_waitq);
-		spin_unlock(&cli->cl_loi_list_lock);
-		rc = l_wait_event(mcw.mcw_waitq, mdc_req_avail(cli, &mcw),
-				  &lwi);
-		if (rc) {
-			spin_lock(&cli->cl_loi_list_lock);
-			if (list_empty(&mcw.mcw_entry))
-				cli->cl_r_in_flight--;
-			list_del_init(&mcw.mcw_entry);
-			spin_unlock(&cli->cl_loi_list_lock);
-		}
-	} else {
-		cli->cl_r_in_flight++;
-		spin_unlock(&cli->cl_loi_list_lock);
-	}
-	return rc;
-}
-
-void mdc_exit_request(struct client_obd *cli)
-{
-	struct list_head *l, *tmp;
-	struct mdc_cache_waiter *mcw;
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_r_in_flight--;
-	list_for_each_safe(l, tmp, &cli->cl_cache_waiters) {
-		if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
-			/* No free request slots anymore */
-			break;
-		}
-
-		mcw = list_entry(l, struct mdc_cache_waiter, mcw_entry);
-		list_del_init(&mcw->mcw_entry);
-		cli->cl_r_in_flight++;
-		wake_up(&mcw->mcw_waitq);
-	}
-	/* Empty waiting list? Decrease reqs in-flight number */
-
-	spin_unlock(&cli->cl_loi_list_lock);
-}
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 626fce5..d8406d5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -809,7 +809,7 @@ resend:
 	 */
 	if (it) {
 		mdc_get_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
-		rc = mdc_enter_request(&obddev->u.cli);
+		rc = obd_get_request_slot(&obddev->u.cli);
 		if (rc != 0) {
 			mdc_put_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
 			mdc_clear_replay_flag(req, 0);
@@ -837,7 +837,7 @@ resend:
 		return rc;
 	}
 
-	mdc_exit_request(&obddev->u.cli);
+	obd_put_request_slot(&obddev->u.cli);
 	mdc_put_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
 
 	if (rc < 0) {
@@ -1179,7 +1179,7 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 
 	obddev = class_exp2obd(exp);
 
-	mdc_exit_request(&obddev->u.cli);
+	obd_put_request_slot(&obddev->u.cli);
 	if (OBD_FAIL_CHECK(OBD_FAIL_MDC_GETATTR_ENQUEUE))
 		rc = -ETIMEDOUT;
 
@@ -1239,7 +1239,7 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	rc = mdc_enter_request(&obddev->u.cli);
+	rc = obd_get_request_slot(&obddev->u.cli);
 	if (rc != 0) {
 		ptlrpc_req_finished(req);
 		return rc;
@@ -1248,7 +1248,7 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 	rc = ldlm_cli_enqueue(exp, &req, einfo, &res_id, &policy, &flags, NULL,
 			      0, LVB_T_NONE, &minfo->mi_lockh, 1);
 	if (rc < 0) {
-		mdc_exit_request(&obddev->u.cli);
+		obd_put_request_slot(&obddev->u.cli);
 		ptlrpc_req_finished(req);
 		return rc;
 	}
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 030295f..558f33b 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -58,16 +58,16 @@ static inline int mdc_queue_wait(struct ptlrpc_request *req)
 	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
 	int rc;
 
-	/* mdc_enter_request() ensures that this client has no more
+	/* obd_get_request_slot() ensures that this client has no more
 	 * than cl_max_rpcs_in_flight RPCs simultaneously inf light
 	 * against an MDT.
 	 */
-	rc = mdc_enter_request(cli);
+	rc = obd_get_request_slot(cli);
 	if (rc != 0)
 		return rc;
 
 	rc = ptlrpc_queue_wait(req);
-	mdc_exit_request(cli);
+	obd_put_request_slot(cli);
 
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index 99c2da6..be25434 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -1312,3 +1312,135 @@ void obd_zombie_impexp_stop(void)
 	obd_zombie_impexp_notify();
 	wait_for_completion(&obd_zombie_stop);
 }
+
+struct obd_request_slot_waiter {
+	struct list_head	orsw_entry;
+	wait_queue_head_t	orsw_waitq;
+	bool			orsw_signaled;
+};
+
+static bool obd_request_slot_avail(struct client_obd *cli,
+				   struct obd_request_slot_waiter *orsw)
+{
+	bool avail;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	avail = !!list_empty(&orsw->orsw_entry);
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return avail;
+};
+
+/*
+ * For network flow control, the RPC sponsor needs to acquire a credit
+ * before sending the RPC. The credits count for a connection is defined
+ * by the "cl_max_rpcs_in_flight". If all the credits are occpuied, then
+ * the subsequent RPC sponsors need to wait until others released their
+ * credits, or the administrator increased the "cl_max_rpcs_in_flight".
+ */
+int obd_get_request_slot(struct client_obd *cli)
+{
+	struct obd_request_slot_waiter orsw;
+	struct l_wait_info lwi;
+	int rc;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	if (cli->cl_r_in_flight < cli->cl_max_rpcs_in_flight) {
+		cli->cl_r_in_flight++;
+		spin_unlock(&cli->cl_loi_list_lock);
+		return 0;
+	}
+
+	init_waitqueue_head(&orsw.orsw_waitq);
+	list_add_tail(&orsw.orsw_entry, &cli->cl_loi_read_list);
+	orsw.orsw_signaled = false;
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
+	rc = l_wait_event(orsw.orsw_waitq,
+			  obd_request_slot_avail(cli, &orsw) ||
+			  orsw.orsw_signaled,
+			  &lwi);
+
+	/*
+	 * Here, we must take the lock to avoid the on-stack 'orsw' to be
+	 * freed but other (such as obd_put_request_slot) is using it.
+	 */
+	spin_lock(&cli->cl_loi_list_lock);
+	if (rc) {
+		if (!orsw.orsw_signaled) {
+			if (list_empty(&orsw.orsw_entry))
+				cli->cl_r_in_flight--;
+			else
+				list_del(&orsw.orsw_entry);
+		}
+	}
+
+	if (orsw.orsw_signaled) {
+		LASSERT(list_empty(&orsw.orsw_entry));
+
+		rc = -EINTR;
+	}
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return rc;
+}
+EXPORT_SYMBOL(obd_get_request_slot);
+
+void obd_put_request_slot(struct client_obd *cli)
+{
+	struct obd_request_slot_waiter *orsw;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	cli->cl_r_in_flight--;
+
+	/* If there is free slot, wakeup the first waiter. */
+	if (!list_empty(&cli->cl_loi_read_list) &&
+	    likely(cli->cl_r_in_flight < cli->cl_max_rpcs_in_flight)) {
+		orsw = list_entry(cli->cl_loi_read_list.next,
+				  struct obd_request_slot_waiter, orsw_entry);
+		list_del_init(&orsw->orsw_entry);
+		cli->cl_r_in_flight++;
+		wake_up(&orsw->orsw_waitq);
+	}
+	spin_unlock(&cli->cl_loi_list_lock);
+}
+EXPORT_SYMBOL(obd_put_request_slot);
+
+__u32 obd_get_max_rpcs_in_flight(struct client_obd *cli)
+{
+	return cli->cl_max_rpcs_in_flight;
+}
+EXPORT_SYMBOL(obd_get_max_rpcs_in_flight);
+
+int obd_set_max_rpcs_in_flight(struct client_obd *cli, __u32 max)
+{
+	struct obd_request_slot_waiter *orsw;
+	__u32 old;
+	int diff;
+	int i;
+
+	if (max > OBD_MAX_RIF_MAX || max < 1)
+		return -ERANGE;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	old = cli->cl_max_rpcs_in_flight;
+	cli->cl_max_rpcs_in_flight = max;
+	diff = max - old;
+
+	/* We increase the max_rpcs_in_flight, then wakeup some waiters. */
+	for (i = 0; i < diff; i++) {
+		if (list_empty(&cli->cl_loi_read_list))
+			break;
+
+		orsw = list_entry(cli->cl_loi_read_list.next,
+				  struct obd_request_slot_waiter, orsw_entry);
+		list_del_init(&orsw->orsw_entry);
+		cli->cl_r_in_flight++;
+		wake_up(&orsw->orsw_waitq);
+	}
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(obd_set_max_rpcs_in_flight);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 36/80] staging: lustre: reorder LOV_MAGIC_* definition
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

Since all the LOV_MAGIC_* definitions have the same
postfix values break that value out into its own
definition. With this we can check whether the magic's
postfix match the LOV_MAGIC_MAGIC or not: if yes,
then it is quite possible that the clients has
encountered an newer LOV magic. This extra information
can let us handle those cases more gracefully.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4941
Reviewed-on: http://review.whamcloud.com/10045
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   20 +++++++++++++++-----
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index d3a9db9..3444add 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1478,11 +1478,21 @@ enum obdo_flags {
 	OBD_FL_LOCAL_MASK   = 0xF0000000,
 };
 
-#define LOV_MAGIC_V1      0x0BD10BD0
-#define LOV_MAGIC	 LOV_MAGIC_V1
-#define LOV_MAGIC_JOIN_V1 0x0BD20BD0
-#define LOV_MAGIC_V3      0x0BD30BD0
-#define LOV_MAGIC_MIGRATE 0x0BD40BD0
+/*
+ * All LOV EA magics should have the same postfix, if some new version
+ * Lustre instroduces new LOV EA magic, then when down-grade to an old
+ * Lustre, even though the old version system does not recognizes such
+ * new magic, it still can distinguish the corrupted cases by checking
+ * the magic's postfix.
+ */
+#define LOV_MAGIC_MAGIC 0x0BD0
+#define LOV_MAGIC_MASK  0xFFFF
+
+#define LOV_MAGIC_V1		(0x0BD10000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC_JOIN_V1	(0x0BD20000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC_V3		(0x0BD30000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC_MIGRATE	(0x0BD40000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC		LOV_MAGIC_V1
 
 /*
  * magic for fully defined striping
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 36/80] staging: lustre: reorder LOV_MAGIC_* definition
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

Since all the LOV_MAGIC_* definitions have the same
postfix values break that value out into its own
definition. With this we can check whether the magic's
postfix match the LOV_MAGIC_MAGIC or not: if yes,
then it is quite possible that the clients has
encountered an newer LOV magic. This extra information
can let us handle those cases more gracefully.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4941
Reviewed-on: http://review.whamcloud.com/10045
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   20 +++++++++++++++-----
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index d3a9db9..3444add 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1478,11 +1478,21 @@ enum obdo_flags {
 	OBD_FL_LOCAL_MASK   = 0xF0000000,
 };
 
-#define LOV_MAGIC_V1      0x0BD10BD0
-#define LOV_MAGIC	 LOV_MAGIC_V1
-#define LOV_MAGIC_JOIN_V1 0x0BD20BD0
-#define LOV_MAGIC_V3      0x0BD30BD0
-#define LOV_MAGIC_MIGRATE 0x0BD40BD0
+/*
+ * All LOV EA magics should have the same postfix, if some new version
+ * Lustre instroduces new LOV EA magic, then when down-grade to an old
+ * Lustre, even though the old version system does not recognizes such
+ * new magic, it still can distinguish the corrupted cases by checking
+ * the magic's postfix.
+ */
+#define LOV_MAGIC_MAGIC 0x0BD0
+#define LOV_MAGIC_MASK  0xFFFF
+
+#define LOV_MAGIC_V1		(0x0BD10000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC_JOIN_V1	(0x0BD20000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC_V3		(0x0BD30000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC_MIGRATE	(0x0BD40000 | LOV_MAGIC_MAGIC)
+#define LOV_MAGIC		LOV_MAGIC_V1
 
 /*
  * magic for fully defined striping
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 37/80] staging: lustre: ldlm: flock completion fixes.
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Vitaly Fertman, Andriy Skulysh, James Simmons

From: Vitaly Fertman <vitaly_fertman@xyratex.com>

Move checks for FAILED, DESTROYED flags under ldlm spinlock,
destroy flock atomically with the check it is not destroyed yet.
Do not put the granted flock into the resource if this is
UNLOCK, TEST, or DEADLOCK'ed flock.

Later a regression for this patch was reported under LU-7626.
The refcount nonzero (1) after lock cleanup errors was reported.
The reason is that the case LCK_NL was not handled for obdecho.
Patch 17791 resolved this issue which has been combined into
this upstream patch.

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2177
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andriy Skulysh <andriy_skulysh@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Xyratex-bug-id: MRP-1588
Reviewed-on: http://review.whamcloud.com/10005
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7626
Reviewed-by: Mirza Arshad Mirza Hussain <arshad.hussain@seagate.com>
Reviewed-by: Alexey Leonidovich Lyashkov <alexey.lyashkov@seagate.com>
Reviewed-on: http://review.whamcloud.com/17791
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/linux/libcfs/libcfs_fail.h      |    3 +
 drivers/staging/lustre/lnet/libcfs/fail.c          |    6 +-
 .../lustre/lustre/include/lustre_dlm_flags.h       |   36 ++++---
 .../staging/lustre/lustre/include/obd_support.h    |    4 +
 drivers/staging/lustre/lustre/ldlm/ldlm_flock.c    |   98 ++++++++++++++------
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   23 ++++-
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   16 ++--
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |    8 ++-
 drivers/staging/lustre/lustre/llite/file.c         |   20 +++--
 9 files changed, 148 insertions(+), 66 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h
index d3f9a60..bdbbe93 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h
@@ -143,6 +143,9 @@ static inline int cfs_fail_timeout_set(__u32 id, __u32 value, int ms, int set)
 #define CFS_FAIL_TIMEOUT_ORSET(id, value, secs) \
 	cfs_fail_timeout_set(id, value, secs * 1000, CFS_FAIL_LOC_ORSET)
 
+#define CFS_FAIL_TIMEOUT_RESET(id, value, secs) \
+	cfs_fail_timeout_set(id, value, secs * 1000, CFS_FAIL_LOC_RESET)
+
 #define CFS_FAIL_TIMEOUT_MS_ORSET(id, value, ms) \
 	cfs_fail_timeout_set(id, value, ms, CFS_FAIL_LOC_ORSET)
 
diff --git a/drivers/staging/lustre/lnet/libcfs/fail.c b/drivers/staging/lustre/lnet/libcfs/fail.c
index 9288ee0..e4b1a0a 100644
--- a/drivers/staging/lustre/lnet/libcfs/fail.c
+++ b/drivers/staging/lustre/lnet/libcfs/fail.c
@@ -90,8 +90,10 @@ int __cfs_fail_check_set(__u32 id, __u32 value, int set)
 		}
 	}
 
-	if ((set == CFS_FAIL_LOC_ORSET || set == CFS_FAIL_LOC_RESET) &&
-	    (value & CFS_FAIL_ONCE))
+	/* Take into account the current call for FAIL_ONCE for ORSET only,
+	 * as RESET is a new fail_loc, it does not change the current call
+	 */
+	if ((set == CFS_FAIL_LOC_ORSET) && (value & CFS_FAIL_ONCE))
 		set_bit(CFS_FAIL_ONCE_BIT, &cfs_fail_loc);
 	/* Lost race to set CFS_FAILED_BIT. */
 	if (test_and_set_bit(CFS_FAILED_BIT, &cfs_fail_loc)) {
diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h b/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h
index e7e0c21..a0f064d 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h
@@ -28,21 +28,6 @@
 /** l_flags bits marked as "all_flags" bits */
 #define LDLM_FL_ALL_FLAGS_MASK          0x00FFFFFFC08F932FULL
 
-/** l_flags bits marked as "ast" bits */
-#define LDLM_FL_AST_MASK                0x0000000080008000ULL
-
-/** l_flags bits marked as "blocked" bits */
-#define LDLM_FL_BLOCKED_MASK            0x000000000000000EULL
-
-/** l_flags bits marked as "gone" bits */
-#define LDLM_FL_GONE_MASK               0x0006004000000000ULL
-
-/** l_flags bits marked as "inherit" bits */
-#define LDLM_FL_INHERIT_MASK            0x0000000000800000ULL
-
-/** l_flags bits marked as "off_wire" bits */
-#define LDLM_FL_OFF_WIRE_MASK           0x00FFFFFF00000000ULL
-
 /** extent, mode, or resource changed */
 #define LDLM_FL_LOCK_CHANGED            0x0000000000000001ULL /* bit 0 */
 #define ldlm_is_lock_changed(_l)        LDLM_TEST_FLAG((_l), 1ULL <<  0)
@@ -372,6 +357,27 @@
 #define ldlm_set_excl(_l)               LDLM_SET_FLAG((_l), 1ULL << 55)
 #define ldlm_clear_excl(_l)             LDLM_CLEAR_FLAG((_l), 1ULL << 55)
 
+/** l_flags bits marked as "ast" bits */
+#define LDLM_FL_AST_MASK		(LDLM_FL_FLOCK_DEADLOCK		|\
+					 LDLM_FL_AST_DISCARD_DATA)
+
+/** l_flags bits marked as "blocked" bits */
+#define LDLM_FL_BLOCKED_MASK		(LDLM_FL_BLOCK_GRANTED		|\
+					 LDLM_FL_BLOCK_CONV		|\
+					 LDLM_FL_BLOCK_WAIT)
+
+/** l_flags bits marked as "gone" bits */
+#define LDLM_FL_GONE_MASK		(LDLM_FL_DESTROYED		|\
+					 LDLM_FL_FAILED)
+
+/** l_flags bits marked as "inherit" bits */
+/* Flags inherited from wire on enqueue/reply between client/server. */
+/* NO_TIMEOUT flag to force ldlm_lock_match() to wait with no timeout. */
+/* TEST_LOCK flag to not let TEST lock to be granted. */
+#define LDLM_FL_INHERIT_MASK		(LDLM_FL_CANCEL_ON_BLOCK	|\
+					 LDLM_FL_NO_TIMEOUT		|\
+					 LDLM_FL_TEST_LOCK)
+
 /** test for ldlm_lock flag bit set */
 #define LDLM_TEST_FLAG(_l, _b)        (((_l)->l_flags & (_b)) != 0)
 
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 71bf844..26fdff6 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -318,6 +318,10 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_LDLM_AGL_NOLOCK	 0x31b
 #define OBD_FAIL_LDLM_OST_LVB		 0x31c
 #define OBD_FAIL_LDLM_ENQUEUE_HANG	 0x31d
+#define OBD_FAIL_LDLM_CP_CB_WAIT2	 0x320
+#define OBD_FAIL_LDLM_CP_CB_WAIT3	 0x321
+#define OBD_FAIL_LDLM_CP_CB_WAIT4	 0x322
+#define OBD_FAIL_LDLM_CP_CB_WAIT5	 0x323
 
 /* LOCKLESS IO */
 #define OBD_FAIL_LDLM_SET_CONTENTION     0x385
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index d6b61bc..65e8e14 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -97,7 +97,7 @@ ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
 	LASSERT(hlist_unhashed(&lock->l_exp_flock_hash));
 
 	list_del_init(&lock->l_res_link);
-	if (flags == LDLM_FL_WAIT_NOREPROC && !ldlm_is_failed(lock)) {
+	if (flags == LDLM_FL_WAIT_NOREPROC) {
 		/* client side - set a flag to prevent sending a CANCEL */
 		lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
 
@@ -455,27 +455,21 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
 	enum ldlm_error		    err;
 	int			     rc = 0;
 
+	OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT2, 4);
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT3)) {
+		lock_res_and_lock(lock);
+		lock->l_flags |= LDLM_FL_FAIL_LOC;
+		unlock_res_and_lock(lock);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT3, 4);
+	}
 	CDEBUG(D_DLMTRACE, "flags: 0x%llx data: %p getlk: %p\n",
 	       flags, data, getlk);
 
-	/* Import invalidation. We need to actually release the lock
-	 * references being held, so that it can go away. No point in
-	 * holding the lock even if app still believes it has it, since
-	 * server already dropped it anyway. Only for granted locks too.
-	 */
-	if ((lock->l_flags & (LDLM_FL_FAILED|LDLM_FL_LOCAL_ONLY)) ==
-	    (LDLM_FL_FAILED|LDLM_FL_LOCAL_ONLY)) {
-		if (lock->l_req_mode == lock->l_granted_mode &&
-		    lock->l_granted_mode != LCK_NL && !data)
-			ldlm_lock_decref_internal(lock, lock->l_req_mode);
-
-		/* Need to wake up the waiter if we were evicted */
-		wake_up(&lock->l_waitq);
-		return 0;
-	}
-
 	LASSERT(flags != LDLM_FL_WAIT_NOREPROC);
 
+	if (flags & LDLM_FL_FAILED)
+		goto granted;
+
 	if (!(flags & (LDLM_FL_BLOCK_WAIT | LDLM_FL_BLOCK_GRANTED |
 		       LDLM_FL_BLOCK_CONV))) {
 		if (!data)
@@ -514,12 +508,21 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
 granted:
 	OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT, 10);
 
-	if (ldlm_is_failed(lock)) {
-		LDLM_DEBUG(lock, "client-side enqueue waking up: failed");
-		return -EIO;
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT4)) {
+		lock_res_and_lock(lock);
+		/* DEADLOCK is always set with CBPENDING */
+		lock->l_flags |= LDLM_FL_FLOCK_DEADLOCK | LDLM_FL_CBPENDING;
+		unlock_res_and_lock(lock);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT4, 4);
+	}
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT5)) {
+		lock_res_and_lock(lock);
+		/* DEADLOCK is always set with CBPENDING */
+		lock->l_flags |= LDLM_FL_FAIL_LOC |
+				 LDLM_FL_FLOCK_DEADLOCK | LDLM_FL_CBPENDING;
+		unlock_res_and_lock(lock);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT5, 4);
 	}
-
-	LDLM_DEBUG(lock, "client-side enqueue granted");
 
 	lock_res_and_lock(lock);
 
@@ -530,20 +533,59 @@ granted:
 	if (ldlm_is_destroyed(lock)) {
 		unlock_res_and_lock(lock);
 		LDLM_DEBUG(lock, "client-side enqueue waking up: destroyed");
-		return 0;
+		/*
+		 * An error is still to be returned, to propagate it up to
+		 * ldlm_cli_enqueue_fini() caller.
+		 */
+		return -EIO;
 	}
 
 	/* ldlm_lock_enqueue() has already placed lock on the granted list. */
-	list_del_init(&lock->l_res_link);
+	ldlm_resource_unlink_lock(lock);
+
+	/*
+	 * Import invalidation. We need to actually release the lock
+	 * references being held, so that it can go away. No point in
+	 * holding the lock even if app still believes it has it, since
+	 * server already dropped it anyway. Only for granted locks too.
+	 */
+	/* Do the same for DEADLOCK'ed locks. */
+	if (ldlm_is_failed(lock) || ldlm_is_flock_deadlock(lock)) {
+		int mode;
+
+		if (flags & LDLM_FL_TEST_LOCK)
+			LASSERT(ldlm_is_test_lock(lock));
+
+		if (ldlm_is_test_lock(lock) || ldlm_is_flock_deadlock(lock))
+			mode = getlk->fl_type;
+		else
+			mode = lock->l_granted_mode;
+
+		if (ldlm_is_flock_deadlock(lock)) {
+			LDLM_DEBUG(lock, "client-side enqueue deadlock received");
+			rc = -EDEADLK;
+		}
+		ldlm_flock_destroy(lock, mode, LDLM_FL_WAIT_NOREPROC);
+		unlock_res_and_lock(lock);
+
+		/* Need to wake up the waiter if we were evicted */
+		wake_up(&lock->l_waitq);
+
+		/*
+		 * An error is still to be returned, to propagate it up to
+		 * ldlm_cli_enqueue_fini() caller.
+		 */
+		return rc ? : -EIO;
+	}
+
+	LDLM_DEBUG(lock, "client-side enqueue granted");
 
-	if (ldlm_is_flock_deadlock(lock)) {
-		LDLM_DEBUG(lock, "client-side enqueue deadlock received");
-		rc = -EDEADLK;
-	} else if (flags & LDLM_FL_TEST_LOCK) {
+	if (flags & LDLM_FL_TEST_LOCK) {
 		/* fcntl(F_GETLK) request */
 		/* The old mode was saved in getlk->fl_type so that if the mode
 		 * in the lock changes we can decref the appropriate refcount.
 		 */
+		LASSERT(ldlm_is_test_lock(lock));
 		ldlm_flock_destroy(lock, getlk->fl_type, LDLM_FL_WAIT_NOREPROC);
 		switch (lock->l_granted_mode) {
 		case LCK_PR:
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index a5993f7..1a0fce1 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -1028,15 +1028,28 @@ void ldlm_grant_lock(struct ldlm_lock *lock, struct list_head *work_list)
 	check_res_locked(res);
 
 	lock->l_granted_mode = lock->l_req_mode;
+
+	if (work_list && lock->l_completion_ast)
+		ldlm_add_ast_work_item(lock, NULL, work_list);
+
 	if (res->lr_type == LDLM_PLAIN || res->lr_type == LDLM_IBITS)
 		ldlm_grant_lock_with_skiplist(lock);
 	else if (res->lr_type == LDLM_EXTENT)
 		ldlm_extent_add_lock(res, lock);
-	else
+	else if (res->lr_type == LDLM_FLOCK) {
+		/*
+		 * We should not add locks to granted list in the following cases:
+		 * - this is an UNLOCK but not a real lock;
+		 * - this is a TEST lock;
+		 * - this is a F_CANCELLK lock (async flock has req_mode == 0)
+		 * - this is a deadlock (flock cannot be granted)
+		 */
+		if (!lock->l_req_mode || lock->l_req_mode == LCK_NL ||
+		    ldlm_is_test_lock(lock) || ldlm_is_flock_deadlock(lock))
+			return;
 		ldlm_resource_add_lock(res, &res->lr_granted, lock);
-
-	if (work_list && lock->l_completion_ast)
-		ldlm_add_ast_work_item(lock, NULL, work_list);
+	} else
+		LBUG();
 
 	ldlm_pool_add(&ldlm_res_to_ns(res)->ns_pool, lock);
 }
@@ -1546,6 +1559,8 @@ enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns,
 	 */
 	if (*flags & LDLM_FL_AST_DISCARD_DATA)
 		ldlm_set_ast_discard_data(lock);
+	if (*flags & LDLM_FL_TEST_LOCK)
+		ldlm_set_test_lock(lock);
 
 	/*
 	 * This distinction between local lock trees is very important; a client
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index af487f9..984a460 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -309,8 +309,6 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns,
 	else
 		LDLM_DEBUG(lock, "lock was granted or failed in race");
 
-	ldlm_lock_decref_internal(lock, mode);
-
 	/* XXX - HACK because we shouldn't call ldlm_lock_destroy()
 	 *       from llite/file.c/ll_file_flock().
 	 */
@@ -321,9 +319,14 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns,
 	 */
 	if (lock->l_resource->lr_type == LDLM_FLOCK) {
 		lock_res_and_lock(lock);
-		ldlm_resource_unlink_lock(lock);
-		ldlm_lock_destroy_nolock(lock);
+		if (!ldlm_is_destroyed(lock)) {
+			ldlm_resource_unlink_lock(lock);
+			ldlm_lock_decref_internal_nolock(lock, mode);
+			ldlm_lock_destroy_nolock(lock);
+		}
 		unlock_res_and_lock(lock);
+	} else {
+		ldlm_lock_decref_internal(lock, mode);
 	}
 }
 
@@ -418,11 +421,6 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	*flags = ldlm_flags_from_wire(reply->lock_flags);
 	lock->l_flags |= ldlm_flags_from_wire(reply->lock_flags &
 					      LDLM_FL_INHERIT_MASK);
-	/* move NO_TIMEOUT flag to the lock to force ldlm_lock_match()
-	 * to wait with no timeout as well
-	 */
-	lock->l_flags |= ldlm_flags_from_wire(reply->lock_flags &
-					      LDLM_FL_NO_TIMEOUT);
 	unlock_res_and_lock(lock);
 
 	CDEBUG(D_INFO, "local: %p, remote cookie: %#llx, flags: 0x%llx\n",
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 51a28d9..5866b00 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -793,8 +793,14 @@ static void cleanup_resource(struct ldlm_resource *res, struct list_head *q,
 			 */
 			unlock_res(res);
 			LDLM_DEBUG(lock, "setting FL_LOCAL_ONLY");
+			if (lock->l_flags & LDLM_FL_FAIL_LOC) {
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				schedule_timeout(cfs_time_seconds(4));
+				set_current_state(TASK_RUNNING);
+			}
 			if (lock->l_completion_ast)
-				lock->l_completion_ast(lock, 0, NULL);
+				lock->l_completion_ast(lock, LDLM_FL_FAILED,
+						       NULL);
 			LDLM_LOCK_RELEASE(lock);
 			continue;
 		}
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 769b028..89e93dc 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2717,6 +2717,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	struct md_op_data *op_data;
 	struct lustre_handle lockh = {0};
 	ldlm_policy_data_t flock = { {0} };
+	int fl_type = file_lock->fl_type;
 	__u64 flags = 0;
 	int rc;
 	int rc2 = 0;
@@ -2747,7 +2748,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	if (file_lock->fl_lmops && file_lock->fl_lmops->lm_compare_owner)
 		flock.l_flock.owner = (unsigned long)file_lock->fl_pid;
 
-	switch (file_lock->fl_type) {
+	switch (fl_type) {
 	case F_RDLCK:
 		einfo.ei_mode = LCK_PR;
 		break;
@@ -2767,8 +2768,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 		einfo.ei_mode = LCK_PW;
 		break;
 	default:
-		CDEBUG(D_INFO, "Unknown fcntl lock type: %d\n",
-		       file_lock->fl_type);
+		CDEBUG(D_INFO, "Unknown fcntl lock type: %d\n", fl_type);
 		return -ENOTSUPP;
 	}
 
@@ -2790,16 +2790,18 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	case F_GETLK64:
 #endif
 		flags = LDLM_FL_TEST_LOCK;
-		/* Save the old mode so that if the mode in the lock changes we
-		 * can decrement the appropriate reader or writer refcount.
-		 */
-		file_lock->fl_type = einfo.ei_mode;
 		break;
 	default:
 		CERROR("unknown fcntl lock command: %d\n", cmd);
 		return -EINVAL;
 	}
 
+	/*
+	 * Save the old mode so that if the mode in the lock changes we
+	 * can decrement the appropriate reader or writer refcount.
+	 */
+	file_lock->fl_type = einfo.ei_mode;
+
 	op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0, 0,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
@@ -2812,6 +2814,10 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	rc = md_enqueue(sbi->ll_md_exp, &einfo, NULL,
 			op_data, &lockh, &flock, 0, NULL /* req */, flags);
 
+	/* Restore the file lock type if not TEST lock. */
+	if (!(flags & LDLM_FL_TEST_LOCK))
+		file_lock->fl_type = fl_type;
+
 	if ((rc == 0 || file_lock->fl_type == F_UNLCK) &&
 	    !(flags & LDLM_FL_TEST_LOCK))
 		rc2  = locks_lock_file_wait(file, file_lock);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 37/80] staging: lustre: ldlm: flock completion fixes.
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Vitaly Fertman, Andriy Skulysh, James Simmons

From: Vitaly Fertman <vitaly_fertman@xyratex.com>

Move checks for FAILED, DESTROYED flags under ldlm spinlock,
destroy flock atomically with the check it is not destroyed yet.
Do not put the granted flock into the resource if this is
UNLOCK, TEST, or DEADLOCK'ed flock.

Later a regression for this patch was reported under LU-7626.
The refcount nonzero (1) after lock cleanup errors was reported.
The reason is that the case LCK_NL was not handled for obdecho.
Patch 17791 resolved this issue which has been combined into
this upstream patch.

Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Signed-off-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2177
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andriy Skulysh <andriy_skulysh@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Xyratex-bug-id: MRP-1588
Reviewed-on: http://review.whamcloud.com/10005
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7626
Reviewed-by: Mirza Arshad Mirza Hussain <arshad.hussain@seagate.com>
Reviewed-by: Alexey Leonidovich Lyashkov <alexey.lyashkov@seagate.com>
Reviewed-on: http://review.whamcloud.com/17791
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/linux/libcfs/libcfs_fail.h      |    3 +
 drivers/staging/lustre/lnet/libcfs/fail.c          |    6 +-
 .../lustre/lustre/include/lustre_dlm_flags.h       |   36 ++++---
 .../staging/lustre/lustre/include/obd_support.h    |    4 +
 drivers/staging/lustre/lustre/ldlm/ldlm_flock.c    |   98 ++++++++++++++------
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   23 ++++-
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   16 ++--
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |    8 ++-
 drivers/staging/lustre/lustre/llite/file.c         |   20 +++--
 9 files changed, 148 insertions(+), 66 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h
index d3f9a60..bdbbe93 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_fail.h
@@ -143,6 +143,9 @@ static inline int cfs_fail_timeout_set(__u32 id, __u32 value, int ms, int set)
 #define CFS_FAIL_TIMEOUT_ORSET(id, value, secs) \
 	cfs_fail_timeout_set(id, value, secs * 1000, CFS_FAIL_LOC_ORSET)
 
+#define CFS_FAIL_TIMEOUT_RESET(id, value, secs) \
+	cfs_fail_timeout_set(id, value, secs * 1000, CFS_FAIL_LOC_RESET)
+
 #define CFS_FAIL_TIMEOUT_MS_ORSET(id, value, ms) \
 	cfs_fail_timeout_set(id, value, ms, CFS_FAIL_LOC_ORSET)
 
diff --git a/drivers/staging/lustre/lnet/libcfs/fail.c b/drivers/staging/lustre/lnet/libcfs/fail.c
index 9288ee0..e4b1a0a 100644
--- a/drivers/staging/lustre/lnet/libcfs/fail.c
+++ b/drivers/staging/lustre/lnet/libcfs/fail.c
@@ -90,8 +90,10 @@ int __cfs_fail_check_set(__u32 id, __u32 value, int set)
 		}
 	}
 
-	if ((set == CFS_FAIL_LOC_ORSET || set == CFS_FAIL_LOC_RESET) &&
-	    (value & CFS_FAIL_ONCE))
+	/* Take into account the current call for FAIL_ONCE for ORSET only,
+	 * as RESET is a new fail_loc, it does not change the current call
+	 */
+	if ((set == CFS_FAIL_LOC_ORSET) && (value & CFS_FAIL_ONCE))
 		set_bit(CFS_FAIL_ONCE_BIT, &cfs_fail_loc);
 	/* Lost race to set CFS_FAILED_BIT. */
 	if (test_and_set_bit(CFS_FAILED_BIT, &cfs_fail_loc)) {
diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h b/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h
index e7e0c21..a0f064d 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm_flags.h
@@ -28,21 +28,6 @@
 /** l_flags bits marked as "all_flags" bits */
 #define LDLM_FL_ALL_FLAGS_MASK          0x00FFFFFFC08F932FULL
 
-/** l_flags bits marked as "ast" bits */
-#define LDLM_FL_AST_MASK                0x0000000080008000ULL
-
-/** l_flags bits marked as "blocked" bits */
-#define LDLM_FL_BLOCKED_MASK            0x000000000000000EULL
-
-/** l_flags bits marked as "gone" bits */
-#define LDLM_FL_GONE_MASK               0x0006004000000000ULL
-
-/** l_flags bits marked as "inherit" bits */
-#define LDLM_FL_INHERIT_MASK            0x0000000000800000ULL
-
-/** l_flags bits marked as "off_wire" bits */
-#define LDLM_FL_OFF_WIRE_MASK           0x00FFFFFF00000000ULL
-
 /** extent, mode, or resource changed */
 #define LDLM_FL_LOCK_CHANGED            0x0000000000000001ULL /* bit 0 */
 #define ldlm_is_lock_changed(_l)        LDLM_TEST_FLAG((_l), 1ULL <<  0)
@@ -372,6 +357,27 @@
 #define ldlm_set_excl(_l)               LDLM_SET_FLAG((_l), 1ULL << 55)
 #define ldlm_clear_excl(_l)             LDLM_CLEAR_FLAG((_l), 1ULL << 55)
 
+/** l_flags bits marked as "ast" bits */
+#define LDLM_FL_AST_MASK		(LDLM_FL_FLOCK_DEADLOCK		|\
+					 LDLM_FL_AST_DISCARD_DATA)
+
+/** l_flags bits marked as "blocked" bits */
+#define LDLM_FL_BLOCKED_MASK		(LDLM_FL_BLOCK_GRANTED		|\
+					 LDLM_FL_BLOCK_CONV		|\
+					 LDLM_FL_BLOCK_WAIT)
+
+/** l_flags bits marked as "gone" bits */
+#define LDLM_FL_GONE_MASK		(LDLM_FL_DESTROYED		|\
+					 LDLM_FL_FAILED)
+
+/** l_flags bits marked as "inherit" bits */
+/* Flags inherited from wire on enqueue/reply between client/server. */
+/* NO_TIMEOUT flag to force ldlm_lock_match() to wait with no timeout. */
+/* TEST_LOCK flag to not let TEST lock to be granted. */
+#define LDLM_FL_INHERIT_MASK		(LDLM_FL_CANCEL_ON_BLOCK	|\
+					 LDLM_FL_NO_TIMEOUT		|\
+					 LDLM_FL_TEST_LOCK)
+
 /** test for ldlm_lock flag bit set */
 #define LDLM_TEST_FLAG(_l, _b)        (((_l)->l_flags & (_b)) != 0)
 
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 71bf844..26fdff6 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -318,6 +318,10 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_LDLM_AGL_NOLOCK	 0x31b
 #define OBD_FAIL_LDLM_OST_LVB		 0x31c
 #define OBD_FAIL_LDLM_ENQUEUE_HANG	 0x31d
+#define OBD_FAIL_LDLM_CP_CB_WAIT2	 0x320
+#define OBD_FAIL_LDLM_CP_CB_WAIT3	 0x321
+#define OBD_FAIL_LDLM_CP_CB_WAIT4	 0x322
+#define OBD_FAIL_LDLM_CP_CB_WAIT5	 0x323
 
 /* LOCKLESS IO */
 #define OBD_FAIL_LDLM_SET_CONTENTION     0x385
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index d6b61bc..65e8e14 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -97,7 +97,7 @@ ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
 	LASSERT(hlist_unhashed(&lock->l_exp_flock_hash));
 
 	list_del_init(&lock->l_res_link);
-	if (flags == LDLM_FL_WAIT_NOREPROC && !ldlm_is_failed(lock)) {
+	if (flags == LDLM_FL_WAIT_NOREPROC) {
 		/* client side - set a flag to prevent sending a CANCEL */
 		lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
 
@@ -455,27 +455,21 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
 	enum ldlm_error		    err;
 	int			     rc = 0;
 
+	OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT2, 4);
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT3)) {
+		lock_res_and_lock(lock);
+		lock->l_flags |= LDLM_FL_FAIL_LOC;
+		unlock_res_and_lock(lock);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT3, 4);
+	}
 	CDEBUG(D_DLMTRACE, "flags: 0x%llx data: %p getlk: %p\n",
 	       flags, data, getlk);
 
-	/* Import invalidation. We need to actually release the lock
-	 * references being held, so that it can go away. No point in
-	 * holding the lock even if app still believes it has it, since
-	 * server already dropped it anyway. Only for granted locks too.
-	 */
-	if ((lock->l_flags & (LDLM_FL_FAILED|LDLM_FL_LOCAL_ONLY)) ==
-	    (LDLM_FL_FAILED|LDLM_FL_LOCAL_ONLY)) {
-		if (lock->l_req_mode == lock->l_granted_mode &&
-		    lock->l_granted_mode != LCK_NL && !data)
-			ldlm_lock_decref_internal(lock, lock->l_req_mode);
-
-		/* Need to wake up the waiter if we were evicted */
-		wake_up(&lock->l_waitq);
-		return 0;
-	}
-
 	LASSERT(flags != LDLM_FL_WAIT_NOREPROC);
 
+	if (flags & LDLM_FL_FAILED)
+		goto granted;
+
 	if (!(flags & (LDLM_FL_BLOCK_WAIT | LDLM_FL_BLOCK_GRANTED |
 		       LDLM_FL_BLOCK_CONV))) {
 		if (!data)
@@ -514,12 +508,21 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
 granted:
 	OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT, 10);
 
-	if (ldlm_is_failed(lock)) {
-		LDLM_DEBUG(lock, "client-side enqueue waking up: failed");
-		return -EIO;
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT4)) {
+		lock_res_and_lock(lock);
+		/* DEADLOCK is always set with CBPENDING */
+		lock->l_flags |= LDLM_FL_FLOCK_DEADLOCK | LDLM_FL_CBPENDING;
+		unlock_res_and_lock(lock);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT4, 4);
+	}
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT5)) {
+		lock_res_and_lock(lock);
+		/* DEADLOCK is always set with CBPENDING */
+		lock->l_flags |= LDLM_FL_FAIL_LOC |
+				 LDLM_FL_FLOCK_DEADLOCK | LDLM_FL_CBPENDING;
+		unlock_res_and_lock(lock);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT5, 4);
 	}
-
-	LDLM_DEBUG(lock, "client-side enqueue granted");
 
 	lock_res_and_lock(lock);
 
@@ -530,20 +533,59 @@ granted:
 	if (ldlm_is_destroyed(lock)) {
 		unlock_res_and_lock(lock);
 		LDLM_DEBUG(lock, "client-side enqueue waking up: destroyed");
-		return 0;
+		/*
+		 * An error is still to be returned, to propagate it up to
+		 * ldlm_cli_enqueue_fini() caller.
+		 */
+		return -EIO;
 	}
 
 	/* ldlm_lock_enqueue() has already placed lock on the granted list. */
-	list_del_init(&lock->l_res_link);
+	ldlm_resource_unlink_lock(lock);
+
+	/*
+	 * Import invalidation. We need to actually release the lock
+	 * references being held, so that it can go away. No point in
+	 * holding the lock even if app still believes it has it, since
+	 * server already dropped it anyway. Only for granted locks too.
+	 */
+	/* Do the same for DEADLOCK'ed locks. */
+	if (ldlm_is_failed(lock) || ldlm_is_flock_deadlock(lock)) {
+		int mode;
+
+		if (flags & LDLM_FL_TEST_LOCK)
+			LASSERT(ldlm_is_test_lock(lock));
+
+		if (ldlm_is_test_lock(lock) || ldlm_is_flock_deadlock(lock))
+			mode = getlk->fl_type;
+		else
+			mode = lock->l_granted_mode;
+
+		if (ldlm_is_flock_deadlock(lock)) {
+			LDLM_DEBUG(lock, "client-side enqueue deadlock received");
+			rc = -EDEADLK;
+		}
+		ldlm_flock_destroy(lock, mode, LDLM_FL_WAIT_NOREPROC);
+		unlock_res_and_lock(lock);
+
+		/* Need to wake up the waiter if we were evicted */
+		wake_up(&lock->l_waitq);
+
+		/*
+		 * An error is still to be returned, to propagate it up to
+		 * ldlm_cli_enqueue_fini() caller.
+		 */
+		return rc ? : -EIO;
+	}
+
+	LDLM_DEBUG(lock, "client-side enqueue granted");
 
-	if (ldlm_is_flock_deadlock(lock)) {
-		LDLM_DEBUG(lock, "client-side enqueue deadlock received");
-		rc = -EDEADLK;
-	} else if (flags & LDLM_FL_TEST_LOCK) {
+	if (flags & LDLM_FL_TEST_LOCK) {
 		/* fcntl(F_GETLK) request */
 		/* The old mode was saved in getlk->fl_type so that if the mode
 		 * in the lock changes we can decref the appropriate refcount.
 		 */
+		LASSERT(ldlm_is_test_lock(lock));
 		ldlm_flock_destroy(lock, getlk->fl_type, LDLM_FL_WAIT_NOREPROC);
 		switch (lock->l_granted_mode) {
 		case LCK_PR:
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index a5993f7..1a0fce1 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -1028,15 +1028,28 @@ void ldlm_grant_lock(struct ldlm_lock *lock, struct list_head *work_list)
 	check_res_locked(res);
 
 	lock->l_granted_mode = lock->l_req_mode;
+
+	if (work_list && lock->l_completion_ast)
+		ldlm_add_ast_work_item(lock, NULL, work_list);
+
 	if (res->lr_type == LDLM_PLAIN || res->lr_type == LDLM_IBITS)
 		ldlm_grant_lock_with_skiplist(lock);
 	else if (res->lr_type == LDLM_EXTENT)
 		ldlm_extent_add_lock(res, lock);
-	else
+	else if (res->lr_type == LDLM_FLOCK) {
+		/*
+		 * We should not add locks to granted list in the following cases:
+		 * - this is an UNLOCK but not a real lock;
+		 * - this is a TEST lock;
+		 * - this is a F_CANCELLK lock (async flock has req_mode == 0)
+		 * - this is a deadlock (flock cannot be granted)
+		 */
+		if (!lock->l_req_mode || lock->l_req_mode == LCK_NL ||
+		    ldlm_is_test_lock(lock) || ldlm_is_flock_deadlock(lock))
+			return;
 		ldlm_resource_add_lock(res, &res->lr_granted, lock);
-
-	if (work_list && lock->l_completion_ast)
-		ldlm_add_ast_work_item(lock, NULL, work_list);
+	} else
+		LBUG();
 
 	ldlm_pool_add(&ldlm_res_to_ns(res)->ns_pool, lock);
 }
@@ -1546,6 +1559,8 @@ enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns,
 	 */
 	if (*flags & LDLM_FL_AST_DISCARD_DATA)
 		ldlm_set_ast_discard_data(lock);
+	if (*flags & LDLM_FL_TEST_LOCK)
+		ldlm_set_test_lock(lock);
 
 	/*
 	 * This distinction between local lock trees is very important; a client
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index af487f9..984a460 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -309,8 +309,6 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns,
 	else
 		LDLM_DEBUG(lock, "lock was granted or failed in race");
 
-	ldlm_lock_decref_internal(lock, mode);
-
 	/* XXX - HACK because we shouldn't call ldlm_lock_destroy()
 	 *       from llite/file.c/ll_file_flock().
 	 */
@@ -321,9 +319,14 @@ static void failed_lock_cleanup(struct ldlm_namespace *ns,
 	 */
 	if (lock->l_resource->lr_type == LDLM_FLOCK) {
 		lock_res_and_lock(lock);
-		ldlm_resource_unlink_lock(lock);
-		ldlm_lock_destroy_nolock(lock);
+		if (!ldlm_is_destroyed(lock)) {
+			ldlm_resource_unlink_lock(lock);
+			ldlm_lock_decref_internal_nolock(lock, mode);
+			ldlm_lock_destroy_nolock(lock);
+		}
 		unlock_res_and_lock(lock);
+	} else {
+		ldlm_lock_decref_internal(lock, mode);
 	}
 }
 
@@ -418,11 +421,6 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	*flags = ldlm_flags_from_wire(reply->lock_flags);
 	lock->l_flags |= ldlm_flags_from_wire(reply->lock_flags &
 					      LDLM_FL_INHERIT_MASK);
-	/* move NO_TIMEOUT flag to the lock to force ldlm_lock_match()
-	 * to wait with no timeout as well
-	 */
-	lock->l_flags |= ldlm_flags_from_wire(reply->lock_flags &
-					      LDLM_FL_NO_TIMEOUT);
 	unlock_res_and_lock(lock);
 
 	CDEBUG(D_INFO, "local: %p, remote cookie: %#llx, flags: 0x%llx\n",
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 51a28d9..5866b00 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -793,8 +793,14 @@ static void cleanup_resource(struct ldlm_resource *res, struct list_head *q,
 			 */
 			unlock_res(res);
 			LDLM_DEBUG(lock, "setting FL_LOCAL_ONLY");
+			if (lock->l_flags & LDLM_FL_FAIL_LOC) {
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				schedule_timeout(cfs_time_seconds(4));
+				set_current_state(TASK_RUNNING);
+			}
 			if (lock->l_completion_ast)
-				lock->l_completion_ast(lock, 0, NULL);
+				lock->l_completion_ast(lock, LDLM_FL_FAILED,
+						       NULL);
 			LDLM_LOCK_RELEASE(lock);
 			continue;
 		}
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 769b028..89e93dc 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2717,6 +2717,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	struct md_op_data *op_data;
 	struct lustre_handle lockh = {0};
 	ldlm_policy_data_t flock = { {0} };
+	int fl_type = file_lock->fl_type;
 	__u64 flags = 0;
 	int rc;
 	int rc2 = 0;
@@ -2747,7 +2748,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	if (file_lock->fl_lmops && file_lock->fl_lmops->lm_compare_owner)
 		flock.l_flock.owner = (unsigned long)file_lock->fl_pid;
 
-	switch (file_lock->fl_type) {
+	switch (fl_type) {
 	case F_RDLCK:
 		einfo.ei_mode = LCK_PR;
 		break;
@@ -2767,8 +2768,7 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 		einfo.ei_mode = LCK_PW;
 		break;
 	default:
-		CDEBUG(D_INFO, "Unknown fcntl lock type: %d\n",
-		       file_lock->fl_type);
+		CDEBUG(D_INFO, "Unknown fcntl lock type: %d\n", fl_type);
 		return -ENOTSUPP;
 	}
 
@@ -2790,16 +2790,18 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	case F_GETLK64:
 #endif
 		flags = LDLM_FL_TEST_LOCK;
-		/* Save the old mode so that if the mode in the lock changes we
-		 * can decrement the appropriate reader or writer refcount.
-		 */
-		file_lock->fl_type = einfo.ei_mode;
 		break;
 	default:
 		CERROR("unknown fcntl lock command: %d\n", cmd);
 		return -EINVAL;
 	}
 
+	/*
+	 * Save the old mode so that if the mode in the lock changes we
+	 * can decrement the appropriate reader or writer refcount.
+	 */
+	file_lock->fl_type = einfo.ei_mode;
+
 	op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0, 0,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
@@ -2812,6 +2814,10 @@ ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock)
 	rc = md_enqueue(sbi->ll_md_exp, &einfo, NULL,
 			op_data, &lockh, &flock, 0, NULL /* req */, flags);
 
+	/* Restore the file lock type if not TEST lock. */
+	if (!(flags & LDLM_FL_TEST_LOCK))
+		file_lock->fl_type = fl_type;
+
 	if ((rc == 0 || file_lock->fl_type == F_UNLCK) &&
 	    !(flags & LDLM_FL_TEST_LOCK))
 		rc2  = locks_lock_file_wait(file, file_lock);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 38/80] staging: lustre: move ioctls to lustre_ioctl.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Move ioctl definitions and related functions from lustre_dlm.h,
lustre_lib.h, obd.h, to lustre_ioctl.h. Replace the definitions of
retired ioctls with comment.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4961
Reviewed-on: http://review.whamcloud.com/10139
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_ioctl.h    |  412 ++++++++++++++++++++
 .../lustre/lustre/include/lustre/lustre_user.h     |   21 +-
 drivers/staging/lustre/lustre/include/lustre_dlm.h |   10 -
 drivers/staging/lustre/lustre/include/lustre_lib.h |  284 --------------
 drivers/staging/lustre/lustre/include/obd.h        |   10 -
 drivers/staging/lustre/lustre/llite/dir.c          |    9 +-
 drivers/staging/lustre/lustre/llite/file.c         |    1 +
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    1 +
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |    1 +
 drivers/staging/lustre/lustre/lov/lov_obd.c        |    1 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    1 +
 drivers/staging/lustre/lustre/obdclass/class_obd.c |    8 +-
 .../lustre/lustre/obdclass/linux/linux-module.c    |    1 +
 .../staging/lustre/lustre/obdclass/obd_config.c    |    1 +
 .../staging/lustre/lustre/obdecho/echo_client.c    |    1 +
 drivers/staging/lustre/lustre/osc/osc_request.c    |    1 +
 16 files changed, 430 insertions(+), 333 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h
new file mode 100644
index 0000000..f3d7c94
--- /dev/null
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h
@@ -0,0 +1,412 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2015, Intel Corporation.
+ */
+#ifndef LUSTRE_IOCTL_H_
+#define LUSTRE_IOCTL_H_
+
+#include <linux/types.h>
+#include "../../../include/linux/libcfs/libcfs.h"
+#include "lustre_idl.h"
+
+#ifdef __KERNEL__
+# include <linux/ioctl.h>
+# include <linux/string.h>
+# include "../obd_support.h"
+#else /* __KERNEL__ */
+# include <malloc.h>
+# include <string.h>
+#include <libcfs/util/ioctl.h>
+#endif /* !__KERNEL__ */
+
+#if !defined(__KERNEL__) && !defined(LUSTRE_UTILS)
+# error This file is for Lustre internal use only.
+#endif
+
+enum md_echo_cmd {
+	ECHO_MD_CREATE		= 1, /* Open/Create file on MDT */
+	ECHO_MD_MKDIR		= 2, /* Mkdir on MDT */
+	ECHO_MD_DESTROY		= 3, /* Unlink file on MDT */
+	ECHO_MD_RMDIR		= 4, /* Rmdir on MDT */
+	ECHO_MD_LOOKUP		= 5, /* Lookup on MDT */
+	ECHO_MD_GETATTR		= 6, /* Getattr on MDT */
+	ECHO_MD_SETATTR		= 7, /* Setattr on MDT */
+	ECHO_MD_ALLOC_FID	= 8, /* Get FIDs from MDT */
+};
+
+#define OBD_DEV_ID 1
+#define OBD_DEV_NAME "obd"
+#define OBD_DEV_PATH "/dev/" OBD_DEV_NAME
+#define OBD_DEV_MAJOR 10
+#define OBD_DEV_MINOR 241
+
+#define OBD_IOCTL_VERSION	0x00010004
+#define OBD_DEV_BY_DEVNAME	0xffffd0de
+#define OBD_MAX_IOCTL_BUFFER	CONFIG_LUSTRE_OBD_MAX_IOCTL_BUFFER
+
+struct obd_ioctl_data {
+	__u32		ioc_len;
+	__u32		ioc_version;
+
+	union {
+		__u64	ioc_cookie;
+		__u64	ioc_u64_1;
+	};
+	union {
+		__u32	ioc_conn1;
+		__u32	ioc_u32_1;
+	};
+	union {
+		__u32	ioc_conn2;
+		__u32	ioc_u32_2;
+	};
+
+	struct obdo	ioc_obdo1;
+	struct obdo	ioc_obdo2;
+
+	__u64		ioc_count;
+	__u64		ioc_offset;
+	__u32		ioc_dev;
+	__u32		ioc_command;
+
+	__u64		ioc_nid;
+	__u32		ioc_nal;
+	__u32		ioc_type;
+
+	/* buffers the kernel will treat as user pointers */
+	__u32		ioc_plen1;
+	char __user    *ioc_pbuf1;
+	__u32		ioc_plen2;
+	char __user    *ioc_pbuf2;
+
+	/* inline buffers for various arguments */
+	__u32		ioc_inllen1;
+	char	       *ioc_inlbuf1;
+	__u32		ioc_inllen2;
+	char	       *ioc_inlbuf2;
+	__u32		ioc_inllen3;
+	char	       *ioc_inlbuf3;
+	__u32		ioc_inllen4;
+	char	       *ioc_inlbuf4;
+
+	char		ioc_bulk[0];
+};
+
+struct obd_ioctl_hdr {
+	__u32		ioc_len;
+	__u32		ioc_version;
+};
+
+static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data)
+{
+	__u32 len = cfs_size_round(sizeof(*data));
+
+	len += cfs_size_round(data->ioc_inllen1);
+	len += cfs_size_round(data->ioc_inllen2);
+	len += cfs_size_round(data->ioc_inllen3);
+	len += cfs_size_round(data->ioc_inllen4);
+
+	return len;
+}
+
+static inline int obd_ioctl_is_invalid(struct obd_ioctl_data *data)
+{
+	if (data->ioc_len > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_len larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen1 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen1 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen2 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen2 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen3 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen3 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen4 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen4 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
+		CERROR("OBD ioctl: inlbuf1 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
+		CERROR("OBD ioctl: inlbuf2 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf3 && !data->ioc_inllen3) {
+		CERROR("OBD ioctl: inlbuf3 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf4 && !data->ioc_inllen4) {
+		CERROR("OBD ioctl: inlbuf4 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_pbuf1 && !data->ioc_plen1) {
+		CERROR("OBD ioctl: pbuf1 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_pbuf2 && !data->ioc_plen2) {
+		CERROR("OBD ioctl: pbuf2 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (!data->ioc_pbuf1 && data->ioc_plen1) {
+		CERROR("OBD ioctl: plen1 set but NULL pointer\n");
+		return 1;
+	}
+
+	if (!data->ioc_pbuf2 && data->ioc_plen2) {
+		CERROR("OBD ioctl: plen2 set but NULL pointer\n");
+		return 1;
+	}
+
+	if (obd_ioctl_packlen(data) > data->ioc_len) {
+		CERROR("OBD ioctl: packlen exceeds ioc_len (%d > %d)\n",
+		       obd_ioctl_packlen(data), data->ioc_len);
+		return 1;
+	}
+
+	return 0;
+}
+
+#ifdef __KERNEL__
+
+int obd_ioctl_getdata(char **buf, int *len, void __user *arg);
+int obd_ioctl_popdata(void __user *arg, void *data, int len);
+
+static inline void obd_ioctl_freedata(char *buf, size_t len)
+{
+	kvfree(buf);
+}
+
+#else /* __KERNEL__ */
+
+static inline int obd_ioctl_pack(struct obd_ioctl_data *data, char **pbuf,
+				 int max_len)
+{
+	char *ptr;
+	struct obd_ioctl_data *overlay;
+
+	data->ioc_len = obd_ioctl_packlen(data);
+	data->ioc_version = OBD_IOCTL_VERSION;
+
+	if (*pbuf && data->ioc_len > max_len) {
+		fprintf(stderr, "pbuf = %p, ioc_len = %u, max_len = %d\n",
+			*pbuf, data->ioc_len, max_len);
+		return -EINVAL;
+	}
+
+	if (!*pbuf)
+		*pbuf = malloc(data->ioc_len);
+
+	if (!*pbuf)
+		return -ENOMEM;
+
+	overlay = (struct obd_ioctl_data *)*pbuf;
+	memcpy(*pbuf, data, sizeof(*data));
+
+	ptr = overlay->ioc_bulk;
+	if (data->ioc_inlbuf1)
+		LOGL(data->ioc_inlbuf1, data->ioc_inllen1, ptr);
+
+	if (data->ioc_inlbuf2)
+		LOGL(data->ioc_inlbuf2, data->ioc_inllen2, ptr);
+
+	if (data->ioc_inlbuf3)
+		LOGL(data->ioc_inlbuf3, data->ioc_inllen3, ptr);
+
+	if (data->ioc_inlbuf4)
+		LOGL(data->ioc_inlbuf4, data->ioc_inllen4, ptr);
+
+	if (obd_ioctl_is_invalid(overlay)) {
+		fprintf(stderr, "invalid ioctl data: ioc_len = %u, max_len = %d\n",
+			data->ioc_len, max_len);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline int
+obd_ioctl_unpack(struct obd_ioctl_data *data, char *pbuf, int max_len)
+{
+	char *ptr;
+	struct obd_ioctl_data *overlay;
+
+	if (!pbuf)
+		return 1;
+
+	overlay = (struct obd_ioctl_data *)pbuf;
+
+	/* Preserve the caller's buffer pointers */
+	overlay->ioc_inlbuf1 = data->ioc_inlbuf1;
+	overlay->ioc_inlbuf2 = data->ioc_inlbuf2;
+	overlay->ioc_inlbuf3 = data->ioc_inlbuf3;
+	overlay->ioc_inlbuf4 = data->ioc_inlbuf4;
+
+	memcpy(data, pbuf, sizeof(*data));
+
+	ptr = overlay->ioc_bulk;
+	if (data->ioc_inlbuf1)
+		LOGU(data->ioc_inlbuf1, data->ioc_inllen1, ptr);
+
+	if (data->ioc_inlbuf2)
+		LOGU(data->ioc_inlbuf2, data->ioc_inllen2, ptr);
+
+	if (data->ioc_inlbuf3)
+		LOGU(data->ioc_inlbuf3, data->ioc_inllen3, ptr);
+
+	if (data->ioc_inlbuf4)
+		LOGU(data->ioc_inlbuf4, data->ioc_inllen4, ptr);
+
+	return 0;
+}
+
+#endif /* !__KERNEL__ */
+
+/*
+ * OBD_IOC_DATA_TYPE is only for compatibility reasons with older
+ * Linux Lustre user tools. New ioctls should NOT use this macro as
+ * the ioctl "size". Instead the ioctl should get a "size" argument
+ * which is the actual data type used by the ioctl, to ensure the
+ * ioctl interface is versioned correctly.
+ */
+#define OBD_IOC_DATA_TYPE	long
+
+/*	IOC_LDLM_TEST		_IOWR('f', 40, long) */
+/*	IOC_LDLM_DUMP		_IOWR('f', 41, long) */
+/*	IOC_LDLM_REGRESS_START	_IOWR('f', 42, long) */
+/*	IOC_LDLM_REGRESS_STOP	_IOWR('f', 43, long) */
+
+#define OBD_IOC_CREATE		_IOWR('f', 101, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_DESTROY		_IOW('f', 104, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_PREALLOCATE	_IOWR('f', 105, OBD_IOC_DATA_TYPE) */
+
+#define OBD_IOC_SETATTR		_IOW('f', 107, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETATTR		_IOWR('f', 108, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_READ		_IOWR('f', 109, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_WRITE		_IOWR('f', 110, OBD_IOC_DATA_TYPE)
+
+#define OBD_IOC_STATFS		_IOWR('f', 113, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SYNC		_IOW('f', 114, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_READ2		_IOWR('f', 115, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_FORMAT		_IOWR('f', 116, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_PARTITION	_IOWR('f', 117, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_COPY		_IOWR('f', 120, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_MIGR		_IOWR('f', 121, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_PUNCH		_IOWR('f', 122, OBD_IOC_DATA_TYPE) */
+
+/*	OBD_IOC_MODULE_DEBUG	_IOWR('f', 124, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_BRW_READ	_IOWR('f', 125, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_BRW_WRITE	_IOWR('f', 126, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_NAME2DEV	_IOWR('f', 127, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_UUID2DEV	_IOWR('f', 130, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETNAME		_IOWR('f', 131, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETMDNAME	_IOR('f', 131, char[MAX_OBD_NAME])
+#define OBD_IOC_GETDTNAME	OBD_IOC_GETNAME
+#define OBD_IOC_LOV_GET_CONFIG	_IOWR('f', 132, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_CLIENT_RECOVER	_IOW('f', 133, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_PING_TARGET	_IOW('f', 136, OBD_IOC_DATA_TYPE)
+
+/*	OBD_IOC_DEC_FS_USE_COUNT _IO('f', 139) */
+#define OBD_IOC_NO_TRANSNO	_IOW('f', 140, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SET_READONLY	_IOW('f', 141, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_ABORT_RECOVERY	_IOR('f', 142, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_ROOT_SQUASH	_IOWR('f', 143, OBD_IOC_DATA_TYPE) */
+#define OBD_GET_VERSION		_IOWR('f', 144, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_GSS_SUPPORT	_IOWR('f', 145, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_CLOSE_UUID	_IOWR('f', 147, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_CHANGELOG_SEND	_IOW('f', 148, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETDEVICE	_IOWR('f', 149, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_FID2PATH	_IOWR('f', 150, OBD_IOC_DATA_TYPE)
+/*	lustre/lustre_user.h	151-153 */
+/*	OBD_IOC_LOV_SETSTRIPE	154 LL_IOC_LOV_SETSTRIPE */
+/*	OBD_IOC_LOV_GETSTRIPE	155 LL_IOC_LOV_GETSTRIPE */
+/*	OBD_IOC_LOV_SETEA	156 LL_IOC_LOV_SETEA */
+/*	lustre/lustre_user.h	157-159 */
+#define	OBD_IOC_QUOTACHECK	_IOW('f', 160, int)
+#define	OBD_IOC_POLL_QUOTACHECK	_IOR('f', 161, struct if_quotacheck *)
+#define OBD_IOC_QUOTACTL	_IOWR('f', 162, struct if_quotactl)
+/*	lustre/lustre_user.h	163-176 */
+#define OBD_IOC_CHANGELOG_REG	_IOW('f', 177, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_DEREG	_IOW('f', 178, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_CLEAR	_IOW('f', 179, struct obd_ioctl_data)
+/*	OBD_IOC_RECORD		_IOWR('f', 180, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_ENDRECORD	_IOWR('f', 181, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_PARSE		_IOWR('f', 182, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_DORECORD	_IOWR('f', 183, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_PROCESS_CFG	_IOWR('f', 184, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_DUMP_LOG	_IOWR('f', 185, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_CLEAR_LOG	_IOWR('f', 186, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_PARAM		_IOW('f', 187, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_POOL		_IOWR('f', 188, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_REPLACE_NIDS	_IOWR('f', 189, OBD_IOC_DATA_TYPE)
+
+#define OBD_IOC_CATLOGLIST	_IOWR('f', 190, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_INFO	_IOWR('f', 191, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_PRINT	_IOWR('f', 192, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_CANCEL	_IOWR('f', 193, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_REMOVE	_IOWR('f', 194, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_CHECK	_IOWR('f', 195, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_LLOG_CATINFO	_IOWR('f', 196, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_NODEMAP		_IOWR('f', 197, OBD_IOC_DATA_TYPE)
+
+/*	ECHO_IOC_GET_STRIPE	_IOWR('f', 200, OBD_IOC_DATA_TYPE) */
+/*	ECHO_IOC_SET_STRIPE	_IOWR('f', 201, OBD_IOC_DATA_TYPE) */
+/*	ECHO_IOC_ENQUEUE	_IOWR('f', 202, OBD_IOC_DATA_TYPE) */
+/*	ECHO_IOC_CANCEL		_IOWR('f', 203, OBD_IOC_DATA_TYPE) */
+
+#define OBD_IOC_GET_OBJ_VERSION	_IOR('f', 210, OBD_IOC_DATA_TYPE)
+
+/*	lustre/lustre_user.h	212-217 */
+#define OBD_IOC_GET_MNTOPT	_IOW('f', 220, mntopt_t)
+#define OBD_IOC_ECHO_MD		_IOR('f', 221, struct obd_ioctl_data)
+#define OBD_IOC_ECHO_ALLOC_SEQ	_IOWR('f', 222, struct obd_ioctl_data)
+#define OBD_IOC_START_LFSCK	_IOWR('f', 230, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_STOP_LFSCK	_IOW('f', 231, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_QUERY_LFSCK	_IOR('f', 232, struct obd_ioctl_data)
+/*	lustre/lustre_user.h	240-249 */
+/*	LIBCFS_IOC_DEBUG_MASK	250 */
+
+#define IOC_OSC_SET_ACTIVE	_IOWR('h', 21, void *)
+
+#endif /* LUSTRE_IOCTL_H_ */
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 4746320..75a78a3 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -188,26 +188,20 @@ struct ost_id {
  * *STRIPE* - set/get lov_user_md
  * *INFO    - set/get lov_user_mds_data
  */
-/* see <lustre_lib.h> for ioctl numberss 101-150 */
+/*	lustre_ioctl.h			101-150 */
 #define LL_IOC_GETFLAGS		 _IOR('f', 151, long)
 #define LL_IOC_SETFLAGS		 _IOW('f', 152, long)
 #define LL_IOC_CLRFLAGS		 _IOW('f', 153, long)
-/* LL_IOC_LOV_SETSTRIPE: See also OBD_IOC_LOV_SETSTRIPE */
 #define LL_IOC_LOV_SETSTRIPE	    _IOW('f', 154, long)
-/* LL_IOC_LOV_GETSTRIPE: See also OBD_IOC_LOV_GETSTRIPE */
 #define LL_IOC_LOV_GETSTRIPE	    _IOW('f', 155, long)
-/* LL_IOC_LOV_SETEA: See also OBD_IOC_LOV_SETEA */
 #define LL_IOC_LOV_SETEA		_IOW('f', 156, long)
 #define LL_IOC_RECREATE_OBJ	     _IOW('f', 157, long)
 #define LL_IOC_RECREATE_FID	     _IOW('f', 157, struct lu_fid)
 #define LL_IOC_GROUP_LOCK	       _IOW('f', 158, long)
 #define LL_IOC_GROUP_UNLOCK	     _IOW('f', 159, long)
-/* LL_IOC_QUOTACHECK: See also OBD_IOC_QUOTACHECK */
-#define LL_IOC_QUOTACHECK	       _IOW('f', 160, int)
-/* LL_IOC_POLL_QUOTACHECK: See also OBD_IOC_POLL_QUOTACHECK */
-#define LL_IOC_POLL_QUOTACHECK	  _IOR('f', 161, struct if_quotacheck *)
-/* LL_IOC_QUOTACTL: See also OBD_IOC_QUOTACTL */
-#define LL_IOC_QUOTACTL		 _IOWR('f', 162, struct if_quotactl)
+/* #define LL_IOC_QUOTACHECK		160 OBD_IOC_QUOTACHECK */
+/* #define LL_IOC_POLL_QUOTACHECK	161 OBD_IOC_POLL_QUOTACHECK */
+/* #define LL_IOC_QUOTACTL		162 OBD_IOC_QUOTACTL */
 #define IOC_OBD_STATFS		  _IOWR('f', 164, struct obd_statfs *)
 #define IOC_LOV_GETINFO		 _IOWR('f', 165, struct lov_user_mds_data *)
 #define LL_IOC_FLUSHCTX		 _IOW('f', 166, long)
@@ -221,8 +215,7 @@ struct ost_id {
 #define LL_IOC_GET_CONNECT_FLAGS	_IOWR('f', 174, __u64 *)
 #define LL_IOC_GET_MDTIDX	       _IOR('f', 175, int)
 
-/* see <lustre_lib.h> for ioctl numbers 177-210 */
-
+/*	lustre_ioctl.h			177-210 */
 #define LL_IOC_HSM_STATE_GET		_IOR('f', 211, struct hsm_user_state)
 #define LL_IOC_HSM_STATE_SET		_IOW('f', 212, struct hsm_state_set)
 #define LL_IOC_HSM_CT_START		_IOW('f', 213, struct lustre_kernelcomm)
@@ -255,10 +248,6 @@ struct ost_id {
 #define IOC_MDC_GETFILEINFO     _IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data *)
 #define LL_IOC_MDC_GETINFO      _IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data *)
 
-/* Keep these for backward compartability. */
-#define LL_IOC_OBD_STATFS       IOC_OBD_STATFS
-#define IOC_MDC_GETSTRIPE       IOC_MDC_GETFILESTRIPE
-
 #define MAX_OBD_NAME 128 /* If this changes, a NEW ioctl must be added */
 
 /* Define O_LOV_DELAY_CREATE to be a mask that is not useful for regular
diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index f7805cc..1ec4231 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -1282,16 +1282,6 @@ int ldlm_cli_cancel_list(struct list_head *head, int count,
 int intent_disposition(struct ldlm_reply *rep, int flag);
 void intent_set_disposition(struct ldlm_reply *rep, int flag);
 
-/* ioctls for trying requests */
-#define IOC_LDLM_TYPE		   'f'
-#define IOC_LDLM_MIN_NR		 40
-
-#define IOC_LDLM_TEST		   _IOWR('f', 40, long)
-#define IOC_LDLM_DUMP		   _IOWR('f', 41, long)
-#define IOC_LDLM_REGRESS_START	  _IOWR('f', 42, long)
-#define IOC_LDLM_REGRESS_STOP	   _IOWR('f', 43, long)
-#define IOC_LDLM_MAX_NR		 43
-
 /**
  * "Modes" of acquiring lock_res, necessary to tell lockdep that taking more
  * than one lock_res is dead-lock safe.
diff --git a/drivers/staging/lustre/lustre/include/lustre_lib.h b/drivers/staging/lustre/lustre/include/lustre_lib.h
index def0193..adb8c47 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lib.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lib.h
@@ -75,7 +75,6 @@ int do_set_info_async(struct obd_import *imp,
 		      struct ptlrpc_request_set *set);
 
 #define OBD_RECOVERY_MAX_TIME (obd_timeout * 18) /* b13079 */
-#define OBD_MAX_IOCTL_BUFFER CONFIG_LUSTRE_OBD_MAX_IOCTL_BUFFER
 
 void target_send_reply(struct ptlrpc_request *req, int rc, int fail_id);
 
@@ -99,289 +98,6 @@ struct obd_client_handle {
 /* statfs_pack.c */
 void statfs_unpack(struct kstatfs *sfs, struct obd_statfs *osfs);
 
-/*
- * For md echo client
- */
-enum md_echo_cmd {
-	ECHO_MD_CREATE       = 1, /* Open/Create file on MDT */
-	ECHO_MD_MKDIR	= 2, /* Mkdir on MDT */
-	ECHO_MD_DESTROY      = 3, /* Unlink file on MDT */
-	ECHO_MD_RMDIR	= 4, /* Rmdir on MDT */
-	ECHO_MD_LOOKUP       = 5, /* Lookup on MDT */
-	ECHO_MD_GETATTR      = 6, /* Getattr on MDT */
-	ECHO_MD_SETATTR      = 7, /* Setattr on MDT */
-	ECHO_MD_ALLOC_FID    = 8, /* Get FIDs from MDT */
-};
-
-/*
- *   OBD IOCTLS
- */
-#define OBD_IOCTL_VERSION 0x00010004
-
-struct obd_ioctl_data {
-	__u32 ioc_len;
-	__u32 ioc_version;
-
-	union {
-		__u64 ioc_cookie;
-		__u64 ioc_u64_1;
-	};
-	union {
-		__u32 ioc_conn1;
-		__u32 ioc_u32_1;
-	};
-	union {
-		__u32 ioc_conn2;
-		__u32 ioc_u32_2;
-	};
-
-	struct obdo ioc_obdo1;
-	struct obdo ioc_obdo2;
-
-	u64	 ioc_count;
-	u64	 ioc_offset;
-	__u32    ioc_dev;
-	__u32    ioc_command;
-
-	__u64 ioc_nid;
-	__u32 ioc_nal;
-	__u32 ioc_type;
-
-	/* buffers the kernel will treat as user pointers */
-	__u32  ioc_plen1;
-	void __user *ioc_pbuf1;
-	__u32  ioc_plen2;
-	void __user *ioc_pbuf2;
-
-	/* inline buffers for various arguments */
-	__u32  ioc_inllen1;
-	char  *ioc_inlbuf1;
-	__u32  ioc_inllen2;
-	char  *ioc_inlbuf2;
-	__u32  ioc_inllen3;
-	char  *ioc_inlbuf3;
-	__u32  ioc_inllen4;
-	char  *ioc_inlbuf4;
-
-	char    ioc_bulk[0];
-};
-
-struct obd_ioctl_hdr {
-	__u32 ioc_len;
-	__u32 ioc_version;
-};
-
-static inline int obd_ioctl_packlen(struct obd_ioctl_data *data)
-{
-	int len = cfs_size_round(sizeof(struct obd_ioctl_data));
-
-	len += cfs_size_round(data->ioc_inllen1);
-	len += cfs_size_round(data->ioc_inllen2);
-	len += cfs_size_round(data->ioc_inllen3);
-	len += cfs_size_round(data->ioc_inllen4);
-	return len;
-}
-
-static inline int obd_ioctl_is_invalid(struct obd_ioctl_data *data)
-{
-	if (data->ioc_len > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_len larger than %d\n",
-		       OBD_MAX_IOCTL_BUFFER);
-		return 1;
-	}
-	if (data->ioc_inllen1 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen1 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inllen2 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen2 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inllen3 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen3 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inllen4 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen4 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
-		CERROR("OBD ioctl: inlbuf1 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
-		CERROR("OBD ioctl: inlbuf2 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf3 && !data->ioc_inllen3) {
-		CERROR("OBD ioctl: inlbuf3 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf4 && !data->ioc_inllen4) {
-		CERROR("OBD ioctl: inlbuf4 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_pbuf1 && !data->ioc_plen1) {
-		CERROR("OBD ioctl: pbuf1 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_pbuf2 && !data->ioc_plen2) {
-		CERROR("OBD ioctl: pbuf2 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_plen1 && !data->ioc_pbuf1) {
-		CERROR("OBD ioctl: plen1 set but NULL pointer\n");
-		return 1;
-	}
-	if (data->ioc_plen2 && !data->ioc_pbuf2) {
-		CERROR("OBD ioctl: plen2 set but NULL pointer\n");
-		return 1;
-	}
-	if (obd_ioctl_packlen(data) > data->ioc_len) {
-		CERROR("OBD ioctl: packlen exceeds ioc_len (%d > %d)\n",
-		       obd_ioctl_packlen(data), data->ioc_len);
-		return 1;
-	}
-	return 0;
-}
-
-#include "obd_support.h"
-
-/* function defined in lustre/obdclass/<platform>/<platform>-module.c */
-int obd_ioctl_getdata(char **buf, int *len, void __user *arg);
-int obd_ioctl_popdata(void __user *arg, void *data, int len);
-
-static inline void obd_ioctl_freedata(char *buf, int len)
-{
-	kvfree(buf);
-	return;
-}
-
-/*
- * BSD ioctl description:
- * #define IOC_V1       _IOR(g, n1, long)
- * #define IOC_V2       _IOW(g, n2, long)
- *
- * ioctl(f, IOC_V1, arg);
- * arg will be treated as a long value,
- *
- * ioctl(f, IOC_V2, arg)
- * arg will be treated as a pointer, bsd will call
- * copyin(buf, arg, sizeof(long))
- *
- * To make BSD ioctl handles argument correctly and simplely,
- * we change _IOR to _IOWR so BSD will copyin obd_ioctl_data
- * for us. Does this change affect Linux?  (XXX Liang)
- */
-#define OBD_IOC_DATA_TYPE long
-
-#define OBD_IOC_CREATE		 _IOWR('f', 101, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DESTROY		_IOW('f', 104, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PREALLOCATE	    _IOWR('f', 105, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_SETATTR		_IOW('f', 107, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_GETATTR		_IOWR ('f', 108, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_READ		   _IOWR('f', 109, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_WRITE		  _IOWR('f', 110, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_STATFS		 _IOWR('f', 113, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_SYNC		   _IOW('f', 114, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_READ2		  _IOWR('f', 115, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_FORMAT		 _IOWR('f', 116, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARTITION	      _IOWR('f', 117, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_COPY		   _IOWR('f', 120, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_MIGR		   _IOWR('f', 121, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PUNCH		  _IOWR('f', 122, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_MODULE_DEBUG	   _IOWR('f', 124, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_BRW_READ	       _IOWR('f', 125, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_BRW_WRITE	      _IOWR('f', 126, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_NAME2DEV	       _IOWR('f', 127, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_UUID2DEV	       _IOWR('f', 130, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_GETNAME		_IOWR('f', 131, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_GETMDNAME	      _IOR('f', 131, char[MAX_OBD_NAME])
-#define OBD_IOC_GETDTNAME	       OBD_IOC_GETNAME
-
-#define OBD_IOC_LOV_GET_CONFIG	 _IOWR('f', 132, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_CLIENT_RECOVER	 _IOW('f', 133, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PING_TARGET	    _IOW('f', 136, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_DEC_FS_USE_COUNT       _IO  ('f', 139)
-#define OBD_IOC_NO_TRANSNO	     _IOW('f', 140, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_SET_READONLY	   _IOW('f', 141, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_ABORT_RECOVERY	 _IOR('f', 142, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_ROOT_SQUASH	    _IOWR('f', 143, OBD_IOC_DATA_TYPE)
-
-#define OBD_GET_VERSION		_IOWR ('f', 144, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_GSS_SUPPORT	    _IOWR('f', 145, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_CLOSE_UUID	     _IOWR ('f', 147, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_CHANGELOG_SEND	 _IOW('f', 148, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_GETDEVICE	      _IOWR ('f', 149, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_FID2PATH	       _IOWR ('f', 150, OBD_IOC_DATA_TYPE)
-/* see also <lustre/lustre_user.h> for ioctls 151-153 */
-/* OBD_IOC_LOV_SETSTRIPE: See also LL_IOC_LOV_SETSTRIPE */
-#define OBD_IOC_LOV_SETSTRIPE	  _IOW('f', 154, OBD_IOC_DATA_TYPE)
-/* OBD_IOC_LOV_GETSTRIPE: See also LL_IOC_LOV_GETSTRIPE */
-#define OBD_IOC_LOV_GETSTRIPE	  _IOW('f', 155, OBD_IOC_DATA_TYPE)
-/* OBD_IOC_LOV_SETEA: See also LL_IOC_LOV_SETEA */
-#define OBD_IOC_LOV_SETEA	      _IOW('f', 156, OBD_IOC_DATA_TYPE)
-/* see <lustre/lustre_user.h> for ioctls 157-159 */
-/* OBD_IOC_QUOTACHECK: See also LL_IOC_QUOTACHECK */
-#define OBD_IOC_QUOTACHECK	     _IOW('f', 160, int)
-/* OBD_IOC_POLL_QUOTACHECK: See also LL_IOC_POLL_QUOTACHECK */
-#define OBD_IOC_POLL_QUOTACHECK	_IOR('f', 161, struct if_quotacheck *)
-/* OBD_IOC_QUOTACTL: See also LL_IOC_QUOTACTL */
-#define OBD_IOC_QUOTACTL	       _IOWR('f', 162, struct if_quotactl)
-/* see  also <lustre/lustre_user.h> for ioctls 163-176 */
-#define OBD_IOC_CHANGELOG_REG	  _IOW('f', 177, struct obd_ioctl_data)
-#define OBD_IOC_CHANGELOG_DEREG	_IOW('f', 178, struct obd_ioctl_data)
-#define OBD_IOC_CHANGELOG_CLEAR	_IOW('f', 179, struct obd_ioctl_data)
-#define OBD_IOC_RECORD		 _IOWR('f', 180, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_ENDRECORD	      _IOWR('f', 181, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARSE		  _IOWR('f', 182, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DORECORD	       _IOWR('f', 183, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PROCESS_CFG	    _IOWR('f', 184, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DUMP_LOG	       _IOWR('f', 185, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_CLEAR_LOG	      _IOWR('f', 186, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARAM		  _IOW('f', 187, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_POOL		   _IOWR('f', 188, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_REPLACE_NIDS	   _IOWR('f', 189, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_CATLOGLIST	     _IOWR('f', 190, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_INFO	      _IOWR('f', 191, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_PRINT	     _IOWR('f', 192, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_CANCEL	    _IOWR('f', 193, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_REMOVE	    _IOWR('f', 194, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_CHECK	     _IOWR('f', 195, OBD_IOC_DATA_TYPE)
-/* OBD_IOC_LLOG_CATINFO is deprecated */
-#define OBD_IOC_LLOG_CATINFO	   _IOWR('f', 196, OBD_IOC_DATA_TYPE)
-
-/*	#define ECHO_IOC_GET_STRIPE    _IOWR('f', 200, OBD_IOC_DATA_TYPE) */
-/*	#define ECHO_IOC_SET_STRIPE    _IOWR('f', 201, OBD_IOC_DATA_TYPE) */
-/*	#define ECHO_IOC_ENQUEUE       _IOWR('f', 202, OBD_IOC_DATA_TYPE) */
-/*	#define ECHO_IOC_CANCEL        _IOWR('f', 203, OBD_IOC_DATA_TYPE) */
-
-#define OBD_IOC_GET_OBJ_VERSION	_IOR('f', 210, OBD_IOC_DATA_TYPE)
-
-/* <lustre/lustre_user.h> defines ioctl number 218-219 */
-#define OBD_IOC_GET_MNTOPT	     _IOW('f', 220, mntopt_t)
-
-#define OBD_IOC_ECHO_MD		_IOR('f', 221, struct obd_ioctl_data)
-#define OBD_IOC_ECHO_ALLOC_SEQ	 _IOWR('f', 222, struct obd_ioctl_data)
-
-#define OBD_IOC_START_LFSCK	       _IOWR('f', 230, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_STOP_LFSCK	       _IOW('f', 231, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PAUSE_LFSCK	       _IOW('f', 232, OBD_IOC_DATA_TYPE)
-
-/* XXX _IOWR('f', 250, long) has been defined in
- * libcfs/include/libcfs/libcfs_private.h for debug, don't use it
- */
-
 /* Until such time as we get_info the per-stripe maximum from the OST,
  * we define this to be 2T - 4k, which is the ext3 maxbytes.
  */
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index cacd472..0dae273 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -35,15 +35,6 @@
 
 #include <linux/spinlock.h>
 
-#define IOC_OSC_TYPE	 'h'
-#define IOC_OSC_MIN_NR       20
-#define IOC_OSC_SET_ACTIVE   _IOWR(IOC_OSC_TYPE, 21, struct obd_device *)
-#define IOC_OSC_MAX_NR       50
-
-#define IOC_MDC_TYPE	 'i'
-#define IOC_MDC_MIN_NR       20
-#define IOC_MDC_MAX_NR       50
-
 #include "lustre/lustre_idl.h"
 #include "lustre_lib.h"
 #include "lu_ref.h"
@@ -623,7 +614,6 @@ struct obd_llog_group {
 
 /* corresponds to one of the obd's */
 #define OBD_DEVICE_MAGIC	0XAB5CD6EF
-#define OBD_DEV_BY_DEVNAME      0xffffd0de
 
 struct lvfs_run_ctxt {
 	struct dt_device *dt;
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 84bec03..257c9a4 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -46,8 +46,8 @@
 
 #include "../include/obd_support.h"
 #include "../include/obd_class.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_lib.h"
-#include "../include/lustre/lustre_idl.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_dlm.h"
 #include "../include/lustre_fid.h"
@@ -1543,7 +1543,7 @@ finish_req:
 
 	case LL_IOC_LOV_SWAP_LAYOUTS:
 		return -EPERM;
-	case LL_IOC_OBD_STATFS:
+	case IOC_OBD_STATFS:
 		return ll_obd_statfs(inode, (void __user *)arg);
 	case LL_IOC_LOV_GETSTRIPE:
 	case LL_IOC_MDC_GETINFO:
@@ -1708,9 +1708,6 @@ free_lmm:
 		kvfree(lmm);
 		return rc;
 	}
-	case OBD_IOC_LLOG_CATINFO: {
-		return -EOPNOTSUPP;
-	}
 	case OBD_IOC_QUOTACHECK: {
 		struct obd_quotactl *oqctl;
 		int error = 0;
@@ -1768,7 +1765,7 @@ out_poll:
 		kfree(check);
 		return rc;
 	}
-	case LL_IOC_QUOTACTL: {
+	case OBD_IOC_QUOTACTL: {
 		struct if_quotactl *qctl;
 
 		qctl = kzalloc(sizeof(*qctl), GFP_NOFS);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 89e93dc..519db53 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -44,6 +44,7 @@
 #include <linux/mount.h>
 #include "llite_internal.h"
 #include "../include/lustre/ll_fiemap.h"
+#include "../include/lustre/lustre_ioctl.h"
 
 #include "../include/cl_object.h"
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index e320400..111264e 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -41,6 +41,7 @@
 #include <linux/types.h>
 #include <linux/mm.h>
 
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_ha.h"
 #include "../include/lustre_dlm.h"
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index d07fd17..e516a84 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -51,6 +51,7 @@
 #include "../include/cl_object.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_fid.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_kernelcomm.h"
 #include "lmv_internal.h"
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_obd.c b/drivers/staging/lustre/lustre/lov/lov_obd.c
index 9b92d55..d904f44 100644
--- a/drivers/staging/lustre/lustre/lov/lov_obd.c
+++ b/drivers/staging/lustre/lustre/lov/lov_obd.c
@@ -41,6 +41,7 @@
 #include "../../include/linux/libcfs/libcfs.h"
 
 #include "../include/obd_support.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_lib.h"
 #include "../include/lustre_net.h"
 #include "../include/lustre/lustre_idl.h"
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 558f33b..394ef3c 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -39,6 +39,7 @@
 # include <linux/utsname.h>
 
 #include "../include/lustre_acl.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/obd_class.h"
 #include "../include/lustre_lmv.h"
 #include "../include/lustre_fid.h"
diff --git a/drivers/staging/lustre/lustre/obdclass/class_obd.c b/drivers/staging/lustre/lustre/obdclass/class_obd.c
index d9d2a19..6edf53e 100644
--- a/drivers/staging/lustre/lustre/obdclass/class_obd.c
+++ b/drivers/staging/lustre/lustre/obdclass/class_obd.c
@@ -40,6 +40,7 @@
 #include "../include/lprocfs_status.h"
 #include <linux/list.h>
 #include "../include/cl_object.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "llog_internal.h"
 
 struct obd_device *obd_devs[MAX_OBD_DEVICES];
@@ -287,13 +288,6 @@ int class_handle_ioctl(unsigned int cmd, unsigned long arg)
 		goto out;
 	}
 
-	case OBD_IOC_CLOSE_UUID: {
-		CDEBUG(D_IOCTL, "closing all connections to uuid %s (NOOP)\n",
-		       data->ioc_inlbuf1);
-		err = 0;
-		goto out;
-	}
-
 	case OBD_IOC_GETDEVICE: {
 		int     index = data->ioc_count;
 		char    *status, *str;
diff --git a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c b/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
index 33342bf..27a72d8 100644
--- a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
+++ b/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
@@ -65,6 +65,7 @@
 #include "../../include/obd_support.h"
 #include "../../include/obd_class.h"
 #include "../../include/lprocfs_status.h"
+#include "../../include/lustre/lustre_ioctl.h"
 #include "../../include/lustre_ver.h"
 
 /* buffer MUST be at least the size of obd_ioctl_hdr */
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_config.c b/drivers/staging/lustre/lustre/obdclass/obd_config.c
index 0eab123..6d0890f 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_config.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_config.c
@@ -37,6 +37,7 @@
 #define DEBUG_SUBSYSTEM S_CLASS
 #include "../include/obd_class.h"
 #include <linux/string.h>
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_log.h"
 #include "../include/lprocfs_status.h"
 #include "../include/lustre_param.h"
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index 5b29c4a..2cb487b 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -41,6 +41,7 @@
 #include "../include/cl_object.h"
 #include "../include/lustre_fid.h"
 #include "../include/lustre_acl.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_net.h"
 
 #include "echo_internal.h"
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index a2d948f..d231827 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -41,6 +41,7 @@
 
 #include "../include/lustre_ha.h"
 #include "../include/lprocfs_status.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_debug.h"
 #include "../include/lustre_param.h"
 #include "../include/lustre_fid.h"
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 38/80] staging: lustre: move ioctls to lustre_ioctl.h
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Move ioctl definitions and related functions from lustre_dlm.h,
lustre_lib.h, obd.h, to lustre_ioctl.h. Replace the definitions of
retired ioctls with comment.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4961
Reviewed-on: http://review.whamcloud.com/10139
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_ioctl.h    |  412 ++++++++++++++++++++
 .../lustre/lustre/include/lustre/lustre_user.h     |   21 +-
 drivers/staging/lustre/lustre/include/lustre_dlm.h |   10 -
 drivers/staging/lustre/lustre/include/lustre_lib.h |  284 --------------
 drivers/staging/lustre/lustre/include/obd.h        |   10 -
 drivers/staging/lustre/lustre/llite/dir.c          |    9 +-
 drivers/staging/lustre/lustre/llite/file.c         |    1 +
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    1 +
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |    1 +
 drivers/staging/lustre/lustre/lov/lov_obd.c        |    1 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    1 +
 drivers/staging/lustre/lustre/obdclass/class_obd.c |    8 +-
 .../lustre/lustre/obdclass/linux/linux-module.c    |    1 +
 .../staging/lustre/lustre/obdclass/obd_config.c    |    1 +
 .../staging/lustre/lustre/obdecho/echo_client.c    |    1 +
 drivers/staging/lustre/lustre/osc/osc_request.c    |    1 +
 16 files changed, 430 insertions(+), 333 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h
new file mode 100644
index 0000000..f3d7c94
--- /dev/null
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_ioctl.h
@@ -0,0 +1,412 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2015, Intel Corporation.
+ */
+#ifndef LUSTRE_IOCTL_H_
+#define LUSTRE_IOCTL_H_
+
+#include <linux/types.h>
+#include "../../../include/linux/libcfs/libcfs.h"
+#include "lustre_idl.h"
+
+#ifdef __KERNEL__
+# include <linux/ioctl.h>
+# include <linux/string.h>
+# include "../obd_support.h"
+#else /* __KERNEL__ */
+# include <malloc.h>
+# include <string.h>
+#include <libcfs/util/ioctl.h>
+#endif /* !__KERNEL__ */
+
+#if !defined(__KERNEL__) && !defined(LUSTRE_UTILS)
+# error This file is for Lustre internal use only.
+#endif
+
+enum md_echo_cmd {
+	ECHO_MD_CREATE		= 1, /* Open/Create file on MDT */
+	ECHO_MD_MKDIR		= 2, /* Mkdir on MDT */
+	ECHO_MD_DESTROY		= 3, /* Unlink file on MDT */
+	ECHO_MD_RMDIR		= 4, /* Rmdir on MDT */
+	ECHO_MD_LOOKUP		= 5, /* Lookup on MDT */
+	ECHO_MD_GETATTR		= 6, /* Getattr on MDT */
+	ECHO_MD_SETATTR		= 7, /* Setattr on MDT */
+	ECHO_MD_ALLOC_FID	= 8, /* Get FIDs from MDT */
+};
+
+#define OBD_DEV_ID 1
+#define OBD_DEV_NAME "obd"
+#define OBD_DEV_PATH "/dev/" OBD_DEV_NAME
+#define OBD_DEV_MAJOR 10
+#define OBD_DEV_MINOR 241
+
+#define OBD_IOCTL_VERSION	0x00010004
+#define OBD_DEV_BY_DEVNAME	0xffffd0de
+#define OBD_MAX_IOCTL_BUFFER	CONFIG_LUSTRE_OBD_MAX_IOCTL_BUFFER
+
+struct obd_ioctl_data {
+	__u32		ioc_len;
+	__u32		ioc_version;
+
+	union {
+		__u64	ioc_cookie;
+		__u64	ioc_u64_1;
+	};
+	union {
+		__u32	ioc_conn1;
+		__u32	ioc_u32_1;
+	};
+	union {
+		__u32	ioc_conn2;
+		__u32	ioc_u32_2;
+	};
+
+	struct obdo	ioc_obdo1;
+	struct obdo	ioc_obdo2;
+
+	__u64		ioc_count;
+	__u64		ioc_offset;
+	__u32		ioc_dev;
+	__u32		ioc_command;
+
+	__u64		ioc_nid;
+	__u32		ioc_nal;
+	__u32		ioc_type;
+
+	/* buffers the kernel will treat as user pointers */
+	__u32		ioc_plen1;
+	char __user    *ioc_pbuf1;
+	__u32		ioc_plen2;
+	char __user    *ioc_pbuf2;
+
+	/* inline buffers for various arguments */
+	__u32		ioc_inllen1;
+	char	       *ioc_inlbuf1;
+	__u32		ioc_inllen2;
+	char	       *ioc_inlbuf2;
+	__u32		ioc_inllen3;
+	char	       *ioc_inlbuf3;
+	__u32		ioc_inllen4;
+	char	       *ioc_inlbuf4;
+
+	char		ioc_bulk[0];
+};
+
+struct obd_ioctl_hdr {
+	__u32		ioc_len;
+	__u32		ioc_version;
+};
+
+static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data)
+{
+	__u32 len = cfs_size_round(sizeof(*data));
+
+	len += cfs_size_round(data->ioc_inllen1);
+	len += cfs_size_round(data->ioc_inllen2);
+	len += cfs_size_round(data->ioc_inllen3);
+	len += cfs_size_round(data->ioc_inllen4);
+
+	return len;
+}
+
+static inline int obd_ioctl_is_invalid(struct obd_ioctl_data *data)
+{
+	if (data->ioc_len > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_len larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen1 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen1 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen2 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen2 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen3 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen3 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inllen4 > (1 << 30)) {
+		CERROR("OBD ioctl: ioc_inllen4 larger than 1<<30\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
+		CERROR("OBD ioctl: inlbuf1 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
+		CERROR("OBD ioctl: inlbuf2 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf3 && !data->ioc_inllen3) {
+		CERROR("OBD ioctl: inlbuf3 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_inlbuf4 && !data->ioc_inllen4) {
+		CERROR("OBD ioctl: inlbuf4 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_pbuf1 && !data->ioc_plen1) {
+		CERROR("OBD ioctl: pbuf1 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (data->ioc_pbuf2 && !data->ioc_plen2) {
+		CERROR("OBD ioctl: pbuf2 pointer but 0 length\n");
+		return 1;
+	}
+
+	if (!data->ioc_pbuf1 && data->ioc_plen1) {
+		CERROR("OBD ioctl: plen1 set but NULL pointer\n");
+		return 1;
+	}
+
+	if (!data->ioc_pbuf2 && data->ioc_plen2) {
+		CERROR("OBD ioctl: plen2 set but NULL pointer\n");
+		return 1;
+	}
+
+	if (obd_ioctl_packlen(data) > data->ioc_len) {
+		CERROR("OBD ioctl: packlen exceeds ioc_len (%d > %d)\n",
+		       obd_ioctl_packlen(data), data->ioc_len);
+		return 1;
+	}
+
+	return 0;
+}
+
+#ifdef __KERNEL__
+
+int obd_ioctl_getdata(char **buf, int *len, void __user *arg);
+int obd_ioctl_popdata(void __user *arg, void *data, int len);
+
+static inline void obd_ioctl_freedata(char *buf, size_t len)
+{
+	kvfree(buf);
+}
+
+#else /* __KERNEL__ */
+
+static inline int obd_ioctl_pack(struct obd_ioctl_data *data, char **pbuf,
+				 int max_len)
+{
+	char *ptr;
+	struct obd_ioctl_data *overlay;
+
+	data->ioc_len = obd_ioctl_packlen(data);
+	data->ioc_version = OBD_IOCTL_VERSION;
+
+	if (*pbuf && data->ioc_len > max_len) {
+		fprintf(stderr, "pbuf = %p, ioc_len = %u, max_len = %d\n",
+			*pbuf, data->ioc_len, max_len);
+		return -EINVAL;
+	}
+
+	if (!*pbuf)
+		*pbuf = malloc(data->ioc_len);
+
+	if (!*pbuf)
+		return -ENOMEM;
+
+	overlay = (struct obd_ioctl_data *)*pbuf;
+	memcpy(*pbuf, data, sizeof(*data));
+
+	ptr = overlay->ioc_bulk;
+	if (data->ioc_inlbuf1)
+		LOGL(data->ioc_inlbuf1, data->ioc_inllen1, ptr);
+
+	if (data->ioc_inlbuf2)
+		LOGL(data->ioc_inlbuf2, data->ioc_inllen2, ptr);
+
+	if (data->ioc_inlbuf3)
+		LOGL(data->ioc_inlbuf3, data->ioc_inllen3, ptr);
+
+	if (data->ioc_inlbuf4)
+		LOGL(data->ioc_inlbuf4, data->ioc_inllen4, ptr);
+
+	if (obd_ioctl_is_invalid(overlay)) {
+		fprintf(stderr, "invalid ioctl data: ioc_len = %u, max_len = %d\n",
+			data->ioc_len, max_len);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline int
+obd_ioctl_unpack(struct obd_ioctl_data *data, char *pbuf, int max_len)
+{
+	char *ptr;
+	struct obd_ioctl_data *overlay;
+
+	if (!pbuf)
+		return 1;
+
+	overlay = (struct obd_ioctl_data *)pbuf;
+
+	/* Preserve the caller's buffer pointers */
+	overlay->ioc_inlbuf1 = data->ioc_inlbuf1;
+	overlay->ioc_inlbuf2 = data->ioc_inlbuf2;
+	overlay->ioc_inlbuf3 = data->ioc_inlbuf3;
+	overlay->ioc_inlbuf4 = data->ioc_inlbuf4;
+
+	memcpy(data, pbuf, sizeof(*data));
+
+	ptr = overlay->ioc_bulk;
+	if (data->ioc_inlbuf1)
+		LOGU(data->ioc_inlbuf1, data->ioc_inllen1, ptr);
+
+	if (data->ioc_inlbuf2)
+		LOGU(data->ioc_inlbuf2, data->ioc_inllen2, ptr);
+
+	if (data->ioc_inlbuf3)
+		LOGU(data->ioc_inlbuf3, data->ioc_inllen3, ptr);
+
+	if (data->ioc_inlbuf4)
+		LOGU(data->ioc_inlbuf4, data->ioc_inllen4, ptr);
+
+	return 0;
+}
+
+#endif /* !__KERNEL__ */
+
+/*
+ * OBD_IOC_DATA_TYPE is only for compatibility reasons with older
+ * Linux Lustre user tools. New ioctls should NOT use this macro as
+ * the ioctl "size". Instead the ioctl should get a "size" argument
+ * which is the actual data type used by the ioctl, to ensure the
+ * ioctl interface is versioned correctly.
+ */
+#define OBD_IOC_DATA_TYPE	long
+
+/*	IOC_LDLM_TEST		_IOWR('f', 40, long) */
+/*	IOC_LDLM_DUMP		_IOWR('f', 41, long) */
+/*	IOC_LDLM_REGRESS_START	_IOWR('f', 42, long) */
+/*	IOC_LDLM_REGRESS_STOP	_IOWR('f', 43, long) */
+
+#define OBD_IOC_CREATE		_IOWR('f', 101, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_DESTROY		_IOW('f', 104, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_PREALLOCATE	_IOWR('f', 105, OBD_IOC_DATA_TYPE) */
+
+#define OBD_IOC_SETATTR		_IOW('f', 107, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETATTR		_IOWR('f', 108, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_READ		_IOWR('f', 109, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_WRITE		_IOWR('f', 110, OBD_IOC_DATA_TYPE)
+
+#define OBD_IOC_STATFS		_IOWR('f', 113, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SYNC		_IOW('f', 114, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_READ2		_IOWR('f', 115, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_FORMAT		_IOWR('f', 116, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_PARTITION	_IOWR('f', 117, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_COPY		_IOWR('f', 120, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_MIGR		_IOWR('f', 121, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_PUNCH		_IOWR('f', 122, OBD_IOC_DATA_TYPE) */
+
+/*	OBD_IOC_MODULE_DEBUG	_IOWR('f', 124, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_BRW_READ	_IOWR('f', 125, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_BRW_WRITE	_IOWR('f', 126, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_NAME2DEV	_IOWR('f', 127, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_UUID2DEV	_IOWR('f', 130, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETNAME		_IOWR('f', 131, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETMDNAME	_IOR('f', 131, char[MAX_OBD_NAME])
+#define OBD_IOC_GETDTNAME	OBD_IOC_GETNAME
+#define OBD_IOC_LOV_GET_CONFIG	_IOWR('f', 132, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_CLIENT_RECOVER	_IOW('f', 133, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_PING_TARGET	_IOW('f', 136, OBD_IOC_DATA_TYPE)
+
+/*	OBD_IOC_DEC_FS_USE_COUNT _IO('f', 139) */
+#define OBD_IOC_NO_TRANSNO	_IOW('f', 140, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SET_READONLY	_IOW('f', 141, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_ABORT_RECOVERY	_IOR('f', 142, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_ROOT_SQUASH	_IOWR('f', 143, OBD_IOC_DATA_TYPE) */
+#define OBD_GET_VERSION		_IOWR('f', 144, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_GSS_SUPPORT	_IOWR('f', 145, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_CLOSE_UUID	_IOWR('f', 147, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_CHANGELOG_SEND	_IOW('f', 148, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_GETDEVICE	_IOWR('f', 149, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_FID2PATH	_IOWR('f', 150, OBD_IOC_DATA_TYPE)
+/*	lustre/lustre_user.h	151-153 */
+/*	OBD_IOC_LOV_SETSTRIPE	154 LL_IOC_LOV_SETSTRIPE */
+/*	OBD_IOC_LOV_GETSTRIPE	155 LL_IOC_LOV_GETSTRIPE */
+/*	OBD_IOC_LOV_SETEA	156 LL_IOC_LOV_SETEA */
+/*	lustre/lustre_user.h	157-159 */
+#define	OBD_IOC_QUOTACHECK	_IOW('f', 160, int)
+#define	OBD_IOC_POLL_QUOTACHECK	_IOR('f', 161, struct if_quotacheck *)
+#define OBD_IOC_QUOTACTL	_IOWR('f', 162, struct if_quotactl)
+/*	lustre/lustre_user.h	163-176 */
+#define OBD_IOC_CHANGELOG_REG	_IOW('f', 177, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_DEREG	_IOW('f', 178, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_CLEAR	_IOW('f', 179, struct obd_ioctl_data)
+/*	OBD_IOC_RECORD		_IOWR('f', 180, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_ENDRECORD	_IOWR('f', 181, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_PARSE		_IOWR('f', 182, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_DORECORD	_IOWR('f', 183, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_PROCESS_CFG	_IOWR('f', 184, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_DUMP_LOG	_IOWR('f', 185, OBD_IOC_DATA_TYPE) */
+/*	OBD_IOC_CLEAR_LOG	_IOWR('f', 186, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_PARAM		_IOW('f', 187, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_POOL		_IOWR('f', 188, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_REPLACE_NIDS	_IOWR('f', 189, OBD_IOC_DATA_TYPE)
+
+#define OBD_IOC_CATLOGLIST	_IOWR('f', 190, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_INFO	_IOWR('f', 191, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_PRINT	_IOWR('f', 192, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_CANCEL	_IOWR('f', 193, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_REMOVE	_IOWR('f', 194, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LLOG_CHECK	_IOWR('f', 195, OBD_IOC_DATA_TYPE)
+/*	OBD_IOC_LLOG_CATINFO	_IOWR('f', 196, OBD_IOC_DATA_TYPE) */
+#define OBD_IOC_NODEMAP		_IOWR('f', 197, OBD_IOC_DATA_TYPE)
+
+/*	ECHO_IOC_GET_STRIPE	_IOWR('f', 200, OBD_IOC_DATA_TYPE) */
+/*	ECHO_IOC_SET_STRIPE	_IOWR('f', 201, OBD_IOC_DATA_TYPE) */
+/*	ECHO_IOC_ENQUEUE	_IOWR('f', 202, OBD_IOC_DATA_TYPE) */
+/*	ECHO_IOC_CANCEL		_IOWR('f', 203, OBD_IOC_DATA_TYPE) */
+
+#define OBD_IOC_GET_OBJ_VERSION	_IOR('f', 210, OBD_IOC_DATA_TYPE)
+
+/*	lustre/lustre_user.h	212-217 */
+#define OBD_IOC_GET_MNTOPT	_IOW('f', 220, mntopt_t)
+#define OBD_IOC_ECHO_MD		_IOR('f', 221, struct obd_ioctl_data)
+#define OBD_IOC_ECHO_ALLOC_SEQ	_IOWR('f', 222, struct obd_ioctl_data)
+#define OBD_IOC_START_LFSCK	_IOWR('f', 230, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_STOP_LFSCK	_IOW('f', 231, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_QUERY_LFSCK	_IOR('f', 232, struct obd_ioctl_data)
+/*	lustre/lustre_user.h	240-249 */
+/*	LIBCFS_IOC_DEBUG_MASK	250 */
+
+#define IOC_OSC_SET_ACTIVE	_IOWR('h', 21, void *)
+
+#endif /* LUSTRE_IOCTL_H_ */
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 4746320..75a78a3 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -188,26 +188,20 @@ struct ost_id {
  * *STRIPE* - set/get lov_user_md
  * *INFO    - set/get lov_user_mds_data
  */
-/* see <lustre_lib.h> for ioctl numberss 101-150 */
+/*	lustre_ioctl.h			101-150 */
 #define LL_IOC_GETFLAGS		 _IOR('f', 151, long)
 #define LL_IOC_SETFLAGS		 _IOW('f', 152, long)
 #define LL_IOC_CLRFLAGS		 _IOW('f', 153, long)
-/* LL_IOC_LOV_SETSTRIPE: See also OBD_IOC_LOV_SETSTRIPE */
 #define LL_IOC_LOV_SETSTRIPE	    _IOW('f', 154, long)
-/* LL_IOC_LOV_GETSTRIPE: See also OBD_IOC_LOV_GETSTRIPE */
 #define LL_IOC_LOV_GETSTRIPE	    _IOW('f', 155, long)
-/* LL_IOC_LOV_SETEA: See also OBD_IOC_LOV_SETEA */
 #define LL_IOC_LOV_SETEA		_IOW('f', 156, long)
 #define LL_IOC_RECREATE_OBJ	     _IOW('f', 157, long)
 #define LL_IOC_RECREATE_FID	     _IOW('f', 157, struct lu_fid)
 #define LL_IOC_GROUP_LOCK	       _IOW('f', 158, long)
 #define LL_IOC_GROUP_UNLOCK	     _IOW('f', 159, long)
-/* LL_IOC_QUOTACHECK: See also OBD_IOC_QUOTACHECK */
-#define LL_IOC_QUOTACHECK	       _IOW('f', 160, int)
-/* LL_IOC_POLL_QUOTACHECK: See also OBD_IOC_POLL_QUOTACHECK */
-#define LL_IOC_POLL_QUOTACHECK	  _IOR('f', 161, struct if_quotacheck *)
-/* LL_IOC_QUOTACTL: See also OBD_IOC_QUOTACTL */
-#define LL_IOC_QUOTACTL		 _IOWR('f', 162, struct if_quotactl)
+/* #define LL_IOC_QUOTACHECK		160 OBD_IOC_QUOTACHECK */
+/* #define LL_IOC_POLL_QUOTACHECK	161 OBD_IOC_POLL_QUOTACHECK */
+/* #define LL_IOC_QUOTACTL		162 OBD_IOC_QUOTACTL */
 #define IOC_OBD_STATFS		  _IOWR('f', 164, struct obd_statfs *)
 #define IOC_LOV_GETINFO		 _IOWR('f', 165, struct lov_user_mds_data *)
 #define LL_IOC_FLUSHCTX		 _IOW('f', 166, long)
@@ -221,8 +215,7 @@ struct ost_id {
 #define LL_IOC_GET_CONNECT_FLAGS	_IOWR('f', 174, __u64 *)
 #define LL_IOC_GET_MDTIDX	       _IOR('f', 175, int)
 
-/* see <lustre_lib.h> for ioctl numbers 177-210 */
-
+/*	lustre_ioctl.h			177-210 */
 #define LL_IOC_HSM_STATE_GET		_IOR('f', 211, struct hsm_user_state)
 #define LL_IOC_HSM_STATE_SET		_IOW('f', 212, struct hsm_state_set)
 #define LL_IOC_HSM_CT_START		_IOW('f', 213, struct lustre_kernelcomm)
@@ -255,10 +248,6 @@ struct ost_id {
 #define IOC_MDC_GETFILEINFO     _IOWR(IOC_MDC_TYPE, 22, struct lov_user_mds_data *)
 #define LL_IOC_MDC_GETINFO      _IOWR(IOC_MDC_TYPE, 23, struct lov_user_mds_data *)
 
-/* Keep these for backward compartability. */
-#define LL_IOC_OBD_STATFS       IOC_OBD_STATFS
-#define IOC_MDC_GETSTRIPE       IOC_MDC_GETFILESTRIPE
-
 #define MAX_OBD_NAME 128 /* If this changes, a NEW ioctl must be added */
 
 /* Define O_LOV_DELAY_CREATE to be a mask that is not useful for regular
diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index f7805cc..1ec4231 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -1282,16 +1282,6 @@ int ldlm_cli_cancel_list(struct list_head *head, int count,
 int intent_disposition(struct ldlm_reply *rep, int flag);
 void intent_set_disposition(struct ldlm_reply *rep, int flag);
 
-/* ioctls for trying requests */
-#define IOC_LDLM_TYPE		   'f'
-#define IOC_LDLM_MIN_NR		 40
-
-#define IOC_LDLM_TEST		   _IOWR('f', 40, long)
-#define IOC_LDLM_DUMP		   _IOWR('f', 41, long)
-#define IOC_LDLM_REGRESS_START	  _IOWR('f', 42, long)
-#define IOC_LDLM_REGRESS_STOP	   _IOWR('f', 43, long)
-#define IOC_LDLM_MAX_NR		 43
-
 /**
  * "Modes" of acquiring lock_res, necessary to tell lockdep that taking more
  * than one lock_res is dead-lock safe.
diff --git a/drivers/staging/lustre/lustre/include/lustre_lib.h b/drivers/staging/lustre/lustre/include/lustre_lib.h
index def0193..adb8c47 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lib.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lib.h
@@ -75,7 +75,6 @@ int do_set_info_async(struct obd_import *imp,
 		      struct ptlrpc_request_set *set);
 
 #define OBD_RECOVERY_MAX_TIME (obd_timeout * 18) /* b13079 */
-#define OBD_MAX_IOCTL_BUFFER CONFIG_LUSTRE_OBD_MAX_IOCTL_BUFFER
 
 void target_send_reply(struct ptlrpc_request *req, int rc, int fail_id);
 
@@ -99,289 +98,6 @@ struct obd_client_handle {
 /* statfs_pack.c */
 void statfs_unpack(struct kstatfs *sfs, struct obd_statfs *osfs);
 
-/*
- * For md echo client
- */
-enum md_echo_cmd {
-	ECHO_MD_CREATE       = 1, /* Open/Create file on MDT */
-	ECHO_MD_MKDIR	= 2, /* Mkdir on MDT */
-	ECHO_MD_DESTROY      = 3, /* Unlink file on MDT */
-	ECHO_MD_RMDIR	= 4, /* Rmdir on MDT */
-	ECHO_MD_LOOKUP       = 5, /* Lookup on MDT */
-	ECHO_MD_GETATTR      = 6, /* Getattr on MDT */
-	ECHO_MD_SETATTR      = 7, /* Setattr on MDT */
-	ECHO_MD_ALLOC_FID    = 8, /* Get FIDs from MDT */
-};
-
-/*
- *   OBD IOCTLS
- */
-#define OBD_IOCTL_VERSION 0x00010004
-
-struct obd_ioctl_data {
-	__u32 ioc_len;
-	__u32 ioc_version;
-
-	union {
-		__u64 ioc_cookie;
-		__u64 ioc_u64_1;
-	};
-	union {
-		__u32 ioc_conn1;
-		__u32 ioc_u32_1;
-	};
-	union {
-		__u32 ioc_conn2;
-		__u32 ioc_u32_2;
-	};
-
-	struct obdo ioc_obdo1;
-	struct obdo ioc_obdo2;
-
-	u64	 ioc_count;
-	u64	 ioc_offset;
-	__u32    ioc_dev;
-	__u32    ioc_command;
-
-	__u64 ioc_nid;
-	__u32 ioc_nal;
-	__u32 ioc_type;
-
-	/* buffers the kernel will treat as user pointers */
-	__u32  ioc_plen1;
-	void __user *ioc_pbuf1;
-	__u32  ioc_plen2;
-	void __user *ioc_pbuf2;
-
-	/* inline buffers for various arguments */
-	__u32  ioc_inllen1;
-	char  *ioc_inlbuf1;
-	__u32  ioc_inllen2;
-	char  *ioc_inlbuf2;
-	__u32  ioc_inllen3;
-	char  *ioc_inlbuf3;
-	__u32  ioc_inllen4;
-	char  *ioc_inlbuf4;
-
-	char    ioc_bulk[0];
-};
-
-struct obd_ioctl_hdr {
-	__u32 ioc_len;
-	__u32 ioc_version;
-};
-
-static inline int obd_ioctl_packlen(struct obd_ioctl_data *data)
-{
-	int len = cfs_size_round(sizeof(struct obd_ioctl_data));
-
-	len += cfs_size_round(data->ioc_inllen1);
-	len += cfs_size_round(data->ioc_inllen2);
-	len += cfs_size_round(data->ioc_inllen3);
-	len += cfs_size_round(data->ioc_inllen4);
-	return len;
-}
-
-static inline int obd_ioctl_is_invalid(struct obd_ioctl_data *data)
-{
-	if (data->ioc_len > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_len larger than %d\n",
-		       OBD_MAX_IOCTL_BUFFER);
-		return 1;
-	}
-	if (data->ioc_inllen1 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen1 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inllen2 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen2 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inllen3 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen3 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inllen4 > OBD_MAX_IOCTL_BUFFER) {
-		CERROR("OBD ioctl: ioc_inllen4 larger than ioc_len\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
-		CERROR("OBD ioctl: inlbuf1 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
-		CERROR("OBD ioctl: inlbuf2 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf3 && !data->ioc_inllen3) {
-		CERROR("OBD ioctl: inlbuf3 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_inlbuf4 && !data->ioc_inllen4) {
-		CERROR("OBD ioctl: inlbuf4 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_pbuf1 && !data->ioc_plen1) {
-		CERROR("OBD ioctl: pbuf1 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_pbuf2 && !data->ioc_plen2) {
-		CERROR("OBD ioctl: pbuf2 pointer but 0 length\n");
-		return 1;
-	}
-	if (data->ioc_plen1 && !data->ioc_pbuf1) {
-		CERROR("OBD ioctl: plen1 set but NULL pointer\n");
-		return 1;
-	}
-	if (data->ioc_plen2 && !data->ioc_pbuf2) {
-		CERROR("OBD ioctl: plen2 set but NULL pointer\n");
-		return 1;
-	}
-	if (obd_ioctl_packlen(data) > data->ioc_len) {
-		CERROR("OBD ioctl: packlen exceeds ioc_len (%d > %d)\n",
-		       obd_ioctl_packlen(data), data->ioc_len);
-		return 1;
-	}
-	return 0;
-}
-
-#include "obd_support.h"
-
-/* function defined in lustre/obdclass/<platform>/<platform>-module.c */
-int obd_ioctl_getdata(char **buf, int *len, void __user *arg);
-int obd_ioctl_popdata(void __user *arg, void *data, int len);
-
-static inline void obd_ioctl_freedata(char *buf, int len)
-{
-	kvfree(buf);
-	return;
-}
-
-/*
- * BSD ioctl description:
- * #define IOC_V1       _IOR(g, n1, long)
- * #define IOC_V2       _IOW(g, n2, long)
- *
- * ioctl(f, IOC_V1, arg);
- * arg will be treated as a long value,
- *
- * ioctl(f, IOC_V2, arg)
- * arg will be treated as a pointer, bsd will call
- * copyin(buf, arg, sizeof(long))
- *
- * To make BSD ioctl handles argument correctly and simplely,
- * we change _IOR to _IOWR so BSD will copyin obd_ioctl_data
- * for us. Does this change affect Linux?  (XXX Liang)
- */
-#define OBD_IOC_DATA_TYPE long
-
-#define OBD_IOC_CREATE		 _IOWR('f', 101, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DESTROY		_IOW('f', 104, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PREALLOCATE	    _IOWR('f', 105, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_SETATTR		_IOW('f', 107, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_GETATTR		_IOWR ('f', 108, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_READ		   _IOWR('f', 109, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_WRITE		  _IOWR('f', 110, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_STATFS		 _IOWR('f', 113, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_SYNC		   _IOW('f', 114, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_READ2		  _IOWR('f', 115, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_FORMAT		 _IOWR('f', 116, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARTITION	      _IOWR('f', 117, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_COPY		   _IOWR('f', 120, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_MIGR		   _IOWR('f', 121, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PUNCH		  _IOWR('f', 122, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_MODULE_DEBUG	   _IOWR('f', 124, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_BRW_READ	       _IOWR('f', 125, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_BRW_WRITE	      _IOWR('f', 126, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_NAME2DEV	       _IOWR('f', 127, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_UUID2DEV	       _IOWR('f', 130, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_GETNAME		_IOWR('f', 131, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_GETMDNAME	      _IOR('f', 131, char[MAX_OBD_NAME])
-#define OBD_IOC_GETDTNAME	       OBD_IOC_GETNAME
-
-#define OBD_IOC_LOV_GET_CONFIG	 _IOWR('f', 132, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_CLIENT_RECOVER	 _IOW('f', 133, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PING_TARGET	    _IOW('f', 136, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_DEC_FS_USE_COUNT       _IO  ('f', 139)
-#define OBD_IOC_NO_TRANSNO	     _IOW('f', 140, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_SET_READONLY	   _IOW('f', 141, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_ABORT_RECOVERY	 _IOR('f', 142, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_ROOT_SQUASH	    _IOWR('f', 143, OBD_IOC_DATA_TYPE)
-
-#define OBD_GET_VERSION		_IOWR ('f', 144, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_GSS_SUPPORT	    _IOWR('f', 145, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_CLOSE_UUID	     _IOWR ('f', 147, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_CHANGELOG_SEND	 _IOW('f', 148, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_GETDEVICE	      _IOWR ('f', 149, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_FID2PATH	       _IOWR ('f', 150, OBD_IOC_DATA_TYPE)
-/* see also <lustre/lustre_user.h> for ioctls 151-153 */
-/* OBD_IOC_LOV_SETSTRIPE: See also LL_IOC_LOV_SETSTRIPE */
-#define OBD_IOC_LOV_SETSTRIPE	  _IOW('f', 154, OBD_IOC_DATA_TYPE)
-/* OBD_IOC_LOV_GETSTRIPE: See also LL_IOC_LOV_GETSTRIPE */
-#define OBD_IOC_LOV_GETSTRIPE	  _IOW('f', 155, OBD_IOC_DATA_TYPE)
-/* OBD_IOC_LOV_SETEA: See also LL_IOC_LOV_SETEA */
-#define OBD_IOC_LOV_SETEA	      _IOW('f', 156, OBD_IOC_DATA_TYPE)
-/* see <lustre/lustre_user.h> for ioctls 157-159 */
-/* OBD_IOC_QUOTACHECK: See also LL_IOC_QUOTACHECK */
-#define OBD_IOC_QUOTACHECK	     _IOW('f', 160, int)
-/* OBD_IOC_POLL_QUOTACHECK: See also LL_IOC_POLL_QUOTACHECK */
-#define OBD_IOC_POLL_QUOTACHECK	_IOR('f', 161, struct if_quotacheck *)
-/* OBD_IOC_QUOTACTL: See also LL_IOC_QUOTACTL */
-#define OBD_IOC_QUOTACTL	       _IOWR('f', 162, struct if_quotactl)
-/* see  also <lustre/lustre_user.h> for ioctls 163-176 */
-#define OBD_IOC_CHANGELOG_REG	  _IOW('f', 177, struct obd_ioctl_data)
-#define OBD_IOC_CHANGELOG_DEREG	_IOW('f', 178, struct obd_ioctl_data)
-#define OBD_IOC_CHANGELOG_CLEAR	_IOW('f', 179, struct obd_ioctl_data)
-#define OBD_IOC_RECORD		 _IOWR('f', 180, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_ENDRECORD	      _IOWR('f', 181, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARSE		  _IOWR('f', 182, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DORECORD	       _IOWR('f', 183, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PROCESS_CFG	    _IOWR('f', 184, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DUMP_LOG	       _IOWR('f', 185, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_CLEAR_LOG	      _IOWR('f', 186, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARAM		  _IOW('f', 187, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_POOL		   _IOWR('f', 188, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_REPLACE_NIDS	   _IOWR('f', 189, OBD_IOC_DATA_TYPE)
-
-#define OBD_IOC_CATLOGLIST	     _IOWR('f', 190, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_INFO	      _IOWR('f', 191, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_PRINT	     _IOWR('f', 192, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_CANCEL	    _IOWR('f', 193, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_REMOVE	    _IOWR('f', 194, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_LLOG_CHECK	     _IOWR('f', 195, OBD_IOC_DATA_TYPE)
-/* OBD_IOC_LLOG_CATINFO is deprecated */
-#define OBD_IOC_LLOG_CATINFO	   _IOWR('f', 196, OBD_IOC_DATA_TYPE)
-
-/*	#define ECHO_IOC_GET_STRIPE    _IOWR('f', 200, OBD_IOC_DATA_TYPE) */
-/*	#define ECHO_IOC_SET_STRIPE    _IOWR('f', 201, OBD_IOC_DATA_TYPE) */
-/*	#define ECHO_IOC_ENQUEUE       _IOWR('f', 202, OBD_IOC_DATA_TYPE) */
-/*	#define ECHO_IOC_CANCEL        _IOWR('f', 203, OBD_IOC_DATA_TYPE) */
-
-#define OBD_IOC_GET_OBJ_VERSION	_IOR('f', 210, OBD_IOC_DATA_TYPE)
-
-/* <lustre/lustre_user.h> defines ioctl number 218-219 */
-#define OBD_IOC_GET_MNTOPT	     _IOW('f', 220, mntopt_t)
-
-#define OBD_IOC_ECHO_MD		_IOR('f', 221, struct obd_ioctl_data)
-#define OBD_IOC_ECHO_ALLOC_SEQ	 _IOWR('f', 222, struct obd_ioctl_data)
-
-#define OBD_IOC_START_LFSCK	       _IOWR('f', 230, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_STOP_LFSCK	       _IOW('f', 231, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PAUSE_LFSCK	       _IOW('f', 232, OBD_IOC_DATA_TYPE)
-
-/* XXX _IOWR('f', 250, long) has been defined in
- * libcfs/include/libcfs/libcfs_private.h for debug, don't use it
- */
-
 /* Until such time as we get_info the per-stripe maximum from the OST,
  * we define this to be 2T - 4k, which is the ext3 maxbytes.
  */
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index cacd472..0dae273 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -35,15 +35,6 @@
 
 #include <linux/spinlock.h>
 
-#define IOC_OSC_TYPE	 'h'
-#define IOC_OSC_MIN_NR       20
-#define IOC_OSC_SET_ACTIVE   _IOWR(IOC_OSC_TYPE, 21, struct obd_device *)
-#define IOC_OSC_MAX_NR       50
-
-#define IOC_MDC_TYPE	 'i'
-#define IOC_MDC_MIN_NR       20
-#define IOC_MDC_MAX_NR       50
-
 #include "lustre/lustre_idl.h"
 #include "lustre_lib.h"
 #include "lu_ref.h"
@@ -623,7 +614,6 @@ struct obd_llog_group {
 
 /* corresponds to one of the obd's */
 #define OBD_DEVICE_MAGIC	0XAB5CD6EF
-#define OBD_DEV_BY_DEVNAME      0xffffd0de
 
 struct lvfs_run_ctxt {
 	struct dt_device *dt;
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 84bec03..257c9a4 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -46,8 +46,8 @@
 
 #include "../include/obd_support.h"
 #include "../include/obd_class.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_lib.h"
-#include "../include/lustre/lustre_idl.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_dlm.h"
 #include "../include/lustre_fid.h"
@@ -1543,7 +1543,7 @@ finish_req:
 
 	case LL_IOC_LOV_SWAP_LAYOUTS:
 		return -EPERM;
-	case LL_IOC_OBD_STATFS:
+	case IOC_OBD_STATFS:
 		return ll_obd_statfs(inode, (void __user *)arg);
 	case LL_IOC_LOV_GETSTRIPE:
 	case LL_IOC_MDC_GETINFO:
@@ -1708,9 +1708,6 @@ free_lmm:
 		kvfree(lmm);
 		return rc;
 	}
-	case OBD_IOC_LLOG_CATINFO: {
-		return -EOPNOTSUPP;
-	}
 	case OBD_IOC_QUOTACHECK: {
 		struct obd_quotactl *oqctl;
 		int error = 0;
@@ -1768,7 +1765,7 @@ out_poll:
 		kfree(check);
 		return rc;
 	}
-	case LL_IOC_QUOTACTL: {
+	case OBD_IOC_QUOTACTL: {
 		struct if_quotactl *qctl;
 
 		qctl = kzalloc(sizeof(*qctl), GFP_NOFS);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 89e93dc..519db53 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -44,6 +44,7 @@
 #include <linux/mount.h>
 #include "llite_internal.h"
 #include "../include/lustre/ll_fiemap.h"
+#include "../include/lustre/lustre_ioctl.h"
 
 #include "../include/cl_object.h"
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index e320400..111264e 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -41,6 +41,7 @@
 #include <linux/types.h>
 #include <linux/mm.h>
 
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_ha.h"
 #include "../include/lustre_dlm.h"
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index d07fd17..e516a84 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -51,6 +51,7 @@
 #include "../include/cl_object.h"
 #include "../include/lustre_lite.h"
 #include "../include/lustre_fid.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_kernelcomm.h"
 #include "lmv_internal.h"
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_obd.c b/drivers/staging/lustre/lustre/lov/lov_obd.c
index 9b92d55..d904f44 100644
--- a/drivers/staging/lustre/lustre/lov/lov_obd.c
+++ b/drivers/staging/lustre/lustre/lov/lov_obd.c
@@ -41,6 +41,7 @@
 #include "../../include/linux/libcfs/libcfs.h"
 
 #include "../include/obd_support.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_lib.h"
 #include "../include/lustre_net.h"
 #include "../include/lustre/lustre_idl.h"
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 558f33b..394ef3c 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -39,6 +39,7 @@
 # include <linux/utsname.h>
 
 #include "../include/lustre_acl.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/obd_class.h"
 #include "../include/lustre_lmv.h"
 #include "../include/lustre_fid.h"
diff --git a/drivers/staging/lustre/lustre/obdclass/class_obd.c b/drivers/staging/lustre/lustre/obdclass/class_obd.c
index d9d2a19..6edf53e 100644
--- a/drivers/staging/lustre/lustre/obdclass/class_obd.c
+++ b/drivers/staging/lustre/lustre/obdclass/class_obd.c
@@ -40,6 +40,7 @@
 #include "../include/lprocfs_status.h"
 #include <linux/list.h>
 #include "../include/cl_object.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "llog_internal.h"
 
 struct obd_device *obd_devs[MAX_OBD_DEVICES];
@@ -287,13 +288,6 @@ int class_handle_ioctl(unsigned int cmd, unsigned long arg)
 		goto out;
 	}
 
-	case OBD_IOC_CLOSE_UUID: {
-		CDEBUG(D_IOCTL, "closing all connections to uuid %s (NOOP)\n",
-		       data->ioc_inlbuf1);
-		err = 0;
-		goto out;
-	}
-
 	case OBD_IOC_GETDEVICE: {
 		int     index = data->ioc_count;
 		char    *status, *str;
diff --git a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c b/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
index 33342bf..27a72d8 100644
--- a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
+++ b/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
@@ -65,6 +65,7 @@
 #include "../../include/obd_support.h"
 #include "../../include/obd_class.h"
 #include "../../include/lprocfs_status.h"
+#include "../../include/lustre/lustre_ioctl.h"
 #include "../../include/lustre_ver.h"
 
 /* buffer MUST be at least the size of obd_ioctl_hdr */
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_config.c b/drivers/staging/lustre/lustre/obdclass/obd_config.c
index 0eab123..6d0890f 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_config.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_config.c
@@ -37,6 +37,7 @@
 #define DEBUG_SUBSYSTEM S_CLASS
 #include "../include/obd_class.h"
 #include <linux/string.h>
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_log.h"
 #include "../include/lprocfs_status.h"
 #include "../include/lustre_param.h"
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index 5b29c4a..2cb487b 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -41,6 +41,7 @@
 #include "../include/cl_object.h"
 #include "../include/lustre_fid.h"
 #include "../include/lustre_acl.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_net.h"
 
 #include "echo_internal.h"
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index a2d948f..d231827 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -41,6 +41,7 @@
 
 #include "../include/lustre_ha.h"
 #include "../include/lprocfs_status.h"
+#include "../include/lustre/lustre_ioctl.h"
 #include "../include/lustre_debug.h"
 #include "../include/lustre_param.h"
 #include "../include/lustre_fid.h"
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 39/80] staging: lustre: llite: add error handler in inode prepare phase
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add error handler during inode inialization, so inode will
become bad inode if something bad happens during inode prepare
phase, otherwise the striped directory will not get its layout
and being mis-regarded as normal directory.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4930
Reviewed-on: http://review.whamcloud.com/10170
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   66 +++++++++++--------
 drivers/staging/lustre/lustre/llite/namei.c        |   59 ++++++++++--------
 3 files changed, 72 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 120aca3..e101dd8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -766,8 +766,8 @@ int ll_setattr(struct dentry *de, struct iattr *attr);
 int ll_statfs(struct dentry *de, struct kstatfs *sfs);
 int ll_statfs_internal(struct super_block *sb, struct obd_statfs *osfs,
 		       __u64 max_age, __u32 flags);
-void ll_update_inode(struct inode *inode, struct lustre_md *md);
-void ll_read_inode2(struct inode *inode, void *opaque);
+int ll_update_inode(struct inode *inode, struct lustre_md *md);
+int ll_read_inode2(struct inode *inode, void *opaque);
 void ll_delete_inode(struct inode *inode);
 int ll_iocontrol(struct inode *inode, struct file *file,
 		 unsigned int cmd, unsigned long arg);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 111264e..ea79ca3 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -464,7 +464,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 	md_free_lustre_md(sbi->ll_md_exp, &lmd);
 	ptlrpc_req_finished(request);
 
-	if (!(root)) {
+	if (IS_ERR(root)) {
 		if (lmd.lsm)
 			obd_free_memmd(sbi->ll_dt_exp, &lmd.lsm);
 #ifdef CONFIG_FS_POSIX_ACL
@@ -1109,11 +1109,11 @@ static inline int lli_lsm_md_eq(const struct lmv_stripe_md *lsm_md1,
 		       lsm_md2->lsm_md_pool_name);
 }
 
-static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
+static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct lmv_stripe_md *lsm = md->lmv;
-	int idx;
+	int idx, rc;
 
 	LASSERT(S_ISDIR(inode->i_mode));
 	CDEBUG(D_INODE, "update lsm %p of "DFID"\n", lli->lli_lsm_md,
@@ -1122,7 +1122,7 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	/* no striped information from request. */
 	if (!lsm) {
 		if (!lli->lli_lsm_md) {
-			return;
+			return 0;
 		} else if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_MIGRATE) {
 			/*
 			 * migration is done, the temporay MIGRATE layout has
@@ -1132,27 +1132,22 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 			       PFID(ll_inode2fid(inode)));
 			lmv_free_memmd(lli->lli_lsm_md);
 			lli->lli_lsm_md = NULL;
-			return;
+			return 0;
 		} else {
 			/*
 			 * The lustre_md from req does not include stripeEA,
 			 * see ll_md_setattr
 			 */
-			return;
+			return 0;
 		}
 	}
 
 	/* set the directory layout */
 	if (!lli->lli_lsm_md) {
-		int rc;
-
 		rc = ll_init_lsm_md(inode, md);
-		if (rc) {
-			CERROR("%s: init "DFID" failed: rc = %d\n",
-			       ll_get_fsname(inode->i_sb, NULL, 0),
-			       PFID(&lli->lli_fid), rc);
-			return;
-		}
+		if (rc)
+			return rc;
+
 		lli->lli_lsm_md = lsm;
 		/*
 		 * set lsm_md to NULL, so the following free lustre_md
@@ -1161,7 +1156,7 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 		md->lmv = NULL;
 		CDEBUG(D_INODE, "Set lsm %p magic %x to "DFID"\n", lsm,
 		       lsm->lsm_md_magic, PFID(ll_inode2fid(inode)));
-		return;
+		return 0;
 	}
 
 	/* Compare the old and new stripe information */
@@ -1185,7 +1180,7 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 		       lli->lli_lsm_md->lsm_md_layout_version,
 		       lsm->lsm_md_pool_name,
 		       lli->lli_lsm_md->lsm_md_pool_name);
-		return;
+		return -EIO;
 	}
 
 	for (idx = 0; idx < lli->lli_lsm_md->lsm_md_stripe_count; idx++) {
@@ -1195,12 +1190,13 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
 			       PFID(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid),
 			       PFID(&lsm->lsm_md_oinfo[idx].lmo_fid));
-			return;
+			return -EIO;
 		}
 	}
 
-	md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
-			 md->body, ll_md_blocking_ast);
+	rc = md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
+			      md->body, ll_md_blocking_ast);
+	return rc;
 }
 
 void ll_clear_inode(struct inode *inode)
@@ -1252,7 +1248,7 @@ void ll_clear_inode(struct inode *inode)
 
 	if (S_ISDIR(inode->i_mode))
 		ll_dir_clear_lsm_md(inode);
-	else
+	if (S_ISREG(inode->i_mode) && !is_bad_inode(inode))
 		LASSERT(list_empty(&lli->lli_agl_list));
 
 	/*
@@ -1320,7 +1316,7 @@ static int ll_md_setattr(struct dentry *dentry, struct md_op_data *op_data,
 	op_data->op_handle = md.body->handle;
 	op_data->op_ioepoch = md.body->ioepoch;
 
-	ll_update_inode(inode, &md);
+	rc = ll_update_inode(inode, &md);
 	ptlrpc_req_finished(request);
 
 	return rc;
@@ -1679,7 +1675,7 @@ void ll_inode_size_unlock(struct inode *inode)
 	mutex_unlock(&lli->lli_size_mutex);
 }
 
-void ll_update_inode(struct inode *inode, struct lustre_md *md)
+int ll_update_inode(struct inode *inode, struct lustre_md *md)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct mdt_body *body = md->body;
@@ -1697,8 +1693,13 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			lli->lli_maxbytes = MAX_LFS_FILESIZE;
 	}
 
-	if (S_ISDIR(inode->i_mode))
-		ll_update_lsm_md(inode, md);
+	if (S_ISDIR(inode->i_mode)) {
+		int rc;
+
+		rc = ll_update_lsm_md(inode, md);
+		if (rc)
+			return rc;
+	}
 
 #ifdef CONFIG_FS_POSIX_ACL
 	if (body->valid & OBD_MD_FLACL) {
@@ -1819,12 +1820,15 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 		if (body->t_state & MS_RESTORE)
 			lli->lli_flags |= LLIF_FILE_RESTORING;
 	}
+
+	return 0;
 }
 
-void ll_read_inode2(struct inode *inode, void *opaque)
+int ll_read_inode2(struct inode *inode, void *opaque)
 {
 	struct lustre_md *md = opaque;
 	struct ll_inode_info *lli = ll_i2info(inode);
+	int rc;
 
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p)\n",
 	       PFID(&lli->lli_fid), inode);
@@ -1840,7 +1844,9 @@ void ll_read_inode2(struct inode *inode, void *opaque)
 	LTIME_S(inode->i_atime) = 0;
 	LTIME_S(inode->i_ctime) = 0;
 	inode->i_rdev = 0;
-	ll_update_inode(inode, md);
+	rc = ll_update_inode(inode, md);
+	if (rc)
+		return rc;
 
 	/* OIDEBUG(inode); */
 
@@ -1861,6 +1867,8 @@ void ll_read_inode2(struct inode *inode, void *opaque)
 		init_special_inode(inode, inode->i_mode,
 				   inode->i_rdev);
 	}
+
+	return 0;
 }
 
 void ll_delete_inode(struct inode *inode)
@@ -2127,7 +2135,9 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		goto cleanup;
 
 	if (*inode) {
-		ll_update_inode(*inode, &md);
+		rc = ll_update_inode(*inode, &md);
+		if (rc)
+			goto out;
 	} else {
 		LASSERT(sb);
 
@@ -2146,7 +2156,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		*inode = ll_iget(sb, cl_fid_build_ino(&md.body->fid1,
 					     sbi->ll_flags & LL_SBI_32BIT_API),
 				 &md);
-		if (!*inode) {
+		if (IS_ERR(*inode)) {
 #ifdef CONFIG_FS_POSIX_ACL
 			if (md.posix_acl) {
 				posix_acl_release(md.posix_acl);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index f059882..6e11b99 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -96,41 +96,46 @@ static int ll_set_inode(struct inode *inode, void *opaque)
 	return 0;
 }
 
-/*
- * Get an inode by inode number (already instantiated by the intent lookup).
- * Returns inode or NULL
+/**
+ * Get an inode by inode number(@hash), which is already instantiated by
+ * the intent lookup).
  */
 struct inode *ll_iget(struct super_block *sb, ino_t hash,
 		      struct lustre_md *md)
 {
 	struct inode	 *inode;
+	int rc = 0;
 
 	LASSERT(hash != 0);
 	inode = iget5_locked(sb, hash, ll_test_inode, ll_set_inode, md);
-
-	if (inode) {
-		if (inode->i_state & I_NEW) {
-			int rc = 0;
-
-			ll_read_inode2(inode, md);
-			if (S_ISREG(inode->i_mode) &&
-			    !ll_i2info(inode)->lli_clob) {
-				CDEBUG(D_INODE,
-				       "%s: apply lsm %p to inode " DFID ".\n",
-				       ll_get_fsname(sb, NULL, 0), md->lsm,
-				       PFID(ll_inode2fid(inode)));
-				rc = cl_file_inode_init(inode, md);
-			}
-			if (rc != 0) {
-				iget_failed(inode);
-				inode = NULL;
-			} else {
-				unlock_new_inode(inode);
-			}
-		} else if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
-			ll_update_inode(inode, md);
-			CDEBUG(D_VFSTRACE, "got inode: "DFID"(%p)\n",
-			       PFID(&md->body->fid1), inode);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	if (inode->i_state & I_NEW) {
+		rc = ll_read_inode2(inode, md);
+		if (!rc && S_ISREG(inode->i_mode) &&
+		    !ll_i2info(inode)->lli_clob) {
+			CDEBUG(D_INODE, "%s: apply lsm %p to inode "DFID"\n",
+			       ll_get_fsname(sb, NULL, 0), md->lsm,
+			       PFID(ll_inode2fid(inode)));
+			rc = cl_file_inode_init(inode, md);
+		}
+		if (rc) {
+			make_bad_inode(inode);
+			unlock_new_inode(inode);
+			iput(inode);
+			inode = ERR_PTR(rc);
+		} else {
+			unlock_new_inode(inode);
+		}
+	} else if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
+		rc = ll_update_inode(inode, md);
+		CDEBUG(D_VFSTRACE, "got inode: "DFID"(%p): rc = %d\n",
+		       PFID(&md->body->fid1), inode, rc);
+		if (rc) {
+			make_bad_inode(inode);
+			iput(inode);
+			inode = ERR_PTR(rc);
 		}
 	}
 	return inode;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 39/80] staging: lustre: llite: add error handler in inode prepare phase
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Add error handler during inode inialization, so inode will
become bad inode if something bad happens during inode prepare
phase, otherwise the striped directory will not get its layout
and being mis-regarded as normal directory.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4930
Reviewed-on: http://review.whamcloud.com/10170
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   66 +++++++++++--------
 drivers/staging/lustre/lustre/llite/namei.c        |   59 ++++++++++--------
 3 files changed, 72 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 120aca3..e101dd8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -766,8 +766,8 @@ int ll_setattr(struct dentry *de, struct iattr *attr);
 int ll_statfs(struct dentry *de, struct kstatfs *sfs);
 int ll_statfs_internal(struct super_block *sb, struct obd_statfs *osfs,
 		       __u64 max_age, __u32 flags);
-void ll_update_inode(struct inode *inode, struct lustre_md *md);
-void ll_read_inode2(struct inode *inode, void *opaque);
+int ll_update_inode(struct inode *inode, struct lustre_md *md);
+int ll_read_inode2(struct inode *inode, void *opaque);
 void ll_delete_inode(struct inode *inode);
 int ll_iocontrol(struct inode *inode, struct file *file,
 		 unsigned int cmd, unsigned long arg);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 111264e..ea79ca3 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -464,7 +464,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 	md_free_lustre_md(sbi->ll_md_exp, &lmd);
 	ptlrpc_req_finished(request);
 
-	if (!(root)) {
+	if (IS_ERR(root)) {
 		if (lmd.lsm)
 			obd_free_memmd(sbi->ll_dt_exp, &lmd.lsm);
 #ifdef CONFIG_FS_POSIX_ACL
@@ -1109,11 +1109,11 @@ static inline int lli_lsm_md_eq(const struct lmv_stripe_md *lsm_md1,
 		       lsm_md2->lsm_md_pool_name);
 }
 
-static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
+static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct lmv_stripe_md *lsm = md->lmv;
-	int idx;
+	int idx, rc;
 
 	LASSERT(S_ISDIR(inode->i_mode));
 	CDEBUG(D_INODE, "update lsm %p of "DFID"\n", lli->lli_lsm_md,
@@ -1122,7 +1122,7 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	/* no striped information from request. */
 	if (!lsm) {
 		if (!lli->lli_lsm_md) {
-			return;
+			return 0;
 		} else if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_MIGRATE) {
 			/*
 			 * migration is done, the temporay MIGRATE layout has
@@ -1132,27 +1132,22 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 			       PFID(ll_inode2fid(inode)));
 			lmv_free_memmd(lli->lli_lsm_md);
 			lli->lli_lsm_md = NULL;
-			return;
+			return 0;
 		} else {
 			/*
 			 * The lustre_md from req does not include stripeEA,
 			 * see ll_md_setattr
 			 */
-			return;
+			return 0;
 		}
 	}
 
 	/* set the directory layout */
 	if (!lli->lli_lsm_md) {
-		int rc;
-
 		rc = ll_init_lsm_md(inode, md);
-		if (rc) {
-			CERROR("%s: init "DFID" failed: rc = %d\n",
-			       ll_get_fsname(inode->i_sb, NULL, 0),
-			       PFID(&lli->lli_fid), rc);
-			return;
-		}
+		if (rc)
+			return rc;
+
 		lli->lli_lsm_md = lsm;
 		/*
 		 * set lsm_md to NULL, so the following free lustre_md
@@ -1161,7 +1156,7 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 		md->lmv = NULL;
 		CDEBUG(D_INODE, "Set lsm %p magic %x to "DFID"\n", lsm,
 		       lsm->lsm_md_magic, PFID(ll_inode2fid(inode)));
-		return;
+		return 0;
 	}
 
 	/* Compare the old and new stripe information */
@@ -1185,7 +1180,7 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 		       lli->lli_lsm_md->lsm_md_layout_version,
 		       lsm->lsm_md_pool_name,
 		       lli->lli_lsm_md->lsm_md_pool_name);
-		return;
+		return -EIO;
 	}
 
 	for (idx = 0; idx < lli->lli_lsm_md->lsm_md_stripe_count; idx++) {
@@ -1195,12 +1190,13 @@ static void ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
 			       PFID(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid),
 			       PFID(&lsm->lsm_md_oinfo[idx].lmo_fid));
-			return;
+			return -EIO;
 		}
 	}
 
-	md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
-			 md->body, ll_md_blocking_ast);
+	rc = md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
+			      md->body, ll_md_blocking_ast);
+	return rc;
 }
 
 void ll_clear_inode(struct inode *inode)
@@ -1252,7 +1248,7 @@ void ll_clear_inode(struct inode *inode)
 
 	if (S_ISDIR(inode->i_mode))
 		ll_dir_clear_lsm_md(inode);
-	else
+	if (S_ISREG(inode->i_mode) && !is_bad_inode(inode))
 		LASSERT(list_empty(&lli->lli_agl_list));
 
 	/*
@@ -1320,7 +1316,7 @@ static int ll_md_setattr(struct dentry *dentry, struct md_op_data *op_data,
 	op_data->op_handle = md.body->handle;
 	op_data->op_ioepoch = md.body->ioepoch;
 
-	ll_update_inode(inode, &md);
+	rc = ll_update_inode(inode, &md);
 	ptlrpc_req_finished(request);
 
 	return rc;
@@ -1679,7 +1675,7 @@ void ll_inode_size_unlock(struct inode *inode)
 	mutex_unlock(&lli->lli_size_mutex);
 }
 
-void ll_update_inode(struct inode *inode, struct lustre_md *md)
+int ll_update_inode(struct inode *inode, struct lustre_md *md)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct mdt_body *body = md->body;
@@ -1697,8 +1693,13 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			lli->lli_maxbytes = MAX_LFS_FILESIZE;
 	}
 
-	if (S_ISDIR(inode->i_mode))
-		ll_update_lsm_md(inode, md);
+	if (S_ISDIR(inode->i_mode)) {
+		int rc;
+
+		rc = ll_update_lsm_md(inode, md);
+		if (rc)
+			return rc;
+	}
 
 #ifdef CONFIG_FS_POSIX_ACL
 	if (body->valid & OBD_MD_FLACL) {
@@ -1819,12 +1820,15 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 		if (body->t_state & MS_RESTORE)
 			lli->lli_flags |= LLIF_FILE_RESTORING;
 	}
+
+	return 0;
 }
 
-void ll_read_inode2(struct inode *inode, void *opaque)
+int ll_read_inode2(struct inode *inode, void *opaque)
 {
 	struct lustre_md *md = opaque;
 	struct ll_inode_info *lli = ll_i2info(inode);
+	int rc;
 
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p)\n",
 	       PFID(&lli->lli_fid), inode);
@@ -1840,7 +1844,9 @@ void ll_read_inode2(struct inode *inode, void *opaque)
 	LTIME_S(inode->i_atime) = 0;
 	LTIME_S(inode->i_ctime) = 0;
 	inode->i_rdev = 0;
-	ll_update_inode(inode, md);
+	rc = ll_update_inode(inode, md);
+	if (rc)
+		return rc;
 
 	/* OIDEBUG(inode); */
 
@@ -1861,6 +1867,8 @@ void ll_read_inode2(struct inode *inode, void *opaque)
 		init_special_inode(inode, inode->i_mode,
 				   inode->i_rdev);
 	}
+
+	return 0;
 }
 
 void ll_delete_inode(struct inode *inode)
@@ -2127,7 +2135,9 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		goto cleanup;
 
 	if (*inode) {
-		ll_update_inode(*inode, &md);
+		rc = ll_update_inode(*inode, &md);
+		if (rc)
+			goto out;
 	} else {
 		LASSERT(sb);
 
@@ -2146,7 +2156,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		*inode = ll_iget(sb, cl_fid_build_ino(&md.body->fid1,
 					     sbi->ll_flags & LL_SBI_32BIT_API),
 				 &md);
-		if (!*inode) {
+		if (IS_ERR(*inode)) {
 #ifdef CONFIG_FS_POSIX_ACL
 			if (md.posix_acl) {
 				posix_acl_release(md.posix_acl);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index f059882..6e11b99 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -96,41 +96,46 @@ static int ll_set_inode(struct inode *inode, void *opaque)
 	return 0;
 }
 
-/*
- * Get an inode by inode number (already instantiated by the intent lookup).
- * Returns inode or NULL
+/**
+ * Get an inode by inode number(@hash), which is already instantiated by
+ * the intent lookup).
  */
 struct inode *ll_iget(struct super_block *sb, ino_t hash,
 		      struct lustre_md *md)
 {
 	struct inode	 *inode;
+	int rc = 0;
 
 	LASSERT(hash != 0);
 	inode = iget5_locked(sb, hash, ll_test_inode, ll_set_inode, md);
-
-	if (inode) {
-		if (inode->i_state & I_NEW) {
-			int rc = 0;
-
-			ll_read_inode2(inode, md);
-			if (S_ISREG(inode->i_mode) &&
-			    !ll_i2info(inode)->lli_clob) {
-				CDEBUG(D_INODE,
-				       "%s: apply lsm %p to inode " DFID ".\n",
-				       ll_get_fsname(sb, NULL, 0), md->lsm,
-				       PFID(ll_inode2fid(inode)));
-				rc = cl_file_inode_init(inode, md);
-			}
-			if (rc != 0) {
-				iget_failed(inode);
-				inode = NULL;
-			} else {
-				unlock_new_inode(inode);
-			}
-		} else if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
-			ll_update_inode(inode, md);
-			CDEBUG(D_VFSTRACE, "got inode: "DFID"(%p)\n",
-			       PFID(&md->body->fid1), inode);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	if (inode->i_state & I_NEW) {
+		rc = ll_read_inode2(inode, md);
+		if (!rc && S_ISREG(inode->i_mode) &&
+		    !ll_i2info(inode)->lli_clob) {
+			CDEBUG(D_INODE, "%s: apply lsm %p to inode "DFID"\n",
+			       ll_get_fsname(sb, NULL, 0), md->lsm,
+			       PFID(ll_inode2fid(inode)));
+			rc = cl_file_inode_init(inode, md);
+		}
+		if (rc) {
+			make_bad_inode(inode);
+			unlock_new_inode(inode);
+			iput(inode);
+			inode = ERR_PTR(rc);
+		} else {
+			unlock_new_inode(inode);
+		}
+	} else if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
+		rc = ll_update_inode(inode, md);
+		CDEBUG(D_VFSTRACE, "got inode: "DFID"(%p): rc = %d\n",
+		       PFID(&md->body->fid1), inode, rc);
+		if (rc) {
+			make_bad_inode(inode);
+			iput(inode);
+			inode = ERR_PTR(rc);
 		}
 	}
 	return inode;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 40/80] staging: lustre: ptlrpc: Early replies need to honor at_max
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Chris Horn,
	James Simmons

From: Chris Horn <hornc@cray.com>

When determining whether an early reply can be sent the server will
calculate the new deadline based on an offset from the request
arrival time. However, when actually setting the new deadline
the server offsets the current time. This can result in deadlines
being extended more than at_max seconds past the request arrival
time. Instead, the server should offset the arrival time when updating
its request timeout.

When a client receives an early reply it doesn't know the server side
arrival time so we use the original sent time as an approximation.

Signed-off-by: Chris Horn <hornc@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4578
Reviewed-on: http://review.whamcloud.com/9100
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Christopher J. Morrone <chris.morrone.llnl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c  |    8 +++++---
 drivers/staging/lustre/lustre/ptlrpc/import.c  |   11 +++++++----
 drivers/staging/lustre/lustre/ptlrpc/service.c |   18 +++++++++++++-----
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 549c62c..f2e71b4 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -385,10 +385,12 @@ static int ptlrpc_at_recv_early_reply(struct ptlrpc_request *req)
 	spin_lock(&req->rq_lock);
 	olddl = req->rq_deadline;
 	/*
-	 * server assumes it now has rq_timeout from when it sent the
-	 * early reply, so client should give it at least that long.
+	 * server assumes it now has rq_timeout from when the request
+	 * arrived, so the client should give it at least that long.
+	 * since we don't know the arrival time we'll use the original
+	 * sent time
 	 */
-	req->rq_deadline = ktime_get_real_seconds() + req->rq_timeout +
+	req->rq_deadline = req->rq_sent + req->rq_timeout +
 			   ptlrpc_at_get_net_latency(req);
 
 	DEBUG_REQ(D_ADAPTTO, req,
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 3292e6e..af8ffbc 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -1497,10 +1497,13 @@ EXPORT_SYMBOL(ptlrpc_disconnect_import);
 /* Adaptive Timeout utils */
 extern unsigned int at_min, at_max, at_history;
 
-/* Bin into timeslices using AT_BINS bins.
- * This gives us a max of the last binlimit*AT_BINS secs without the storage,
- * but still smoothing out a return to normalcy from a slow response.
- * (E.g. remember the maximum latency in each minute of the last 4 minutes.)
+/*
+ *Update at_current with the specified value (bounded by at_min and at_max),
+ * as well as the AT history "bins".
+ *  - Bin into timeslices using AT_BINS bins.
+ *  - This gives us a max of the last at_history seconds without the storage,
+ *    but still smoothing out a return to normalcy from a slow response.
+ *  - (E.g. remember the maximum latency in each minute of the last 4 minutes.)
  */
 int at_measured(struct adaptive_timeout *at, unsigned int val)
 {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 4788c49..30d8b72 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -1005,13 +1005,16 @@ ptlrpc_at_remove_timed(struct ptlrpc_request *req)
 	array->paa_count--;
 }
 
+/*
+ * Attempt to extend the request deadline by sending an early reply to the
+ * client.
+ */
 static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 {
 	struct ptlrpc_service_part *svcpt = req->rq_rqbd->rqbd_svcpt;
 	struct ptlrpc_request *reqcopy;
 	struct lustre_msg *reqmsg;
 	long olddl = req->rq_deadline - ktime_get_real_seconds();
-	time64_t newdl;
 	int rc;
 
 	/* deadline is when the client expects us to reply, margin is the
@@ -1039,8 +1042,13 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 		return -ENOSYS;
 	}
 
-	/* Fake our processing time into the future to ask the clients
-	 * for some extra amount of time
+	/*
+	 * We want to extend the request deadline by at_extra seconds,
+	 * so we set our service estimate to reflect how much time has
+	 * passed since this request arrived plus an additional
+	 * at_extra seconds. The client will calculate the new deadline
+	 * based on this service estimate (plus some additional time to
+	 * account for network latency). See ptlrpc_at_recv_early_reply
 	 */
 	at_measured(&svcpt->scp_at_estimate, at_extra +
 		    ktime_get_real_seconds() - req->rq_arrival_time.tv_sec);
@@ -1056,7 +1064,6 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 			  ktime_get_real_seconds());
 		return -ETIMEDOUT;
 	}
-	newdl = ktime_get_real_seconds() + at_get(&svcpt->scp_at_estimate);
 
 	reqcopy = ptlrpc_request_cache_alloc(GFP_NOFS);
 	if (!reqcopy)
@@ -1110,7 +1117,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 
 	if (!rc) {
 		/* Adjust our own deadline to what we told the client */
-		req->rq_deadline = newdl;
+		req->rq_deadline = req->rq_arrival_time.tv_sec +
+				   at_get(&svcpt->scp_at_estimate);
 		req->rq_early_count++; /* number sent, server side */
 	} else {
 		DEBUG_REQ(D_ERROR, req, "Early reply send failed %d", rc);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 40/80] staging: lustre: ptlrpc: Early replies need to honor at_max
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Chris Horn,
	James Simmons

From: Chris Horn <hornc@cray.com>

When determining whether an early reply can be sent the server will
calculate the new deadline based on an offset from the request
arrival time. However, when actually setting the new deadline
the server offsets the current time. This can result in deadlines
being extended more than at_max seconds past the request arrival
time. Instead, the server should offset the arrival time when updating
its request timeout.

When a client receives an early reply it doesn't know the server side
arrival time so we use the original sent time as an approximation.

Signed-off-by: Chris Horn <hornc@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4578
Reviewed-on: http://review.whamcloud.com/9100
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Christopher J. Morrone <chris.morrone.llnl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c  |    8 +++++---
 drivers/staging/lustre/lustre/ptlrpc/import.c  |   11 +++++++----
 drivers/staging/lustre/lustre/ptlrpc/service.c |   18 +++++++++++++-----
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 549c62c..f2e71b4 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -385,10 +385,12 @@ static int ptlrpc_at_recv_early_reply(struct ptlrpc_request *req)
 	spin_lock(&req->rq_lock);
 	olddl = req->rq_deadline;
 	/*
-	 * server assumes it now has rq_timeout from when it sent the
-	 * early reply, so client should give it at least that long.
+	 * server assumes it now has rq_timeout from when the request
+	 * arrived, so the client should give it at least that long.
+	 * since we don't know the arrival time we'll use the original
+	 * sent time
 	 */
-	req->rq_deadline = ktime_get_real_seconds() + req->rq_timeout +
+	req->rq_deadline = req->rq_sent + req->rq_timeout +
 			   ptlrpc_at_get_net_latency(req);
 
 	DEBUG_REQ(D_ADAPTTO, req,
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 3292e6e..af8ffbc 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -1497,10 +1497,13 @@ EXPORT_SYMBOL(ptlrpc_disconnect_import);
 /* Adaptive Timeout utils */
 extern unsigned int at_min, at_max, at_history;
 
-/* Bin into timeslices using AT_BINS bins.
- * This gives us a max of the last binlimit*AT_BINS secs without the storage,
- * but still smoothing out a return to normalcy from a slow response.
- * (E.g. remember the maximum latency in each minute of the last 4 minutes.)
+/*
+ *Update at_current with the specified value (bounded by at_min and at_max),
+ * as well as the AT history "bins".
+ *  - Bin into timeslices using AT_BINS bins.
+ *  - This gives us a max of the last at_history seconds without the storage,
+ *    but still smoothing out a return to normalcy from a slow response.
+ *  - (E.g. remember the maximum latency in each minute of the last 4 minutes.)
  */
 int at_measured(struct adaptive_timeout *at, unsigned int val)
 {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 4788c49..30d8b72 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -1005,13 +1005,16 @@ ptlrpc_at_remove_timed(struct ptlrpc_request *req)
 	array->paa_count--;
 }
 
+/*
+ * Attempt to extend the request deadline by sending an early reply to the
+ * client.
+ */
 static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 {
 	struct ptlrpc_service_part *svcpt = req->rq_rqbd->rqbd_svcpt;
 	struct ptlrpc_request *reqcopy;
 	struct lustre_msg *reqmsg;
 	long olddl = req->rq_deadline - ktime_get_real_seconds();
-	time64_t newdl;
 	int rc;
 
 	/* deadline is when the client expects us to reply, margin is the
@@ -1039,8 +1042,13 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 		return -ENOSYS;
 	}
 
-	/* Fake our processing time into the future to ask the clients
-	 * for some extra amount of time
+	/*
+	 * We want to extend the request deadline by at_extra seconds,
+	 * so we set our service estimate to reflect how much time has
+	 * passed since this request arrived plus an additional
+	 * at_extra seconds. The client will calculate the new deadline
+	 * based on this service estimate (plus some additional time to
+	 * account for network latency). See ptlrpc_at_recv_early_reply
 	 */
 	at_measured(&svcpt->scp_at_estimate, at_extra +
 		    ktime_get_real_seconds() - req->rq_arrival_time.tv_sec);
@@ -1056,7 +1064,6 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 			  ktime_get_real_seconds());
 		return -ETIMEDOUT;
 	}
-	newdl = ktime_get_real_seconds() + at_get(&svcpt->scp_at_estimate);
 
 	reqcopy = ptlrpc_request_cache_alloc(GFP_NOFS);
 	if (!reqcopy)
@@ -1110,7 +1117,8 @@ static int ptlrpc_at_send_early_reply(struct ptlrpc_request *req)
 
 	if (!rc) {
 		/* Adjust our own deadline to what we told the client */
-		req->rq_deadline = newdl;
+		req->rq_deadline = req->rq_arrival_time.tv_sec +
+				   at_get(&svcpt->scp_at_estimate);
 		req->rq_early_count++; /* number sent, server side */
 	} else {
 		DEBUG_REQ(D_ERROR, req, "Early reply send failed %d", rc);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Separate master stripe with master object, so
1. stripeEA only exists on master object.
2. sub-stripe object will be inserted into master object
as sub-directory, and it can get the master object by "..".

By this, it will remove those specilities for stripe0 in
LMV and LOD. And also simplify LFSCK, i.e. consistency check
would be easier.

When then master object becomes an orphan, we should
mark all of its sub-stripes as dead object as well,
otherwise client might still be able to create files
under these stripes.

A few fixes for striped directory layout lock:

 1. stripe 0 should be locked as EX, same as other stripes.
 2. Acquire the layout for directory, when it is being unliked.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4690
Reviewed-on: http://review.whamcloud.com/9511
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   64 +++++++++-----
 .../lustre/lustre/include/lustre/lustre_user.h     |    3 +-
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   25 +++++-
 drivers/staging/lustre/lustre/include/obd.h        |    4 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |    5 +-
 drivers/staging/lustre/lustre/llite/dir.c          |   31 ++-----
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   89 ++++++++++----------
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   25 +-----
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |    4 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   70 ++++++++--------
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    4 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |    2 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |    6 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    8 +-
 14 files changed, 174 insertions(+), 166 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 3444add..8736826 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2497,18 +2497,52 @@ struct lmv_desc {
 	struct obd_uuid ld_uuid;
 };
 
-/* lmv structures */
-#define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
-#define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
-#define LMV_MAGIC_MIGRATE	0x0CD30CD0	/* migrate stripe lmv magic */
-#define LMV_MAGIC	LMV_MAGIC_V1
+/* LMV layout EA, and it will be stored both in master and slave object */
+struct lmv_mds_md_v1 {
+	__u32 lmv_magic;
+	__u32 lmv_stripe_count;
+	__u32 lmv_master_mdt_index;	/* On master object, it is master
+					 * MDT index, on slave object, it
+					 * is stripe index of the slave obj
+					 */
+	__u32 lmv_hash_type;		/* dir stripe policy, i.e. indicate
+					 * which hash function to be used,
+					 * Note: only lower 16 bits is being
+					 * used for now. Higher 16 bits will
+					 * be used to mark the object status,
+					 * for example migrating or dead.
+					 */
+	__u32 lmv_layout_version;	/* Used for directory restriping */
+	__u32 lmv_padding;
+	struct lu_fid lmv_master_fid;	/* The FID of the master object, which
+					 * is the namespace-visible dir FID
+					 */
+	char lmv_pool_name[LOV_MAXPOOLNAME];	/* pool name */
+	struct lu_fid lmv_stripe_fids[0];	/* FIDs for each stripe */
+};
 
+#define LMV_MAGIC_V1	 0x0CD20CD0	/* normal stripe lmv magic */
+#define LMV_MAGIC	 LMV_MAGIC_V1
+
+/* #define LMV_USER_MAGIC 0x0CD30CD0 */
+#define LMV_MAGIC_STRIPE 0x0CD40CD0	/* magic for dir sub_stripe */
+
+/*
+ *Right now only the lower part(0-16bits) of lmv_hash_type is being used,
+ * and the higher part will be the flag to indicate the status of object,
+ * for example the object is being migrated. And the hash function
+ * might be interpreted differently with different flags.
+ */
 enum lmv_hash_type {
 	LMV_HASH_TYPE_ALL_CHARS = 1,
 	LMV_HASH_TYPE_FNV_1A_64 = 2,
-	LMV_HASH_TYPE_MIGRATION = 3,
 };
 
+#define LMV_HASH_TYPE_MASK		0x0000ffff
+
+#define LMV_HASH_FLAG_MIGRATION		0x80000000
+#define LMV_HASH_FLAG_DEAD		0x40000000
+
 #define LMV_HASH_NAME_ALL_CHARS		"all_char"
 #define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
 
@@ -2540,19 +2574,6 @@ static inline __u64 lustre_hash_fnv_1a_64(const void *buf, size_t size)
 	return hash;
 }
 
-struct lmv_mds_md_v1 {
-	__u32 lmv_magic;
-	__u32 lmv_stripe_count;		/* stripe count */
-	__u32 lmv_master_mdt_index;	/* master MDT index */
-	__u32 lmv_hash_type;		/* dir stripe policy, i.e. indicate
-					 * which hash function to be used
-					 */
-	__u32 lmv_layout_version;	/* Used for directory restriping */
-	__u32 lmv_padding;
-	char lmv_pool_name[LOV_MAXPOOLNAME];	/* pool name */
-	struct lu_fid lmv_stripe_fids[0];	/* FIDs for each stripe */
-};
-
 union lmv_mds_md {
 	__u32			lmv_magic;
 	struct lmv_mds_md_v1	lmv_md_v1;
@@ -2566,8 +2587,7 @@ static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic)
 	ssize_t len = -EINVAL;
 
 	switch (lmm_magic) {
-	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE: {
+	case LMV_MAGIC_V1: {
 		struct lmv_mds_md_v1 *lmm1;
 
 		len = sizeof(*lmm1);
@@ -2583,7 +2603,6 @@ static inline int lmv_mds_md_stripe_count_get(const union lmv_mds_md *lmm)
 {
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		return le32_to_cpu(lmm->lmv_md_v1.lmv_stripe_count);
 	case LMV_USER_MAGIC:
 		return le32_to_cpu(lmm->lmv_user_md.lum_stripe_count);
@@ -2599,7 +2618,6 @@ static inline int lmv_mds_md_stripe_count_set(union lmv_mds_md *lmm,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		lmm->lmv_md_v1.lmv_stripe_count = cpu_to_le32(stripe_count);
 		break;
 	case LMV_USER_MAGIC:
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 75a78a3..4b2553c 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -269,8 +269,7 @@ struct ost_id {
 #define LOV_USER_MAGIC_JOIN_V1 0x0BD20BD0
 #define LOV_USER_MAGIC_V3 0x0BD30BD0
 
-#define LMV_MAGIC_V1      0x0CD10CD0    /*normal stripe lmv magic */
-#define LMV_USER_MAGIC    0x0CD20CD0    /*default lmv magic*/
+#define LMV_USER_MAGIC    0x0CD30CD0    /*default lmv magic*/
 
 #define LOV_PATTERN_RAID0 0x001
 #define LOV_PATTERN_RAID1 0x002
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index feee981..1dd3e92 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -48,10 +48,33 @@ struct lmv_stripe_md {
 	__u32	lsm_md_layout_version;
 	__u32	lsm_md_default_count;
 	__u32	lsm_md_default_index;
+	struct lu_fid lsm_md_master_fid;
 	char	lsm_md_pool_name[LOV_MAXPOOLNAME];
 	struct lmv_oinfo lsm_md_oinfo[0];
 };
 
+static inline bool
+lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
+{
+	int idx;
+
+	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
+	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
+	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
+	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
+	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
+	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
+		return false;
+
+	for (idx = 0; idx < lsm1->lsm_md_stripe_count; idx++) {
+		if (!lu_fid_eq(&lsm1->lsm_md_oinfo[idx].lmo_fid,
+			       &lsm2->lsm_md_oinfo[idx].lmo_fid))
+			return false;
+	}
+
+	return true;
+}
+
 union lmv_mds_md;
 
 int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
@@ -106,7 +129,6 @@ static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
 {
 	switch (lmv_src->lmv_magic) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
@@ -119,7 +141,6 @@ static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
 {
 	switch (le32_to_cpu(lmv_src->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		lmv1_le_to_cpu(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 0dae273..52020a9 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -917,8 +917,8 @@ struct obd_ops {
 	int (*fid_fini)(struct obd_device *obd);
 
 	/* Allocate new fid according to passed @hint. */
-	int (*fid_alloc)(struct obd_export *exp, struct lu_fid *fid,
-			 struct md_op_data *op_data);
+	int (*fid_alloc)(const struct lu_env *env, struct obd_export *exp,
+			 struct lu_fid *fid, struct md_op_data *op_data);
 
 	/*
 	 * Object with @fid is getting deleted, we may want to do something
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index de808ee..a288995 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -930,7 +930,8 @@ static inline int obd_fid_fini(struct obd_device *obd)
 	return rc;
 }
 
-static inline int obd_fid_alloc(struct obd_export *exp,
+static inline int obd_fid_alloc(const struct lu_env *env,
+				struct obd_export *exp,
 				struct lu_fid *fid,
 				struct md_op_data *op_data)
 {
@@ -939,7 +940,7 @@ static inline int obd_fid_alloc(struct obd_export *exp,
 	EXP_CHECK_DT_OP(exp, fid_alloc);
 	EXP_COUNTER_INCREMENT(exp, fid_alloc);
 
-	rc = OBP(exp->exp_obd, fid_alloc)(exp, fid, op_data);
+	rc = OBP(exp->exp_obd, fid_alloc)(env, exp, fid, op_data);
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 257c9a4..47fbcd2 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -883,7 +883,6 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
 	case LMV_USER_MAGIC:
-	case LMV_MAGIC_MIGRATE:
 		if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC)
 			lustre_swab_lmv_user_md((struct lmv_user_md *)lmm);
 		break;
@@ -1471,7 +1470,7 @@ lmv_out_free:
 
 		rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize, &request,
 				      valid);
-		if (rc && rc != -ENODATA)
+		if (rc)
 			goto finish_req;
 
 		/* Get default LMV EA */
@@ -1490,14 +1489,7 @@ lmv_out_free:
 			goto finish_req;
 		}
 
-		/* Get normal LMV EA */
-		if (rc == -ENODATA) {
-			stripe_count = 1;
-		} else {
-			LASSERT(lmm);
-			stripe_count = lmv_mds_md_stripe_count_get(lmm);
-		}
-
+		stripe_count = lmv_mds_md_stripe_count_get(lmm);
 		lum_size = lmv_user_md_size(stripe_count, LMV_MAGIC_V1);
 		tmp = kzalloc(lum_size, GFP_NOFS);
 		if (!tmp) {
@@ -1505,28 +1497,25 @@ lmv_out_free:
 			goto finish_req;
 		}
 
-		tmp->lum_magic = LMV_MAGIC_V1;
-		tmp->lum_stripe_count = 1;
 		mdt_index = ll_get_mdt_idx(inode);
 		if (mdt_index < 0) {
 			rc = -ENOMEM;
 			goto out_tmp;
 		}
+		tmp->lum_magic = LMV_MAGIC_V1;
+		tmp->lum_stripe_count = 0;
 		tmp->lum_stripe_offset = mdt_index;
-		tmp->lum_objects[0].lum_mds = mdt_index;
-		tmp->lum_objects[0].lum_fid = *ll_inode2fid(inode);
-		for (i = 1; i < stripe_count; i++) {
-			struct lmv_mds_md_v1 *lmm1;
-
-			lmm1 = &lmm->lmv_md_v1;
-			mdt_index = ll_get_mdt_idx_by_fid(sbi,
-							  &lmm1->lmv_stripe_fids[i]);
+		for (i = 0; i < stripe_count; i++) {
+			struct lu_fid   *fid;
+
+			fid = &lmm->lmv_md_v1.lmv_stripe_fids[i];
+			mdt_index = ll_get_mdt_idx_by_fid(sbi, fid);
 			if (mdt_index < 0) {
 				rc = mdt_index;
 				goto out_tmp;
 			}
 			tmp->lum_objects[i].lum_mds = mdt_index;
-			tmp->lum_objects[i].lum_fid = lmm1->lmv_stripe_fids[i];
+			tmp->lum_objects[i].lum_fid = *fid;
 			tmp->lum_stripe_count++;
 		}
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index ea79ca3..2f6e770 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1042,9 +1042,9 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 		ll_lli_init(lli);
 
 		LASSERT(lsm);
-		/* master stripe FID */
-		lli->lli_pfid = lsm->lsm_md_oinfo[0].lmo_fid;
-		CDEBUG(D_INODE, "lli %p master "DFID" slave "DFID"\n",
+		/* master object FID */
+		lli->lli_pfid = body->fid1;
+		CDEBUG(D_INODE, "lli %p slave "DFID" master "DFID"\n",
 		       lli, PFID(fid), PFID(&lli->lli_pfid));
 		unlock_new_inode(inode);
 	}
@@ -1067,23 +1067,24 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md)
 	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
 		fid = &lsm->lsm_md_oinfo[i].lmo_fid;
 		LASSERT(!lsm->lsm_md_oinfo[i].lmo_root);
-		if (!i) {
+		/* Unfortunately ll_iget will call ll_update_inode,
+		 * where the initialization of slave inode is slightly
+		 * different, so it reset lsm_md to NULL to avoid
+		 * initializing lsm for slave inode.
+		 */
+		/* For migrating inode, master stripe and master object will
+		 * be same, so we only need assign this inode
+		 */
+		if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION && !i)
 			lsm->lsm_md_oinfo[i].lmo_root = inode;
-		} else {
-			/*
-			 * Unfortunately ll_iget will call ll_update_inode,
-			 * where the initialization of slave inode is slightly
-			 * different, so it reset lsm_md to NULL to avoid
-			 * initializing lsm for slave inode.
-			 */
+		else
 			lsm->lsm_md_oinfo[i].lmo_root =
 				ll_iget_anon_dir(inode->i_sb, fid, md);
-			if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) {
-				int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root);
+		if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) {
+			int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root);
 
-				lsm->lsm_md_oinfo[i].lmo_root = NULL;
-				return rc;
-			}
+			lsm->lsm_md_oinfo[i].lmo_root = NULL;
+			return rc;
 		}
 	}
 
@@ -1113,7 +1114,7 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct lmv_stripe_md *lsm = md->lmv;
-	int idx, rc;
+	int rc;
 
 	LASSERT(S_ISDIR(inode->i_mode));
 	CDEBUG(D_INODE, "update lsm %p of "DFID"\n", lli->lli_lsm_md,
@@ -1123,7 +1124,8 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	if (!lsm) {
 		if (!lli->lli_lsm_md) {
 			return 0;
-		} else if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+		} else if (lli->lli_lsm_md->lsm_md_hash_type &
+			   LMV_HASH_FLAG_MIGRATION) {
 			/*
 			 * migration is done, the temporay MIGRATE layout has
 			 * been removed
@@ -1160,43 +1162,40 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	}
 
 	/* Compare the old and new stripe information */
-	if (!lli_lsm_md_eq(lli->lli_lsm_md, lsm)) {
-		CERROR("inode %p %lu mismatch\n"
-		       "    new(%p)     vs     lli_lsm_md(%p):\n"
-		       "    magic:      %x                   %x\n"
-		       "    count:      %x                   %x\n"
-		       "    master:     %x                   %x\n"
-		       "    hash_type:  %x                   %x\n"
-		       "    layout:     %x                   %x\n"
-		       "    pool:       %s                   %s\n",
-		       inode, inode->i_ino, lsm, lli->lli_lsm_md,
-		       lsm->lsm_md_magic, lli->lli_lsm_md->lsm_md_magic,
+	if (!lsm_md_eq(lli->lli_lsm_md, lsm)) {
+		struct lmv_stripe_md *old_lsm = lli->lli_lsm_md;
+		int idx;
+
+		CERROR("%s: inode "DFID"(%p)'s lmv layout mismatch (%p)/(%p) magic:0x%x/0x%x stripe count: %d/%d master_mdt: %d/%d hash_type:0x%x/0x%x layout: 0x%x/0x%x pool:%s/%s\n",
+		       ll_get_fsname(inode->i_sb, NULL, 0), PFID(&lli->lli_fid),
+		       inode, lsm, old_lsm,
+		       lsm->lsm_md_magic, old_lsm->lsm_md_magic,
 		       lsm->lsm_md_stripe_count,
-		       lli->lli_lsm_md->lsm_md_stripe_count,
+		       old_lsm->lsm_md_stripe_count,
 		       lsm->lsm_md_master_mdt_index,
-		       lli->lli_lsm_md->lsm_md_master_mdt_index,
-		       lsm->lsm_md_hash_type, lli->lli_lsm_md->lsm_md_hash_type,
+		       old_lsm->lsm_md_master_mdt_index,
+		       lsm->lsm_md_hash_type, old_lsm->lsm_md_hash_type,
 		       lsm->lsm_md_layout_version,
-		       lli->lli_lsm_md->lsm_md_layout_version,
+		       old_lsm->lsm_md_layout_version,
 		       lsm->lsm_md_pool_name,
-		       lli->lli_lsm_md->lsm_md_pool_name);
-		return -EIO;
-	}
+		       old_lsm->lsm_md_pool_name);
+
+		for (idx = 0; idx < old_lsm->lsm_md_stripe_count; idx++) {
+			CERROR("%s: sub FIDs in old lsm idx %d, old: "DFID"\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
+			       PFID(&old_lsm->lsm_md_oinfo[idx].lmo_fid));
+		}
 
-	for (idx = 0; idx < lli->lli_lsm_md->lsm_md_stripe_count; idx++) {
-		if (!lu_fid_eq(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid,
-			       &lsm->lsm_md_oinfo[idx].lmo_fid)) {
-			CERROR("%s: FID in lsm mismatch idx %d, old: "DFID" new:"DFID"\n",
+		for (idx = 0; idx < lsm->lsm_md_stripe_count; idx++) {
+			CERROR("%s: sub FIDs in new lsm idx %d, new: "DFID"\n",
 			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
-			       PFID(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid),
 			       PFID(&lsm->lsm_md_oinfo[idx].lmo_fid));
-			return -EIO;
 		}
+
+		return -EIO;
 	}
 
-	rc = md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
-			      md->body, ll_md_blocking_ast);
-	return rc;
+	return 0;
 }
 
 void ll_clear_inode(struct inode *inode)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index d7e165f..7f81e78 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -173,9 +173,6 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 	 * revalidate slaves has some problems, temporarily return,
 	 * we may not need that
 	 */
-	if (lsm->lsm_md_stripe_count <= 1)
-		return 0;
-
 	op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
 	if (!op_data)
 		return -ENOMEM;
@@ -194,14 +191,6 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 
 		fid = lsm->lsm_md_oinfo[i].lmo_fid;
 		inode = lsm->lsm_md_oinfo[i].lmo_root;
-		if (!i) {
-			if (mbody) {
-				body = mbody;
-				goto update;
-			} else {
-				goto release_lock;
-			}
-		}
 
 		/*
 		 * Prepare op_data for revalidating. Note that @fid2 shluld be
@@ -237,7 +226,7 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 			body = req_capsule_server_get(&req->rq_pill,
 						      &RMF_MDT_BODY);
 			LASSERT(body);
-update:
+
 			if (unlikely(body->nlink < 2)) {
 				CERROR("%s: nlink %d < 2 corrupt stripe %d "DFID":" DFID"\n",
 				       obd->obd_name, body->nlink, i,
@@ -256,10 +245,6 @@ update:
 				goto cleanup;
 			}
 
-			if (i)
-				md_set_lock_data(tgt->ltd_exp, &lockh->cookie,
-						 inode, NULL);
-
 			i_size_write(inode, body->size);
 			set_nlink(inode, body->nlink);
 			LTIME_S(inode->i_atime) = body->atime;
@@ -269,8 +254,8 @@ update:
 			if (req)
 				ptlrpc_req_finished(req);
 		}
-release_lock:
-		size += i_size_read(inode);
+
+		md_set_lock_data(tgt->ltd_exp, &lockh->cookie, inode, NULL);
 
 		if (i != 0)
 			nlink += inode->i_nlink - 2;
@@ -361,7 +346,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 		 * fid and setup FLD for it.
 		 */
 		op_data->op_fid3 = op_data->op_fid2;
-		rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc != 0)
 			return rc;
 	}
@@ -453,7 +438,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		}
 		return rc;
 	} else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm &&
-		   lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+		   lsm->lsm_md_magic & LMV_HASH_FLAG_MIGRATION) {
 		/*
 		 * For migrating directory, if it can not find the child in
 		 * the source directory(master stripe), try the targeting
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index ed02927..dbd1da6 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -52,8 +52,8 @@ int lmv_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 
 int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds);
 int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
-int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data);
+int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data);
 
 int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		  const union lmv_mds_md *lmm, int stripe_count);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index e516a84..03594f0 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -80,41 +80,35 @@ lmv_hash_fnv1a(unsigned int count, const char *name, int namelen)
 	return do_div(hash, count);
 }
 
-int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
-			     unsigned int max_mdt_index,
+int lmv_name_to_stripe_index(__u32 lmv_hash_type, unsigned int stripe_count,
 			     const char *name, int namelen)
 {
+	__u32 hash_type = lmv_hash_type & LMV_HASH_TYPE_MASK;
 	int idx;
 
 	LASSERT(namelen > 0);
-	if (max_mdt_index <= 1)
+	if (stripe_count <= 1)
 		return 0;
 
-	switch (hashtype) {
+	/* for migrating object, always start from 0 stripe */
+	if (lmv_hash_type & LMV_HASH_FLAG_MIGRATION)
+		return 0;
+
+	switch (hash_type) {
 	case LMV_HASH_TYPE_ALL_CHARS:
-		idx = lmv_hash_all_chars(max_mdt_index, name, namelen);
+		idx = lmv_hash_all_chars(stripe_count, name, namelen);
 		break;
 	case LMV_HASH_TYPE_FNV_1A_64:
-		idx = lmv_hash_fnv1a(max_mdt_index, name, namelen);
+		idx = lmv_hash_fnv1a(stripe_count, name, namelen);
 		break;
-	/*
-	 * LMV_HASH_TYPE_MIGRATION means the file is being migrated,
-	 * and the file should be accessed by client, except for
-	 * lookup(see lmv_intent_lookup), return -EACCES here
-	 */
-	case LMV_HASH_TYPE_MIGRATION:
-		CERROR("%.*s is being migrated: rc = %d\n", namelen,
-		       name, -EACCES);
-		return -EACCES;
 	default:
-		CERROR("Unknown hash type 0x%x\n", hashtype);
+		CERROR("Unknown hash type 0x%x\n", hash_type);
 		return -EINVAL;
 	}
 
 	CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name,
-	       hashtype, idx);
+	       hash_type, idx);
 
-	LASSERT(idx < max_mdt_index);
 	return idx;
 }
 
@@ -1287,7 +1281,7 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds)
 	/*
 	 * Asking underlaying tgt layer to allocate new fid.
 	 */
-	rc = obd_fid_alloc(tgt->ltd_exp, fid, NULL);
+	rc = obd_fid_alloc(NULL, tgt->ltd_exp, fid, NULL);
 	if (rc > 0) {
 		LASSERT(fid_is_sane(fid));
 		rc = 0;
@@ -1298,8 +1292,8 @@ out:
 	return rc;
 }
 
-int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data)
+int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data)
 {
 	struct obd_device     *obd = class_exp2obd(exp);
 	struct lmv_obd	*lmv = &obd->u.lmv;
@@ -1695,9 +1689,7 @@ struct lmv_tgt_desc
 	struct lmv_stripe_md *lsm = op_data->op_mea1;
 	struct lmv_tgt_desc *tgt;
 
-	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
-	    !op_data->op_namelen ||
-	    lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+	if (!lsm || !op_data->op_namelen) {
 		tgt = lmv_find_target(lmv, fid);
 		if (IS_ERR(tgt))
 			return tgt;
@@ -1737,7 +1729,7 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
 	       op_data->op_mds);
 
-	rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+	rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 	if (rc)
 		return rc;
 
@@ -2060,7 +2052,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	if (op_data->op_cli_flags & CLI_MIGRATE) {
 		LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n",
 			 PFID(&op_data->op_fid3));
-		rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc)
 			return rc;
 		src_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid3);
@@ -2365,8 +2357,7 @@ retry:
 			return PTR_ERR(tgt);
 
 		/* For striped dir, we need to locate the parent as well */
-		if (op_data->op_mea1 &&
-		    op_data->op_mea1->lsm_md_stripe_count > 1) {
+		if (op_data->op_mea1) {
 			struct lmv_tgt_desc *tmp;
 
 			LASSERT(op_data->op_name && op_data->op_namelen);
@@ -2679,9 +2670,13 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
 	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmm1->lmv_master_mdt_index);
 	lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
 	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
+	fid_le_to_cpu(&lsm->lsm_md_master_fid, &lmm1->lmv_master_fid);
 	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
 			sizeof(lsm->lsm_md_pool_name));
 
+	if (!fid_is_sane(&lsm->lsm_md_master_fid))
+		return -EPROTO;
+
 	if (cplen >= sizeof(lsm->lsm_md_pool_name))
 		return -E2BIG;
 
@@ -2719,7 +2714,13 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		int i;
 
 		for (i = 1; i < lsm->lsm_md_stripe_count; i++) {
-			if (lsm->lsm_md_oinfo[i].lmo_root)
+			/*
+			 * For migrating inode, the master stripe and master
+			 * object will be the same, so do not need iput, see
+			 * ll_update_lsm_md
+			 */
+			if (!(lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION &&
+			      !i) && lsm->lsm_md_oinfo[i].lmo_root)
 				iput(lsm->lsm_md_oinfo[i].lmo_root);
 		}
 
@@ -2739,9 +2740,11 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		return 0;
 	}
 
+	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_STRIPE)
+		return -EPERM;
+
 	/* Unpack memmd */
 	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1 &&
-	    le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_MIGRATE &&
 	    le32_to_cpu(lmm->lmv_magic) != LMV_USER_MAGIC) {
 		CERROR("%s: invalid lmv magic %x: rc = %d\n",
 		       exp->exp_obd->obd_name, le32_to_cpu(lmm->lmv_magic),
@@ -2749,8 +2752,7 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		return -EIO;
 	}
 
-	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_V1 ||
-	    le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_MIGRATE)
+	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_V1)
 		lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
 	else
 		/**
@@ -2769,7 +2771,6 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1);
 		break;
 	default:
@@ -3067,9 +3068,6 @@ static int lmv_quotacheck(struct obd_device *unused, struct obd_export *exp,
 int lmv_update_lsm_md(struct obd_export *exp, struct lmv_stripe_md *lsm,
 		      struct mdt_body *body, ldlm_blocking_callback cb_blocking)
 {
-	if (lsm->lsm_md_stripe_count <= 1)
-		return 0;
-
 	return lmv_revalidate_slaves(exp, body, lsm, cb_blocking, 0);
 }
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 53b4063..00e8435 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -87,8 +87,8 @@ int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid,
 			    struct list_head *cancels, enum ldlm_mode  mode,
 			    __u64 bits);
 /* mdc/mdc_request.c */
-int mdc_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data);
+int mdc_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data);
 struct obd_client_handle;
 
 int mdc_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index d8406d5..20b15f6 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1144,7 +1144,7 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 
 	/* For case if upper layer did not alloc fid, do it now. */
 	if (!fid_is_sane(&op_data->op_fid2) && it->it_op & IT_CREAT) {
-		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = mdc_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc < 0) {
 			CERROR("Can't alloc new fid, rc %d\n", rc);
 			return rc;
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index 5dba2c8..c3781a6 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -214,11 +214,9 @@ int mdc_create(struct obd_export *exp, struct md_op_data *op_data,
 		 * mdc_fid_alloc() may return errno 1 in case of switch to new
 		 * sequence, handle this.
 		 */
-		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
-		if (rc < 0) {
-			CERROR("Can't alloc new fid, rc %d\n", rc);
+		rc = mdc_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
+		if (rc < 0)
 			return rc;
-		}
 	}
 
 rebuild:
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 394ef3c..e26d0d7 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -765,7 +765,7 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data,
 		req_fmt = &RQF_MDS_RELEASE_CLOSE;
 
 		/* allocate a FID for volatile file */
-		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = mdc_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc < 0) {
 			CERROR("%s: "DFID" failed to allocate FID: %d\n",
 			       obd->obd_name, PFID(&op_data->op_fid1), rc);
@@ -2203,13 +2203,13 @@ static int mdc_import_event(struct obd_device *obd, struct obd_import *imp,
 	return rc;
 }
 
-int mdc_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data)
+int mdc_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data)
 {
 	struct client_obd *cli = &exp->exp_obd->u.cli;
 	struct lu_client_seq *seq = cli->cl_seq;
 
-	return seq_client_alloc_fid(NULL, seq, fid);
+	return seq_client_alloc_fid(env, seq, fid);
 }
 
 static struct obd_uuid *mdc_get_uuid(struct obd_export *exp)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Separate master stripe with master object, so
1. stripeEA only exists on master object.
2. sub-stripe object will be inserted into master object
as sub-directory, and it can get the master object by "..".

By this, it will remove those specilities for stripe0 in
LMV and LOD. And also simplify LFSCK, i.e. consistency check
would be easier.

When then master object becomes an orphan, we should
mark all of its sub-stripes as dead object as well,
otherwise client might still be able to create files
under these stripes.

A few fixes for striped directory layout lock:

 1. stripe 0 should be locked as EX, same as other stripes.
 2. Acquire the layout for directory, when it is being unliked.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4690
Reviewed-on: http://review.whamcloud.com/9511
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   64 +++++++++-----
 .../lustre/lustre/include/lustre/lustre_user.h     |    3 +-
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   25 +++++-
 drivers/staging/lustre/lustre/include/obd.h        |    4 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |    5 +-
 drivers/staging/lustre/lustre/llite/dir.c          |   31 ++-----
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   89 ++++++++++----------
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   25 +-----
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |    4 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   70 ++++++++--------
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    4 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |    2 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |    6 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    8 +-
 14 files changed, 174 insertions(+), 166 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 3444add..8736826 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2497,18 +2497,52 @@ struct lmv_desc {
 	struct obd_uuid ld_uuid;
 };
 
-/* lmv structures */
-#define LMV_MAGIC_V1	0x0CD10CD0	/* normal stripe lmv magic */
-#define LMV_USER_MAGIC	0x0CD20CD0	/* default lmv magic*/
-#define LMV_MAGIC_MIGRATE	0x0CD30CD0	/* migrate stripe lmv magic */
-#define LMV_MAGIC	LMV_MAGIC_V1
+/* LMV layout EA, and it will be stored both in master and slave object */
+struct lmv_mds_md_v1 {
+	__u32 lmv_magic;
+	__u32 lmv_stripe_count;
+	__u32 lmv_master_mdt_index;	/* On master object, it is master
+					 * MDT index, on slave object, it
+					 * is stripe index of the slave obj
+					 */
+	__u32 lmv_hash_type;		/* dir stripe policy, i.e. indicate
+					 * which hash function to be used,
+					 * Note: only lower 16 bits is being
+					 * used for now. Higher 16 bits will
+					 * be used to mark the object status,
+					 * for example migrating or dead.
+					 */
+	__u32 lmv_layout_version;	/* Used for directory restriping */
+	__u32 lmv_padding;
+	struct lu_fid lmv_master_fid;	/* The FID of the master object, which
+					 * is the namespace-visible dir FID
+					 */
+	char lmv_pool_name[LOV_MAXPOOLNAME];	/* pool name */
+	struct lu_fid lmv_stripe_fids[0];	/* FIDs for each stripe */
+};
 
+#define LMV_MAGIC_V1	 0x0CD20CD0	/* normal stripe lmv magic */
+#define LMV_MAGIC	 LMV_MAGIC_V1
+
+/* #define LMV_USER_MAGIC 0x0CD30CD0 */
+#define LMV_MAGIC_STRIPE 0x0CD40CD0	/* magic for dir sub_stripe */
+
+/*
+ *Right now only the lower part(0-16bits) of lmv_hash_type is being used,
+ * and the higher part will be the flag to indicate the status of object,
+ * for example the object is being migrated. And the hash function
+ * might be interpreted differently with different flags.
+ */
 enum lmv_hash_type {
 	LMV_HASH_TYPE_ALL_CHARS = 1,
 	LMV_HASH_TYPE_FNV_1A_64 = 2,
-	LMV_HASH_TYPE_MIGRATION = 3,
 };
 
+#define LMV_HASH_TYPE_MASK		0x0000ffff
+
+#define LMV_HASH_FLAG_MIGRATION		0x80000000
+#define LMV_HASH_FLAG_DEAD		0x40000000
+
 #define LMV_HASH_NAME_ALL_CHARS		"all_char"
 #define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
 
@@ -2540,19 +2574,6 @@ static inline __u64 lustre_hash_fnv_1a_64(const void *buf, size_t size)
 	return hash;
 }
 
-struct lmv_mds_md_v1 {
-	__u32 lmv_magic;
-	__u32 lmv_stripe_count;		/* stripe count */
-	__u32 lmv_master_mdt_index;	/* master MDT index */
-	__u32 lmv_hash_type;		/* dir stripe policy, i.e. indicate
-					 * which hash function to be used
-					 */
-	__u32 lmv_layout_version;	/* Used for directory restriping */
-	__u32 lmv_padding;
-	char lmv_pool_name[LOV_MAXPOOLNAME];	/* pool name */
-	struct lu_fid lmv_stripe_fids[0];	/* FIDs for each stripe */
-};
-
 union lmv_mds_md {
 	__u32			lmv_magic;
 	struct lmv_mds_md_v1	lmv_md_v1;
@@ -2566,8 +2587,7 @@ static inline ssize_t lmv_mds_md_size(int stripe_count, unsigned int lmm_magic)
 	ssize_t len = -EINVAL;
 
 	switch (lmm_magic) {
-	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE: {
+	case LMV_MAGIC_V1: {
 		struct lmv_mds_md_v1 *lmm1;
 
 		len = sizeof(*lmm1);
@@ -2583,7 +2603,6 @@ static inline int lmv_mds_md_stripe_count_get(const union lmv_mds_md *lmm)
 {
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		return le32_to_cpu(lmm->lmv_md_v1.lmv_stripe_count);
 	case LMV_USER_MAGIC:
 		return le32_to_cpu(lmm->lmv_user_md.lum_stripe_count);
@@ -2599,7 +2618,6 @@ static inline int lmv_mds_md_stripe_count_set(union lmv_mds_md *lmm,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		lmm->lmv_md_v1.lmv_stripe_count = cpu_to_le32(stripe_count);
 		break;
 	case LMV_USER_MAGIC:
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 75a78a3..4b2553c 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -269,8 +269,7 @@ struct ost_id {
 #define LOV_USER_MAGIC_JOIN_V1 0x0BD20BD0
 #define LOV_USER_MAGIC_V3 0x0BD30BD0
 
-#define LMV_MAGIC_V1      0x0CD10CD0    /*normal stripe lmv magic */
-#define LMV_USER_MAGIC    0x0CD20CD0    /*default lmv magic*/
+#define LMV_USER_MAGIC    0x0CD30CD0    /*default lmv magic*/
 
 #define LOV_PATTERN_RAID0 0x001
 #define LOV_PATTERN_RAID1 0x002
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index feee981..1dd3e92 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -48,10 +48,33 @@ struct lmv_stripe_md {
 	__u32	lsm_md_layout_version;
 	__u32	lsm_md_default_count;
 	__u32	lsm_md_default_index;
+	struct lu_fid lsm_md_master_fid;
 	char	lsm_md_pool_name[LOV_MAXPOOLNAME];
 	struct lmv_oinfo lsm_md_oinfo[0];
 };
 
+static inline bool
+lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
+{
+	int idx;
+
+	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
+	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
+	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
+	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
+	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
+	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
+		return false;
+
+	for (idx = 0; idx < lsm1->lsm_md_stripe_count; idx++) {
+		if (!lu_fid_eq(&lsm1->lsm_md_oinfo[idx].lmo_fid,
+			       &lsm2->lsm_md_oinfo[idx].lmo_fid))
+			return false;
+	}
+
+	return true;
+}
+
 union lmv_mds_md;
 
 int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
@@ -106,7 +129,6 @@ static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
 {
 	switch (lmv_src->lmv_magic) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
@@ -119,7 +141,6 @@ static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
 {
 	switch (le32_to_cpu(lmv_src->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		lmv1_le_to_cpu(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
 		break;
 	default:
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 0dae273..52020a9 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -917,8 +917,8 @@ struct obd_ops {
 	int (*fid_fini)(struct obd_device *obd);
 
 	/* Allocate new fid according to passed @hint. */
-	int (*fid_alloc)(struct obd_export *exp, struct lu_fid *fid,
-			 struct md_op_data *op_data);
+	int (*fid_alloc)(const struct lu_env *env, struct obd_export *exp,
+			 struct lu_fid *fid, struct md_op_data *op_data);
 
 	/*
 	 * Object with @fid is getting deleted, we may want to do something
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index de808ee..a288995 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -930,7 +930,8 @@ static inline int obd_fid_fini(struct obd_device *obd)
 	return rc;
 }
 
-static inline int obd_fid_alloc(struct obd_export *exp,
+static inline int obd_fid_alloc(const struct lu_env *env,
+				struct obd_export *exp,
 				struct lu_fid *fid,
 				struct md_op_data *op_data)
 {
@@ -939,7 +940,7 @@ static inline int obd_fid_alloc(struct obd_export *exp,
 	EXP_CHECK_DT_OP(exp, fid_alloc);
 	EXP_COUNTER_INCREMENT(exp, fid_alloc);
 
-	rc = OBP(exp->exp_obd, fid_alloc)(exp, fid, op_data);
+	rc = OBP(exp->exp_obd, fid_alloc)(env, exp, fid, op_data);
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 257c9a4..47fbcd2 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -883,7 +883,6 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
 	case LMV_USER_MAGIC:
-	case LMV_MAGIC_MIGRATE:
 		if (cpu_to_le32(LMV_USER_MAGIC) != LMV_USER_MAGIC)
 			lustre_swab_lmv_user_md((struct lmv_user_md *)lmm);
 		break;
@@ -1471,7 +1470,7 @@ lmv_out_free:
 
 		rc = ll_dir_getstripe(inode, (void **)&lmm, &lmmsize, &request,
 				      valid);
-		if (rc && rc != -ENODATA)
+		if (rc)
 			goto finish_req;
 
 		/* Get default LMV EA */
@@ -1490,14 +1489,7 @@ lmv_out_free:
 			goto finish_req;
 		}
 
-		/* Get normal LMV EA */
-		if (rc == -ENODATA) {
-			stripe_count = 1;
-		} else {
-			LASSERT(lmm);
-			stripe_count = lmv_mds_md_stripe_count_get(lmm);
-		}
-
+		stripe_count = lmv_mds_md_stripe_count_get(lmm);
 		lum_size = lmv_user_md_size(stripe_count, LMV_MAGIC_V1);
 		tmp = kzalloc(lum_size, GFP_NOFS);
 		if (!tmp) {
@@ -1505,28 +1497,25 @@ lmv_out_free:
 			goto finish_req;
 		}
 
-		tmp->lum_magic = LMV_MAGIC_V1;
-		tmp->lum_stripe_count = 1;
 		mdt_index = ll_get_mdt_idx(inode);
 		if (mdt_index < 0) {
 			rc = -ENOMEM;
 			goto out_tmp;
 		}
+		tmp->lum_magic = LMV_MAGIC_V1;
+		tmp->lum_stripe_count = 0;
 		tmp->lum_stripe_offset = mdt_index;
-		tmp->lum_objects[0].lum_mds = mdt_index;
-		tmp->lum_objects[0].lum_fid = *ll_inode2fid(inode);
-		for (i = 1; i < stripe_count; i++) {
-			struct lmv_mds_md_v1 *lmm1;
-
-			lmm1 = &lmm->lmv_md_v1;
-			mdt_index = ll_get_mdt_idx_by_fid(sbi,
-							  &lmm1->lmv_stripe_fids[i]);
+		for (i = 0; i < stripe_count; i++) {
+			struct lu_fid   *fid;
+
+			fid = &lmm->lmv_md_v1.lmv_stripe_fids[i];
+			mdt_index = ll_get_mdt_idx_by_fid(sbi, fid);
 			if (mdt_index < 0) {
 				rc = mdt_index;
 				goto out_tmp;
 			}
 			tmp->lum_objects[i].lum_mds = mdt_index;
-			tmp->lum_objects[i].lum_fid = lmm1->lmv_stripe_fids[i];
+			tmp->lum_objects[i].lum_fid = *fid;
 			tmp->lum_stripe_count++;
 		}
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index ea79ca3..2f6e770 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1042,9 +1042,9 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 		ll_lli_init(lli);
 
 		LASSERT(lsm);
-		/* master stripe FID */
-		lli->lli_pfid = lsm->lsm_md_oinfo[0].lmo_fid;
-		CDEBUG(D_INODE, "lli %p master "DFID" slave "DFID"\n",
+		/* master object FID */
+		lli->lli_pfid = body->fid1;
+		CDEBUG(D_INODE, "lli %p slave "DFID" master "DFID"\n",
 		       lli, PFID(fid), PFID(&lli->lli_pfid));
 		unlock_new_inode(inode);
 	}
@@ -1067,23 +1067,24 @@ static int ll_init_lsm_md(struct inode *inode, struct lustre_md *md)
 	for (i = 0; i < lsm->lsm_md_stripe_count; i++) {
 		fid = &lsm->lsm_md_oinfo[i].lmo_fid;
 		LASSERT(!lsm->lsm_md_oinfo[i].lmo_root);
-		if (!i) {
+		/* Unfortunately ll_iget will call ll_update_inode,
+		 * where the initialization of slave inode is slightly
+		 * different, so it reset lsm_md to NULL to avoid
+		 * initializing lsm for slave inode.
+		 */
+		/* For migrating inode, master stripe and master object will
+		 * be same, so we only need assign this inode
+		 */
+		if (lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION && !i)
 			lsm->lsm_md_oinfo[i].lmo_root = inode;
-		} else {
-			/*
-			 * Unfortunately ll_iget will call ll_update_inode,
-			 * where the initialization of slave inode is slightly
-			 * different, so it reset lsm_md to NULL to avoid
-			 * initializing lsm for slave inode.
-			 */
+		else
 			lsm->lsm_md_oinfo[i].lmo_root =
 				ll_iget_anon_dir(inode->i_sb, fid, md);
-			if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) {
-				int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root);
+		if (IS_ERR(lsm->lsm_md_oinfo[i].lmo_root)) {
+			int rc = PTR_ERR(lsm->lsm_md_oinfo[i].lmo_root);
 
-				lsm->lsm_md_oinfo[i].lmo_root = NULL;
-				return rc;
-			}
+			lsm->lsm_md_oinfo[i].lmo_root = NULL;
+			return rc;
 		}
 	}
 
@@ -1113,7 +1114,7 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct lmv_stripe_md *lsm = md->lmv;
-	int idx, rc;
+	int rc;
 
 	LASSERT(S_ISDIR(inode->i_mode));
 	CDEBUG(D_INODE, "update lsm %p of "DFID"\n", lli->lli_lsm_md,
@@ -1123,7 +1124,8 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	if (!lsm) {
 		if (!lli->lli_lsm_md) {
 			return 0;
-		} else if (lli->lli_lsm_md->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+		} else if (lli->lli_lsm_md->lsm_md_hash_type &
+			   LMV_HASH_FLAG_MIGRATION) {
 			/*
 			 * migration is done, the temporay MIGRATE layout has
 			 * been removed
@@ -1160,43 +1162,40 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	}
 
 	/* Compare the old and new stripe information */
-	if (!lli_lsm_md_eq(lli->lli_lsm_md, lsm)) {
-		CERROR("inode %p %lu mismatch\n"
-		       "    new(%p)     vs     lli_lsm_md(%p):\n"
-		       "    magic:      %x                   %x\n"
-		       "    count:      %x                   %x\n"
-		       "    master:     %x                   %x\n"
-		       "    hash_type:  %x                   %x\n"
-		       "    layout:     %x                   %x\n"
-		       "    pool:       %s                   %s\n",
-		       inode, inode->i_ino, lsm, lli->lli_lsm_md,
-		       lsm->lsm_md_magic, lli->lli_lsm_md->lsm_md_magic,
+	if (!lsm_md_eq(lli->lli_lsm_md, lsm)) {
+		struct lmv_stripe_md *old_lsm = lli->lli_lsm_md;
+		int idx;
+
+		CERROR("%s: inode "DFID"(%p)'s lmv layout mismatch (%p)/(%p) magic:0x%x/0x%x stripe count: %d/%d master_mdt: %d/%d hash_type:0x%x/0x%x layout: 0x%x/0x%x pool:%s/%s\n",
+		       ll_get_fsname(inode->i_sb, NULL, 0), PFID(&lli->lli_fid),
+		       inode, lsm, old_lsm,
+		       lsm->lsm_md_magic, old_lsm->lsm_md_magic,
 		       lsm->lsm_md_stripe_count,
-		       lli->lli_lsm_md->lsm_md_stripe_count,
+		       old_lsm->lsm_md_stripe_count,
 		       lsm->lsm_md_master_mdt_index,
-		       lli->lli_lsm_md->lsm_md_master_mdt_index,
-		       lsm->lsm_md_hash_type, lli->lli_lsm_md->lsm_md_hash_type,
+		       old_lsm->lsm_md_master_mdt_index,
+		       lsm->lsm_md_hash_type, old_lsm->lsm_md_hash_type,
 		       lsm->lsm_md_layout_version,
-		       lli->lli_lsm_md->lsm_md_layout_version,
+		       old_lsm->lsm_md_layout_version,
 		       lsm->lsm_md_pool_name,
-		       lli->lli_lsm_md->lsm_md_pool_name);
-		return -EIO;
-	}
+		       old_lsm->lsm_md_pool_name);
+
+		for (idx = 0; idx < old_lsm->lsm_md_stripe_count; idx++) {
+			CERROR("%s: sub FIDs in old lsm idx %d, old: "DFID"\n",
+			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
+			       PFID(&old_lsm->lsm_md_oinfo[idx].lmo_fid));
+		}
 
-	for (idx = 0; idx < lli->lli_lsm_md->lsm_md_stripe_count; idx++) {
-		if (!lu_fid_eq(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid,
-			       &lsm->lsm_md_oinfo[idx].lmo_fid)) {
-			CERROR("%s: FID in lsm mismatch idx %d, old: "DFID" new:"DFID"\n",
+		for (idx = 0; idx < lsm->lsm_md_stripe_count; idx++) {
+			CERROR("%s: sub FIDs in new lsm idx %d, new: "DFID"\n",
 			       ll_get_fsname(inode->i_sb, NULL, 0), idx,
-			       PFID(&lli->lli_lsm_md->lsm_md_oinfo[idx].lmo_fid),
 			       PFID(&lsm->lsm_md_oinfo[idx].lmo_fid));
-			return -EIO;
 		}
+
+		return -EIO;
 	}
 
-	rc = md_update_lsm_md(ll_i2mdexp(inode), ll_i2info(inode)->lli_lsm_md,
-			      md->body, ll_md_blocking_ast);
-	return rc;
+	return 0;
 }
 
 void ll_clear_inode(struct inode *inode)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index d7e165f..7f81e78 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -173,9 +173,6 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 	 * revalidate slaves has some problems, temporarily return,
 	 * we may not need that
 	 */
-	if (lsm->lsm_md_stripe_count <= 1)
-		return 0;
-
 	op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
 	if (!op_data)
 		return -ENOMEM;
@@ -194,14 +191,6 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 
 		fid = lsm->lsm_md_oinfo[i].lmo_fid;
 		inode = lsm->lsm_md_oinfo[i].lmo_root;
-		if (!i) {
-			if (mbody) {
-				body = mbody;
-				goto update;
-			} else {
-				goto release_lock;
-			}
-		}
 
 		/*
 		 * Prepare op_data for revalidating. Note that @fid2 shluld be
@@ -237,7 +226,7 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 			body = req_capsule_server_get(&req->rq_pill,
 						      &RMF_MDT_BODY);
 			LASSERT(body);
-update:
+
 			if (unlikely(body->nlink < 2)) {
 				CERROR("%s: nlink %d < 2 corrupt stripe %d "DFID":" DFID"\n",
 				       obd->obd_name, body->nlink, i,
@@ -256,10 +245,6 @@ update:
 				goto cleanup;
 			}
 
-			if (i)
-				md_set_lock_data(tgt->ltd_exp, &lockh->cookie,
-						 inode, NULL);
-
 			i_size_write(inode, body->size);
 			set_nlink(inode, body->nlink);
 			LTIME_S(inode->i_atime) = body->atime;
@@ -269,8 +254,8 @@ update:
 			if (req)
 				ptlrpc_req_finished(req);
 		}
-release_lock:
-		size += i_size_read(inode);
+
+		md_set_lock_data(tgt->ltd_exp, &lockh->cookie, inode, NULL);
 
 		if (i != 0)
 			nlink += inode->i_nlink - 2;
@@ -361,7 +346,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 		 * fid and setup FLD for it.
 		 */
 		op_data->op_fid3 = op_data->op_fid2;
-		rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc != 0)
 			return rc;
 	}
@@ -453,7 +438,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		}
 		return rc;
 	} else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm &&
-		   lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+		   lsm->lsm_md_magic & LMV_HASH_FLAG_MIGRATION) {
 		/*
 		 * For migrating directory, if it can not find the child in
 		 * the source directory(master stripe), try the targeting
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index ed02927..dbd1da6 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -52,8 +52,8 @@ int lmv_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 
 int lmv_fld_lookup(struct lmv_obd *lmv, const struct lu_fid *fid, u32 *mds);
 int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds);
-int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data);
+int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data);
 
 int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		  const union lmv_mds_md *lmm, int stripe_count);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index e516a84..03594f0 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -80,41 +80,35 @@ lmv_hash_fnv1a(unsigned int count, const char *name, int namelen)
 	return do_div(hash, count);
 }
 
-int lmv_name_to_stripe_index(enum lmv_hash_type hashtype,
-			     unsigned int max_mdt_index,
+int lmv_name_to_stripe_index(__u32 lmv_hash_type, unsigned int stripe_count,
 			     const char *name, int namelen)
 {
+	__u32 hash_type = lmv_hash_type & LMV_HASH_TYPE_MASK;
 	int idx;
 
 	LASSERT(namelen > 0);
-	if (max_mdt_index <= 1)
+	if (stripe_count <= 1)
 		return 0;
 
-	switch (hashtype) {
+	/* for migrating object, always start from 0 stripe */
+	if (lmv_hash_type & LMV_HASH_FLAG_MIGRATION)
+		return 0;
+
+	switch (hash_type) {
 	case LMV_HASH_TYPE_ALL_CHARS:
-		idx = lmv_hash_all_chars(max_mdt_index, name, namelen);
+		idx = lmv_hash_all_chars(stripe_count, name, namelen);
 		break;
 	case LMV_HASH_TYPE_FNV_1A_64:
-		idx = lmv_hash_fnv1a(max_mdt_index, name, namelen);
+		idx = lmv_hash_fnv1a(stripe_count, name, namelen);
 		break;
-	/*
-	 * LMV_HASH_TYPE_MIGRATION means the file is being migrated,
-	 * and the file should be accessed by client, except for
-	 * lookup(see lmv_intent_lookup), return -EACCES here
-	 */
-	case LMV_HASH_TYPE_MIGRATION:
-		CERROR("%.*s is being migrated: rc = %d\n", namelen,
-		       name, -EACCES);
-		return -EACCES;
 	default:
-		CERROR("Unknown hash type 0x%x\n", hashtype);
+		CERROR("Unknown hash type 0x%x\n", hash_type);
 		return -EINVAL;
 	}
 
 	CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name,
-	       hashtype, idx);
+	       hash_type, idx);
 
-	LASSERT(idx < max_mdt_index);
 	return idx;
 }
 
@@ -1287,7 +1281,7 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds)
 	/*
 	 * Asking underlaying tgt layer to allocate new fid.
 	 */
-	rc = obd_fid_alloc(tgt->ltd_exp, fid, NULL);
+	rc = obd_fid_alloc(NULL, tgt->ltd_exp, fid, NULL);
 	if (rc > 0) {
 		LASSERT(fid_is_sane(fid));
 		rc = 0;
@@ -1298,8 +1292,8 @@ out:
 	return rc;
 }
 
-int lmv_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data)
+int lmv_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data)
 {
 	struct obd_device     *obd = class_exp2obd(exp);
 	struct lmv_obd	*lmv = &obd->u.lmv;
@@ -1695,9 +1689,7 @@ struct lmv_tgt_desc
 	struct lmv_stripe_md *lsm = op_data->op_mea1;
 	struct lmv_tgt_desc *tgt;
 
-	if (!lsm || lsm->lsm_md_stripe_count <= 1 ||
-	    !op_data->op_namelen ||
-	    lsm->lsm_md_magic == LMV_MAGIC_MIGRATE) {
+	if (!lsm || !op_data->op_namelen) {
 		tgt = lmv_find_target(lmv, fid);
 		if (IS_ERR(tgt))
 			return tgt;
@@ -1737,7 +1729,7 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	       op_data->op_namelen, op_data->op_name, PFID(&op_data->op_fid1),
 	       op_data->op_mds);
 
-	rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+	rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 	if (rc)
 		return rc;
 
@@ -2060,7 +2052,7 @@ static int lmv_rename(struct obd_export *exp, struct md_op_data *op_data,
 	if (op_data->op_cli_flags & CLI_MIGRATE) {
 		LASSERTF(fid_is_sane(&op_data->op_fid3), "invalid FID "DFID"\n",
 			 PFID(&op_data->op_fid3));
-		rc = lmv_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc)
 			return rc;
 		src_tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid3);
@@ -2365,8 +2357,7 @@ retry:
 			return PTR_ERR(tgt);
 
 		/* For striped dir, we need to locate the parent as well */
-		if (op_data->op_mea1 &&
-		    op_data->op_mea1->lsm_md_stripe_count > 1) {
+		if (op_data->op_mea1) {
 			struct lmv_tgt_desc *tmp;
 
 			LASSERT(op_data->op_name && op_data->op_namelen);
@@ -2679,9 +2670,13 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
 	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmm1->lmv_master_mdt_index);
 	lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
 	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
+	fid_le_to_cpu(&lsm->lsm_md_master_fid, &lmm1->lmv_master_fid);
 	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
 			sizeof(lsm->lsm_md_pool_name));
 
+	if (!fid_is_sane(&lsm->lsm_md_master_fid))
+		return -EPROTO;
+
 	if (cplen >= sizeof(lsm->lsm_md_pool_name))
 		return -E2BIG;
 
@@ -2719,7 +2714,13 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		int i;
 
 		for (i = 1; i < lsm->lsm_md_stripe_count; i++) {
-			if (lsm->lsm_md_oinfo[i].lmo_root)
+			/*
+			 * For migrating inode, the master stripe and master
+			 * object will be the same, so do not need iput, see
+			 * ll_update_lsm_md
+			 */
+			if (!(lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION &&
+			      !i) && lsm->lsm_md_oinfo[i].lmo_root)
 				iput(lsm->lsm_md_oinfo[i].lmo_root);
 		}
 
@@ -2739,9 +2740,11 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		return 0;
 	}
 
+	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_STRIPE)
+		return -EPERM;
+
 	/* Unpack memmd */
 	if (le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_V1 &&
-	    le32_to_cpu(lmm->lmv_magic) != LMV_MAGIC_MIGRATE &&
 	    le32_to_cpu(lmm->lmv_magic) != LMV_USER_MAGIC) {
 		CERROR("%s: invalid lmv magic %x: rc = %d\n",
 		       exp->exp_obd->obd_name, le32_to_cpu(lmm->lmv_magic),
@@ -2749,8 +2752,7 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 		return -EIO;
 	}
 
-	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_V1 ||
-	    le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_MIGRATE)
+	if (le32_to_cpu(lmm->lmv_magic) == LMV_MAGIC_V1)
 		lsm_size = lmv_stripe_md_size(lmv_mds_md_stripe_count_get(lmm));
 	else
 		/**
@@ -2769,7 +2771,6 @@ int lmv_unpack_md(struct obd_export *exp, struct lmv_stripe_md **lsmp,
 
 	switch (le32_to_cpu(lmm->lmv_magic)) {
 	case LMV_MAGIC_V1:
-	case LMV_MAGIC_MIGRATE:
 		rc = lmv_unpack_md_v1(exp, lsm, &lmm->lmv_md_v1);
 		break;
 	default:
@@ -3067,9 +3068,6 @@ static int lmv_quotacheck(struct obd_device *unused, struct obd_export *exp,
 int lmv_update_lsm_md(struct obd_export *exp, struct lmv_stripe_md *lsm,
 		      struct mdt_body *body, ldlm_blocking_callback cb_blocking)
 {
-	if (lsm->lsm_md_stripe_count <= 1)
-		return 0;
-
 	return lmv_revalidate_slaves(exp, body, lsm, cb_blocking, 0);
 }
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 53b4063..00e8435 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -87,8 +87,8 @@ int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid,
 			    struct list_head *cancels, enum ldlm_mode  mode,
 			    __u64 bits);
 /* mdc/mdc_request.c */
-int mdc_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data);
+int mdc_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data);
 struct obd_client_handle;
 
 int mdc_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index d8406d5..20b15f6 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1144,7 +1144,7 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 
 	/* For case if upper layer did not alloc fid, do it now. */
 	if (!fid_is_sane(&op_data->op_fid2) && it->it_op & IT_CREAT) {
-		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = mdc_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc < 0) {
 			CERROR("Can't alloc new fid, rc %d\n", rc);
 			return rc;
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index 5dba2c8..c3781a6 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -214,11 +214,9 @@ int mdc_create(struct obd_export *exp, struct md_op_data *op_data,
 		 * mdc_fid_alloc() may return errno 1 in case of switch to new
 		 * sequence, handle this.
 		 */
-		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
-		if (rc < 0) {
-			CERROR("Can't alloc new fid, rc %d\n", rc);
+		rc = mdc_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
+		if (rc < 0)
 			return rc;
-		}
 	}
 
 rebuild:
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 394ef3c..e26d0d7 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -765,7 +765,7 @@ static int mdc_close(struct obd_export *exp, struct md_op_data *op_data,
 		req_fmt = &RQF_MDS_RELEASE_CLOSE;
 
 		/* allocate a FID for volatile file */
-		rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
+		rc = mdc_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc < 0) {
 			CERROR("%s: "DFID" failed to allocate FID: %d\n",
 			       obd->obd_name, PFID(&op_data->op_fid1), rc);
@@ -2203,13 +2203,13 @@ static int mdc_import_event(struct obd_device *obd, struct obd_import *imp,
 	return rc;
 }
 
-int mdc_fid_alloc(struct obd_export *exp, struct lu_fid *fid,
-		  struct md_op_data *op_data)
+int mdc_fid_alloc(const struct lu_env *env, struct obd_export *exp,
+		  struct lu_fid *fid, struct md_op_data *op_data)
 {
 	struct client_obd *cli = &exp->exp_obd->u.cli;
 	struct lu_client_seq *seq = cli->cl_seq;
 
-	return seq_client_alloc_fid(NULL, seq, fid);
+	return seq_client_alloc_fid(env, seq, fid);
 }
 
 static struct obd_uuid *mdc_get_uuid(struct obd_export *exp)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 42/80] staging: lustre: llite: validate names
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

In ll_prep_md_op_data() validate names according to the same formula
used in mdd_name_check(). Add mdc_pack_name() to validate the name
actually packed in the request.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4992
Reviewed-on: http://review.whamcloud.com/10198
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/linux/libcfs/libcfs_private.h   |    9 ---
 drivers/staging/lustre/lustre/include/lu_object.h  |   16 +++++
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   13 +++-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   67 +++++++++++++-------
 4 files changed, 70 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
index 4daa382..d401ae1 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
@@ -360,13 +360,4 @@ do {							    \
 	ptr += cfs_size_round(len);			     \
 } while (0)
 
-#define LOGL0(var, len, ptr)			      \
-do {						    \
-	if (!len)				       \
-		break;				  \
-	memcpy((char *)ptr, (const char *)var, len);    \
-	*((char *)(ptr) + len) = 0;		     \
-	ptr += cfs_size_round(len + 1);		 \
-} while (0)
-
 #endif
diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
index 25c12d8..6ab1782 100644
--- a/drivers/staging/lustre/lustre/include/lu_object.h
+++ b/drivers/staging/lustre/lustre/include/lu_object.h
@@ -1263,6 +1263,22 @@ struct lu_name {
 };
 
 /**
+ * Validate names (path components)
+ *
+ * To be valid \a name must be non-empty, '\0' terminated of length \a
+ * name_len, and not contain '/'. The maximum length of a name (before
+ * say -ENAMETOOLONG will be returned) is really controlled by llite
+ * and the server. We only check for something insane coming from bad
+ * integer handling here.
+ */
+static inline bool lu_name_is_valid_2(const char *name, size_t name_len)
+{
+	return name && name_len > 0 && name_len < INT_MAX &&
+	       name[name_len] == '\0' && strlen(name) == name_len &&
+	       !memchr(name, '/', name_len);
+}
+
+/**
  * Common buffer structure to be passed around for various xattr_{s,g}et()
  * methods.
  */
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 2f6e770..a3b4c97 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -2304,8 +2304,17 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 				      const char *name, int namelen,
 				      int mode, __u32 opc, void *data)
 {
-	if (namelen > ll_i2sbi(i1)->ll_namelen)
-		return ERR_PTR(-ENAMETOOLONG);
+	if (!name) {
+		/* Do not reuse namelen for something else. */
+		if (namelen)
+			return ERR_PTR(-EINVAL);
+	} else {
+		if (namelen > ll_i2sbi(i1)->ll_namelen)
+			return ERR_PTR(-ENAMETOOLONG);
+
+		if (!lu_name_is_valid_2(name, namelen))
+			return ERR_PTR(-EINVAL);
+	}
 
 	if (!op_data)
 		op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index b532623..16c3571 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -87,6 +87,37 @@ void mdc_pack_body(struct ptlrpc_request *req, const struct lu_fid *fid,
 	}
 }
 
+/**
+ * Pack a name (path component) into a request
+ *
+ * \param[in] req	request
+ * \param[in] field	request field (usually RMF_NAME)
+ * \param[in] name	path component
+ * \param[in] name_len	length of path component
+ *
+ * \a field must be present in \a req and of size \a name_len + 1.
+ *
+ * \a name must be '\0' terminated of length \a name_len and represent
+ * a single path component (not contain '/').
+ */
+static void mdc_pack_name(struct ptlrpc_request *req,
+			  const struct req_msg_field *field,
+			  const char *name, size_t name_len)
+{
+	size_t buf_size;
+	size_t cpy_len;
+	char *buf;
+
+	buf = req_capsule_client_get(&req->rq_pill, field);
+	buf_size = req_capsule_get_size(&req->rq_pill, field, RCL_CLIENT);
+
+	LASSERT(name && name_len && buf && buf_size == name_len + 1);
+
+	cpy_len = strlcpy(buf, name, buf_size);
+
+	LASSERT(cpy_len == name_len && lu_name_is_valid_2(buf, cpy_len));
+}
+
 void mdc_readdir_pack(struct ptlrpc_request *req, __u64 pgoff,
 		      __u32 size, const struct lu_fid *fid)
 {
@@ -130,9 +161,7 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec->cr_bias     = op_data->op_bias;
 	rec->cr_umask    = current_umask();
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
-
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
 	if (data) {
 		tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA);
 		memcpy(tmp, data, datalen);
@@ -200,8 +229,9 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec->cr_old_handle = op_data->op_handle;
 
 	if (op_data->op_name) {
-		tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-		LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+		mdc_pack_name(req, &RMF_NAME, op_data->op_name,
+			      op_data->op_namelen);
+
 		if (op_data->op_bias & MDS_CREATE_VOLATILE)
 			cr_flags |= MDS_OPEN_VOLATILE;
 	}
@@ -334,7 +364,6 @@ void mdc_setattr_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 {
 	struct mdt_rec_unlink *rec;
-	char *tmp;
 
 	CLASSERT(sizeof(struct mdt_rec_reint) == sizeof(struct mdt_rec_unlink));
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
@@ -352,15 +381,12 @@ void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 	rec->ul_time     = op_data->op_mod_time;
 	rec->ul_bias     = op_data->op_bias;
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LASSERT(tmp);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
 }
 
 void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 {
 	struct mdt_rec_link *rec;
-	char *tmp;
 
 	CLASSERT(sizeof(struct mdt_rec_reint) == sizeof(struct mdt_rec_link));
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
@@ -376,15 +402,13 @@ void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 	rec->lk_time     = op_data->op_mod_time;
 	rec->lk_bias     = op_data->op_bias;
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
 }
 
 void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 		     const char *old, int oldlen, const char *new, int newlen)
 {
 	struct mdt_rec_rename *rec;
-	char *tmp;
 
 	CLASSERT(sizeof(struct mdt_rec_reint) == sizeof(struct mdt_rec_rename));
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
@@ -404,13 +428,10 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec->rn_mode     = op_data->op_mode;
 	rec->rn_bias     = op_data->op_bias;
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(old, oldlen, tmp);
+	mdc_pack_name(req, &RMF_NAME, old, oldlen);
 
-	if (new) {
-		tmp = req_capsule_client_get(&req->rq_pill, &RMF_SYMTGT);
-		LOGL0(new, newlen, tmp);
-	}
+	if (new)
+		mdc_pack_name(req, &RMF_SYMTGT, new, newlen);
 }
 
 void mdc_getattr_pack(struct ptlrpc_request *req, __u64 valid, int flags,
@@ -432,11 +453,9 @@ void mdc_getattr_pack(struct ptlrpc_request *req, __u64 valid, int flags,
 	b->fid2 = op_data->op_fid2;
 	b->valid |= OBD_MD_FLID;
 
-	if (op_data->op_name) {
-		char *tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-
-		LOGL0(op_data->op_name, op_data->op_namelen, tmp);
-	}
+	if (op_data->op_name)
+		mdc_pack_name(req, &RMF_NAME, op_data->op_name,
+			      op_data->op_namelen);
 }
 
 static void mdc_hsm_release_pack(struct ptlrpc_request *req,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 42/80] staging: lustre: llite: validate names
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

In ll_prep_md_op_data() validate names according to the same formula
used in mdd_name_check(). Add mdc_pack_name() to validate the name
actually packed in the request.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4992
Reviewed-on: http://review.whamcloud.com/10198
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/linux/libcfs/libcfs_private.h   |    9 ---
 drivers/staging/lustre/lustre/include/lu_object.h  |   16 +++++
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   13 +++-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   67 +++++++++++++-------
 4 files changed, 70 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
index 4daa382..d401ae1 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
@@ -360,13 +360,4 @@ do {							    \
 	ptr += cfs_size_round(len);			     \
 } while (0)
 
-#define LOGL0(var, len, ptr)			      \
-do {						    \
-	if (!len)				       \
-		break;				  \
-	memcpy((char *)ptr, (const char *)var, len);    \
-	*((char *)(ptr) + len) = 0;		     \
-	ptr += cfs_size_round(len + 1);		 \
-} while (0)
-
 #endif
diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
index 25c12d8..6ab1782 100644
--- a/drivers/staging/lustre/lustre/include/lu_object.h
+++ b/drivers/staging/lustre/lustre/include/lu_object.h
@@ -1263,6 +1263,22 @@ struct lu_name {
 };
 
 /**
+ * Validate names (path components)
+ *
+ * To be valid \a name must be non-empty, '\0' terminated of length \a
+ * name_len, and not contain '/'. The maximum length of a name (before
+ * say -ENAMETOOLONG will be returned) is really controlled by llite
+ * and the server. We only check for something insane coming from bad
+ * integer handling here.
+ */
+static inline bool lu_name_is_valid_2(const char *name, size_t name_len)
+{
+	return name && name_len > 0 && name_len < INT_MAX &&
+	       name[name_len] == '\0' && strlen(name) == name_len &&
+	       !memchr(name, '/', name_len);
+}
+
+/**
  * Common buffer structure to be passed around for various xattr_{s,g}et()
  * methods.
  */
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 2f6e770..a3b4c97 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -2304,8 +2304,17 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 				      const char *name, int namelen,
 				      int mode, __u32 opc, void *data)
 {
-	if (namelen > ll_i2sbi(i1)->ll_namelen)
-		return ERR_PTR(-ENAMETOOLONG);
+	if (!name) {
+		/* Do not reuse namelen for something else. */
+		if (namelen)
+			return ERR_PTR(-EINVAL);
+	} else {
+		if (namelen > ll_i2sbi(i1)->ll_namelen)
+			return ERR_PTR(-ENAMETOOLONG);
+
+		if (!lu_name_is_valid_2(name, namelen))
+			return ERR_PTR(-EINVAL);
+	}
 
 	if (!op_data)
 		op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index b532623..16c3571 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -87,6 +87,37 @@ void mdc_pack_body(struct ptlrpc_request *req, const struct lu_fid *fid,
 	}
 }
 
+/**
+ * Pack a name (path component) into a request
+ *
+ * \param[in] req	request
+ * \param[in] field	request field (usually RMF_NAME)
+ * \param[in] name	path component
+ * \param[in] name_len	length of path component
+ *
+ * \a field must be present in \a req and of size \a name_len + 1.
+ *
+ * \a name must be '\0' terminated of length \a name_len and represent
+ * a single path component (not contain '/').
+ */
+static void mdc_pack_name(struct ptlrpc_request *req,
+			  const struct req_msg_field *field,
+			  const char *name, size_t name_len)
+{
+	size_t buf_size;
+	size_t cpy_len;
+	char *buf;
+
+	buf = req_capsule_client_get(&req->rq_pill, field);
+	buf_size = req_capsule_get_size(&req->rq_pill, field, RCL_CLIENT);
+
+	LASSERT(name && name_len && buf && buf_size == name_len + 1);
+
+	cpy_len = strlcpy(buf, name, buf_size);
+
+	LASSERT(cpy_len == name_len && lu_name_is_valid_2(buf, cpy_len));
+}
+
 void mdc_readdir_pack(struct ptlrpc_request *req, __u64 pgoff,
 		      __u32 size, const struct lu_fid *fid)
 {
@@ -130,9 +161,7 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec->cr_bias     = op_data->op_bias;
 	rec->cr_umask    = current_umask();
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
-
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
 	if (data) {
 		tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA);
 		memcpy(tmp, data, datalen);
@@ -200,8 +229,9 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec->cr_old_handle = op_data->op_handle;
 
 	if (op_data->op_name) {
-		tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-		LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+		mdc_pack_name(req, &RMF_NAME, op_data->op_name,
+			      op_data->op_namelen);
+
 		if (op_data->op_bias & MDS_CREATE_VOLATILE)
 			cr_flags |= MDS_OPEN_VOLATILE;
 	}
@@ -334,7 +364,6 @@ void mdc_setattr_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 {
 	struct mdt_rec_unlink *rec;
-	char *tmp;
 
 	CLASSERT(sizeof(struct mdt_rec_reint) == sizeof(struct mdt_rec_unlink));
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
@@ -352,15 +381,12 @@ void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 	rec->ul_time     = op_data->op_mod_time;
 	rec->ul_bias     = op_data->op_bias;
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LASSERT(tmp);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
 }
 
 void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 {
 	struct mdt_rec_link *rec;
-	char *tmp;
 
 	CLASSERT(sizeof(struct mdt_rec_reint) == sizeof(struct mdt_rec_link));
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
@@ -376,15 +402,13 @@ void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data)
 	rec->lk_time     = op_data->op_mod_time;
 	rec->lk_bias     = op_data->op_bias;
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
 }
 
 void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 		     const char *old, int oldlen, const char *new, int newlen)
 {
 	struct mdt_rec_rename *rec;
-	char *tmp;
 
 	CLASSERT(sizeof(struct mdt_rec_reint) == sizeof(struct mdt_rec_rename));
 	rec = req_capsule_client_get(&req->rq_pill, &RMF_REC_REINT);
@@ -404,13 +428,10 @@ void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 	rec->rn_mode     = op_data->op_mode;
 	rec->rn_bias     = op_data->op_bias;
 
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(old, oldlen, tmp);
+	mdc_pack_name(req, &RMF_NAME, old, oldlen);
 
-	if (new) {
-		tmp = req_capsule_client_get(&req->rq_pill, &RMF_SYMTGT);
-		LOGL0(new, newlen, tmp);
-	}
+	if (new)
+		mdc_pack_name(req, &RMF_SYMTGT, new, newlen);
 }
 
 void mdc_getattr_pack(struct ptlrpc_request *req, __u64 valid, int flags,
@@ -432,11 +453,9 @@ void mdc_getattr_pack(struct ptlrpc_request *req, __u64 valid, int flags,
 	b->fid2 = op_data->op_fid2;
 	b->valid |= OBD_MD_FLID;
 
-	if (op_data->op_name) {
-		char *tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-
-		LOGL0(op_data->op_name, op_data->op_namelen, tmp);
-	}
+	if (op_data->op_name)
+		mdc_pack_name(req, &RMF_NAME, op_data->op_name,
+			      op_data->op_namelen);
 }
 
 static void mdc_hsm_release_pack(struct ptlrpc_request *req,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 43/80] staging: lustre: llite: fix inconsistencies of root squash feature
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Gregoire Pichon, James Simmons

From: Gregoire Pichon <gregoire.pichon@bull.net>

Root squash exhibits inconsistent behaviour on a client when
enabled. If a file is not cached on the client, then root will get
a permission denied error when accessing the file. When
the file has recently been accessed by a regular user and is
still in cache, root will be able to access the file without error
because the permission check is only done by the client that
isn't aware of root squash.

While the only real security benefit from root squash is to deny
clients access to files owned by root itself, it also makes sense
to treat file access on the client in a consistent manner
regardless of whether the file is in cache or not.

This patch adds root squash settings to llite so that client is able
to apply root squashing when it is relevant.

Configuration of MDT root squash settings will automatically be
applied to llite config log as well.

Update cfs_str2num_check() routine by removing any modification
of the specified string parameter. Since string can come from ls_str
field of a lstr structure, this avoids inconsistent ls_len field.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1778
Reviewed-on: http://review.whamcloud.com/5700
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/libcfs/libcfs_string.c |    2 -
 .../staging/lustre/lustre/include/lprocfs_status.h |    6 +
 drivers/staging/lustre/lustre/include/obd_class.h  |    9 ++
 drivers/staging/lustre/lustre/llite/file.c         |   44 ++++++
 .../staging/lustre/lustre/llite/llite_internal.h   |    6 +
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   47 +++++++
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |   68 ++++++++++
 .../lustre/lustre/obdclass/lprocfs_status.c        |  140 ++++++++++++++++++++
 8 files changed, 320 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
index fc697cd..56a614d 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
@@ -229,8 +229,6 @@ cfs_str2num_check(char *str, int nob, unsigned *num,
 	char *endp, cache;
 	int rc;
 
-	str = cfs_trimwhite(str);
-
 	/**
 	 * kstrouint can only handle strings composed
 	 * of only numbers. We need to scan the string
diff --git a/drivers/staging/lustre/lustre/include/lprocfs_status.h b/drivers/staging/lustre/lustre/include/lprocfs_status.h
index d68e60e..ff35e63 100644
--- a/drivers/staging/lustre/lustre/include/lprocfs_status.h
+++ b/drivers/staging/lustre/lustre/include/lprocfs_status.h
@@ -681,6 +681,12 @@ static struct lustre_attr lustre_attr_##name = __ATTR(name, mode, show, store)
 
 extern const struct sysfs_ops lustre_sysfs_ops;
 
+struct root_squash_info;
+int lprocfs_wr_root_squash(const char *buffer, unsigned long count,
+			   struct root_squash_info *squash, char *name);
+int lprocfs_wr_nosquash_nids(const char *buffer, unsigned long count,
+			     struct root_squash_info *squash, char *name);
+
 /* all quota proc functions */
 int lprocfs_quota_rd_bunit(char *page, char **start,
 			   loff_t off, int count,
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index a288995..e86961c 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1759,4 +1759,13 @@ extern spinlock_t obd_types_lock;
 /* prng.c */
 #define ll_generate_random_uuid(uuid_out) cfs_get_random_bytes(uuid_out, sizeof(class_uuid_t))
 
+/* root squash info */
+struct rw_semaphore;
+struct root_squash_info {
+	uid_t			rsi_uid;
+	gid_t			rsi_gid;
+	struct list_head	rsi_nosquash_nids;
+	struct rw_semaphore	rsi_sem;
+};
+
 #endif /* __LINUX_OBD_CLASS_H */
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 519db53..90a7170 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -41,6 +41,7 @@
 #include "../include/lustre_lite.h"
 #include <linux/pagemap.h>
 #include <linux/file.h>
+#include <linux/sched.h>
 #include <linux/mount.h>
 #include "llite_internal.h"
 #include "../include/lustre/ll_fiemap.h"
@@ -3289,6 +3290,12 @@ struct posix_acl *ll_get_acl(struct inode *inode, int type)
 
 int ll_inode_permission(struct inode *inode, int mask)
 {
+	struct ll_sb_info *sbi;
+	struct root_squash_info *squash;
+	const struct cred *old_cred = NULL;
+	struct cred *cred = NULL;
+	bool squash_id = false;
+	cfs_cap_t cap;
 	int rc = 0;
 
 	if (mask & MAY_NOT_BLOCK)
@@ -3308,9 +3315,46 @@ int ll_inode_permission(struct inode *inode, int mask)
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p), inode mode %x mask %o\n",
 	       PFID(ll_inode2fid(inode)), inode, inode->i_mode, mask);
 
+	/* squash fsuid/fsgid if needed */
+	sbi = ll_i2sbi(inode);
+	squash = &sbi->ll_squash;
+	if (unlikely(squash->rsi_uid &&
+		     uid_eq(current_fsuid(), GLOBAL_ROOT_UID) &&
+		     !(sbi->ll_flags & LL_SBI_NOROOTSQUASH))) {
+		squash_id = true;
+	}
+
+	if (squash_id) {
+		CDEBUG(D_OTHER, "squash creds (%d:%d)=>(%d:%d)\n",
+		       __kuid_val(current_fsuid()), __kgid_val(current_fsgid()),
+		       squash->rsi_uid, squash->rsi_gid);
+
+		/*
+		 * update current process's credentials
+		 * and FS capability
+		 */
+		cred = prepare_creds();
+		if (!cred)
+			return -ENOMEM;
+
+		cred->fsuid = make_kuid(&init_user_ns, squash->rsi_uid);
+		cred->fsgid = make_kgid(&init_user_ns, squash->rsi_gid);
+		for (cap = 0; cap < sizeof(cfs_cap_t) * 8; cap++) {
+			if ((1 << cap) & CFS_CAP_FS_MASK)
+				cap_lower(cred->cap_effective, cap);
+		}
+		old_cred = override_creds(cred);
+	}
+
 	ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_INODE_PERM, 1);
 	rc = generic_permission(inode, mask);
 
+	/* restore current process's credentials and FS capability */
+	if (squash_id) {
+		revert_creds(old_cred);
+		put_cred(cred);
+	}
+
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index e101dd8..500b5ec 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -412,6 +412,7 @@ enum stats_track_type {
 #define LL_SBI_LAYOUT_LOCK    0x20000 /* layout lock support */
 #define LL_SBI_USER_FID2PATH  0x40000 /* allow fid2path by unprivileged users */
 #define LL_SBI_XATTR_CACHE    0x80000 /* support for xattr cache */
+#define LL_SBI_NOROOTSQUASH	0x100000 /* do not apply root squash */
 
 #define LL_SBI_FLAGS {	\
 	"nolck",	\
@@ -434,6 +435,7 @@ enum stats_track_type {
 	"layout",	\
 	"user_fid2path",\
 	"xattr",	\
+	"norootsquash",	\
 }
 
 struct ll_sb_info {
@@ -500,6 +502,9 @@ struct ll_sb_info {
 	dev_t			  ll_sdev_orig; /* save s_dev before assign for
 						 * clustered nfs
 						 */
+	/* root squash */
+	struct root_squash_info	  ll_squash;
+
 	__kernel_fsid_t		  ll_fsid;
 	struct kobject		 ll_kobj; /* sysfs object */
 	struct super_block	*ll_sb; /* struct super_block (for sysfs code)*/
@@ -798,6 +803,7 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 void ll_finish_md_op_data(struct md_op_data *op_data);
 int ll_get_obd_name(struct inode *inode, unsigned int cmd, unsigned long arg);
 char *ll_get_fsname(struct super_block *sb, char *buf, int buflen);
+void ll_compute_rootsquash_state(struct ll_sb_info *sbi);
 void ll_open_cleanup(struct super_block *sb, struct ptlrpc_request *open_req);
 
 /* llite/llite_nfs.c */
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index a3b4c97..0a28925 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -119,6 +119,12 @@ static struct ll_sb_info *ll_init_sbi(struct super_block *sb)
 	atomic_set(&sbi->ll_agl_total, 0);
 	sbi->ll_flags |= LL_SBI_AGL_ENABLED;
 
+	/* root squash */
+	sbi->ll_squash.rsi_uid = 0;
+	sbi->ll_squash.rsi_gid = 0;
+	INIT_LIST_HEAD(&sbi->ll_squash.rsi_nosquash_nids);
+	init_rwsem(&sbi->ll_squash.rsi_sem);
+
 	sbi->ll_sb = sb;
 
 	return sbi;
@@ -129,6 +135,8 @@ static void ll_free_sbi(struct super_block *sb)
 	struct ll_sb_info *sbi = ll_s2sbi(sb);
 
 	if (sbi->ll_cache) {
+		if (!list_empty(&sbi->ll_squash.rsi_nosquash_nids))
+			cfs_free_nidlist(&sbi->ll_squash.rsi_nosquash_nids);
 		cl_cache_decref(sbi->ll_cache);
 		sbi->ll_cache = NULL;
 	}
@@ -2496,3 +2504,42 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret)
 	if (buf)
 		free_page((unsigned long)buf);
 }
+
+/*
+ * Compute llite root squash state after a change of root squash
+ * configuration setting or add/remove of a lnet nid
+ */
+void ll_compute_rootsquash_state(struct ll_sb_info *sbi)
+{
+	struct root_squash_info *squash = &sbi->ll_squash;
+	lnet_process_id_t id;
+	bool matched;
+	int i;
+
+	/* Update norootsquash flag */
+	down_write(&squash->rsi_sem);
+	if (list_empty(&squash->rsi_nosquash_nids)) {
+		sbi->ll_flags &= ~LL_SBI_NOROOTSQUASH;
+	} else {
+		/*
+		 * Do not apply root squash as soon as one of our NIDs is
+		 * in the nosquash_nids list
+		 */
+		matched = false;
+		i = 0;
+
+		while (LNetGetId(i++, &id) != -ENOENT) {
+			if (LNET_NETTYP(LNET_NIDNET(id.nid)) == LOLND)
+				continue;
+			if (cfs_match_nid(id.nid, &squash->rsi_nosquash_nids)) {
+				matched = true;
+				break;
+			}
+		}
+		if (matched)
+			sbi->ll_flags |= LL_SBI_NOROOTSQUASH;
+		else
+			sbi->ll_flags &= ~LL_SBI_NOROOTSQUASH;
+	}
+	up_write(&squash->rsi_sem);
+}
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index e86bf3c..2f1f389 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -833,6 +833,71 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
 }
 LUSTRE_RO_ATTR(unstable_stats);
 
+static ssize_t root_squash_show(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	struct root_squash_info *squash = &sbi->ll_squash;
+
+	return sprintf(buf, "%u:%u\n", squash->rsi_uid, squash->rsi_gid);
+}
+
+static ssize_t root_squash_store(struct kobject *kobj, struct attribute *attr,
+				 const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	struct root_squash_info *squash = &sbi->ll_squash;
+
+	return lprocfs_wr_root_squash(buffer, count, squash,
+				      ll_get_fsname(sbi->ll_sb, NULL, 0));
+}
+LUSTRE_RW_ATTR(root_squash);
+
+static int ll_nosquash_nids_seq_show(struct seq_file *m, void *v)
+{
+	struct super_block *sb = m->private;
+	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct root_squash_info *squash = &sbi->ll_squash;
+	int len;
+
+	down_read(&squash->rsi_sem);
+	if (!list_empty(&squash->rsi_nosquash_nids)) {
+		len = cfs_print_nidlist(m->buf + m->count, m->size - m->count,
+					&squash->rsi_nosquash_nids);
+		m->count += len;
+		seq_puts(m, "\n");
+	} else {
+		seq_puts(m, "NONE\n");
+	}
+	up_read(&squash->rsi_sem);
+
+	return 0;
+}
+
+static ssize_t ll_nosquash_nids_seq_write(struct file *file,
+					  const char __user *buffer,
+					  size_t count, loff_t *off)
+{
+	struct seq_file *m = file->private_data;
+	struct super_block *sb = m->private;
+	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct root_squash_info *squash = &sbi->ll_squash;
+	int rc;
+
+	rc = lprocfs_wr_nosquash_nids(buffer, count, squash,
+				      ll_get_fsname(sb, NULL, 0));
+	if (rc < 0)
+		return rc;
+
+	ll_compute_rootsquash_state(sbi);
+
+	return rc;
+}
+
+LPROC_SEQ_FOPS(ll_nosquash_nids);
+
 static struct lprocfs_vars lprocfs_llite_obd_vars[] = {
 	/* { "mntpt_path",   ll_rd_path,	     0, 0 }, */
 	{ "site",	  &ll_site_stats_fops,    NULL, 0 },
@@ -840,6 +905,8 @@ static struct lprocfs_vars lprocfs_llite_obd_vars[] = {
 	{ "max_cached_mb",    &ll_max_cached_mb_fops, NULL },
 	{ "statahead_stats",  &ll_statahead_stats_fops, NULL, 0 },
 	{ "sbi_flags",	      &ll_sbi_flags_fops, NULL, 0 },
+	{ .name =		"nosquash_nids",
+	  .fops =		&ll_nosquash_nids_fops		},
 	{ NULL }
 };
 
@@ -869,6 +936,7 @@ static struct attribute *llite_attrs[] = {
 	&lustre_attr_default_easize.attr,
 	&lustre_attr_xattr_cache.attr,
 	&lustre_attr_unstable_stats.attr,
+	&lustre_attr_root_squash.attr,
 	NULL,
 };
 
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index 279b625..c83d28e 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -1547,6 +1547,146 @@ void lprocfs_oh_clear(struct obd_histogram *oh)
 }
 EXPORT_SYMBOL(lprocfs_oh_clear);
 
+int lprocfs_wr_root_squash(const char __user *buffer, unsigned long count,
+			   struct root_squash_info *squash, char *name)
+{
+	char kernbuf[64], *tmp, *errmsg;
+	unsigned long uid, gid;
+	int rc;
+
+	if (count >= sizeof(kernbuf)) {
+		errmsg = "string too long";
+		rc = -EINVAL;
+		goto failed_noprint;
+	}
+	if (copy_from_user(kernbuf, buffer, count)) {
+		errmsg = "bad address";
+		rc = -EFAULT;
+		goto failed_noprint;
+	}
+	kernbuf[count] = '\0';
+
+	/* look for uid gid separator */
+	tmp = strchr(kernbuf, ':');
+	if (!tmp) {
+		errmsg = "needs uid:gid format";
+		rc = -EINVAL;
+		goto failed;
+	}
+	*tmp = '\0';
+	tmp++;
+
+	/* parse uid */
+	if (kstrtoul(kernbuf, 0, &uid) != 0) {
+		errmsg = "bad uid";
+		rc = -EINVAL;
+		goto failed;
+	}
+	/* parse gid */
+	if (kstrtoul(tmp, 0, &gid) != 0) {
+		errmsg = "bad gid";
+		rc = -EINVAL;
+		goto failed;
+	}
+
+	squash->rsi_uid = uid;
+	squash->rsi_gid = gid;
+
+	LCONSOLE_INFO("%s: root_squash is set to %u:%u\n",
+		      name, squash->rsi_uid, squash->rsi_gid);
+	return count;
+
+failed:
+	if (tmp) {
+		tmp--;
+		*tmp = ':';
+	}
+	CWARN("%s: failed to set root_squash to \"%s\", %s, rc = %d\n",
+	      name, kernbuf, errmsg, rc);
+	return rc;
+failed_noprint:
+	CWARN("%s: failed to set root_squash due to %s, rc = %d\n",
+	      name, errmsg, rc);
+	return rc;
+}
+EXPORT_SYMBOL(lprocfs_wr_root_squash);
+
+int lprocfs_wr_nosquash_nids(const char __user *buffer, unsigned long count,
+			     struct root_squash_info *squash, char *name)
+{
+	char *kernbuf = NULL, *errmsg;
+	struct list_head tmp;
+	int len = count;
+	int rc;
+
+	if (count > 4096) {
+		errmsg = "string too long";
+		rc = -EINVAL;
+		goto failed;
+	}
+
+	kernbuf = kzalloc(count + 1, GFP_NOFS);
+	if (!kernbuf) {
+		errmsg = "no memory";
+		rc = -ENOMEM;
+		goto failed;
+	}
+
+	if (copy_from_user(kernbuf, buffer, count)) {
+		errmsg = "bad address";
+		rc = -EFAULT;
+		goto failed;
+	}
+	kernbuf[count] = '\0';
+
+	if (count > 0 && kernbuf[count - 1] == '\n')
+		len = count - 1;
+
+	if ((len == 4 && !strncmp(kernbuf, "NONE", len)) ||
+	    (len == 5 && !strncmp(kernbuf, "clear", len))) {
+		/* empty string is special case */
+		down_write(&squash->rsi_sem);
+		if (!list_empty(&squash->rsi_nosquash_nids))
+			cfs_free_nidlist(&squash->rsi_nosquash_nids);
+		up_write(&squash->rsi_sem);
+		LCONSOLE_INFO("%s: nosquash_nids is cleared\n", name);
+		kfree(kernbuf);
+		return count;
+	}
+
+	INIT_LIST_HEAD(&tmp);
+	if (cfs_parse_nidlist(kernbuf, count, &tmp) <= 0) {
+		errmsg = "can't parse";
+		rc = -EINVAL;
+		goto failed;
+	}
+	LCONSOLE_INFO("%s: nosquash_nids set to %s\n",
+		      name, kernbuf);
+	kfree(kernbuf);
+	kernbuf = NULL;
+
+	down_write(&squash->rsi_sem);
+	if (!list_empty(&squash->rsi_nosquash_nids))
+		cfs_free_nidlist(&squash->rsi_nosquash_nids);
+	list_splice(&tmp, &squash->rsi_nosquash_nids);
+	up_write(&squash->rsi_sem);
+
+	return count;
+
+failed:
+	if (kernbuf) {
+		CWARN("%s: failed to set nosquash_nids to \"%s\", %s rc = %d\n",
+		      name, kernbuf, errmsg, rc);
+		kfree(kernbuf);
+		kernbuf = NULL;
+	} else {
+		CWARN("%s: failed to set nosquash_nids due to %s rc = %d\n",
+		      name, errmsg, rc);
+	}
+	return rc;
+}
+EXPORT_SYMBOL(lprocfs_wr_nosquash_nids);
+
 static ssize_t lustre_attr_show(struct kobject *kobj,
 				struct attribute *attr, char *buf)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 43/80] staging: lustre: llite: fix inconsistencies of root squash feature
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Gregoire Pichon, James Simmons

From: Gregoire Pichon <gregoire.pichon@bull.net>

Root squash exhibits inconsistent behaviour on a client when
enabled. If a file is not cached on the client, then root will get
a permission denied error when accessing the file. When
the file has recently been accessed by a regular user and is
still in cache, root will be able to access the file without error
because the permission check is only done by the client that
isn't aware of root squash.

While the only real security benefit from root squash is to deny
clients access to files owned by root itself, it also makes sense
to treat file access on the client in a consistent manner
regardless of whether the file is in cache or not.

This patch adds root squash settings to llite so that client is able
to apply root squashing when it is relevant.

Configuration of MDT root squash settings will automatically be
applied to llite config log as well.

Update cfs_str2num_check() routine by removing any modification
of the specified string parameter. Since string can come from ls_str
field of a lstr structure, this avoids inconsistent ls_len field.

Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1778
Reviewed-on: http://review.whamcloud.com/5700
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/libcfs/libcfs_string.c |    2 -
 .../staging/lustre/lustre/include/lprocfs_status.h |    6 +
 drivers/staging/lustre/lustre/include/obd_class.h  |    9 ++
 drivers/staging/lustre/lustre/llite/file.c         |   44 ++++++
 .../staging/lustre/lustre/llite/llite_internal.h   |    6 +
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   47 +++++++
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |   68 ++++++++++
 .../lustre/lustre/obdclass/lprocfs_status.c        |  140 ++++++++++++++++++++
 8 files changed, 320 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
index fc697cd..56a614d 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
@@ -229,8 +229,6 @@ cfs_str2num_check(char *str, int nob, unsigned *num,
 	char *endp, cache;
 	int rc;
 
-	str = cfs_trimwhite(str);
-
 	/**
 	 * kstrouint can only handle strings composed
 	 * of only numbers. We need to scan the string
diff --git a/drivers/staging/lustre/lustre/include/lprocfs_status.h b/drivers/staging/lustre/lustre/include/lprocfs_status.h
index d68e60e..ff35e63 100644
--- a/drivers/staging/lustre/lustre/include/lprocfs_status.h
+++ b/drivers/staging/lustre/lustre/include/lprocfs_status.h
@@ -681,6 +681,12 @@ static struct lustre_attr lustre_attr_##name = __ATTR(name, mode, show, store)
 
 extern const struct sysfs_ops lustre_sysfs_ops;
 
+struct root_squash_info;
+int lprocfs_wr_root_squash(const char *buffer, unsigned long count,
+			   struct root_squash_info *squash, char *name);
+int lprocfs_wr_nosquash_nids(const char *buffer, unsigned long count,
+			     struct root_squash_info *squash, char *name);
+
 /* all quota proc functions */
 int lprocfs_quota_rd_bunit(char *page, char **start,
 			   loff_t off, int count,
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index a288995..e86961c 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1759,4 +1759,13 @@ extern spinlock_t obd_types_lock;
 /* prng.c */
 #define ll_generate_random_uuid(uuid_out) cfs_get_random_bytes(uuid_out, sizeof(class_uuid_t))
 
+/* root squash info */
+struct rw_semaphore;
+struct root_squash_info {
+	uid_t			rsi_uid;
+	gid_t			rsi_gid;
+	struct list_head	rsi_nosquash_nids;
+	struct rw_semaphore	rsi_sem;
+};
+
 #endif /* __LINUX_OBD_CLASS_H */
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 519db53..90a7170 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -41,6 +41,7 @@
 #include "../include/lustre_lite.h"
 #include <linux/pagemap.h>
 #include <linux/file.h>
+#include <linux/sched.h>
 #include <linux/mount.h>
 #include "llite_internal.h"
 #include "../include/lustre/ll_fiemap.h"
@@ -3289,6 +3290,12 @@ struct posix_acl *ll_get_acl(struct inode *inode, int type)
 
 int ll_inode_permission(struct inode *inode, int mask)
 {
+	struct ll_sb_info *sbi;
+	struct root_squash_info *squash;
+	const struct cred *old_cred = NULL;
+	struct cred *cred = NULL;
+	bool squash_id = false;
+	cfs_cap_t cap;
 	int rc = 0;
 
 	if (mask & MAY_NOT_BLOCK)
@@ -3308,9 +3315,46 @@ int ll_inode_permission(struct inode *inode, int mask)
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p), inode mode %x mask %o\n",
 	       PFID(ll_inode2fid(inode)), inode, inode->i_mode, mask);
 
+	/* squash fsuid/fsgid if needed */
+	sbi = ll_i2sbi(inode);
+	squash = &sbi->ll_squash;
+	if (unlikely(squash->rsi_uid &&
+		     uid_eq(current_fsuid(), GLOBAL_ROOT_UID) &&
+		     !(sbi->ll_flags & LL_SBI_NOROOTSQUASH))) {
+		squash_id = true;
+	}
+
+	if (squash_id) {
+		CDEBUG(D_OTHER, "squash creds (%d:%d)=>(%d:%d)\n",
+		       __kuid_val(current_fsuid()), __kgid_val(current_fsgid()),
+		       squash->rsi_uid, squash->rsi_gid);
+
+		/*
+		 * update current process's credentials
+		 * and FS capability
+		 */
+		cred = prepare_creds();
+		if (!cred)
+			return -ENOMEM;
+
+		cred->fsuid = make_kuid(&init_user_ns, squash->rsi_uid);
+		cred->fsgid = make_kgid(&init_user_ns, squash->rsi_gid);
+		for (cap = 0; cap < sizeof(cfs_cap_t) * 8; cap++) {
+			if ((1 << cap) & CFS_CAP_FS_MASK)
+				cap_lower(cred->cap_effective, cap);
+		}
+		old_cred = override_creds(cred);
+	}
+
 	ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_INODE_PERM, 1);
 	rc = generic_permission(inode, mask);
 
+	/* restore current process's credentials and FS capability */
+	if (squash_id) {
+		revert_creds(old_cred);
+		put_cred(cred);
+	}
+
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index e101dd8..500b5ec 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -412,6 +412,7 @@ enum stats_track_type {
 #define LL_SBI_LAYOUT_LOCK    0x20000 /* layout lock support */
 #define LL_SBI_USER_FID2PATH  0x40000 /* allow fid2path by unprivileged users */
 #define LL_SBI_XATTR_CACHE    0x80000 /* support for xattr cache */
+#define LL_SBI_NOROOTSQUASH	0x100000 /* do not apply root squash */
 
 #define LL_SBI_FLAGS {	\
 	"nolck",	\
@@ -434,6 +435,7 @@ enum stats_track_type {
 	"layout",	\
 	"user_fid2path",\
 	"xattr",	\
+	"norootsquash",	\
 }
 
 struct ll_sb_info {
@@ -500,6 +502,9 @@ struct ll_sb_info {
 	dev_t			  ll_sdev_orig; /* save s_dev before assign for
 						 * clustered nfs
 						 */
+	/* root squash */
+	struct root_squash_info	  ll_squash;
+
 	__kernel_fsid_t		  ll_fsid;
 	struct kobject		 ll_kobj; /* sysfs object */
 	struct super_block	*ll_sb; /* struct super_block (for sysfs code)*/
@@ -798,6 +803,7 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 void ll_finish_md_op_data(struct md_op_data *op_data);
 int ll_get_obd_name(struct inode *inode, unsigned int cmd, unsigned long arg);
 char *ll_get_fsname(struct super_block *sb, char *buf, int buflen);
+void ll_compute_rootsquash_state(struct ll_sb_info *sbi);
 void ll_open_cleanup(struct super_block *sb, struct ptlrpc_request *open_req);
 
 /* llite/llite_nfs.c */
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index a3b4c97..0a28925 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -119,6 +119,12 @@ static struct ll_sb_info *ll_init_sbi(struct super_block *sb)
 	atomic_set(&sbi->ll_agl_total, 0);
 	sbi->ll_flags |= LL_SBI_AGL_ENABLED;
 
+	/* root squash */
+	sbi->ll_squash.rsi_uid = 0;
+	sbi->ll_squash.rsi_gid = 0;
+	INIT_LIST_HEAD(&sbi->ll_squash.rsi_nosquash_nids);
+	init_rwsem(&sbi->ll_squash.rsi_sem);
+
 	sbi->ll_sb = sb;
 
 	return sbi;
@@ -129,6 +135,8 @@ static void ll_free_sbi(struct super_block *sb)
 	struct ll_sb_info *sbi = ll_s2sbi(sb);
 
 	if (sbi->ll_cache) {
+		if (!list_empty(&sbi->ll_squash.rsi_nosquash_nids))
+			cfs_free_nidlist(&sbi->ll_squash.rsi_nosquash_nids);
 		cl_cache_decref(sbi->ll_cache);
 		sbi->ll_cache = NULL;
 	}
@@ -2496,3 +2504,42 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret)
 	if (buf)
 		free_page((unsigned long)buf);
 }
+
+/*
+ * Compute llite root squash state after a change of root squash
+ * configuration setting or add/remove of a lnet nid
+ */
+void ll_compute_rootsquash_state(struct ll_sb_info *sbi)
+{
+	struct root_squash_info *squash = &sbi->ll_squash;
+	lnet_process_id_t id;
+	bool matched;
+	int i;
+
+	/* Update norootsquash flag */
+	down_write(&squash->rsi_sem);
+	if (list_empty(&squash->rsi_nosquash_nids)) {
+		sbi->ll_flags &= ~LL_SBI_NOROOTSQUASH;
+	} else {
+		/*
+		 * Do not apply root squash as soon as one of our NIDs is
+		 * in the nosquash_nids list
+		 */
+		matched = false;
+		i = 0;
+
+		while (LNetGetId(i++, &id) != -ENOENT) {
+			if (LNET_NETTYP(LNET_NIDNET(id.nid)) == LOLND)
+				continue;
+			if (cfs_match_nid(id.nid, &squash->rsi_nosquash_nids)) {
+				matched = true;
+				break;
+			}
+		}
+		if (matched)
+			sbi->ll_flags |= LL_SBI_NOROOTSQUASH;
+		else
+			sbi->ll_flags &= ~LL_SBI_NOROOTSQUASH;
+	}
+	up_write(&squash->rsi_sem);
+}
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index e86bf3c..2f1f389 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -833,6 +833,71 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
 }
 LUSTRE_RO_ATTR(unstable_stats);
 
+static ssize_t root_squash_show(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	struct root_squash_info *squash = &sbi->ll_squash;
+
+	return sprintf(buf, "%u:%u\n", squash->rsi_uid, squash->rsi_gid);
+}
+
+static ssize_t root_squash_store(struct kobject *kobj, struct attribute *attr,
+				 const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	struct root_squash_info *squash = &sbi->ll_squash;
+
+	return lprocfs_wr_root_squash(buffer, count, squash,
+				      ll_get_fsname(sbi->ll_sb, NULL, 0));
+}
+LUSTRE_RW_ATTR(root_squash);
+
+static int ll_nosquash_nids_seq_show(struct seq_file *m, void *v)
+{
+	struct super_block *sb = m->private;
+	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct root_squash_info *squash = &sbi->ll_squash;
+	int len;
+
+	down_read(&squash->rsi_sem);
+	if (!list_empty(&squash->rsi_nosquash_nids)) {
+		len = cfs_print_nidlist(m->buf + m->count, m->size - m->count,
+					&squash->rsi_nosquash_nids);
+		m->count += len;
+		seq_puts(m, "\n");
+	} else {
+		seq_puts(m, "NONE\n");
+	}
+	up_read(&squash->rsi_sem);
+
+	return 0;
+}
+
+static ssize_t ll_nosquash_nids_seq_write(struct file *file,
+					  const char __user *buffer,
+					  size_t count, loff_t *off)
+{
+	struct seq_file *m = file->private_data;
+	struct super_block *sb = m->private;
+	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct root_squash_info *squash = &sbi->ll_squash;
+	int rc;
+
+	rc = lprocfs_wr_nosquash_nids(buffer, count, squash,
+				      ll_get_fsname(sb, NULL, 0));
+	if (rc < 0)
+		return rc;
+
+	ll_compute_rootsquash_state(sbi);
+
+	return rc;
+}
+
+LPROC_SEQ_FOPS(ll_nosquash_nids);
+
 static struct lprocfs_vars lprocfs_llite_obd_vars[] = {
 	/* { "mntpt_path",   ll_rd_path,	     0, 0 }, */
 	{ "site",	  &ll_site_stats_fops,    NULL, 0 },
@@ -840,6 +905,8 @@ static struct lprocfs_vars lprocfs_llite_obd_vars[] = {
 	{ "max_cached_mb",    &ll_max_cached_mb_fops, NULL },
 	{ "statahead_stats",  &ll_statahead_stats_fops, NULL, 0 },
 	{ "sbi_flags",	      &ll_sbi_flags_fops, NULL, 0 },
+	{ .name =		"nosquash_nids",
+	  .fops =		&ll_nosquash_nids_fops		},
 	{ NULL }
 };
 
@@ -869,6 +936,7 @@ static struct attribute *llite_attrs[] = {
 	&lustre_attr_default_easize.attr,
 	&lustre_attr_xattr_cache.attr,
 	&lustre_attr_unstable_stats.attr,
+	&lustre_attr_root_squash.attr,
 	NULL,
 };
 
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index 279b625..c83d28e 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -1547,6 +1547,146 @@ void lprocfs_oh_clear(struct obd_histogram *oh)
 }
 EXPORT_SYMBOL(lprocfs_oh_clear);
 
+int lprocfs_wr_root_squash(const char __user *buffer, unsigned long count,
+			   struct root_squash_info *squash, char *name)
+{
+	char kernbuf[64], *tmp, *errmsg;
+	unsigned long uid, gid;
+	int rc;
+
+	if (count >= sizeof(kernbuf)) {
+		errmsg = "string too long";
+		rc = -EINVAL;
+		goto failed_noprint;
+	}
+	if (copy_from_user(kernbuf, buffer, count)) {
+		errmsg = "bad address";
+		rc = -EFAULT;
+		goto failed_noprint;
+	}
+	kernbuf[count] = '\0';
+
+	/* look for uid gid separator */
+	tmp = strchr(kernbuf, ':');
+	if (!tmp) {
+		errmsg = "needs uid:gid format";
+		rc = -EINVAL;
+		goto failed;
+	}
+	*tmp = '\0';
+	tmp++;
+
+	/* parse uid */
+	if (kstrtoul(kernbuf, 0, &uid) != 0) {
+		errmsg = "bad uid";
+		rc = -EINVAL;
+		goto failed;
+	}
+	/* parse gid */
+	if (kstrtoul(tmp, 0, &gid) != 0) {
+		errmsg = "bad gid";
+		rc = -EINVAL;
+		goto failed;
+	}
+
+	squash->rsi_uid = uid;
+	squash->rsi_gid = gid;
+
+	LCONSOLE_INFO("%s: root_squash is set to %u:%u\n",
+		      name, squash->rsi_uid, squash->rsi_gid);
+	return count;
+
+failed:
+	if (tmp) {
+		tmp--;
+		*tmp = ':';
+	}
+	CWARN("%s: failed to set root_squash to \"%s\", %s, rc = %d\n",
+	      name, kernbuf, errmsg, rc);
+	return rc;
+failed_noprint:
+	CWARN("%s: failed to set root_squash due to %s, rc = %d\n",
+	      name, errmsg, rc);
+	return rc;
+}
+EXPORT_SYMBOL(lprocfs_wr_root_squash);
+
+int lprocfs_wr_nosquash_nids(const char __user *buffer, unsigned long count,
+			     struct root_squash_info *squash, char *name)
+{
+	char *kernbuf = NULL, *errmsg;
+	struct list_head tmp;
+	int len = count;
+	int rc;
+
+	if (count > 4096) {
+		errmsg = "string too long";
+		rc = -EINVAL;
+		goto failed;
+	}
+
+	kernbuf = kzalloc(count + 1, GFP_NOFS);
+	if (!kernbuf) {
+		errmsg = "no memory";
+		rc = -ENOMEM;
+		goto failed;
+	}
+
+	if (copy_from_user(kernbuf, buffer, count)) {
+		errmsg = "bad address";
+		rc = -EFAULT;
+		goto failed;
+	}
+	kernbuf[count] = '\0';
+
+	if (count > 0 && kernbuf[count - 1] == '\n')
+		len = count - 1;
+
+	if ((len == 4 && !strncmp(kernbuf, "NONE", len)) ||
+	    (len == 5 && !strncmp(kernbuf, "clear", len))) {
+		/* empty string is special case */
+		down_write(&squash->rsi_sem);
+		if (!list_empty(&squash->rsi_nosquash_nids))
+			cfs_free_nidlist(&squash->rsi_nosquash_nids);
+		up_write(&squash->rsi_sem);
+		LCONSOLE_INFO("%s: nosquash_nids is cleared\n", name);
+		kfree(kernbuf);
+		return count;
+	}
+
+	INIT_LIST_HEAD(&tmp);
+	if (cfs_parse_nidlist(kernbuf, count, &tmp) <= 0) {
+		errmsg = "can't parse";
+		rc = -EINVAL;
+		goto failed;
+	}
+	LCONSOLE_INFO("%s: nosquash_nids set to %s\n",
+		      name, kernbuf);
+	kfree(kernbuf);
+	kernbuf = NULL;
+
+	down_write(&squash->rsi_sem);
+	if (!list_empty(&squash->rsi_nosquash_nids))
+		cfs_free_nidlist(&squash->rsi_nosquash_nids);
+	list_splice(&tmp, &squash->rsi_nosquash_nids);
+	up_write(&squash->rsi_sem);
+
+	return count;
+
+failed:
+	if (kernbuf) {
+		CWARN("%s: failed to set nosquash_nids to \"%s\", %s rc = %d\n",
+		      name, kernbuf, errmsg, rc);
+		kfree(kernbuf);
+		kernbuf = NULL;
+	} else {
+		CWARN("%s: failed to set nosquash_nids due to %s rc = %d\n",
+		      name, errmsg, rc);
+	}
+	return rc;
+}
+EXPORT_SYMBOL(lprocfs_wr_nosquash_nids);
+
 static ssize_t lustre_attr_show(struct kobject *kobj,
 				struct attribute *attr, char *buf)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 44/80] staging: lustre: Remove static declaration in anonymous union
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Christopher J. Morrone, James Simmons

From: Christopher J. Morrone <morrone2@llnl.gov>

It is not permitted in C++ to have a static declaration inside
of an anonymous union. The g++ compiler will complaine with an
error like this:

 error: struct ost_id::<anonymous union>::ostid invalid; an
 anonymous union can only have non-static data members [-fpermissive]

This patch changes the code to use an unnamed struct in place of
"struct ostid" inside of the anonymous union. That name declaration
was completely unnecessary anyway, since it was not used anywhere else.

Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4987
Reviewed-on: http://review.whamcloud.com/10176
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_user.h     |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 4b2553c..59d45de 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -167,7 +167,7 @@ struct lustre_mdt_attrs {
  */
 struct ost_id {
 	union {
-		struct ostid {
+		struct {
 			__u64	oi_id;
 			__u64	oi_seq;
 		} oi;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 44/80] staging: lustre: Remove static declaration in anonymous union
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Christopher J. Morrone, James Simmons

From: Christopher J. Morrone <morrone2@llnl.gov>

It is not permitted in C++ to have a static declaration inside
of an anonymous union. The g++ compiler will complaine with an
error like this:

 error: struct ost_id::<anonymous union>::ostid invalid; an
 anonymous union can only have non-static data members [-fpermissive]

This patch changes the code to use an unnamed struct in place of
"struct ostid" inside of the anonymous union. That name declaration
was completely unnecessary anyway, since it was not used anywhere else.

Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4987
Reviewed-on: http://review.whamcloud.com/10176
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_user.h     |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 4b2553c..59d45de 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -167,7 +167,7 @@ struct lustre_mdt_attrs {
  */
 struct ost_id {
 	union {
-		struct ostid {
+		struct {
 			__u64	oi_id;
 			__u64	oi_seq;
 		} oi;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 45/80] staging: lustre: llite: Fix the deadlock in balance_dirty_pages()
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

If the page is already dirtied in ll_write_end() and kernel tries
to call balance_dirty_pages() to write back dirty pages in the same
thread, this is deadlock case if the page is already held by clio.

This can also fix the issue of LU-4873.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4977
Reviewed-on: http://review.whamcloud.com/10149
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/rw26.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index c14a1b6..8c8c100 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -506,9 +506,8 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 	env = lcc->lcc_env;
 	io  = lcc->lcc_io;
 
-	if (likely(to == PAGE_SIZE)) /* LU-4873 */
-		/* To avoid deadlock, try to lock page first. */
-		vmpage = grab_cache_page_nowait(mapping, index);
+	/* To avoid deadlock, try to lock page first. */
+	vmpage = grab_cache_page_nowait(mapping, index);
 	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
 		struct vvp_io *vio = vvp_env_io(env);
 		struct cl_page_list *plist = &vio->u.write.vui_queue;
@@ -617,6 +616,13 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 			LASSERT(from == 0);
 		vio->u.write.vui_to = from + copied;
 
+		/*
+		 * To address the deadlock in balance_dirty_pages() where
+		 * this dirty page may be written back in the same thread.
+		 */
+		if (PageDirty(vmpage))
+			unplug = true;
+
 		/* We may have one full RPC, commit it soon */
 		if (plist->pl_nr >= PTLRPC_MAX_BRW_PAGES)
 			unplug = true;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 45/80] staging: lustre: llite: Fix the deadlock in balance_dirty_pages()
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

If the page is already dirtied in ll_write_end() and kernel tries
to call balance_dirty_pages() to write back dirty pages in the same
thread, this is deadlock case if the page is already held by clio.

This can also fix the issue of LU-4873.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4977
Reviewed-on: http://review.whamcloud.com/10149
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/rw26.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index c14a1b6..8c8c100 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -506,9 +506,8 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 	env = lcc->lcc_env;
 	io  = lcc->lcc_io;
 
-	if (likely(to == PAGE_SIZE)) /* LU-4873 */
-		/* To avoid deadlock, try to lock page first. */
-		vmpage = grab_cache_page_nowait(mapping, index);
+	/* To avoid deadlock, try to lock page first. */
+	vmpage = grab_cache_page_nowait(mapping, index);
 	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
 		struct vvp_io *vio = vvp_env_io(env);
 		struct cl_page_list *plist = &vio->u.write.vui_queue;
@@ -617,6 +616,13 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 			LASSERT(from == 0);
 		vio->u.write.vui_to = from + copied;
 
+		/*
+		 * To address the deadlock in balance_dirty_pages() where
+		 * this dirty page may be written back in the same thread.
+		 */
+		if (PageDirty(vmpage))
+			unplug = true;
+
 		/* We may have one full RPC, commit it soon */
 		if (plist->pl_nr >= PTLRPC_MAX_BRW_PAGES)
 			unplug = true;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 46/80] staging: lustre: llite: Change readdir BRW metrics
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:18   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Andreas Dilger, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

To simplify the code, change the metrics from bytes to pages.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5034
Reviewed-on: http://review.whamcloud.com/10275
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |    2 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 47fbcd2..924b5df 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -148,7 +148,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	struct page **page_pool;
 	struct page *page;
 	struct lu_dirpage *dp;
-	int max_pages = ll_i2sbi(inode)->ll_md_brw_size >> PAGE_SHIFT;
+	int max_pages = ll_i2sbi(inode)->ll_md_brw_pages;
 	int nrdpgs = 0; /* number of pages read actually */
 	int npages;
 	int i;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 500b5ec..3d7fa9a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -474,7 +474,7 @@ struct ll_sb_info {
 	unsigned int	      ll_namelen;
 	struct file_operations   *ll_fop;
 
-	unsigned int	      ll_md_brw_size; /* used by readdir */
+	unsigned int		  ll_md_brw_pages; /* readdir pages per RPC */
 
 	struct lu_site	   *ll_site;
 	struct cl_device	 *ll_cl;
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 0a28925..ac59cd6 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -319,9 +319,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 		sbi->ll_flags |= LL_SBI_64BIT_HASH;
 
 	if (data->ocd_connect_flags & OBD_CONNECT_BRW_SIZE)
-		sbi->ll_md_brw_size = data->ocd_brw_size;
+		sbi->ll_md_brw_pages = data->ocd_brw_size >> PAGE_SHIFT;
 	else
-		sbi->ll_md_brw_size = PAGE_SIZE;
+		sbi->ll_md_brw_pages = 1;
 
 	if (data->ocd_connect_flags & OBD_CONNECT_LAYOUTLOCK)
 		sbi->ll_flags |= LL_SBI_LAYOUT_LOCK;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 46/80] staging: lustre: llite: Change readdir BRW metrics
@ 2016-08-16 20:18   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

To simplify the code, change the metrics from bytes to pages.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5034
Reviewed-on: http://review.whamcloud.com/10275
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |    2 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 47fbcd2..924b5df 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -148,7 +148,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	struct page **page_pool;
 	struct page *page;
 	struct lu_dirpage *dp;
-	int max_pages = ll_i2sbi(inode)->ll_md_brw_size >> PAGE_SHIFT;
+	int max_pages = ll_i2sbi(inode)->ll_md_brw_pages;
 	int nrdpgs = 0; /* number of pages read actually */
 	int npages;
 	int i;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 500b5ec..3d7fa9a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -474,7 +474,7 @@ struct ll_sb_info {
 	unsigned int	      ll_namelen;
 	struct file_operations   *ll_fop;
 
-	unsigned int	      ll_md_brw_size; /* used by readdir */
+	unsigned int		  ll_md_brw_pages; /* readdir pages per RPC */
 
 	struct lu_site	   *ll_site;
 	struct cl_device	 *ll_cl;
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 0a28925..ac59cd6 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -319,9 +319,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 		sbi->ll_flags |= LL_SBI_64BIT_HASH;
 
 	if (data->ocd_connect_flags & OBD_CONNECT_BRW_SIZE)
-		sbi->ll_md_brw_size = data->ocd_brw_size;
+		sbi->ll_md_brw_pages = data->ocd_brw_size >> PAGE_SHIFT;
 	else
-		sbi->ll_md_brw_size = PAGE_SIZE;
+		sbi->ll_md_brw_pages = 1;
 
 	if (data->ocd_connect_flags & OBD_CONNECT_LAYOUTLOCK)
 		sbi->ll_flags |= LL_SBI_LAYOUT_LOCK;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 47/80] staging: lustre: uapi: reduce scope of lustre_idl.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Move the definition of OBD_OCD_VERSION() and similar macros from
lustre_idl.h to lustre_ver.h. These macros are primarily used in
comparisons to LUSTRE_VERSION_CODE which is defined in lustre_ver.h
and so should be defined there as well. Move a few definitions
(related to FIDs, quota and striping) from lustre_idl.h to
lustre_user.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5065
Reviewed-on: http://review.whamcloud.com/10336
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   38 +------------------
 .../lustre/lustre/include/lustre/lustre_user.h     |   32 +++++++++++++++--
 drivers/staging/lustre/lustre/include/lustre_ver.h |   13 +++++--
 3 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 8736826..69bed64 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -93,6 +93,7 @@
 /* Defn's shared with user-space. */
 #include "lustre_user.h"
 #include "lustre_errno.h"
+#include "../lustre_ver.h"
 
 /*
  *  GENERAL STUFF
@@ -846,11 +847,6 @@ static inline bool fid_is_sane(const struct lu_fid *fid)
 		fid_seq_is_rsvd(fid_seq(fid)));
 }
 
-static inline bool fid_is_zero(const struct lu_fid *fid)
-{
-	return fid_seq(fid) == 0 && fid_oid(fid) == 0;
-}
-
 void lustre_swab_lu_fid(struct lu_fid *fid);
 void lustre_swab_lu_seq_range(struct lu_seq_range *range);
 
@@ -1318,14 +1314,6 @@ void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 #define CLIENT_CONNECT_MDT_REQD (OBD_CONNECT_IBITS | OBD_CONNECT_FID | \
 				 OBD_CONNECT_FULL20)
 
-#define OBD_OCD_VERSION(major, minor, patch, fix) (((major)<<24) + \
-						  ((minor)<<16) + \
-						  ((patch)<<8) + (fix))
-#define OBD_OCD_VERSION_MAJOR(version) ((int)((version)>>24)&255)
-#define OBD_OCD_VERSION_MINOR(version) ((int)((version)>>16)&255)
-#define OBD_OCD_VERSION_PATCH(version) ((int)((version)>>8)&255)
-#define OBD_OCD_VERSION_FIX(version)   ((int)(version)&255)
-
 /* This structure is used for both request and reply.
  *
  * If we eventually have separate connect data for different types, which we
@@ -1509,14 +1497,6 @@ enum obdo_flags {
 #define LOV_MAGIC_V1_DEF  0x0CD10BD0
 #define LOV_MAGIC_V3_DEF  0x0CD30BD0
 
-#define LOV_PATTERN_RAID0	0x001   /* stripes are used round-robin */
-#define LOV_PATTERN_RAID1	0x002   /* stripes are mirrors of each other */
-#define LOV_PATTERN_FIRST	0x100   /* first stripe is not in round-robin */
-#define LOV_PATTERN_CMOBD	0x200
-
-#define LOV_PATTERN_F_MASK	0xffff0000
-#define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
-
 #define lov_pattern(pattern)		(pattern & ~LOV_PATTERN_F_MASK)
 #define lov_pattern_flags(pattern)	(pattern & LOV_PATTERN_F_MASK)
 
@@ -1796,7 +1776,7 @@ void lustre_swab_obd_statfs(struct obd_statfs *os);
 				      * it to sync quickly
 				      */
 
-#define OBD_OBJECT_EOF 0xffffffffffffffffULL
+#define OBD_OBJECT_EOF	LUSTRE_EOF
 
 #define OST_MIN_PRECREATE 32
 #define OST_MAX_PRECREATE 20000
@@ -1892,12 +1872,6 @@ struct obd_quotactl {
 
 void lustre_swab_obd_quotactl(struct obd_quotactl *q);
 
-#define Q_QUOTACHECK	0x800100 /* deprecated as of 2.4 */
-#define Q_INITQUOTA	0x800101 /* deprecated as of 2.4  */
-#define Q_GETOINFO	0x800102 /* get obd quota info */
-#define Q_GETOQUOTA	0x800103 /* get obd quotas */
-#define Q_FINVALIDATE	0x800104 /* deprecated as of 2.4 */
-
 #define Q_COPY(out, in, member) (out)->member = (in)->member
 
 #define QCTL_COPY(out, in)		\
@@ -2533,19 +2507,11 @@ struct lmv_mds_md_v1 {
  * for example the object is being migrated. And the hash function
  * might be interpreted differently with different flags.
  */
-enum lmv_hash_type {
-	LMV_HASH_TYPE_ALL_CHARS = 1,
-	LMV_HASH_TYPE_FNV_1A_64 = 2,
-};
-
 #define LMV_HASH_TYPE_MASK		0x0000ffff
 
 #define LMV_HASH_FLAG_MIGRATION		0x80000000
 #define LMV_HASH_FLAG_DEAD		0x40000000
 
-#define LMV_HASH_NAME_ALL_CHARS		"all_char"
-#define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
-
 /**
  * The FNV-1a hash algorithm is as follows:
  *     hash = FNV_offset_basis
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 59d45de..8398c4f 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -45,6 +45,8 @@
 #include "ll_fiemap.h"
 #include "../linux/lustre_user.h"
 
+#define LUSTRE_EOF 0xffffffffffffffffULL
+
 /* for statfs() */
 #define LL_SUPER_MAGIC 0x0BD00BD0
 
@@ -117,6 +119,11 @@ struct lu_fid {
 	__u32 f_ver;
 };
 
+static inline bool fid_is_zero(const struct lu_fid *fid)
+{
+	return !fid->f_seq && !fid->f_oid;
+}
+
 struct filter_fid {
 	struct lu_fid	ff_parent;  /* ff_parent.f_ver == file stripe number */
 };
@@ -271,9 +278,14 @@ struct ost_id {
 
 #define LMV_USER_MAGIC    0x0CD30CD0    /*default lmv magic*/
 
-#define LOV_PATTERN_RAID0 0x001
-#define LOV_PATTERN_RAID1 0x002
-#define LOV_PATTERN_FIRST 0x100
+#define LOV_PATTERN_RAID0	0x001
+#define LOV_PATTERN_RAID1	0x002
+#define LOV_PATTERN_FIRST	0x100
+#define LOV_PATTERN_CMOBD	0x200
+
+#define LOV_PATTERN_F_MASK	0xffff0000
+#define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
+
 
 #define LOV_MAXPOOLNAME 16
 #define LOV_POOLNAMEF "%.16s"
@@ -370,6 +382,14 @@ struct lmv_user_mds_data {
 	__u32		lum_mds;
 };
 
+enum lmv_hash_type {
+	LMV_HASH_TYPE_ALL_CHARS = 1,
+	LMV_HASH_TYPE_FNV_1A_64 = 2,
+};
+
+#define LMV_HASH_NAME_ALL_CHARS		"all_char"
+#define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
+
 /*
  * Got this according to how get LOV_MAX_STRIPE_COUNT, see above,
  * (max buffer size - lmv+rpc header) / sizeof(struct lmv_user_mds_data)
@@ -488,6 +508,12 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen)
 
 /********* Quotas **********/
 
+#define Q_QUOTACHECK   0x800100 /* deprecated as of 2.4 */
+#define Q_INITQUOTA    0x800101 /* deprecated as of 2.4  */
+#define Q_GETOINFO     0x800102 /* get obd quota info */
+#define Q_GETOQUOTA    0x800103 /* get obd quotas */
+#define Q_FINVALIDATE  0x800104 /* deprecated as of 2.4 */
+
 /* these must be explicitly translated into linux Q_* in ll_dir_ioctl */
 #define LUSTRE_Q_QUOTAON    0x800002     /* turn quotas on */
 #define LUSTRE_Q_QUOTAOFF   0x800003     /* turn quotas off */
diff --git a/drivers/staging/lustre/lustre/include/lustre_ver.h b/drivers/staging/lustre/lustre/include/lustre_ver.h
index 64559a1..2bb59b2 100644
--- a/drivers/staging/lustre/lustre/include/lustre_ver.h
+++ b/drivers/staging/lustre/lustre/include/lustre_ver.h
@@ -7,9 +7,16 @@
 #define LUSTRE_FIX 0
 #define LUSTRE_VERSION_STRING "2.4.60"
 
-#define LUSTRE_VERSION_CODE OBD_OCD_VERSION(LUSTRE_MAJOR, \
-					    LUSTRE_MINOR, LUSTRE_PATCH, \
-					    LUSTRE_FIX)
+#define OBD_OCD_VERSION(major, minor, patch, fix)			\
+	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
+
+#define OBD_OCD_VERSION_MAJOR(version)	((int)((version) >> 24) & 255)
+#define OBD_OCD_VERSION_MINOR(version)	((int)((version) >> 16) & 255)
+#define OBD_OCD_VERSION_PATCH(version)	((int)((version) >>  8) & 255)
+#define OBD_OCD_VERSION_FIX(version)	((int)((version) >>  0) & 255)
+
+#define LUSTRE_VERSION_CODE						\
+	OBD_OCD_VERSION(LUSTRE_MAJOR, LUSTRE_MINOR, LUSTRE_PATCH, LUSTRE_FIX)
 
 /*
  * If lustre version of client and servers it connects to differs by more
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 47/80] staging: lustre: uapi: reduce scope of lustre_idl.h
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Move the definition of OBD_OCD_VERSION() and similar macros from
lustre_idl.h to lustre_ver.h. These macros are primarily used in
comparisons to LUSTRE_VERSION_CODE which is defined in lustre_ver.h
and so should be defined there as well. Move a few definitions
(related to FIDs, quota and striping) from lustre_idl.h to
lustre_user.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5065
Reviewed-on: http://review.whamcloud.com/10336
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   38 +------------------
 .../lustre/lustre/include/lustre/lustre_user.h     |   32 +++++++++++++++--
 drivers/staging/lustre/lustre/include/lustre_ver.h |   13 +++++--
 3 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 8736826..69bed64 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -93,6 +93,7 @@
 /* Defn's shared with user-space. */
 #include "lustre_user.h"
 #include "lustre_errno.h"
+#include "../lustre_ver.h"
 
 /*
  *  GENERAL STUFF
@@ -846,11 +847,6 @@ static inline bool fid_is_sane(const struct lu_fid *fid)
 		fid_seq_is_rsvd(fid_seq(fid)));
 }
 
-static inline bool fid_is_zero(const struct lu_fid *fid)
-{
-	return fid_seq(fid) == 0 && fid_oid(fid) == 0;
-}
-
 void lustre_swab_lu_fid(struct lu_fid *fid);
 void lustre_swab_lu_seq_range(struct lu_seq_range *range);
 
@@ -1318,14 +1314,6 @@ void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 #define CLIENT_CONNECT_MDT_REQD (OBD_CONNECT_IBITS | OBD_CONNECT_FID | \
 				 OBD_CONNECT_FULL20)
 
-#define OBD_OCD_VERSION(major, minor, patch, fix) (((major)<<24) + \
-						  ((minor)<<16) + \
-						  ((patch)<<8) + (fix))
-#define OBD_OCD_VERSION_MAJOR(version) ((int)((version)>>24)&255)
-#define OBD_OCD_VERSION_MINOR(version) ((int)((version)>>16)&255)
-#define OBD_OCD_VERSION_PATCH(version) ((int)((version)>>8)&255)
-#define OBD_OCD_VERSION_FIX(version)   ((int)(version)&255)
-
 /* This structure is used for both request and reply.
  *
  * If we eventually have separate connect data for different types, which we
@@ -1509,14 +1497,6 @@ enum obdo_flags {
 #define LOV_MAGIC_V1_DEF  0x0CD10BD0
 #define LOV_MAGIC_V3_DEF  0x0CD30BD0
 
-#define LOV_PATTERN_RAID0	0x001   /* stripes are used round-robin */
-#define LOV_PATTERN_RAID1	0x002   /* stripes are mirrors of each other */
-#define LOV_PATTERN_FIRST	0x100   /* first stripe is not in round-robin */
-#define LOV_PATTERN_CMOBD	0x200
-
-#define LOV_PATTERN_F_MASK	0xffff0000
-#define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
-
 #define lov_pattern(pattern)		(pattern & ~LOV_PATTERN_F_MASK)
 #define lov_pattern_flags(pattern)	(pattern & LOV_PATTERN_F_MASK)
 
@@ -1796,7 +1776,7 @@ void lustre_swab_obd_statfs(struct obd_statfs *os);
 				      * it to sync quickly
 				      */
 
-#define OBD_OBJECT_EOF 0xffffffffffffffffULL
+#define OBD_OBJECT_EOF	LUSTRE_EOF
 
 #define OST_MIN_PRECREATE 32
 #define OST_MAX_PRECREATE 20000
@@ -1892,12 +1872,6 @@ struct obd_quotactl {
 
 void lustre_swab_obd_quotactl(struct obd_quotactl *q);
 
-#define Q_QUOTACHECK	0x800100 /* deprecated as of 2.4 */
-#define Q_INITQUOTA	0x800101 /* deprecated as of 2.4  */
-#define Q_GETOINFO	0x800102 /* get obd quota info */
-#define Q_GETOQUOTA	0x800103 /* get obd quotas */
-#define Q_FINVALIDATE	0x800104 /* deprecated as of 2.4 */
-
 #define Q_COPY(out, in, member) (out)->member = (in)->member
 
 #define QCTL_COPY(out, in)		\
@@ -2533,19 +2507,11 @@ struct lmv_mds_md_v1 {
  * for example the object is being migrated. And the hash function
  * might be interpreted differently with different flags.
  */
-enum lmv_hash_type {
-	LMV_HASH_TYPE_ALL_CHARS = 1,
-	LMV_HASH_TYPE_FNV_1A_64 = 2,
-};
-
 #define LMV_HASH_TYPE_MASK		0x0000ffff
 
 #define LMV_HASH_FLAG_MIGRATION		0x80000000
 #define LMV_HASH_FLAG_DEAD		0x40000000
 
-#define LMV_HASH_NAME_ALL_CHARS		"all_char"
-#define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
-
 /**
  * The FNV-1a hash algorithm is as follows:
  *     hash = FNV_offset_basis
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 59d45de..8398c4f 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -45,6 +45,8 @@
 #include "ll_fiemap.h"
 #include "../linux/lustre_user.h"
 
+#define LUSTRE_EOF 0xffffffffffffffffULL
+
 /* for statfs() */
 #define LL_SUPER_MAGIC 0x0BD00BD0
 
@@ -117,6 +119,11 @@ struct lu_fid {
 	__u32 f_ver;
 };
 
+static inline bool fid_is_zero(const struct lu_fid *fid)
+{
+	return !fid->f_seq && !fid->f_oid;
+}
+
 struct filter_fid {
 	struct lu_fid	ff_parent;  /* ff_parent.f_ver == file stripe number */
 };
@@ -271,9 +278,14 @@ struct ost_id {
 
 #define LMV_USER_MAGIC    0x0CD30CD0    /*default lmv magic*/
 
-#define LOV_PATTERN_RAID0 0x001
-#define LOV_PATTERN_RAID1 0x002
-#define LOV_PATTERN_FIRST 0x100
+#define LOV_PATTERN_RAID0	0x001
+#define LOV_PATTERN_RAID1	0x002
+#define LOV_PATTERN_FIRST	0x100
+#define LOV_PATTERN_CMOBD	0x200
+
+#define LOV_PATTERN_F_MASK	0xffff0000
+#define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
+
 
 #define LOV_MAXPOOLNAME 16
 #define LOV_POOLNAMEF "%.16s"
@@ -370,6 +382,14 @@ struct lmv_user_mds_data {
 	__u32		lum_mds;
 };
 
+enum lmv_hash_type {
+	LMV_HASH_TYPE_ALL_CHARS = 1,
+	LMV_HASH_TYPE_FNV_1A_64 = 2,
+};
+
+#define LMV_HASH_NAME_ALL_CHARS		"all_char"
+#define LMV_HASH_NAME_FNV_1A_64		"fnv_1a_64"
+
 /*
  * Got this according to how get LOV_MAX_STRIPE_COUNT, see above,
  * (max buffer size - lmv+rpc header) / sizeof(struct lmv_user_mds_data)
@@ -488,6 +508,12 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen)
 
 /********* Quotas **********/
 
+#define Q_QUOTACHECK   0x800100 /* deprecated as of 2.4 */
+#define Q_INITQUOTA    0x800101 /* deprecated as of 2.4  */
+#define Q_GETOINFO     0x800102 /* get obd quota info */
+#define Q_GETOQUOTA    0x800103 /* get obd quotas */
+#define Q_FINVALIDATE  0x800104 /* deprecated as of 2.4 */
+
 /* these must be explicitly translated into linux Q_* in ll_dir_ioctl */
 #define LUSTRE_Q_QUOTAON    0x800002     /* turn quotas on */
 #define LUSTRE_Q_QUOTAOFF   0x800003     /* turn quotas off */
diff --git a/drivers/staging/lustre/lustre/include/lustre_ver.h b/drivers/staging/lustre/lustre/include/lustre_ver.h
index 64559a1..2bb59b2 100644
--- a/drivers/staging/lustre/lustre/include/lustre_ver.h
+++ b/drivers/staging/lustre/lustre/include/lustre_ver.h
@@ -7,9 +7,16 @@
 #define LUSTRE_FIX 0
 #define LUSTRE_VERSION_STRING "2.4.60"
 
-#define LUSTRE_VERSION_CODE OBD_OCD_VERSION(LUSTRE_MAJOR, \
-					    LUSTRE_MINOR, LUSTRE_PATCH, \
-					    LUSTRE_FIX)
+#define OBD_OCD_VERSION(major, minor, patch, fix)			\
+	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
+
+#define OBD_OCD_VERSION_MAJOR(version)	((int)((version) >> 24) & 255)
+#define OBD_OCD_VERSION_MINOR(version)	((int)((version) >> 16) & 255)
+#define OBD_OCD_VERSION_PATCH(version)	((int)((version) >>  8) & 255)
+#define OBD_OCD_VERSION_FIX(version)	((int)((version) >>  0) & 255)
+
+#define LUSTRE_VERSION_CODE						\
+	OBD_OCD_VERSION(LUSTRE_MAJOR, LUSTRE_MINOR, LUSTRE_PATCH, LUSTRE_FIX)
 
 /*
  * If lustre version of client and servers it connects to differs by more
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 48/80] staging: lustre: llite: a few fixes about readdir of striped dir.
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	Li Xi, James Simmons

From: wang di <di.wang@intel.com>

Normally we know the value of op_mea1 when ll_readdir is called.
In the case of '.' or '..' op_mea1 is unknown so for that case
fetch the real parents FID.

Signed-off-by: wang di <di.wang@intel.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4603
Reviewed-on: http://review.whamcloud.com/9191
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Li Xi <pkuelelixi@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   27 +++++++++++++++++
 .../staging/lustre/lustre/llite/llite_internal.h   |    1 +
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |   31 ++++++++++++++------
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 924b5df..3fed80d 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -622,6 +622,33 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx)
 		goto out;
 	}
 
+	if (unlikely(op_data->op_mea1)) {
+		/*
+		 * This is only needed for striped dir to fill ..,
+		 * see lmv_read_page
+		 */
+		if (file_dentry(filp)->d_parent &&
+		    file_dentry(filp)->d_parent->d_inode) {
+			__u64 ibits = MDS_INODELOCK_UPDATE;
+			struct inode *parent;
+
+			parent = file_dentry(filp)->d_parent->d_inode;
+			if (ll_have_md_lock(parent, &ibits, LCK_MINMODE))
+				op_data->op_fid3 = *ll_inode2fid(parent);
+		}
+
+		/*
+		 * If it can not find in cache, do lookup .. on the master
+		 * object
+		 */
+		if (fid_is_zero(&op_data->op_fid3)) {
+			rc = ll_dir_get_parent_fid(inode, &op_data->op_fid3);
+			if (rc) {
+				ll_finish_md_op_data(op_data);
+				return rc;
+			}
+		}
+	}
 	ctx->pos = pos;
 	rc = ll_dir_read(inode, &pos, op_data, ctx);
 	pos = ctx->pos;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 3d7fa9a..43269aa 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -812,6 +812,7 @@ __u32 get_uuid2int(const char *name, int len);
 void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid);
 struct inode *search_inode_for_lustre(struct super_block *sb,
 				      const struct lu_fid *fid);
+int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid);
 
 /* llite/symlink.c */
 extern const struct inode_operations ll_fast_symlink_inode_operations;
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index ab9d5cc..06a8199 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -302,14 +302,12 @@ static struct dentry *ll_fh_to_parent(struct super_block *sb, struct fid *fid,
 	return ll_iget_for_nfs(sb, &nfs_fid->lnf_parent, NULL);
 }
 
-static struct dentry *ll_get_parent(struct dentry *dchild)
+int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid)
 {
 	struct ptlrpc_request *req = NULL;
-	struct inode	  *dir = d_inode(dchild);
 	struct ll_sb_info     *sbi;
-	struct dentry	 *result = NULL;
 	struct mdt_body       *body;
-	static char	   dotdot[] = "..";
+	static const char dotdot[] = "..";
 	struct md_op_data     *op_data;
 	int		   rc;
 	int		      lmmsize;
@@ -324,13 +322,13 @@ static struct dentry *ll_get_parent(struct dentry *dchild)
 
 	rc = ll_get_default_mdsize(sbi, &lmmsize);
 	if (rc != 0)
-		return ERR_PTR(rc);
+		return rc;
 
 	op_data = ll_prep_md_op_data(NULL, dir, NULL, dotdot,
 				     strlen(dotdot), lmmsize,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
-		return (void *)op_data;
+		return PTR_ERR(op_data);
 
 	rc = md_getattr_name(sbi->ll_md_exp, op_data, &req);
 	ll_finish_md_op_data(op_data);
@@ -338,7 +336,7 @@ static struct dentry *ll_get_parent(struct dentry *dchild)
 		CERROR("%s: failure inode "DFID" get parent: rc = %d\n",
 		       ll_get_fsname(dir->i_sb, NULL, 0),
 		       PFID(ll_inode2fid(dir)), rc);
-		return ERR_PTR(rc);
+		return rc;
 	}
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 	/*
@@ -348,11 +346,26 @@ static struct dentry *ll_get_parent(struct dentry *dchild)
 	if (body->valid & OBD_MD_FLID) {
 		CDEBUG(D_INFO, "parent for " DFID " is " DFID "\n",
 		       PFID(ll_inode2fid(dir)), PFID(&body->fid1));
+		*parent_fid = body->fid1;
 	}
-	result = ll_iget_for_nfs(dir->i_sb, &body->fid1, NULL);
 
 	ptlrpc_req_finished(req);
-	return result;
+	return 0;
+}
+
+static struct dentry *ll_get_parent(struct dentry *dchild)
+{
+	struct lu_fid parent_fid = { 0 };
+	struct dentry *dentry;
+	int rc;
+
+	rc = ll_dir_get_parent_fid(dchild->d_inode, &parent_fid);
+	if (rc)
+		return ERR_PTR(rc);
+
+	dentry = ll_iget_for_nfs(dchild->d_inode->i_sb, &parent_fid, NULL);
+
+	return dentry;
 }
 
 const struct export_operations lustre_export_operations = {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 48/80] staging: lustre: llite: a few fixes about readdir of striped dir.
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	Li Xi, James Simmons

From: wang di <di.wang@intel.com>

Normally we know the value of op_mea1 when ll_readdir is called.
In the case of '.' or '..' op_mea1 is unknown so for that case
fetch the real parents FID.

Signed-off-by: wang di <di.wang@intel.com>
Signed-off-by: Li Xi <lixi@ddn.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4603
Reviewed-on: http://review.whamcloud.com/9191
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Li Xi <pkuelelixi@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |   27 +++++++++++++++++
 .../staging/lustre/lustre/llite/llite_internal.h   |    1 +
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |   31 ++++++++++++++------
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 924b5df..3fed80d 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -622,6 +622,33 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx)
 		goto out;
 	}
 
+	if (unlikely(op_data->op_mea1)) {
+		/*
+		 * This is only needed for striped dir to fill ..,
+		 * see lmv_read_page
+		 */
+		if (file_dentry(filp)->d_parent &&
+		    file_dentry(filp)->d_parent->d_inode) {
+			__u64 ibits = MDS_INODELOCK_UPDATE;
+			struct inode *parent;
+
+			parent = file_dentry(filp)->d_parent->d_inode;
+			if (ll_have_md_lock(parent, &ibits, LCK_MINMODE))
+				op_data->op_fid3 = *ll_inode2fid(parent);
+		}
+
+		/*
+		 * If it can not find in cache, do lookup .. on the master
+		 * object
+		 */
+		if (fid_is_zero(&op_data->op_fid3)) {
+			rc = ll_dir_get_parent_fid(inode, &op_data->op_fid3);
+			if (rc) {
+				ll_finish_md_op_data(op_data);
+				return rc;
+			}
+		}
+	}
 	ctx->pos = pos;
 	rc = ll_dir_read(inode, &pos, op_data, ctx);
 	pos = ctx->pos;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 3d7fa9a..43269aa 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -812,6 +812,7 @@ __u32 get_uuid2int(const char *name, int len);
 void get_uuid2fsid(const char *name, int len, __kernel_fsid_t *fsid);
 struct inode *search_inode_for_lustre(struct super_block *sb,
 				      const struct lu_fid *fid);
+int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid);
 
 /* llite/symlink.c */
 extern const struct inode_operations ll_fast_symlink_inode_operations;
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index ab9d5cc..06a8199 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -302,14 +302,12 @@ static struct dentry *ll_fh_to_parent(struct super_block *sb, struct fid *fid,
 	return ll_iget_for_nfs(sb, &nfs_fid->lnf_parent, NULL);
 }
 
-static struct dentry *ll_get_parent(struct dentry *dchild)
+int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid)
 {
 	struct ptlrpc_request *req = NULL;
-	struct inode	  *dir = d_inode(dchild);
 	struct ll_sb_info     *sbi;
-	struct dentry	 *result = NULL;
 	struct mdt_body       *body;
-	static char	   dotdot[] = "..";
+	static const char dotdot[] = "..";
 	struct md_op_data     *op_data;
 	int		   rc;
 	int		      lmmsize;
@@ -324,13 +322,13 @@ static struct dentry *ll_get_parent(struct dentry *dchild)
 
 	rc = ll_get_default_mdsize(sbi, &lmmsize);
 	if (rc != 0)
-		return ERR_PTR(rc);
+		return rc;
 
 	op_data = ll_prep_md_op_data(NULL, dir, NULL, dotdot,
 				     strlen(dotdot), lmmsize,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
-		return (void *)op_data;
+		return PTR_ERR(op_data);
 
 	rc = md_getattr_name(sbi->ll_md_exp, op_data, &req);
 	ll_finish_md_op_data(op_data);
@@ -338,7 +336,7 @@ static struct dentry *ll_get_parent(struct dentry *dchild)
 		CERROR("%s: failure inode "DFID" get parent: rc = %d\n",
 		       ll_get_fsname(dir->i_sb, NULL, 0),
 		       PFID(ll_inode2fid(dir)), rc);
-		return ERR_PTR(rc);
+		return rc;
 	}
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 	/*
@@ -348,11 +346,26 @@ static struct dentry *ll_get_parent(struct dentry *dchild)
 	if (body->valid & OBD_MD_FLID) {
 		CDEBUG(D_INFO, "parent for " DFID " is " DFID "\n",
 		       PFID(ll_inode2fid(dir)), PFID(&body->fid1));
+		*parent_fid = body->fid1;
 	}
-	result = ll_iget_for_nfs(dir->i_sb, &body->fid1, NULL);
 
 	ptlrpc_req_finished(req);
-	return result;
+	return 0;
+}
+
+static struct dentry *ll_get_parent(struct dentry *dchild)
+{
+	struct lu_fid parent_fid = { 0 };
+	struct dentry *dentry;
+	int rc;
+
+	rc = ll_dir_get_parent_fid(dchild->d_inode, &parent_fid);
+	if (rc)
+		return ERR_PTR(rc);
+
+	dentry = ll_iget_for_nfs(dchild->d_inode->i_sb, &parent_fid, NULL);
+
+	return dentry;
 }
 
 const struct export_operations lustre_export_operations = {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 49/80] staging: lustre: lmv: validate lock with correct stripe FID
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

In ll_lookup_it_finish, we need use the real parent(stripe)
FID to validate the parent UPDATE lock.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4925
Reviewed-on: http://review.whamcloud.com/10026
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h       |    5 +++++
 drivers/staging/lustre/lustre/include/obd_class.h |   13 +++++++++++++
 drivers/staging/lustre/lustre/llite/namei.c       |   15 +++++++++++++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c       |   19 ++++++++++++++++++-
 4 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 52020a9..b7bdd07 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -1103,6 +1103,11 @@ struct md_ops {
 			     ldlm_policy_data_t *, enum ldlm_mode,
 			     enum ldlm_cancel_flags flags, void *opaque);
 
+	int (*get_fid_from_lsm)(struct obd_export *,
+				const struct lmv_stripe_md *,
+				const char *name, int namelen,
+				struct lu_fid *fid);
+
 	int (*intent_getattr_async)(struct obd_export *,
 				    struct md_enqueue_info *,
 				    struct ldlm_enqueue_info *);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index e86961c..69b628b 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1699,6 +1699,19 @@ static inline int md_revalidate_lock(struct obd_export *exp,
 	return rc;
 }
 
+static inline int md_get_fid_from_lsm(struct obd_export *exp,
+				      const struct lmv_stripe_md *lsm,
+				      const char *name, int namelen,
+				      struct lu_fid *fid)
+{
+	int rc;
+
+	EXP_CHECK_MD_OP(exp, get_fid_from_lsm);
+	EXP_MD_COUNTER_INCREMENT(exp, get_fid_from_lsm);
+	rc = MDP(exp->exp_obd, get_fid_from_lsm)(exp, lsm, name, namelen, fid);
+	return rc;
+}
+
 /* OBD Metadata Support */
 
 int obd_init_caches(void);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 6e11b99..581b083 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -487,9 +487,20 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 		struct lookup_intent parent_it = {
 					.it_op = IT_GETATTR,
 					.it_lock_handle = 0 };
+		struct lu_fid fid = ll_i2info(parent)->lli_fid;
+
+		/* If it is striped directory, get the real stripe parent */
+		if (unlikely(ll_i2info(parent)->lli_lsm_md)) {
+			rc = md_get_fid_from_lsm(ll_i2mdexp(parent),
+						 ll_i2info(parent)->lli_lsm_md,
+						 (*de)->d_name.name,
+						 (*de)->d_name.len, &fid);
+			if (rc)
+				return rc;
+		}
 
-		if (md_revalidate_lock(ll_i2mdexp(parent), &parent_it,
-				       &ll_i2info(parent)->lli_fid, NULL)) {
+		if (md_revalidate_lock(ll_i2mdexp(parent), &parent_it, &fid,
+				       NULL)) {
 			d_lustre_revalidate(*de);
 			ll_intent_release(&parent_it);
 		}
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 03594f0..9821f69 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2991,6 +2991,22 @@ static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 	return rc;
 }
 
+int lmv_get_fid_from_lsm(struct obd_export *exp,
+			 const struct lmv_stripe_md *lsm,
+			 const char *name, int namelen, struct lu_fid *fid)
+{
+	const struct lmv_oinfo *oinfo;
+
+	LASSERT(lsm);
+	oinfo = lsm_name_to_stripe_info(lsm, name, namelen);
+	if (IS_ERR(oinfo))
+		return PTR_ERR(oinfo);
+
+	*fid = oinfo->lmo_fid;
+
+	return 0;
+}
+
 /**
  * For lmv, only need to send request to master MDT, and the master MDT will
  * process with other slave MDTs. The only exception is Q_GETOQUOTA for which
@@ -3155,7 +3171,8 @@ static struct md_ops lmv_md_ops = {
 	.set_open_replay_data	= lmv_set_open_replay_data,
 	.clear_open_replay_data	= lmv_clear_open_replay_data,
 	.intent_getattr_async	= lmv_intent_getattr_async,
-	.revalidate_lock	= lmv_revalidate_lock
+	.revalidate_lock	= lmv_revalidate_lock,
+	.get_fid_from_lsm	= lmv_get_fid_from_lsm,
 };
 
 static int __init lmv_init(void)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 49/80] staging: lustre: lmv: validate lock with correct stripe FID
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

In ll_lookup_it_finish, we need use the real parent(stripe)
FID to validate the parent UPDATE lock.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4925
Reviewed-on: http://review.whamcloud.com/10026
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h       |    5 +++++
 drivers/staging/lustre/lustre/include/obd_class.h |   13 +++++++++++++
 drivers/staging/lustre/lustre/llite/namei.c       |   15 +++++++++++++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c       |   19 ++++++++++++++++++-
 4 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 52020a9..b7bdd07 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -1103,6 +1103,11 @@ struct md_ops {
 			     ldlm_policy_data_t *, enum ldlm_mode,
 			     enum ldlm_cancel_flags flags, void *opaque);
 
+	int (*get_fid_from_lsm)(struct obd_export *,
+				const struct lmv_stripe_md *,
+				const char *name, int namelen,
+				struct lu_fid *fid);
+
 	int (*intent_getattr_async)(struct obd_export *,
 				    struct md_enqueue_info *,
 				    struct ldlm_enqueue_info *);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index e86961c..69b628b 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1699,6 +1699,19 @@ static inline int md_revalidate_lock(struct obd_export *exp,
 	return rc;
 }
 
+static inline int md_get_fid_from_lsm(struct obd_export *exp,
+				      const struct lmv_stripe_md *lsm,
+				      const char *name, int namelen,
+				      struct lu_fid *fid)
+{
+	int rc;
+
+	EXP_CHECK_MD_OP(exp, get_fid_from_lsm);
+	EXP_MD_COUNTER_INCREMENT(exp, get_fid_from_lsm);
+	rc = MDP(exp->exp_obd, get_fid_from_lsm)(exp, lsm, name, namelen, fid);
+	return rc;
+}
+
 /* OBD Metadata Support */
 
 int obd_init_caches(void);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 6e11b99..581b083 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -487,9 +487,20 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 		struct lookup_intent parent_it = {
 					.it_op = IT_GETATTR,
 					.it_lock_handle = 0 };
+		struct lu_fid fid = ll_i2info(parent)->lli_fid;
+
+		/* If it is striped directory, get the real stripe parent */
+		if (unlikely(ll_i2info(parent)->lli_lsm_md)) {
+			rc = md_get_fid_from_lsm(ll_i2mdexp(parent),
+						 ll_i2info(parent)->lli_lsm_md,
+						 (*de)->d_name.name,
+						 (*de)->d_name.len, &fid);
+			if (rc)
+				return rc;
+		}
 
-		if (md_revalidate_lock(ll_i2mdexp(parent), &parent_it,
-				       &ll_i2info(parent)->lli_fid, NULL)) {
+		if (md_revalidate_lock(ll_i2mdexp(parent), &parent_it, &fid,
+				       NULL)) {
 			d_lustre_revalidate(*de);
 			ll_intent_release(&parent_it);
 		}
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 03594f0..9821f69 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2991,6 +2991,22 @@ static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 	return rc;
 }
 
+int lmv_get_fid_from_lsm(struct obd_export *exp,
+			 const struct lmv_stripe_md *lsm,
+			 const char *name, int namelen, struct lu_fid *fid)
+{
+	const struct lmv_oinfo *oinfo;
+
+	LASSERT(lsm);
+	oinfo = lsm_name_to_stripe_info(lsm, name, namelen);
+	if (IS_ERR(oinfo))
+		return PTR_ERR(oinfo);
+
+	*fid = oinfo->lmo_fid;
+
+	return 0;
+}
+
 /**
  * For lmv, only need to send request to master MDT, and the master MDT will
  * process with other slave MDTs. The only exception is Q_GETOQUOTA for which
@@ -3155,7 +3171,8 @@ static struct md_ops lmv_md_ops = {
 	.set_open_replay_data	= lmv_set_open_replay_data,
 	.clear_open_replay_data	= lmv_clear_open_replay_data,
 	.intent_getattr_async	= lmv_intent_getattr_async,
-	.revalidate_lock	= lmv_revalidate_lock
+	.revalidate_lock	= lmv_revalidate_lock,
+	.get_fid_from_lsm	= lmv_get_fid_from_lsm,
 };
 
 static int __init lmv_init(void)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 50/80] staging: lustre: lov: new pattern flag for partially repaired file
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

When the layout LFSCK repairs orphan OST-object, if the parent
MDT-object was lost, then it will re-create the MDT-object and
regenerate the LOV EA and fill the target LOV EA slot with the
orphan information, and fill other slots with zero (LOV hole);
if related LOV EA slot is invalid or hole, then it will refill
the target LOV EA slot; if the target slot exceeds current LOV
EA tail, then extend the LOV EA, and fill the gaps as zero.

Some of the LOV EA holes may cannot be re-filled finally because
of lost some OST-objects. And even if they can be re-filled, but
there are still some possible race accessings from client before
the re-filling. If the client access the LOV EA with hole(s), it
may cause some strange behaviour, such as trigger LBUG()/LASSERT()
on the client.

So we will make the client to be aware of the LOV EA is incomplete.
We introduce a new LOV EA pattern flag LOV_PATTERN_F_HOLE for that:
any time when the LFSCK repairs the LOV EA with hole(s), the LOV EA
will be marked as LOV_PATTERN_F_HOLE; when all the holes in the LOV
EA are refilled, the LOV_PATTERN_F_HOLE will be dropped.

For a new client, it recongizes the pattern flag LOV_PATTERN_F_HOLE,
then it can permit/forbid some opertions on the file with LOV holes:

 1) Normal read/write the file with LOV EA hole is permitted, but the
    application will get EIO error when read data from the dummy slot
    or write data to the dummy slot.
 2) The users can dump the recovered data via some common read tools,
    such as "dd conv=sync,noerror".

 3) Append data to the file which has LOV EA hole will get EIO failure.

 4) Other operations will skip the LOV EA hole(s), and will not get
    failures, such as {s,g}etattr, {s,g}getxattr, stat, chown/chgrp,
    chmod, touch, unlink, and so on.

For an old client, since it will not recognize the new pattern flag
LOV_PATTERN_F_HOLE. So the LOV EA with hole will be dicarded with
failure, but it will not cause the client to be crashed.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4675
Reviewed-on: http://review.whamcloud.com/10042
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    1 +
 .../lustre/lustre/include/lustre/lustre_user.h     |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    2 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |   16 +++++++++++++---
 .../lustre/lustre/obdclass/lprocfs_status.c        |    2 ++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    2 ++
 6 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 69bed64..87eef4c 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1289,6 +1289,7 @@ void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 #define OBD_CONNECT_OPEN_BY_FID	0x20000000000000ULL	/* open by fid won't pack
 							 * name in request
 							 */
+#define OBD_CONNECT_LFSCK	0x40000000000000ULL/* support online LFSCK */
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 8398c4f..9e38ed3 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -284,9 +284,9 @@ struct ost_id {
 #define LOV_PATTERN_CMOBD	0x200
 
 #define LOV_PATTERN_F_MASK	0xffff0000
+#define LOV_PATTERN_F_HOLE	0x40000000 /* there is hole in LOV EA */
 #define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
 
-
 #define LOV_MAXPOOLNAME 16
 #define LOV_POOLNAMEF "%.16s"
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index ac59cd6..dd44ee8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -189,7 +189,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 				  OBD_CONNECT_PINGLESS |
 				  OBD_CONNECT_MAX_EASIZE |
 				  OBD_CONNECT_FLOCK_DEAD |
-				  OBD_CONNECT_DISP_STRIPE;
+				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK;
 
 	if (sbi->ll_flags & LL_SBI_SOM_PREVIEW)
 		data->ocd_connect_flags |= OBD_CONNECT_SOM;
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 84032a5..95126c3 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -298,8 +298,8 @@ static int lov_io_subio_init(const struct lu_env *env, struct lov_io *lio,
 	return result;
 }
 
-static void lov_io_slice_init(struct lov_io *lio,
-			      struct lov_object *obj, struct cl_io *io)
+static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
+			     struct cl_io *io)
 {
 	io->ci_result = 0;
 	lio->lis_object = obj;
@@ -314,6 +314,15 @@ static void lov_io_slice_init(struct lov_io *lio,
 		lio->lis_io_endpos = lio->lis_endpos;
 		if (cl_io_is_append(io)) {
 			LASSERT(io->ci_type == CIT_WRITE);
+
+			/*
+			 * If there is LOV EA hole, then we may cannot locate
+			 * the current file-tail exactly.
+			 */
+			if (unlikely(obj->lo_lsm->lsm_pattern &
+				     LOV_PATTERN_F_HOLE))
+				return -EIO;
+
 			lio->lis_pos = 0;
 			lio->lis_endpos = OBD_OBJECT_EOF;
 		}
@@ -349,6 +358,7 @@ static void lov_io_slice_init(struct lov_io *lio,
 	default:
 		LBUG();
 	}
+	return 0;
 }
 
 static void lov_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
@@ -870,7 +880,7 @@ int lov_io_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	struct lov_object   *lov = cl2lov(obj);
 
 	INIT_LIST_HEAD(&lio->lis_active);
-	lov_io_slice_init(lio, lov, io);
+	io->ci_result = lov_io_slice_init(lio, lov, io);
 	if (io->ci_result == 0) {
 		io->ci_result = lov_io_subio_init(env, lio, io);
 		if (io->ci_result == 0) {
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index c83d28e..f42ed17 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -97,6 +97,8 @@ static const char * const obd_connect_names[] = {
 	"flock_deadlock",
 	"disp_stripe",
 	"unknown",
+	"lfsck",
+	"unknown",
 	NULL
 };
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index bc27f8d..9d5d2c8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1071,6 +1071,8 @@ void lustre_assert_wire_constants(void)
 		 "found 0x%.16llxULL\n", OBD_CONNECT_FLOCK_DEAD);
 	LASSERTF(OBD_CONNECT_OPEN_BY_FID == 0x20000000000000ULL,
 		 "found 0x%.16llxULL\n", OBD_CONNECT_OPEN_BY_FID);
+	LASSERTF(OBD_CONNECT_LFSCK == 0x40000000000000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT_LFSCK);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 50/80] staging: lustre: lov: new pattern flag for partially repaired file
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

When the layout LFSCK repairs orphan OST-object, if the parent
MDT-object was lost, then it will re-create the MDT-object and
regenerate the LOV EA and fill the target LOV EA slot with the
orphan information, and fill other slots with zero (LOV hole);
if related LOV EA slot is invalid or hole, then it will refill
the target LOV EA slot; if the target slot exceeds current LOV
EA tail, then extend the LOV EA, and fill the gaps as zero.

Some of the LOV EA holes may cannot be re-filled finally because
of lost some OST-objects. And even if they can be re-filled, but
there are still some possible race accessings from client before
the re-filling. If the client access the LOV EA with hole(s), it
may cause some strange behaviour, such as trigger LBUG()/LASSERT()
on the client.

So we will make the client to be aware of the LOV EA is incomplete.
We introduce a new LOV EA pattern flag LOV_PATTERN_F_HOLE for that:
any time when the LFSCK repairs the LOV EA with hole(s), the LOV EA
will be marked as LOV_PATTERN_F_HOLE; when all the holes in the LOV
EA are refilled, the LOV_PATTERN_F_HOLE will be dropped.

For a new client, it recongizes the pattern flag LOV_PATTERN_F_HOLE,
then it can permit/forbid some opertions on the file with LOV holes:

 1) Normal read/write the file with LOV EA hole is permitted, but the
    application will get EIO error when read data from the dummy slot
    or write data to the dummy slot.
 2) The users can dump the recovered data via some common read tools,
    such as "dd conv=sync,noerror".

 3) Append data to the file which has LOV EA hole will get EIO failure.

 4) Other operations will skip the LOV EA hole(s), and will not get
    failures, such as {s,g}etattr, {s,g}getxattr, stat, chown/chgrp,
    chmod, touch, unlink, and so on.

For an old client, since it will not recognize the new pattern flag
LOV_PATTERN_F_HOLE. So the LOV EA with hole will be dicarded with
failure, but it will not cause the client to be crashed.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4675
Reviewed-on: http://review.whamcloud.com/10042
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    1 +
 .../lustre/lustre/include/lustre/lustre_user.h     |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    2 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |   16 +++++++++++++---
 .../lustre/lustre/obdclass/lprocfs_status.c        |    2 ++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    2 ++
 6 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 69bed64..87eef4c 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1289,6 +1289,7 @@ void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 #define OBD_CONNECT_OPEN_BY_FID	0x20000000000000ULL	/* open by fid won't pack
 							 * name in request
 							 */
+#define OBD_CONNECT_LFSCK	0x40000000000000ULL/* support online LFSCK */
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 8398c4f..9e38ed3 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -284,9 +284,9 @@ struct ost_id {
 #define LOV_PATTERN_CMOBD	0x200
 
 #define LOV_PATTERN_F_MASK	0xffff0000
+#define LOV_PATTERN_F_HOLE	0x40000000 /* there is hole in LOV EA */
 #define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
 
-
 #define LOV_MAXPOOLNAME 16
 #define LOV_POOLNAMEF "%.16s"
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index ac59cd6..dd44ee8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -189,7 +189,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 				  OBD_CONNECT_PINGLESS |
 				  OBD_CONNECT_MAX_EASIZE |
 				  OBD_CONNECT_FLOCK_DEAD |
-				  OBD_CONNECT_DISP_STRIPE;
+				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK;
 
 	if (sbi->ll_flags & LL_SBI_SOM_PREVIEW)
 		data->ocd_connect_flags |= OBD_CONNECT_SOM;
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 84032a5..95126c3 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -298,8 +298,8 @@ static int lov_io_subio_init(const struct lu_env *env, struct lov_io *lio,
 	return result;
 }
 
-static void lov_io_slice_init(struct lov_io *lio,
-			      struct lov_object *obj, struct cl_io *io)
+static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
+			     struct cl_io *io)
 {
 	io->ci_result = 0;
 	lio->lis_object = obj;
@@ -314,6 +314,15 @@ static void lov_io_slice_init(struct lov_io *lio,
 		lio->lis_io_endpos = lio->lis_endpos;
 		if (cl_io_is_append(io)) {
 			LASSERT(io->ci_type == CIT_WRITE);
+
+			/*
+			 * If there is LOV EA hole, then we may cannot locate
+			 * the current file-tail exactly.
+			 */
+			if (unlikely(obj->lo_lsm->lsm_pattern &
+				     LOV_PATTERN_F_HOLE))
+				return -EIO;
+
 			lio->lis_pos = 0;
 			lio->lis_endpos = OBD_OBJECT_EOF;
 		}
@@ -349,6 +358,7 @@ static void lov_io_slice_init(struct lov_io *lio,
 	default:
 		LBUG();
 	}
+	return 0;
 }
 
 static void lov_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
@@ -870,7 +880,7 @@ int lov_io_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	struct lov_object   *lov = cl2lov(obj);
 
 	INIT_LIST_HEAD(&lio->lis_active);
-	lov_io_slice_init(lio, lov, io);
+	io->ci_result = lov_io_slice_init(lio, lov, io);
 	if (io->ci_result == 0) {
 		io->ci_result = lov_io_subio_init(env, lio, io);
 		if (io->ci_result == 0) {
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index c83d28e..f42ed17 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -97,6 +97,8 @@ static const char * const obd_connect_names[] = {
 	"flock_deadlock",
 	"disp_stripe",
 	"unknown",
+	"lfsck",
+	"unknown",
 	NULL
 };
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index bc27f8d..9d5d2c8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1071,6 +1071,8 @@ void lustre_assert_wire_constants(void)
 		 "found 0x%.16llxULL\n", OBD_CONNECT_FLOCK_DEAD);
 	LASSERTF(OBD_CONNECT_OPEN_BY_FID == 0x20000000000000ULL,
 		 "found 0x%.16llxULL\n", OBD_CONNECT_OPEN_BY_FID);
+	LASSERTF(OBD_CONNECT_LFSCK == 0x40000000000000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT_LFSCK);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 51/80] staging: lustre: lmv: Match MDT where the FID locates first
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

With DNE every object can have two locks in different namespaces:
lookup lock in space of MDT storing direntry and update/open lock
in space of MDT storing inode. In lmv_find_cbdata/lmv_lock_lock,
it should try the MDT that the FID maps to first, since this can
be easily found, and only try others if that fails.

In the error handler of lmv_add_targets, it should check whether
ld_tgt_count is being increased before ld_tgt_count is being -1.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4098
Reviewed-on: http://review.whamcloud.com/8019
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |   45 +++++++++++++-----
 drivers/staging/lustre/lustre/lmv/lmv_obd.c      |   57 +++++++++++++++-------
 2 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index dbd1da6..faf6a7b 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -64,35 +64,56 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 			  int extra_lock_flags);
 
 static inline struct lmv_tgt_desc *
-lmv_get_target(struct lmv_obd *lmv, u32 mds)
+lmv_get_target(struct lmv_obd *lmv, u32 mdt_idx, int *index)
 {
-	int count = lmv->desc.ld_tgt_count;
 	int i;
 
-	for (i = 0; i < count; i++) {
+	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
 		if (!lmv->tgts[i])
 			continue;
 
-		if (lmv->tgts[i]->ltd_idx == mds)
+		if (lmv->tgts[i]->ltd_idx == mdt_idx) {
+			if (index)
+				*index = i;
 			return lmv->tgts[i];
+		}
 	}
 
 	return ERR_PTR(-ENODEV);
 }
 
-static inline struct lmv_tgt_desc *
-lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid)
+static inline int
+lmv_find_target_index(struct lmv_obd *lmv, const struct lu_fid *fid)
 {
-	u32 mds = 0;
-	int rc;
+	struct lmv_tgt_desc *ltd;
+	u32 mdt_idx = 0;
+	int index = 0;
 
 	if (lmv->desc.ld_tgt_count > 1) {
-		rc = lmv_fld_lookup(lmv, fid, &mds);
-		if (rc)
-			return ERR_PTR(rc);
+		int rc;
+
+		rc = lmv_fld_lookup(lmv, fid, &mdt_idx);
+		if (rc < 0)
+			return rc;
 	}
 
-	return lmv_get_target(lmv, mds);
+	ltd = lmv_get_target(lmv, mdt_idx, &index);
+	if (IS_ERR(ltd))
+		return PTR_ERR(ltd);
+
+	return index;
+}
+
+static inline struct lmv_tgt_desc *
+lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid)
+{
+	int index;
+
+	index = lmv_find_target_index(lmv, fid);
+	if (index < 0)
+		return ERR_PTR(index);
+
+	return lmv->tgts[index];
 }
 
 static inline int lmv_stripe_md_size(int stripe_count)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 9821f69..6917a03 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -480,6 +480,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 {
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt;
+	int orig_tgt_count = 0;
 	int		  rc = 0;
 
 	CDEBUG(D_CONFIG, "Target uuid: %s. index %d\n", uuidp->uuid, index);
@@ -549,14 +550,17 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 	tgt->ltd_uuid = *uuidp;
 	tgt->ltd_active = 0;
 	lmv->tgts[index] = tgt;
-	if (index >= lmv->desc.ld_tgt_count)
+	if (index >= lmv->desc.ld_tgt_count) {
+		orig_tgt_count = lmv->desc.ld_tgt_count;
 		lmv->desc.ld_tgt_count = index + 1;
+	}
 
 	if (lmv->connected) {
 		rc = lmv_connect_mdc(obd, tgt);
 		if (rc) {
 			spin_lock(&lmv->lmv_lock);
-			lmv->desc.ld_tgt_count--;
+			if (lmv->desc.ld_tgt_count == index + 1)
+				lmv->desc.ld_tgt_count = orig_tgt_count;
 			memset(tgt, 0, sizeof(*tgt));
 			spin_unlock(&lmv->lmv_lock);
 		} else {
@@ -1263,7 +1267,7 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds)
 	struct lmv_tgt_desc	*tgt;
 	int			 rc;
 
-	tgt = lmv_get_target(lmv, mds);
+	tgt = lmv_get_target(lmv, mds, NULL);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
@@ -1610,6 +1614,7 @@ static int lmv_find_cbdata(struct obd_export *exp, const struct lu_fid *fid,
 {
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
+	int tgt;
 	int		  i;
 	int		  rc;
 
@@ -1622,12 +1627,22 @@ static int lmv_find_cbdata(struct obd_export *exp, const struct lu_fid *fid,
 	/*
 	 * With DNE every object can have two locks in different namespaces:
 	 * lookup lock in space of MDT storing direntry and update/open lock in
-	 * space of MDT storing inode.
+	 * space of MDT storing inode. Try the MDT that the FID maps to first,
+	 * since this can be easily found, and only try others if that fails.
 	 */
-	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
-		if (!lmv->tgts[i] || !lmv->tgts[i]->ltd_exp)
+	for (i = 0, tgt = lmv_find_target_index(lmv, fid);
+	     i < lmv->desc.ld_tgt_count;
+	     i++, tgt = (tgt + 1) % lmv->desc.ld_tgt_count) {
+		if (tgt < 0) {
+			CDEBUG(D_HA, "%s: "DFID" is inaccessible: rc = %d\n",
+			       obd->obd_name, PFID(fid), tgt);
+			tgt = 0;
+		}
+
+		if (!lmv->tgts[tgt] || !lmv->tgts[tgt]->ltd_exp)
 			continue;
-		rc = md_find_cbdata(lmv->tgts[i]->ltd_exp, fid, it, data);
+
+		rc = md_find_cbdata(lmv->tgts[tgt]->ltd_exp, fid, it, data);
 		if (rc)
 			return rc;
 	}
@@ -1676,7 +1691,7 @@ lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 
 	*fid = oinfo->lmo_fid;
 	*mds = oinfo->lmo_mds;
-	tgt = lmv_get_target(lmv, *mds);
+	tgt = lmv_get_target(lmv, *mds, NULL);
 
 	CDEBUG(D_INFO, "locate on mds %u "DFID"\n", *mds, PFID(fid));
 	return tgt;
@@ -2866,24 +2881,32 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, __u64 flags,
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	enum ldlm_mode	      rc;
+	int tgt;
 	int		      i;
 
 	CDEBUG(D_INODE, "Lock match for "DFID"\n", PFID(fid));
 
 	/*
-	 * With CMD every object can have two locks in different namespaces:
-	 * lookup lock in space of mds storing direntry and update/open lock in
-	 * space of mds storing inode. Thus we check all targets, not only that
-	 * one fid was created in.
+	 * With DNE every object can have two locks in different namespaces:
+	 * lookup lock in space of MDT storing direntry and update/open lock in
+	 * space of MDT storing inode.  Try the MDT that the FID maps to first,
+	 * since this can be easily found, and only try others if that fails.
 	 */
-	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
-		struct lmv_tgt_desc *tgt = lmv->tgts[i];
+	for (i = 0, tgt = lmv_find_target_index(lmv, fid);
+	     i < lmv->desc.ld_tgt_count;
+	     i++, tgt = (tgt + 1) % lmv->desc.ld_tgt_count) {
+		if (tgt < 0) {
+			CDEBUG(D_HA, "%s: "DFID" is inaccessible: rc = %d\n",
+			       obd->obd_name, PFID(fid), tgt);
+			tgt = 0;
+		}
 
-		if (!tgt || !tgt->ltd_exp || !tgt->ltd_active)
+		if (!lmv->tgts[tgt] || !lmv->tgts[tgt]->ltd_exp ||
+		    !lmv->tgts[tgt]->ltd_active)
 			continue;
 
-		rc = md_lock_match(tgt->ltd_exp, flags, fid, type, policy, mode,
-				   lockh);
+		rc = md_lock_match(lmv->tgts[tgt]->ltd_exp, flags, fid,
+				   type, policy, mode, lockh);
 		if (rc)
 			return rc;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 51/80] staging: lustre: lmv: Match MDT where the FID locates first
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

With DNE every object can have two locks in different namespaces:
lookup lock in space of MDT storing direntry and update/open lock
in space of MDT storing inode. In lmv_find_cbdata/lmv_lock_lock,
it should try the MDT that the FID maps to first, since this can
be easily found, and only try others if that fails.

In the error handler of lmv_add_targets, it should check whether
ld_tgt_count is being increased before ld_tgt_count is being -1.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4098
Reviewed-on: http://review.whamcloud.com/8019
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |   45 +++++++++++++-----
 drivers/staging/lustre/lustre/lmv/lmv_obd.c      |   57 +++++++++++++++-------
 2 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index dbd1da6..faf6a7b 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -64,35 +64,56 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 			  int extra_lock_flags);
 
 static inline struct lmv_tgt_desc *
-lmv_get_target(struct lmv_obd *lmv, u32 mds)
+lmv_get_target(struct lmv_obd *lmv, u32 mdt_idx, int *index)
 {
-	int count = lmv->desc.ld_tgt_count;
 	int i;
 
-	for (i = 0; i < count; i++) {
+	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
 		if (!lmv->tgts[i])
 			continue;
 
-		if (lmv->tgts[i]->ltd_idx == mds)
+		if (lmv->tgts[i]->ltd_idx == mdt_idx) {
+			if (index)
+				*index = i;
 			return lmv->tgts[i];
+		}
 	}
 
 	return ERR_PTR(-ENODEV);
 }
 
-static inline struct lmv_tgt_desc *
-lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid)
+static inline int
+lmv_find_target_index(struct lmv_obd *lmv, const struct lu_fid *fid)
 {
-	u32 mds = 0;
-	int rc;
+	struct lmv_tgt_desc *ltd;
+	u32 mdt_idx = 0;
+	int index = 0;
 
 	if (lmv->desc.ld_tgt_count > 1) {
-		rc = lmv_fld_lookup(lmv, fid, &mds);
-		if (rc)
-			return ERR_PTR(rc);
+		int rc;
+
+		rc = lmv_fld_lookup(lmv, fid, &mdt_idx);
+		if (rc < 0)
+			return rc;
 	}
 
-	return lmv_get_target(lmv, mds);
+	ltd = lmv_get_target(lmv, mdt_idx, &index);
+	if (IS_ERR(ltd))
+		return PTR_ERR(ltd);
+
+	return index;
+}
+
+static inline struct lmv_tgt_desc *
+lmv_find_target(struct lmv_obd *lmv, const struct lu_fid *fid)
+{
+	int index;
+
+	index = lmv_find_target_index(lmv, fid);
+	if (index < 0)
+		return ERR_PTR(index);
+
+	return lmv->tgts[index];
 }
 
 static inline int lmv_stripe_md_size(int stripe_count)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 9821f69..6917a03 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -480,6 +480,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 {
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt;
+	int orig_tgt_count = 0;
 	int		  rc = 0;
 
 	CDEBUG(D_CONFIG, "Target uuid: %s. index %d\n", uuidp->uuid, index);
@@ -549,14 +550,17 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 	tgt->ltd_uuid = *uuidp;
 	tgt->ltd_active = 0;
 	lmv->tgts[index] = tgt;
-	if (index >= lmv->desc.ld_tgt_count)
+	if (index >= lmv->desc.ld_tgt_count) {
+		orig_tgt_count = lmv->desc.ld_tgt_count;
 		lmv->desc.ld_tgt_count = index + 1;
+	}
 
 	if (lmv->connected) {
 		rc = lmv_connect_mdc(obd, tgt);
 		if (rc) {
 			spin_lock(&lmv->lmv_lock);
-			lmv->desc.ld_tgt_count--;
+			if (lmv->desc.ld_tgt_count == index + 1)
+				lmv->desc.ld_tgt_count = orig_tgt_count;
 			memset(tgt, 0, sizeof(*tgt));
 			spin_unlock(&lmv->lmv_lock);
 		} else {
@@ -1263,7 +1267,7 @@ int __lmv_fid_alloc(struct lmv_obd *lmv, struct lu_fid *fid, u32 mds)
 	struct lmv_tgt_desc	*tgt;
 	int			 rc;
 
-	tgt = lmv_get_target(lmv, mds);
+	tgt = lmv_get_target(lmv, mds, NULL);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
@@ -1610,6 +1614,7 @@ static int lmv_find_cbdata(struct obd_export *exp, const struct lu_fid *fid,
 {
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
+	int tgt;
 	int		  i;
 	int		  rc;
 
@@ -1622,12 +1627,22 @@ static int lmv_find_cbdata(struct obd_export *exp, const struct lu_fid *fid,
 	/*
 	 * With DNE every object can have two locks in different namespaces:
 	 * lookup lock in space of MDT storing direntry and update/open lock in
-	 * space of MDT storing inode.
+	 * space of MDT storing inode. Try the MDT that the FID maps to first,
+	 * since this can be easily found, and only try others if that fails.
 	 */
-	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
-		if (!lmv->tgts[i] || !lmv->tgts[i]->ltd_exp)
+	for (i = 0, tgt = lmv_find_target_index(lmv, fid);
+	     i < lmv->desc.ld_tgt_count;
+	     i++, tgt = (tgt + 1) % lmv->desc.ld_tgt_count) {
+		if (tgt < 0) {
+			CDEBUG(D_HA, "%s: "DFID" is inaccessible: rc = %d\n",
+			       obd->obd_name, PFID(fid), tgt);
+			tgt = 0;
+		}
+
+		if (!lmv->tgts[tgt] || !lmv->tgts[tgt]->ltd_exp)
 			continue;
-		rc = md_find_cbdata(lmv->tgts[i]->ltd_exp, fid, it, data);
+
+		rc = md_find_cbdata(lmv->tgts[tgt]->ltd_exp, fid, it, data);
 		if (rc)
 			return rc;
 	}
@@ -1676,7 +1691,7 @@ lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 
 	*fid = oinfo->lmo_fid;
 	*mds = oinfo->lmo_mds;
-	tgt = lmv_get_target(lmv, *mds);
+	tgt = lmv_get_target(lmv, *mds, NULL);
 
 	CDEBUG(D_INFO, "locate on mds %u "DFID"\n", *mds, PFID(fid));
 	return tgt;
@@ -2866,24 +2881,32 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, __u64 flags,
 	struct obd_device       *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	enum ldlm_mode	      rc;
+	int tgt;
 	int		      i;
 
 	CDEBUG(D_INODE, "Lock match for "DFID"\n", PFID(fid));
 
 	/*
-	 * With CMD every object can have two locks in different namespaces:
-	 * lookup lock in space of mds storing direntry and update/open lock in
-	 * space of mds storing inode. Thus we check all targets, not only that
-	 * one fid was created in.
+	 * With DNE every object can have two locks in different namespaces:
+	 * lookup lock in space of MDT storing direntry and update/open lock in
+	 * space of MDT storing inode.  Try the MDT that the FID maps to first,
+	 * since this can be easily found, and only try others if that fails.
 	 */
-	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
-		struct lmv_tgt_desc *tgt = lmv->tgts[i];
+	for (i = 0, tgt = lmv_find_target_index(lmv, fid);
+	     i < lmv->desc.ld_tgt_count;
+	     i++, tgt = (tgt + 1) % lmv->desc.ld_tgt_count) {
+		if (tgt < 0) {
+			CDEBUG(D_HA, "%s: "DFID" is inaccessible: rc = %d\n",
+			       obd->obd_name, PFID(fid), tgt);
+			tgt = 0;
+		}
 
-		if (!tgt || !tgt->ltd_exp || !tgt->ltd_active)
+		if (!lmv->tgts[tgt] || !lmv->tgts[tgt]->ltd_exp ||
+		    !lmv->tgts[tgt]->ltd_active)
 			continue;
 
-		rc = md_lock_match(tgt->ltd_exp, flags, fid, type, policy, mode,
-				   lockh);
+		rc = md_lock_match(lmv->tgts[tgt]->ltd_exp, flags, fid,
+				   type, policy, mode, lockh);
 		if (rc)
 			return rc;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 52/80] staging: lustre: llite: use the correct mode for striped directory
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Create striped directory with correct mode, which should be
handling same as mkdir.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4929
Reviewed-on: http://review.whamcloud.com/10028
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c |   40 +++++++++++++++++++---------
 1 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 3fed80d..a1b5143 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -694,28 +694,40 @@ static int ll_send_mgc_param(struct obd_export *mgc, char *string)
 	return rc;
 }
 
-static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
-			       const char *filename)
+/**
+ * Create striped directory with specified stripe(@lump)
+ *
+ * param[in] parent	the parent of the directory.
+ * param[in] lump	the specified stripes.
+ * param[in] dirname	the name of the directory.
+ * param[in] mode	the specified mode of the directory.
+ *
+ * retval		=0 if striped directory is being created successfully.
+ *			<0 if the creation is failed.
+ */
+static int ll_dir_setdirstripe(struct inode *parent, struct lmv_user_md *lump,
+			       const char *dirname, umode_t mode)
 {
 	struct ptlrpc_request *request = NULL;
 	struct md_op_data *op_data;
-	struct ll_sb_info *sbi = ll_i2sbi(dir);
-	int mode;
+	struct ll_sb_info *sbi = ll_i2sbi(parent);
 	int err;
 
 	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
 		return -EINVAL;
 
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p) name %s stripe_offset %d, stripe_count: %u\n",
-	       PFID(ll_inode2fid(dir)), dir, filename,
+	       PFID(ll_inode2fid(parent)), parent, dirname,
 	       (int)lump->lum_stripe_offset, lump->lum_stripe_count);
 
 	if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC))
 		lustre_swab_lmv_user_md(lump);
 
-	mode = (~current_umask() & 0755) | S_IFDIR;
-	op_data = ll_prep_md_op_data(NULL, dir, NULL, filename,
-				     strlen(filename), mode, LUSTRE_OPC_MKDIR,
+	if (!IS_POSIXACL(parent) || !exp_connect_umask(ll_i2mdexp(parent)))
+		mode &= ~current_umask();
+	mode = (mode & (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
+	op_data = ll_prep_md_op_data(NULL, parent, NULL, dirname,
+				     strlen(dirname), mode, LUSTRE_OPC_MKDIR,
 				     lump);
 	if (IS_ERR(op_data)) {
 		err = PTR_ERR(op_data);
@@ -1379,6 +1391,7 @@ out_free:
 		char		*filename;
 		int		 namelen = 0;
 		int		 lumlen = 0;
+		umode_t mode;
 		int		 len;
 		int		 rc;
 
@@ -1412,11 +1425,12 @@ out_free:
 			goto lmv_out_free;
 		}
 
-		/**
-		 * ll_dir_setdirstripe will be used to set dir stripe
-		 *  mdc_create--->mdt_reint_create (with dirstripe)
-		 */
-		rc = ll_dir_setdirstripe(inode, lum, filename);
+#if OBD_OCD_VERSION(2, 9, 50, 0) > LUSTRE_VERSION_CODE
+		mode = data->ioc_type != 0 ? data->ioc_type : S_IRWXUGO;
+#else
+		mode = data->ioc_type;
+#endif
+		rc = ll_dir_setdirstripe(inode, lum, filename, mode);
 lmv_out_free:
 		obd_ioctl_freedata(buf, len);
 		return rc;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 52/80] staging: lustre: llite: use the correct mode for striped directory
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Create striped directory with correct mode, which should be
handling same as mkdir.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4929
Reviewed-on: http://review.whamcloud.com/10028
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c |   40 +++++++++++++++++++---------
 1 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 3fed80d..a1b5143 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -694,28 +694,40 @@ static int ll_send_mgc_param(struct obd_export *mgc, char *string)
 	return rc;
 }
 
-static int ll_dir_setdirstripe(struct inode *dir, struct lmv_user_md *lump,
-			       const char *filename)
+/**
+ * Create striped directory with specified stripe(@lump)
+ *
+ * param[in] parent	the parent of the directory.
+ * param[in] lump	the specified stripes.
+ * param[in] dirname	the name of the directory.
+ * param[in] mode	the specified mode of the directory.
+ *
+ * retval		=0 if striped directory is being created successfully.
+ *			<0 if the creation is failed.
+ */
+static int ll_dir_setdirstripe(struct inode *parent, struct lmv_user_md *lump,
+			       const char *dirname, umode_t mode)
 {
 	struct ptlrpc_request *request = NULL;
 	struct md_op_data *op_data;
-	struct ll_sb_info *sbi = ll_i2sbi(dir);
-	int mode;
+	struct ll_sb_info *sbi = ll_i2sbi(parent);
 	int err;
 
 	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
 		return -EINVAL;
 
 	CDEBUG(D_VFSTRACE, "VFS Op:inode="DFID"(%p) name %s stripe_offset %d, stripe_count: %u\n",
-	       PFID(ll_inode2fid(dir)), dir, filename,
+	       PFID(ll_inode2fid(parent)), parent, dirname,
 	       (int)lump->lum_stripe_offset, lump->lum_stripe_count);
 
 	if (lump->lum_magic != cpu_to_le32(LMV_USER_MAGIC))
 		lustre_swab_lmv_user_md(lump);
 
-	mode = (~current_umask() & 0755) | S_IFDIR;
-	op_data = ll_prep_md_op_data(NULL, dir, NULL, filename,
-				     strlen(filename), mode, LUSTRE_OPC_MKDIR,
+	if (!IS_POSIXACL(parent) || !exp_connect_umask(ll_i2mdexp(parent)))
+		mode &= ~current_umask();
+	mode = (mode & (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
+	op_data = ll_prep_md_op_data(NULL, parent, NULL, dirname,
+				     strlen(dirname), mode, LUSTRE_OPC_MKDIR,
 				     lump);
 	if (IS_ERR(op_data)) {
 		err = PTR_ERR(op_data);
@@ -1379,6 +1391,7 @@ out_free:
 		char		*filename;
 		int		 namelen = 0;
 		int		 lumlen = 0;
+		umode_t mode;
 		int		 len;
 		int		 rc;
 
@@ -1412,11 +1425,12 @@ out_free:
 			goto lmv_out_free;
 		}
 
-		/**
-		 * ll_dir_setdirstripe will be used to set dir stripe
-		 *  mdc_create--->mdt_reint_create (with dirstripe)
-		 */
-		rc = ll_dir_setdirstripe(inode, lum, filename);
+#if OBD_OCD_VERSION(2, 9, 50, 0) > LUSTRE_VERSION_CODE
+		mode = data->ioc_type != 0 ? data->ioc_type : S_IRWXUGO;
+#else
+		mode = data->ioc_type;
+#endif
+		rc = ll_dir_setdirstripe(inode, lum, filename, mode);
 lmv_out_free:
 		obd_ioctl_freedata(buf, len);
 		return rc;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 53/80] staging: lustre: obd: rename lsr_padding to lsr_valid
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Niu Yawei,
	James Simmons

From: Niu Yawei <yawei.niu@intel.com>

Simple variable rename.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4345
Reviewed-on: http://review.whamcloud.com/10223
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    2 +-
 drivers/staging/lustre/lustre/obdclass/llog_swab.c |    1 +
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    8 ++++----
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 87eef4c..bbf0c8d 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -3036,7 +3036,7 @@ struct llog_setattr64_rec {
 	__u32			lsr_uid_h;
 	__u32			lsr_gid;
 	__u32			lsr_gid_h;
-	__u64			lsr_padding;
+	__u64			lsr_valid;
 	struct llog_rec_tail    lsr_tail;
 } __packed;
 
diff --git a/drivers/staging/lustre/lustre/obdclass/llog_swab.c b/drivers/staging/lustre/lustre/obdclass/llog_swab.c
index f7b9b19..0ec6361 100644
--- a/drivers/staging/lustre/lustre/obdclass/llog_swab.c
+++ b/drivers/staging/lustre/lustre/obdclass/llog_swab.c
@@ -224,6 +224,7 @@ void lustre_swab_llog_rec(struct llog_rec_hdr *rec)
 		__swab32s(&lsr->lsr_uid_h);
 		__swab32s(&lsr->lsr_gid);
 		__swab32s(&lsr->lsr_gid_h);
+		__swab64s(&lsr->lsr_valid);
 		tail = &lsr->lsr_tail;
 		break;
 	}
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 9d5d2c8..8dbaf32 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -3170,10 +3170,10 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_gid_h));
 	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_gid_h) == 4, "found %lld\n",
 		 (long long)(int)sizeof(((struct llog_setattr64_rec *)0)->lsr_gid_h));
-	LASSERTF((int)offsetof(struct llog_setattr64_rec, lsr_padding) == 48, "found %lld\n",
-		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_padding));
-	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_padding) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct llog_setattr64_rec *)0)->lsr_padding));
+	LASSERTF((int)offsetof(struct llog_setattr64_rec, lsr_valid) == 48, "found %lld\n",
+		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_valid));
+	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_valid) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct llog_setattr64_rec *)0)->lsr_valid));
 	LASSERTF((int)offsetof(struct llog_setattr64_rec, lsr_tail) == 56, "found %lld\n",
 		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_tail));
 	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_tail) == 8, "found %lld\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 53/80] staging: lustre: obd: rename lsr_padding to lsr_valid
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Niu Yawei,
	James Simmons

From: Niu Yawei <yawei.niu@intel.com>

Simple variable rename.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4345
Reviewed-on: http://review.whamcloud.com/10223
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    2 +-
 drivers/staging/lustre/lustre/obdclass/llog_swab.c |    1 +
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    8 ++++----
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 87eef4c..bbf0c8d 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -3036,7 +3036,7 @@ struct llog_setattr64_rec {
 	__u32			lsr_uid_h;
 	__u32			lsr_gid;
 	__u32			lsr_gid_h;
-	__u64			lsr_padding;
+	__u64			lsr_valid;
 	struct llog_rec_tail    lsr_tail;
 } __packed;
 
diff --git a/drivers/staging/lustre/lustre/obdclass/llog_swab.c b/drivers/staging/lustre/lustre/obdclass/llog_swab.c
index f7b9b19..0ec6361 100644
--- a/drivers/staging/lustre/lustre/obdclass/llog_swab.c
+++ b/drivers/staging/lustre/lustre/obdclass/llog_swab.c
@@ -224,6 +224,7 @@ void lustre_swab_llog_rec(struct llog_rec_hdr *rec)
 		__swab32s(&lsr->lsr_uid_h);
 		__swab32s(&lsr->lsr_gid);
 		__swab32s(&lsr->lsr_gid_h);
+		__swab64s(&lsr->lsr_valid);
 		tail = &lsr->lsr_tail;
 		break;
 	}
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 9d5d2c8..8dbaf32 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -3170,10 +3170,10 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_gid_h));
 	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_gid_h) == 4, "found %lld\n",
 		 (long long)(int)sizeof(((struct llog_setattr64_rec *)0)->lsr_gid_h));
-	LASSERTF((int)offsetof(struct llog_setattr64_rec, lsr_padding) == 48, "found %lld\n",
-		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_padding));
-	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_padding) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct llog_setattr64_rec *)0)->lsr_padding));
+	LASSERTF((int)offsetof(struct llog_setattr64_rec, lsr_valid) == 48, "found %lld\n",
+		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_valid));
+	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_valid) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct llog_setattr64_rec *)0)->lsr_valid));
 	LASSERTF((int)offsetof(struct llog_setattr64_rec, lsr_tail) == 56, "found %lld\n",
 		 (long long)(int)offsetof(struct llog_setattr64_rec, lsr_tail));
 	LASSERTF((int)sizeof(((struct llog_setattr64_rec *)0)->lsr_tail) == 8, "found %lld\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 54/80] staging: lustre: llite: set dir LOV xattr length variable
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Hongchao Zhang, James Simmons

From: Hongchao Zhang <hongchao.zhang@intel.com>

the LOV xattr of directory could be either lov_user_md_v1
(size is 32) or lov_user_md_v3 (size is 48), then the actual
size of the LOV xattr should be return.

Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5100
Reviewed-on: http://review.whamcloud.com/10453
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/xattr.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index aa0738b..146da6b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -379,14 +379,6 @@ static int ll_xattr_get(const struct xattr_handler *handler,
 		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
 			return -ENODATA;
 
-		if (size == 0 && S_ISDIR(inode->i_mode)) {
-			/* XXX directory EA is fix for now, optimize to save
-			 * RPC transfer
-			 */
-			rc = sizeof(struct lov_user_md);
-			goto out;
-		}
-
 		lsm = ccc_inode_lsm_get(inode);
 		if (!lsm) {
 			if (S_ISDIR(inode->i_mode)) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 54/80] staging: lustre: llite: set dir LOV xattr length variable
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Hongchao Zhang, James Simmons

From: Hongchao Zhang <hongchao.zhang@intel.com>

the LOV xattr of directory could be either lov_user_md_v1
(size is 32) or lov_user_md_v3 (size is 48), then the actual
size of the LOV xattr should be return.

Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5100
Reviewed-on: http://review.whamcloud.com/10453
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/xattr.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index aa0738b..146da6b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -379,14 +379,6 @@ static int ll_xattr_get(const struct xattr_handler *handler,
 		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
 			return -ENODATA;
 
-		if (size == 0 && S_ISDIR(inode->i_mode)) {
-			/* XXX directory EA is fix for now, optimize to save
-			 * RPC transfer
-			 */
-			rc = sizeof(struct lov_user_md);
-			goto out;
-		}
-
 		lsm = ccc_inode_lsm_get(inode);
 		if (!lsm) {
 			if (S_ISDIR(inode->i_mode)) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 55/80] staging: lustre: mdt: add mbo_ prefix to members of struct mdt_body
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Rename each member of struct mdt_body, adding the prefix mbo_.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10202
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   74 +++---
 drivers/staging/lustre/lustre/include/lustre_mdc.h |   14 +-
 drivers/staging/lustre/lustre/llite/dir.c          |   30 +-
 drivers/staging/lustre/lustre/llite/file.c         |   20 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  110 ++++----
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |    6 +-
 drivers/staging/lustre/lustre/llite/namei.c        |   44 ++--
 drivers/staging/lustre/lustre/llite/statahead.c    |    4 +-
 drivers/staging/lustre/lustre/llite/symlink.c      |    6 +-
 drivers/staging/lustre/lustre/llite/xattr.c        |   14 +-
 drivers/staging/lustre/lustre/llite/xattr_cache.c  |   12 +-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   38 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   16 +-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   62 +++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   32 ++--
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |    4 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |   52 ++--
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   56 ++--
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |  268 ++++++++++----------
 20 files changed, 432 insertions(+), 432 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index bbf0c8d..400ab3c 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2097,43 +2097,43 @@ enum md_transient_state {
 };
 
 struct mdt_body {
-	struct lu_fid  fid1;
-	struct lu_fid  fid2;
-	struct lustre_handle handle;
-	__u64	  valid;
-	__u64	  size;   /* Offset, in the case of MDS_READPAGE */
-	__s64	  mtime;
-	__s64	  atime;
-	__s64	  ctime;
-	__u64	  blocks; /* XID, in the case of MDS_READPAGE */
-	__u64	  ioepoch;
-	__u64	  t_state; /* transient file state defined in
-			    * enum md_transient_state
-			    * was "ino" until 2.4.0
-			    */
-	__u32	  fsuid;
-	__u32	  fsgid;
-	__u32	  capability;
-	__u32	  mode;
-	__u32	  uid;
-	__u32	  gid;
-	__u32	  flags; /* from vfs for pin/unpin, LUSTRE_BFLAG close */
-	__u32	  rdev;
-	__u32	  nlink; /* #bytes to read in the case of MDS_READPAGE */
-	__u32	  unused2; /* was "generation" until 2.4.0 */
-	__u32	  suppgid;
-	__u32	  eadatasize;
-	__u32	  aclsize;
-	__u32	  max_mdsize;
-	__u32	  max_cookiesize;
-	__u32	  uid_h; /* high 32-bits of uid, for FUID */
-	__u32	  gid_h; /* high 32-bits of gid, for FUID */
-	__u32	  padding_5; /* also fix lustre_swab_mdt_body */
-	__u64	  padding_6;
-	__u64	  padding_7;
-	__u64	  padding_8;
-	__u64	  padding_9;
-	__u64	  padding_10;
+	struct lu_fid mbo_fid1;
+	struct lu_fid mbo_fid2;
+	struct lustre_handle mbo_handle;
+	__u64	mbo_valid;
+	__u64	mbo_size;	/* Offset, in the case of MDS_READPAGE */
+	__s64	mbo_mtime;
+	__s64	mbo_atime;
+	__s64	mbo_ctime;
+	__u64	mbo_blocks;	/* XID, in the case of MDS_READPAGE */
+	__u64	mbo_ioepoch;
+	__u64	mbo_t_state;	/* transient file state defined in
+				 * enum md_transient_state
+				 * was "ino" until 2.4.0
+				 */
+	__u32	mbo_fsuid;
+	__u32	mbo_fsgid;
+	__u32	mbo_capability;
+	__u32	mbo_mode;
+	__u32	mbo_uid;
+	__u32	mbo_gid;
+	__u32	mbo_flags;
+	__u32	mbo_rdev;
+	__u32	mbo_nlink;	/* #bytes to read in the case of MDS_READPAGE */
+	__u32	mbo_unused2;	/* was "generation" until 2.4.0 */
+	__u32	mbo_suppgid;
+	__u32	mbo_eadatasize;
+	__u32	mbo_aclsize;
+	__u32	mbo_max_mdsize;
+	__u32	mbo_max_cookiesize;
+	__u32	mbo_uid_h;	/* high 32-bits of uid, for FUID */
+	__u32	mbo_gid_h;	/* high 32-bits of gid, for FUID */
+	__u32	mbo_padding_5;	/* also fix lustre_swab_mdt_body */
+	__u64	mbo_padding_6;
+	__u64	mbo_padding_7;
+	__u64	mbo_padding_8;
+	__u64	mbo_padding_9;
+	__u64	mbo_padding_10;
 }; /* 216 */
 
 void lustre_swab_mdt_body(struct mdt_body *b);
diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h b/drivers/staging/lustre/lustre/include/lustre_mdc.h
index bf6f87a..9549fb4 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
@@ -163,18 +163,18 @@ static inline void mdc_put_rpc_lock(struct mdc_rpc_lock *lck,
 static inline void mdc_update_max_ea_from_body(struct obd_export *exp,
 					       struct mdt_body *body)
 {
-	if (body->valid & OBD_MD_FLMODEASIZE) {
+	if (body->mbo_valid & OBD_MD_FLMODEASIZE) {
 		struct client_obd *cli = &exp->exp_obd->u.cli;
 
-		if (cli->cl_max_mds_easize < body->max_mdsize) {
-			cli->cl_max_mds_easize = body->max_mdsize;
+		if (cli->cl_max_mds_easize < body->mbo_max_mdsize) {
+			cli->cl_max_mds_easize = body->mbo_max_mdsize;
 			cli->cl_default_mds_easize =
-			    min_t(__u32, body->max_mdsize, PAGE_SIZE);
+			    min_t(__u32, body->mbo_max_mdsize, PAGE_SIZE);
 		}
-		if (cli->cl_max_mds_cookiesize < body->max_cookiesize) {
-			cli->cl_max_mds_cookiesize = body->max_cookiesize;
+		if (cli->cl_max_mds_cookiesize < body->mbo_max_cookiesize) {
+			cli->cl_max_mds_cookiesize = body->mbo_max_cookiesize;
 			cli->cl_default_mds_cookiesize =
-			    min_t(__u32, body->max_cookiesize, PAGE_SIZE);
+			    min_t(__u32, body->mbo_max_cookiesize, PAGE_SIZE);
 		}
 	}
 }
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index a1b5143..9c7fa8f 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -188,8 +188,8 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	} else if (rc == 0) {
 		body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
 		/* Checked by mdc_readpage() */
-		if (body->valid & OBD_MD_FLSIZE)
-			i_size_write(inode, body->size);
+		if (body->mbo_valid & OBD_MD_FLSIZE)
+			i_size_write(inode, body->mbo_size);
 
 		nrdpgs = (request->rq_bulk->bd_nob_transferred+PAGE_SIZE-1)
 			 >> PAGE_SHIFT;
@@ -894,9 +894,9 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-	lmmsize = body->eadatasize;
+	lmmsize = body->mbo_eadatasize;
 
-	if (!(body->valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
+	if (!(body->mbo_valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
 	    lmmsize == 0) {
 		rc = -ENODATA;
 		goto out;
@@ -1639,18 +1639,18 @@ skip_lmm:
 			lstat_t st = { 0 };
 
 			st.st_dev     = inode->i_sb->s_dev;
-			st.st_mode    = body->mode;
-			st.st_nlink   = body->nlink;
-			st.st_uid     = body->uid;
-			st.st_gid     = body->gid;
-			st.st_rdev    = body->rdev;
-			st.st_size    = body->size;
+			st.st_mode    = body->mbo_mode;
+			st.st_nlink   = body->mbo_nlink;
+			st.st_uid     = body->mbo_uid;
+			st.st_gid     = body->mbo_gid;
+			st.st_rdev    = body->mbo_rdev;
+			st.st_size    = body->mbo_size;
 			st.st_blksize = PAGE_SIZE;
-			st.st_blocks  = body->blocks;
-			st.st_atime   = body->atime;
-			st.st_mtime   = body->mtime;
-			st.st_ctime   = body->ctime;
-			st.st_ino     = cl_fid_build_ino(&body->fid1,
+			st.st_blocks  = body->mbo_blocks;
+			st.st_atime   = body->mbo_atime;
+			st.st_mtime   = body->mbo_mtime;
+			st.st_ctime   = body->mbo_ctime;
+			st.st_ino     = cl_fid_build_ino(&body->mbo_fid1,
 							 sbi->ll_flags &
 							 LL_SBI_32BIT_API);
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 90a7170..563cdf6 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -200,7 +200,7 @@ static int ll_close_inode_openhandle(struct obd_export *md_exp,
 		struct mdt_body *body;
 
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-		if (!(body->valid & OBD_MD_FLRELEASED))
+		if (!(body->mbo_valid & OBD_MD_FLRELEASED))
 			rc = -EBUSY;
 	}
 
@@ -482,8 +482,8 @@ static int ll_och_fill(struct obd_export *md_exp, struct lookup_intent *it,
 	struct mdt_body *body;
 
 	body = req_capsule_server_get(&it->it_request->rq_pill, &RMF_MDT_BODY);
-	och->och_fh = body->handle;
-	och->och_fid = body->fid1;
+	och->och_fh = body->mbo_handle;
+	och->och_fid = body->mbo_fid1;
 	och->och_lease_handle.cookie = it->it_lock_handle;
 	och->och_magic = OBD_CLIENT_HANDLE_MAGIC;
 	och->och_flags = it->it_flags;
@@ -511,7 +511,7 @@ static int ll_local_open(struct file *file, struct lookup_intent *it,
 
 		body = req_capsule_server_get(&it->it_request->rq_pill,
 					      &RMF_MDT_BODY);
-		ll_ioepoch_open(lli, body->ioepoch);
+		ll_ioepoch_open(lli, body->mbo_ioepoch);
 	}
 
 	LUSTRE_FPRIVATE(file) = fd;
@@ -1451,9 +1451,9 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-	lmmsize = body->eadatasize;
+	lmmsize = body->mbo_eadatasize;
 
-	if (!(body->valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
+	if (!(body->mbo_valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
 	    lmmsize == 0) {
 		rc = -ENODATA;
 		goto out;
@@ -1484,13 +1484,13 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 		 */
 		if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V1)) {
 			lustre_swab_lov_user_md_v1((struct lov_user_md_v1 *)lmm);
-			if (S_ISREG(body->mode))
+			if (S_ISREG(body->mbo_mode))
 				lustre_swab_lov_user_md_objects(
 				 ((struct lov_user_md_v1 *)lmm)->lmm_objects,
 				 stripe_count);
 		} else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V3)) {
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
-			if (S_ISREG(body->mode))
+			if (S_ISREG(body->mbo_mode))
 				lustre_swab_lov_user_md_objects(
 				 ((struct lov_user_md_v3 *)lmm)->lmm_objects,
 				 stripe_count);
@@ -2861,7 +2861,7 @@ int ll_get_fid_by_name(struct inode *parent, const char *name,
 		goto out_req;
 	}
 	if (fid)
-		*fid = body->fid1;
+		*fid = body->mbo_fid1;
 out_req:
 	ptlrpc_req_finished(req);
 	return rc;
@@ -3583,7 +3583,7 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock)
 		goto out;
 	}
 
-	lmmsize = body->eadatasize;
+	lmmsize = body->mbo_eadatasize;
 	if (lmmsize == 0) /* empty layout */ {
 		rc = 0;
 		goto out;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 396e4e4..eed464b 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -154,7 +154,7 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
 	int result = 0;
 	int refcheck;
 
-	LASSERT(md->body->valid & OBD_MD_FLID);
+	LASSERT(md->body->mbo_valid & OBD_MD_FLID);
 	LASSERT(S_ISREG(inode->i_mode));
 
 	env = cl_env_get(&refcheck);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index dd44ee8..5f6343a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1035,7 +1035,7 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 		struct lmv_stripe_md *lsm = md->lmv;
 
 		inode->i_mode = (inode->i_mode & ~S_IFMT) |
-				(body->mode & S_IFMT);
+				(body->mbo_mode & S_IFMT);
 		LASSERTF(S_ISDIR(inode->i_mode), "Not slave inode "DFID"\n",
 			 PFID(fid));
 
@@ -1051,7 +1051,7 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 
 		LASSERT(lsm);
 		/* master object FID */
-		lli->lli_pfid = body->fid1;
+		lli->lli_pfid = body->mbo_fid1;
 		CDEBUG(D_INODE, "lli %p slave "DFID" master "DFID"\n",
 		       lli, PFID(fid), PFID(&lli->lli_pfid));
 		unlock_new_inode(inode);
@@ -1320,8 +1320,8 @@ static int ll_md_setattr(struct dentry *dentry, struct md_op_data *op_data,
 	op_data->op_attr.ia_valid = ia_valid;
 
 	/* Extract epoch data if obtained. */
-	op_data->op_handle = md.body->handle;
-	op_data->op_ioepoch = md.body->ioepoch;
+	op_data->op_handle = md.body->mbo_handle;
+	op_data->op_ioepoch = md.body->mbo_ioepoch;
 
 	rc = ll_update_inode(inode, &md);
 	ptlrpc_req_finished(request);
@@ -1689,7 +1689,7 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	struct lov_stripe_md *lsm = md->lsm;
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 
-	LASSERT((lsm != NULL) == ((body->valid & OBD_MD_FLEASIZE) != 0));
+	LASSERT((lsm != NULL) == ((body->mbo_valid & OBD_MD_FLEASIZE) != 0));
 	if (lsm) {
 		if (!lli->lli_has_smd &&
 		    !(sbi->ll_flags & LL_SBI_LAYOUT_LOCK))
@@ -1709,7 +1709,7 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	}
 
 #ifdef CONFIG_FS_POSIX_ACL
-	if (body->valid & OBD_MD_FLACL) {
+	if (body->mbo_valid & OBD_MD_FLACL) {
 		spin_lock(&lli->lli_lock);
 		if (lli->lli_posix_acl)
 			posix_acl_release(lli->lli_posix_acl);
@@ -1717,65 +1717,65 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 		spin_unlock(&lli->lli_lock);
 	}
 #endif
-	inode->i_ino = cl_fid_build_ino(&body->fid1,
+	inode->i_ino = cl_fid_build_ino(&body->mbo_fid1,
 					sbi->ll_flags & LL_SBI_32BIT_API);
-	inode->i_generation = cl_fid_build_gen(&body->fid1);
+	inode->i_generation = cl_fid_build_gen(&body->mbo_fid1);
 
-	if (body->valid & OBD_MD_FLATIME) {
-		if (body->atime > LTIME_S(inode->i_atime))
-			LTIME_S(inode->i_atime) = body->atime;
-		lli->lli_atime = body->atime;
+	if (body->mbo_valid & OBD_MD_FLATIME) {
+		if (body->mbo_atime > LTIME_S(inode->i_atime))
+			LTIME_S(inode->i_atime) = body->mbo_atime;
+		lli->lli_atime = body->mbo_atime;
 	}
-	if (body->valid & OBD_MD_FLMTIME) {
-		if (body->mtime > LTIME_S(inode->i_mtime)) {
+	if (body->mbo_valid & OBD_MD_FLMTIME) {
+		if (body->mbo_mtime > LTIME_S(inode->i_mtime)) {
 			CDEBUG(D_INODE, "setting ino %lu mtime from %lu to %llu\n",
 			       inode->i_ino, LTIME_S(inode->i_mtime),
-			       body->mtime);
-			LTIME_S(inode->i_mtime) = body->mtime;
+			       body->mbo_mtime);
+			LTIME_S(inode->i_mtime) = body->mbo_mtime;
 		}
-		lli->lli_mtime = body->mtime;
+		lli->lli_mtime = body->mbo_mtime;
 	}
-	if (body->valid & OBD_MD_FLCTIME) {
-		if (body->ctime > LTIME_S(inode->i_ctime))
-			LTIME_S(inode->i_ctime) = body->ctime;
-		lli->lli_ctime = body->ctime;
+	if (body->mbo_valid & OBD_MD_FLCTIME) {
+		if (body->mbo_ctime > LTIME_S(inode->i_ctime))
+			LTIME_S(inode->i_ctime) = body->mbo_ctime;
+		lli->lli_ctime = body->mbo_ctime;
 	}
-	if (body->valid & OBD_MD_FLMODE)
-		inode->i_mode = (inode->i_mode & S_IFMT)|(body->mode & ~S_IFMT);
-	if (body->valid & OBD_MD_FLTYPE)
-		inode->i_mode = (inode->i_mode & ~S_IFMT)|(body->mode & S_IFMT);
+	if (body->mbo_valid & OBD_MD_FLMODE)
+		inode->i_mode = (inode->i_mode & S_IFMT)|(body->mbo_mode & ~S_IFMT);
+	if (body->mbo_valid & OBD_MD_FLTYPE)
+		inode->i_mode = (inode->i_mode & ~S_IFMT)|(body->mbo_mode & S_IFMT);
 	LASSERT(inode->i_mode != 0);
 	if (S_ISREG(inode->i_mode))
 		inode->i_blkbits = min(PTLRPC_MAX_BRW_BITS + 1,
 				       LL_MAX_BLKSIZE_BITS);
 	else
 		inode->i_blkbits = inode->i_sb->s_blocksize_bits;
-	if (body->valid & OBD_MD_FLUID)
-		inode->i_uid = make_kuid(&init_user_ns, body->uid);
-	if (body->valid & OBD_MD_FLGID)
-		inode->i_gid = make_kgid(&init_user_ns, body->gid);
-	if (body->valid & OBD_MD_FLFLAGS)
-		inode->i_flags = ll_ext_to_inode_flags(body->flags);
-	if (body->valid & OBD_MD_FLNLINK)
-		set_nlink(inode, body->nlink);
-	if (body->valid & OBD_MD_FLRDEV)
-		inode->i_rdev = old_decode_dev(body->rdev);
-
-	if (body->valid & OBD_MD_FLID) {
+	if (body->mbo_valid & OBD_MD_FLUID)
+		inode->i_uid = make_kuid(&init_user_ns, body->mbo_uid);
+	if (body->mbo_valid & OBD_MD_FLGID)
+		inode->i_gid = make_kgid(&init_user_ns, body->mbo_gid);
+	if (body->mbo_valid & OBD_MD_FLFLAGS)
+		inode->i_flags = ll_ext_to_inode_flags(body->mbo_flags);
+	if (body->mbo_valid & OBD_MD_FLNLINK)
+		set_nlink(inode, body->mbo_nlink);
+	if (body->mbo_valid & OBD_MD_FLRDEV)
+		inode->i_rdev = old_decode_dev(body->mbo_rdev);
+
+	if (body->mbo_valid & OBD_MD_FLID) {
 		/* FID shouldn't be changed! */
 		if (fid_is_sane(&lli->lli_fid)) {
-			LASSERTF(lu_fid_eq(&lli->lli_fid, &body->fid1),
+			LASSERTF(lu_fid_eq(&lli->lli_fid, &body->mbo_fid1),
 				 "Trying to change FID "DFID" to the "DFID", inode "DFID"(%p)\n",
-				 PFID(&lli->lli_fid), PFID(&body->fid1),
+				 PFID(&lli->lli_fid), PFID(&body->mbo_fid1),
 				 PFID(ll_inode2fid(inode)), inode);
 		} else {
-			lli->lli_fid = body->fid1;
+			lli->lli_fid = body->mbo_fid1;
 		}
 	}
 
 	LASSERT(fid_seq(&lli->lli_fid) != 0);
 
-	if (body->valid & OBD_MD_FLSIZE) {
+	if (body->mbo_valid & OBD_MD_FLSIZE) {
 		if (exp_connect_som(ll_i2mdexp(inode)) &&
 		    S_ISREG(inode->i_mode)) {
 			struct lustre_handle lockh;
@@ -1802,7 +1802,7 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 					/* Use old size assignment to avoid
 					 * deadlock bz14138 & bz14326
 					 */
-					i_size_write(inode, body->size);
+					i_size_write(inode, body->mbo_size);
 					spin_lock(&lli->lli_lock);
 					lli->lli_flags |= LLIF_MDS_SIZE_LOCK;
 					spin_unlock(&lli->lli_lock);
@@ -1813,18 +1813,18 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 			/* Use old size assignment to avoid
 			 * deadlock bz14138 & bz14326
 			 */
-			i_size_write(inode, body->size);
+			i_size_write(inode, body->mbo_size);
 
 			CDEBUG(D_VFSTRACE, "inode=%lu, updating i_size %llu\n",
-			       inode->i_ino, (unsigned long long)body->size);
+			       inode->i_ino, (unsigned long long)body->mbo_size);
 		}
 
-		if (body->valid & OBD_MD_FLBLOCKS)
-			inode->i_blocks = body->blocks;
+		if (body->mbo_valid & OBD_MD_FLBLOCKS)
+			inode->i_blocks = body->mbo_blocks;
 	}
 
-	if (body->valid & OBD_MD_TSTATE) {
-		if (body->t_state & MS_RESTORE)
+	if (body->mbo_valid & OBD_MD_TSTATE) {
+		if (body->mbo_t_state & MS_RESTORE)
 			lli->lli_flags |= LLIF_FILE_RESTORING;
 	}
 
@@ -1936,7 +1936,7 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-		flags = body->flags;
+		flags = body->mbo_flags;
 
 		ptlrpc_req_finished(req);
 
@@ -2118,9 +2118,9 @@ void ll_open_cleanup(struct super_block *sb, struct ptlrpc_request *open_req)
 	if (!op_data)
 		return;
 
-	op_data->op_fid1 = body->fid1;
-	op_data->op_ioepoch = body->ioepoch;
-	op_data->op_handle = body->handle;
+	op_data->op_fid1 = body->mbo_fid1;
+	op_data->op_ioepoch = body->mbo_ioepoch;
+	op_data->op_handle = body->mbo_handle;
 	op_data->op_mod_time = get_seconds();
 	md_close(exp, op_data, NULL, &close_req);
 	ptlrpc_req_finished(close_req);
@@ -2152,15 +2152,15 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		 * At this point server returns to client's same fid as client
 		 * generated for creating. So using ->fid1 is okay here.
 		 */
-		if (!fid_is_sane(&md.body->fid1)) {
+		if (!fid_is_sane(&md.body->mbo_fid1)) {
 			CERROR("%s: Fid is insane " DFID "\n",
 			       ll_get_fsname(sb, NULL, 0),
-			       PFID(&md.body->fid1));
+			       PFID(&md.body->mbo_fid1));
 			rc = -EINVAL;
 			goto out;
 		}
 
-		*inode = ll_iget(sb, cl_fid_build_ino(&md.body->fid1,
+		*inode = ll_iget(sb, cl_fid_build_ino(&md.body->mbo_fid1,
 					     sbi->ll_flags & LL_SBI_32BIT_API),
 				 &md);
 		if (IS_ERR(*inode)) {
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index 06a8199..ac96d89 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -343,10 +343,10 @@ int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid)
 	 * LU-3952: MDT may lost the FID of its parent, we should not crash
 	 * the NFS server, ll_iget_for_nfs() will handle the error.
 	 */
-	if (body->valid & OBD_MD_FLID) {
+	if (body->mbo_valid & OBD_MD_FLID) {
 		CDEBUG(D_INFO, "parent for " DFID " is " DFID "\n",
-		       PFID(ll_inode2fid(dir)), PFID(&body->fid1));
-		*parent_fid = body->fid1;
+		       PFID(ll_inode2fid(dir)), PFID(&body->mbo_fid1));
+		*parent_fid = body->mbo_fid1;
 	}
 
 	ptlrpc_req_finished(req);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 581b083..ac0f442 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -56,12 +56,12 @@ static int ll_test_inode(struct inode *inode, void *opaque)
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct lustre_md     *md = opaque;
 
-	if (unlikely(!(md->body->valid & OBD_MD_FLID))) {
+	if (unlikely(!(md->body->mbo_valid & OBD_MD_FLID))) {
 		CERROR("MDS body missing FID\n");
 		return 0;
 	}
 
-	if (!lu_fid_eq(&lli->lli_fid, &md->body->fid1))
+	if (!lu_fid_eq(&lli->lli_fid, &md->body->mbo_fid1))
 		return 0;
 
 	return 1;
@@ -72,20 +72,20 @@ static int ll_set_inode(struct inode *inode, void *opaque)
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct mdt_body *body = ((struct lustre_md *)opaque)->body;
 
-	if (unlikely(!(body->valid & OBD_MD_FLID))) {
+	if (unlikely(!(body->mbo_valid & OBD_MD_FLID))) {
 		CERROR("MDS body missing FID\n");
 		return -EINVAL;
 	}
 
-	lli->lli_fid = body->fid1;
-	if (unlikely(!(body->valid & OBD_MD_FLTYPE))) {
+	lli->lli_fid = body->mbo_fid1;
+	if (unlikely(!(body->mbo_valid & OBD_MD_FLTYPE))) {
 		CERROR("Can not initialize inode " DFID
 		       " without object type: valid = %#llx\n",
-		       PFID(&lli->lli_fid), body->valid);
+		       PFID(&lli->lli_fid), body->mbo_valid);
 		return -EINVAL;
 	}
 
-	inode->i_mode = (inode->i_mode & ~S_IFMT) | (body->mode & S_IFMT);
+	inode->i_mode = (inode->i_mode & ~S_IFMT) | (body->mbo_mode & S_IFMT);
 	if (unlikely(inode->i_mode == 0)) {
 		CERROR("Invalid inode "DFID" type\n", PFID(&lli->lli_fid));
 		return -EINVAL;
@@ -131,7 +131,7 @@ struct inode *ll_iget(struct super_block *sb, ino_t hash,
 	} else if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
 		rc = ll_update_inode(inode, md);
 		CDEBUG(D_VFSTRACE, "got inode: "DFID"(%p): rc = %d\n",
-		       PFID(&md->body->fid1), inode, rc);
+		       PFID(&md->body->mbo_fid1), inode, rc);
 		if (rc) {
 			make_bad_inode(inode);
 			iput(inode);
@@ -774,16 +774,16 @@ void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
 						       &RMF_MDT_BODY);
 
 	LASSERT(body);
-	if (body->valid & OBD_MD_FLMTIME &&
-	    body->mtime > LTIME_S(inode->i_mtime)) {
+	if (body->mbo_valid & OBD_MD_FLMTIME &&
+	    body->mbo_mtime > LTIME_S(inode->i_mtime)) {
 		CDEBUG(D_INODE, "setting fid "DFID" mtime from %lu to %llu\n",
 		       PFID(ll_inode2fid(inode)), LTIME_S(inode->i_mtime),
-		       body->mtime);
-		LTIME_S(inode->i_mtime) = body->mtime;
+		       body->mbo_mtime);
+		LTIME_S(inode->i_mtime) = body->mbo_mtime;
 	}
-	if (body->valid & OBD_MD_FLCTIME &&
-	    body->ctime > LTIME_S(inode->i_ctime))
-		LTIME_S(inode->i_ctime) = body->ctime;
+	if (body->mbo_valid & OBD_MD_FLCTIME &&
+	    body->mbo_ctime > LTIME_S(inode->i_ctime))
+		LTIME_S(inode->i_ctime) = body->mbo_ctime;
 }
 
 static int ll_new_node(struct inode *dir, struct dentry *dentry,
@@ -899,10 +899,10 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 
 	/* req is swabbed so this is safe */
 	body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
-	if (!(body->valid & OBD_MD_FLEASIZE))
+	if (!(body->mbo_valid & OBD_MD_FLEASIZE))
 		return 0;
 
-	if (body->eadatasize == 0) {
+	if (body->mbo_eadatasize == 0) {
 		CERROR("OBD_MD_FLEASIZE set but eadatasize zero\n");
 		rc = -EPROTO;
 		goto out;
@@ -914,10 +914,10 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 	 * check it is complete and sensible.
 	 */
 	eadata = req_capsule_server_sized_get(&request->rq_pill, &RMF_MDT_MD,
-					      body->eadatasize);
+					      body->mbo_eadatasize);
 	LASSERT(eadata);
 
-	rc = obd_unpackmd(ll_i2dtexp(dir), &lsm, eadata, body->eadatasize);
+	rc = obd_unpackmd(ll_i2dtexp(dir), &lsm, eadata, body->mbo_eadatasize);
 	if (rc < 0) {
 		CERROR("obd_unpackmd: %d\n", rc);
 		goto out;
@@ -931,10 +931,10 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 	}
 
 	oa->o_oi = lsm->lsm_oi;
-	oa->o_mode = body->mode & S_IFMT;
+	oa->o_mode = body->mbo_mode & S_IFMT;
 	oa->o_valid = OBD_MD_FLID | OBD_MD_FLTYPE | OBD_MD_FLGROUP;
 
-	if (body->valid & OBD_MD_FLCOOKIE) {
+	if (body->mbo_valid & OBD_MD_FLCOOKIE) {
 		oa->o_valid |= OBD_MD_FLCOOKIE;
 		oti.oti_logcookies =
 			req_capsule_server_sized_get(&request->rq_pill,
@@ -943,7 +943,7 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 						     lsm->lsm_stripe_count);
 		if (!oti.oti_logcookies) {
 			oa->o_valid &= ~OBD_MD_FLCOOKIE;
-			body->valid &= ~OBD_MD_FLCOOKIE;
+			body->mbo_valid &= ~OBD_MD_FLCOOKIE;
 		}
 	}
 
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index e8c1959..46b8faf 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -632,7 +632,7 @@ static void ll_post_statahead(struct ll_statahead_info *sai)
 		/* XXX: No fid in reply, this is probably cross-ref case.
 		 * SA can't handle it yet.
 		 */
-		if (body->valid & OBD_MD_MDS) {
+		if (body->mbo_valid & OBD_MD_MDS) {
 			rc = -EAGAIN;
 			goto out;
 		}
@@ -641,7 +641,7 @@ static void ll_post_statahead(struct ll_statahead_info *sai)
 		 * revalidate.
 		 */
 		/* unlinked and re-created with the same name */
-		if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->fid1))) {
+		if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->mbo_fid1))) {
 			entry->se_inode = NULL;
 			iput(child);
 			child = NULL;
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 4601be9..47fb799 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -80,17 +80,17 @@ static int ll_readlink_internal(struct inode *inode,
 	}
 
 	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);
-	if ((body->valid & OBD_MD_LINKNAME) == 0) {
+	if ((body->mbo_valid & OBD_MD_LINKNAME) == 0) {
 		CERROR("OBD_MD_LINKNAME not set on reply\n");
 		rc = -EPROTO;
 		goto failed;
 	}
 
 	LASSERT(symlen != 0);
-	if (body->eadatasize != symlen) {
+	if (body->mbo_eadatasize != symlen) {
 		CERROR("%s: inode "DFID": symlink length %d not expected %d\n",
 		       ll_get_fsname(inode->i_sb, NULL, 0),
-		       PFID(ll_inode2fid(inode)), body->eadatasize - 1,
+		       PFID(ll_inode2fid(inode)), body->mbo_eadatasize - 1,
 		       symlen - 1);
 		rc = -EPROTO;
 		goto failed;
diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index 146da6b..f252c26 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -263,32 +263,32 @@ getxattr_nocache:
 
 		/* only detect the xattr size */
 		if (size == 0) {
-			rc = body->eadatasize;
+			rc = body->mbo_eadatasize;
 			goto out;
 		}
 
-		if (size < body->eadatasize) {
+		if (size < body->mbo_eadatasize) {
 			CERROR("server bug: replied size %u > %u\n",
-			       body->eadatasize, (int)size);
+			       body->mbo_eadatasize, (int)size);
 			rc = -ERANGE;
 			goto out;
 		}
 
-		if (body->eadatasize == 0) {
+		if (body->mbo_eadatasize == 0) {
 			rc = -ENODATA;
 			goto out;
 		}
 
 		/* do not need swab xattr data */
 		xdata = req_capsule_server_sized_get(&req->rq_pill, &RMF_EADATA,
-						     body->eadatasize);
+						     body->mbo_eadatasize);
 		if (!xdata) {
 			rc = -EFAULT;
 			goto out;
 		}
 
-		memcpy(buffer, xdata, body->eadatasize);
-		rc = body->eadatasize;
+		memcpy(buffer, xdata, body->mbo_eadatasize);
+		rc = body->mbo_eadatasize;
 	}
 
 out_xattr:
diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c b/drivers/staging/lustre/lustre/llite/xattr_cache.c
index 8089da8..b66542c 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_cache.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c
@@ -380,25 +380,25 @@ static int ll_xattr_cache_refill(struct inode *inode, struct lookup_intent *oit)
 	}
 	/* do not need swab xattr data */
 	xdata = req_capsule_server_sized_get(&req->rq_pill, &RMF_EADATA,
-					     body->eadatasize);
+					     body->mbo_eadatasize);
 	xval = req_capsule_server_sized_get(&req->rq_pill, &RMF_EAVALS,
-					    body->aclsize);
+					    body->mbo_aclsize);
 	xsizes = req_capsule_server_sized_get(&req->rq_pill, &RMF_EAVALS_LENS,
-					      body->max_mdsize * sizeof(__u32));
+					      body->mbo_max_mdsize * sizeof(__u32));
 	if (!xdata || !xval || !xsizes) {
 		CERROR("wrong setxattr reply\n");
 		rc = -EPROTO;
 		goto out_destroy;
 	}
 
-	xtail = xdata + body->eadatasize;
-	xvtail = xval + body->aclsize;
+	xtail = xdata + body->mbo_eadatasize;
+	xvtail = xval + body->mbo_aclsize;
 
 	CDEBUG(D_CACHE, "caching: xdata=%p xtail=%p\n", xdata, xtail);
 
 	ll_xattr_cache_init(lli);
 
-	for (i = 0; i < body->max_mdsize; i++) {
+	for (i = 0; i < body->mbo_max_mdsize; i++) {
 		CDEBUG(D_CACHE, "caching [%s]=%.*s\n", xdata, *xsizes, xval);
 		/* Perform consistency checks: attr names and vals in pill */
 		if (!memchr(xdata, 0, xtail - xdata)) {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 7f81e78..761ab24 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -69,7 +69,7 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 	if (!body)
 		return -EPROTO;
 
-	LASSERT((body->valid & OBD_MD_MDS));
+	LASSERT((body->mbo_valid & OBD_MD_MDS));
 
 	/*
 	 * Unfortunately, we have to lie to MDC/MDS to retrieve
@@ -88,9 +88,9 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		it->it_request = NULL;
 	}
 
-	LASSERT(fid_is_sane(&body->fid1));
+	LASSERT(fid_is_sane(&body->mbo_fid1));
 
-	tgt = lmv_find_target(lmv, &body->fid1);
+	tgt = lmv_find_target(lmv, &body->mbo_fid1);
 	if (IS_ERR(tgt)) {
 		rc = PTR_ERR(tgt);
 		goto out;
@@ -102,7 +102,7 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		goto out;
 	}
 
-	op_data->op_fid1 = body->fid1;
+	op_data->op_fid1 = body->mbo_fid1;
 	/* Sent the parent FID to the remote MDT */
 	if (parent_fid) {
 		/* The parent fid is only for remote open to
@@ -114,12 +114,12 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		/* Add object FID to op_fid3, in case it needs to check stale
 		 * (M_CHECK_STALE), see mdc_finish_intent_lock
 		 */
-		op_data->op_fid3 = body->fid1;
+		op_data->op_fid3 = body->mbo_fid1;
 	}
 
 	op_data->op_bias = MDS_CROSS_REF;
 	CDEBUG(D_INODE, "REMOTE_INTENT with fid="DFID" -> mds #%d\n",
-	       PFID(&body->fid1), tgt->ltd_idx);
+	       PFID(&body->mbo_fid1), tgt->ltd_idx);
 
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, &req, cb_blocking, extra_lock_flags);
@@ -227,9 +227,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 						      &RMF_MDT_BODY);
 			LASSERT(body);
 
-			if (unlikely(body->nlink < 2)) {
+			if (unlikely(body->mbo_nlink < 2)) {
 				CERROR("%s: nlink %d < 2 corrupt stripe %d "DFID":" DFID"\n",
-				       obd->obd_name, body->nlink, i,
+				       obd->obd_name, body->mbo_nlink, i,
 				       PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
 				       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
 
@@ -245,11 +245,11 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 				goto cleanup;
 			}
 
-			i_size_write(inode, body->size);
-			set_nlink(inode, body->nlink);
-			LTIME_S(inode->i_atime) = body->atime;
-			LTIME_S(inode->i_ctime) = body->ctime;
-			LTIME_S(inode->i_mtime) = body->mtime;
+			i_size_write(inode, body->mbo_size);
+			set_nlink(inode, body->mbo_nlink);
+			LTIME_S(inode->i_atime) = body->mbo_atime;
+			LTIME_S(inode->i_ctime) = body->mbo_ctime;
+			LTIME_S(inode->i_mtime) = body->mbo_mtime;
 
 			if (req)
 				ptlrpc_req_finished(req);
@@ -288,9 +288,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 	       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
 
 	if (mbody) {
-		mbody->atime = atime;
-		mbody->ctime = ctime;
-		mbody->mtime = mtime;
+		mbody->mbo_atime = atime;
+		mbody->mbo_ctime = ctime;
+		mbody->mbo_mtime = mtime;
 	}
 cleanup:
 	kfree(op_data);
@@ -360,7 +360,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	if (rc != 0)
 		return rc;
 	/*
-	 * Nothing is found, do not access body->fid1 as it is zero and thus
+	 * Nothing is found, do not access body->mbo_fid1 as it is zero and thus
 	 * pointless.
 	 */
 	if ((it->it_disposition & DISP_LOOKUP_NEG) &&
@@ -373,7 +373,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 		return -EPROTO;
 
 	/* Not cross-ref case, just get out of here. */
-	if (unlikely((body->valid & OBD_MD_MDS))) {
+	if (unlikely((body->mbo_valid & OBD_MD_MDS))) {
 		rc = lmv_intent_remote(exp, lmm, lmmsize, it, &op_data->op_fid1,
 				       flags, reqp, cb_blocking,
 				       extra_lock_flags);
@@ -470,7 +470,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		return -EPROTO;
 
 	/* Not cross-ref case, just get out of here. */
-	if (unlikely((body->valid & OBD_MD_MDS))) {
+	if (unlikely((body->mbo_valid & OBD_MD_MDS))) {
 		rc = lmv_intent_remote(exp, lmm, lmmsize, it, NULL, flags,
 				       reqp, cb_blocking, extra_lock_flags);
 		if (rc != 0)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 6917a03..27a6be1 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1813,11 +1813,11 @@ lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-	if (!(body->valid & OBD_MD_MDS))
+	if (!(body->mbo_valid & OBD_MD_MDS))
 		return 0;
 
 	CDEBUG(D_INODE, "REMOTE_ENQUEUE '%s' on "DFID" -> "DFID"\n",
-	       LL_IT2STR(it), PFID(&op_data->op_fid1), PFID(&body->fid1));
+	       LL_IT2STR(it), PFID(&op_data->op_fid1), PFID(&body->mbo_fid1));
 
 	/*
 	 * We got LOOKUP lock, but we really need attrs.
@@ -1827,7 +1827,7 @@ lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 	memcpy(&plock, lockh, sizeof(plock));
 	it->it_lock_mode = 0;
 	it->it_request = NULL;
-	fid1 = body->fid1;
+	fid1 = body->mbo_fid1;
 
 	ptlrpc_req_finished(req);
 
@@ -1917,8 +1917,8 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		return rc;
 
 	body = req_capsule_server_get(&(*preq)->rq_pill, &RMF_MDT_BODY);
-	if (body->valid & OBD_MD_MDS) {
-		struct lu_fid rid = body->fid1;
+	if (body->mbo_valid & OBD_MD_MDS) {
+		struct lu_fid rid = body->mbo_fid1;
 
 		CDEBUG(D_INODE, "Request attrs for "DFID"\n",
 		       PFID(&rid));
@@ -2433,11 +2433,11 @@ retry:
 		return -EPROTO;
 
 	/* Not cross-ref case, just get out of here. */
-	if (likely(!(body->valid & OBD_MD_MDS)))
+	if (likely(!(body->mbo_valid & OBD_MD_MDS)))
 		return 0;
 
 	CDEBUG(D_INODE, "%s: try unlink to another MDT for "DFID"\n",
-	       exp->exp_obd->obd_name, PFID(&body->fid1));
+	       exp->exp_obd->obd_name, PFID(&body->mbo_fid1));
 
 	/* This is a remote object, try remote MDT, Note: it may
 	 * try more than 1 time here, Considering following case
@@ -2459,7 +2459,7 @@ retry:
 	 * In theory, it might try unlimited time here, but it should
 	 * be very rare case.
 	 */
-	op_data->op_fid2 = body->fid1;
+	op_data->op_fid2 = body->mbo_fid1;
 	ptlrpc_req_finished(*request);
 	*request = NULL;
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 16c3571..813f923 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -37,12 +37,12 @@
 
 static void __mdc_pack_body(struct mdt_body *b, __u32 suppgid)
 {
-	b->suppgid = suppgid;
-	b->uid = from_kuid(&init_user_ns, current_uid());
-	b->gid = from_kgid(&init_user_ns, current_gid());
-	b->fsuid = from_kuid(&init_user_ns, current_fsuid());
-	b->fsgid = from_kgid(&init_user_ns, current_fsgid());
-	b->capability = cfs_curproc_cap_pack();
+	b->mbo_suppgid = suppgid;
+	b->mbo_uid = from_kuid(&init_user_ns, current_uid());
+	b->mbo_gid = from_kgid(&init_user_ns, current_gid());
+	b->mbo_fsuid = from_kuid(&init_user_ns, current_fsuid());
+	b->mbo_fsgid = from_kgid(&init_user_ns, current_fsgid());
+	b->mbo_capability = cfs_curproc_cap_pack();
 }
 
 void mdc_is_subdir_pack(struct ptlrpc_request *req, const struct lu_fid *pfid,
@@ -52,12 +52,12 @@ void mdc_is_subdir_pack(struct ptlrpc_request *req, const struct lu_fid *pfid,
 						    &RMF_MDT_BODY);
 
 	if (pfid) {
-		b->fid1 = *pfid;
-		b->valid = OBD_MD_FLID;
+		b->mbo_fid1 = *pfid;
+		b->mbo_valid = OBD_MD_FLID;
 	}
 	if (cfid)
-		b->fid2 = *cfid;
-	b->flags = flags;
+		b->mbo_fid2 = *cfid;
+	b->mbo_flags = flags;
 }
 
 void mdc_swap_layouts_pack(struct ptlrpc_request *req,
@@ -67,9 +67,9 @@ void mdc_swap_layouts_pack(struct ptlrpc_request *req,
 						    &RMF_MDT_BODY);
 
 	__mdc_pack_body(b, op_data->op_suppgids[0]);
-	b->fid1 = op_data->op_fid1;
-	b->fid2 = op_data->op_fid2;
-	b->valid |= OBD_MD_FLID;
+	b->mbo_fid1 = op_data->op_fid1;
+	b->mbo_fid2 = op_data->op_fid2;
+	b->mbo_valid |= OBD_MD_FLID;
 }
 
 void mdc_pack_body(struct ptlrpc_request *req, const struct lu_fid *fid,
@@ -77,13 +77,13 @@ void mdc_pack_body(struct ptlrpc_request *req, const struct lu_fid *fid,
 {
 	struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
 						    &RMF_MDT_BODY);
-	b->valid = valid;
-	b->eadatasize = ea_size;
-	b->flags = flags;
+	b->mbo_valid = valid;
+	b->mbo_eadatasize = ea_size;
+	b->mbo_flags = flags;
 	__mdc_pack_body(b, suppgid);
 	if (fid) {
-		b->fid1 = *fid;
-		b->valid |= OBD_MD_FLID;
+		b->mbo_fid1 = *fid;
+		b->mbo_valid |= OBD_MD_FLID;
 	}
 }
 
@@ -123,12 +123,12 @@ void mdc_readdir_pack(struct ptlrpc_request *req, __u64 pgoff,
 {
 	struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
 						    &RMF_MDT_BODY);
-	b->fid1 = *fid;
-	b->valid |= OBD_MD_FLID;
-	b->size = pgoff;		       /* !! */
-	b->nlink = size;			/* !! */
+	b->mbo_fid1 = *fid;
+	b->mbo_valid |= OBD_MD_FLID;
+	b->mbo_size = pgoff;		       /* !! */
+	b->mbo_nlink = size;			/* !! */
 	__mdc_pack_body(b, -1);
-	b->mode = LUDA_FID | LUDA_TYPE;
+	b->mbo_mode = LUDA_FID | LUDA_TYPE;
 }
 
 /* packing of MDS records */
@@ -440,18 +440,18 @@ void mdc_getattr_pack(struct ptlrpc_request *req, __u64 valid, int flags,
 	struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
 						    &RMF_MDT_BODY);
 
-	b->valid = valid;
+	b->mbo_valid = valid;
 	if (op_data->op_bias & MDS_CHECK_SPLIT)
-		b->valid |= OBD_MD_FLCKSPLIT;
+		b->mbo_valid |= OBD_MD_FLCKSPLIT;
 	if (op_data->op_bias & MDS_CROSS_REF)
-		b->valid |= OBD_MD_FLCROSSREF;
-	b->eadatasize = ea_size;
-	b->flags = flags;
+		b->mbo_valid |= OBD_MD_FLCROSSREF;
+	b->mbo_eadatasize = ea_size;
+	b->mbo_flags = flags;
 	__mdc_pack_body(b, op_data->op_suppgids[0]);
 
-	b->fid1 = op_data->op_fid1;
-	b->fid2 = op_data->op_fid2;
-	b->valid |= OBD_MD_FLID;
+	b->mbo_fid1 = op_data->op_fid1;
+	b->mbo_fid2 = op_data->op_fid2;
+	b->mbo_valid |= OBD_MD_FLID;
 
 	if (op_data->op_name)
 		mdc_pack_name(req, &RMF_NAME, op_data->op_name,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 20b15f6..551f3d9 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -240,12 +240,12 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 
 	/* FIXME: remove this explicit offset. */
 	rc = sptlrpc_cli_enlarge_reqbuf(req, DLM_INTENT_REC_OFF + 4,
-					body->eadatasize);
+					body->mbo_eadatasize);
 	if (rc) {
 		CERROR("Can't enlarge segment %d size to %d\n",
-		       DLM_INTENT_REC_OFF + 4, body->eadatasize);
-		body->valid &= ~OBD_MD_FLEASIZE;
-		body->eadatasize = 0;
+		       DLM_INTENT_REC_OFF + 4, body->mbo_eadatasize);
+		body->mbo_valid &= ~OBD_MD_FLEASIZE;
+		body->mbo_eadatasize = 0;
 	}
 }
 
@@ -608,7 +608,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			mdc_set_open_replay_data(NULL, NULL, it);
 		}
 
-		if ((body->valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) != 0) {
+		if ((body->mbo_valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) != 0) {
 			void *eadata;
 
 			mdc_update_max_ea_from_body(exp, body);
@@ -618,7 +618,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * Eventually, obd_unpackmd() will check the contents.
 			 */
 			eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
-							      body->eadatasize);
+							      body->mbo_eadatasize);
 			if (!eadata)
 				return -EPROTO;
 
@@ -626,7 +626,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * lock
 			 */
 			lvb_data = eadata;
-			lvb_len = body->eadatasize;
+			lvb_len = body->mbo_eadatasize;
 
 			/*
 			 * We save the reply LOV EA in case we have to replay a
@@ -642,20 +642,20 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 
 				if (req_capsule_get_size(pill, &RMF_EADATA,
 							 RCL_CLIENT) <
-				    body->eadatasize)
+				    body->mbo_eadatasize)
 					mdc_realloc_openmsg(req, body);
 				else
 					req_capsule_shrink(pill, &RMF_EADATA,
-							   body->eadatasize,
+							   body->mbo_eadatasize,
 							   RCL_CLIENT);
 
 				req_capsule_set_size(pill, &RMF_EADATA,
 						     RCL_CLIENT,
-						     body->eadatasize);
+						     body->mbo_eadatasize);
 
 				lmm = req_capsule_client_get(pill, &RMF_EADATA);
 				if (lmm)
-					memcpy(lmm, eadata, body->eadatasize);
+					memcpy(lmm, eadata, body->mbo_eadatasize);
 			}
 		}
 	} else if (it->it_op & IT_LAYOUT) {
@@ -935,11 +935,11 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 		 * op_fid3 - existent fid - if file only open.
 		 * op_fid3 is saved in lmv_intent_open
 		 */
-		if ((!lu_fid_eq(&op_data->op_fid2, &mdt_body->fid1)) &&
-		    (!lu_fid_eq(&op_data->op_fid3, &mdt_body->fid1))) {
+		if ((!lu_fid_eq(&op_data->op_fid2, &mdt_body->mbo_fid1)) &&
+		    (!lu_fid_eq(&op_data->op_fid3, &mdt_body->mbo_fid1))) {
 			CDEBUG(D_DENTRY, "Found stale data "DFID"("DFID")/"DFID
 			       "\n", PFID(&op_data->op_fid2),
-			       PFID(&op_data->op_fid2), PFID(&mdt_body->fid1));
+			       PFID(&op_data->op_fid2), PFID(&mdt_body->mbo_fid1));
 			return -ESTALE;
 		}
 	}
@@ -986,10 +986,10 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 
 		LDLM_DEBUG(lock, "matching against this");
 
-		LASSERTF(fid_res_name_eq(&mdt_body->fid1,
+		LASSERTF(fid_res_name_eq(&mdt_body->mbo_fid1,
 					 &lock->l_resource->lr_name),
 			 "Lock res_id: "DLDLMRES", fid: "DFID"\n",
-			 PLDLMRES(lock->l_resource), PFID(&mdt_body->fid1));
+			 PLDLMRES(lock->l_resource), PFID(&mdt_body->mbo_fid1));
 		LDLM_LOCK_PUT(lock);
 
 		memcpy(&old_lock, lockh, sizeof(*lockh));
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index c3781a6..9bec049 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -177,8 +177,8 @@ int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data,
 
 		epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH);
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-		epoch->handle = body->handle;
-		epoch->ioepoch = body->ioepoch;
+		epoch->handle = body->mbo_handle;
+		epoch->ioepoch = body->mbo_ioepoch;
 		req->rq_replay_cb = mdc_replay_open;
 	/** bug 3633, open may be committed and estale answer is not error */
 	} else if (rc == -ESTALE && (op_data->op_flags & MF_SOM_CHANGE)) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index e26d0d7..74ddec3 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -100,7 +100,7 @@ static int mdc_getstatus(struct obd_export *exp, struct lu_fid *rootfid)
 		goto out;
 	}
 
-	*rootfid = body->fid1;
+	*rootfid = body->mbo_fid1;
 	CDEBUG(D_NET,
 	       "root fid="DFID", last_committed=%llu\n",
 	       PFID(rootfid),
@@ -138,12 +138,12 @@ static int mdc_getattr_common(struct obd_export *exp,
 	if (!body)
 		return -EPROTO;
 
-	CDEBUG(D_NET, "mode: %o\n", body->mode);
+	CDEBUG(D_NET, "mode: %o\n", body->mbo_mode);
 
 	mdc_update_max_ea_from_body(exp, body);
-	if (body->eadatasize != 0) {
+	if (body->mbo_eadatasize != 0) {
 		eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
-						      body->eadatasize);
+						      body->mbo_eadatasize);
 		if (!eadata)
 			return -EPROTO;
 	}
@@ -399,15 +399,15 @@ static int mdc_unpack_acl(struct ptlrpc_request *req, struct lustre_md *md)
 	void		   *buf;
 	int		     rc;
 
-	if (!body->aclsize)
+	if (!body->mbo_aclsize)
 		return 0;
 
-	buf = req_capsule_server_sized_get(pill, &RMF_ACL, body->aclsize);
+	buf = req_capsule_server_sized_get(pill, &RMF_ACL, body->mbo_aclsize);
 
 	if (!buf)
 		return -EPROTO;
 
-	acl = posix_acl_from_xattr(&init_user_ns, buf, body->aclsize);
+	acl = posix_acl_from_xattr(&init_user_ns, buf, body->mbo_aclsize);
 	if (!acl)
 		return 0;
 
@@ -445,24 +445,24 @@ static int mdc_get_lustre_md(struct obd_export *exp,
 
 	md->body = req_capsule_server_get(pill, &RMF_MDT_BODY);
 
-	if (md->body->valid & OBD_MD_FLEASIZE) {
+	if (md->body->mbo_valid & OBD_MD_FLEASIZE) {
 		int lmmsize;
 		struct lov_mds_md *lmm;
 
-		if (!S_ISREG(md->body->mode)) {
+		if (!S_ISREG(md->body->mbo_mode)) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLEASIZE set, should be a regular file, but is not\n");
 			rc = -EPROTO;
 			goto out;
 		}
 
-		if (md->body->eadatasize == 0) {
+		if (md->body->mbo_eadatasize == 0) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLEASIZE set, but eadatasize 0\n");
 			rc = -EPROTO;
 			goto out;
 		}
-		lmmsize = md->body->eadatasize;
+		lmmsize = md->body->mbo_eadatasize;
 		lmm = req_capsule_server_sized_get(pill, &RMF_MDT_MD, lmmsize);
 		if (!lmm) {
 			rc = -EPROTO;
@@ -481,24 +481,24 @@ static int mdc_get_lustre_md(struct obd_export *exp,
 			goto out;
 		}
 
-	} else if (md->body->valid & OBD_MD_FLDIREA) {
+	} else if (md->body->mbo_valid & OBD_MD_FLDIREA) {
 		int lmvsize;
 		struct lov_mds_md *lmv;
 
-		if (!S_ISDIR(md->body->mode)) {
+		if (!S_ISDIR(md->body->mbo_mode)) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLDIREA set, should be a directory, but is not\n");
 			rc = -EPROTO;
 			goto out;
 		}
 
-		if (md->body->eadatasize == 0) {
+		if (md->body->mbo_eadatasize == 0) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLDIREA is set, but eadatasize 0\n");
 			return -EPROTO;
 		}
-		if (md->body->valid & OBD_MD_MEA) {
-			lmvsize = md->body->eadatasize;
+		if (md->body->mbo_valid & OBD_MD_MEA) {
+			lmvsize = md->body->mbo_eadatasize;
 			lmv = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
 							   lmvsize);
 			if (!lmv) {
@@ -522,12 +522,12 @@ static int mdc_get_lustre_md(struct obd_export *exp,
 	}
 	rc = 0;
 
-	if (md->body->valid & OBD_MD_FLACL) {
+	if (md->body->mbo_valid & OBD_MD_FLACL) {
 		/* for ACL, it's possible that FLACL is set but aclsize is zero.
 		 * only when aclsize != 0 there's an actual segment for ACL
 		 * in reply buffer.
 		 */
-		if (md->body->aclsize) {
+		if (md->body->mbo_aclsize) {
 			rc = mdc_unpack_acl(req, md);
 			if (rc)
 				goto out;
@@ -582,9 +582,9 @@ void mdc_replay_open(struct ptlrpc_request *req)
 
 		file_fh = &och->och_fh;
 		CDEBUG(D_HA, "updating handle from %#llx to %#llx\n",
-		       file_fh->cookie, body->handle.cookie);
+		       file_fh->cookie, body->mbo_handle.cookie);
 		old = *file_fh;
-		*file_fh = body->handle;
+		*file_fh = body->mbo_handle;
 	}
 	close_req = mod->mod_close_req;
 	if (close_req) {
@@ -599,7 +599,7 @@ void mdc_replay_open(struct ptlrpc_request *req)
 		if (och)
 			LASSERT(!memcmp(&old, &epoch->handle, sizeof(old)));
 		DEBUG_REQ(D_HA, close_req, "updating close body with new fh");
-		epoch->handle = body->handle;
+		epoch->handle = body->mbo_handle;
 	}
 }
 
@@ -681,11 +681,11 @@ int mdc_set_open_replay_data(struct obd_export *exp,
 		spin_unlock(&open_req->rq_lock);
 	}
 
-	rec->cr_fid2 = body->fid1;
-	rec->cr_ioepoch = body->ioepoch;
-	rec->cr_old_handle.cookie = body->handle.cookie;
+	rec->cr_fid2 = body->mbo_fid1;
+	rec->cr_ioepoch = body->mbo_ioepoch;
+	rec->cr_old_handle.cookie = body->mbo_handle.cookie;
 	open_req->rq_replay_cb = mdc_replay_open;
-	if (!fid_is_sane(&body->fid1)) {
+	if (!fid_is_sane(&body->mbo_fid1)) {
 		DEBUG_REQ(D_ERROR, open_req,
 			  "Saving replay request with insane fid");
 		LBUG();
@@ -746,7 +746,7 @@ static void mdc_close_handle_reply(struct ptlrpc_request *req,
 		epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH);
 
 		epoch->flags |= MF_SOM_AU;
-		if (repbody->valid & OBD_MD_FLGETATTRLOCK)
+		if (repbody->mbo_valid & OBD_MD_FLGETATTRLOCK)
 			op_data->op_flags |= MF_GETATTR_LOCK;
 	}
 }
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 6ddc9c7..465698b 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1674,35 +1674,35 @@ EXPORT_SYMBOL(lustre_swab_lquota_lvb);
 
 void lustre_swab_mdt_body(struct mdt_body *b)
 {
-	lustre_swab_lu_fid(&b->fid1);
-	lustre_swab_lu_fid(&b->fid2);
+	lustre_swab_lu_fid(&b->mbo_fid1);
+	lustre_swab_lu_fid(&b->mbo_fid2);
 	/* handle is opaque */
-	__swab64s(&b->valid);
-	__swab64s(&b->size);
-	__swab64s(&b->mtime);
-	__swab64s(&b->atime);
-	__swab64s(&b->ctime);
-	__swab64s(&b->blocks);
-	__swab64s(&b->ioepoch);
-	__swab64s(&b->t_state);
-	__swab32s(&b->fsuid);
-	__swab32s(&b->fsgid);
-	__swab32s(&b->capability);
-	__swab32s(&b->mode);
-	__swab32s(&b->uid);
-	__swab32s(&b->gid);
-	__swab32s(&b->flags);
-	__swab32s(&b->rdev);
-	__swab32s(&b->nlink);
-	CLASSERT(offsetof(typeof(*b), unused2) != 0);
-	__swab32s(&b->suppgid);
-	__swab32s(&b->eadatasize);
-	__swab32s(&b->aclsize);
-	__swab32s(&b->max_mdsize);
-	__swab32s(&b->max_cookiesize);
-	__swab32s(&b->uid_h);
-	__swab32s(&b->gid_h);
-	CLASSERT(offsetof(typeof(*b), padding_5) != 0);
+	__swab64s(&b->mbo_valid);
+	__swab64s(&b->mbo_size);
+	__swab64s(&b->mbo_mtime);
+	__swab64s(&b->mbo_atime);
+	__swab64s(&b->mbo_ctime);
+	__swab64s(&b->mbo_blocks);
+	__swab64s(&b->mbo_ioepoch);
+	__swab64s(&b->mbo_t_state);
+	__swab32s(&b->mbo_fsuid);
+	__swab32s(&b->mbo_fsgid);
+	__swab32s(&b->mbo_capability);
+	__swab32s(&b->mbo_mode);
+	__swab32s(&b->mbo_uid);
+	__swab32s(&b->mbo_gid);
+	__swab32s(&b->mbo_flags);
+	__swab32s(&b->mbo_rdev);
+	__swab32s(&b->mbo_nlink);
+	CLASSERT(offsetof(typeof(*b), mbo_unused2) != 0);
+	__swab32s(&b->mbo_suppgid);
+	__swab32s(&b->mbo_eadatasize);
+	__swab32s(&b->mbo_aclsize);
+	__swab32s(&b->mbo_max_mdsize);
+	__swab32s(&b->mbo_max_cookiesize);
+	__swab32s(&b->mbo_uid_h);
+	__swab32s(&b->mbo_gid_h);
+	CLASSERT(offsetof(typeof(*b), mbo_padding_5) != 0);
 }
 EXPORT_SYMBOL(lustre_swab_mdt_body);
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 8dbaf32..60d03dd 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1350,7 +1350,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct lov_mds_md_v1, lmm_objects[0]));
 	LASSERTF((int)sizeof(((struct lov_mds_md_v1 *)0)->lmm_objects[0]) == 24, "found %lld\n",
 		 (long long)(int)sizeof(((struct lov_mds_md_v1 *)0)->lmm_objects[0]));
-	CLASSERT(LOV_MAGIC_V1 == 0x0BD10BD0);
+	CLASSERT(LOV_MAGIC_V1 == (0x0BD10000 | 0x0BD0));
 
 	/* Checks for struct lov_mds_md_v3 */
 	LASSERTF((int)sizeof(struct lov_mds_md_v3) == 48, "found %lld\n",
@@ -1388,7 +1388,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct lov_mds_md_v3, lmm_objects[0]));
 	LASSERTF((int)sizeof(((struct lov_mds_md_v3 *)0)->lmm_objects[0]) == 24, "found %lld\n",
 		 (long long)(int)sizeof(((struct lov_mds_md_v3 *)0)->lmm_objects[0]));
-	CLASSERT(LOV_MAGIC_V3 == 0x0BD30BD0);
+	CLASSERT(LOV_MAGIC_V3 == (0x0BD30000 | 0x0BD0));
 	LASSERTF(LOV_PATTERN_RAID0 == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)LOV_PATTERN_RAID0);
 	LASSERTF(LOV_PATTERN_RAID1 == 0x00000002UL, "found 0x%.8xUL\n",
@@ -1667,139 +1667,139 @@ void lustre_assert_wire_constants(void)
 	/* Checks for struct mdt_body */
 	LASSERTF((int)sizeof(struct mdt_body) == 216, "found %lld\n",
 		 (long long)(int)sizeof(struct mdt_body));
-	LASSERTF((int)offsetof(struct mdt_body, fid1) == 0, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fid1));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fid1) == 16, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fid1));
-	LASSERTF((int)offsetof(struct mdt_body, fid2) == 16, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fid2));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fid2) == 16, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fid2));
-	LASSERTF((int)offsetof(struct mdt_body, handle) == 32, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, handle));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->handle) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->handle));
-	LASSERTF((int)offsetof(struct mdt_body, valid) == 40, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, valid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->valid) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->valid));
-	LASSERTF((int)offsetof(struct mdt_body, size) == 48, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, size));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->size) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->size));
-	LASSERTF((int)offsetof(struct mdt_body, mtime) == 56, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, mtime));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->mtime) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->mtime));
-	LASSERTF((int)offsetof(struct mdt_body, atime) == 64, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, atime));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->atime) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->atime));
-	LASSERTF((int)offsetof(struct mdt_body, ctime) == 72, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, ctime));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->ctime) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->ctime));
-	LASSERTF((int)offsetof(struct mdt_body, blocks) == 80, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, blocks));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->blocks) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->blocks));
-	LASSERTF((int)offsetof(struct mdt_body, t_state) == 96, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, t_state));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->t_state) == 8,
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fid1) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fid1));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fid1) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fid1));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fid2) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fid2));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fid2) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fid2));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_handle) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_handle));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_handle) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_handle));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_valid) == 40, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_valid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_valid) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_valid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_size) == 48, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_size));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_size) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_size));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_mtime) == 56, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_mtime));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_mtime) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_mtime));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_atime) == 64, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_atime));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_atime) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_atime));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_ctime) == 72, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_ctime));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_ctime) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_ctime));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_blocks) == 80, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_blocks));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_blocks) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_blocks));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_t_state) == 96, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_t_state));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_t_state) == 8,
 		 "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->t_state));
-	LASSERTF((int)offsetof(struct mdt_body, fsuid) == 104, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fsuid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fsuid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fsuid));
-	LASSERTF((int)offsetof(struct mdt_body, fsgid) == 108, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fsgid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fsgid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fsgid));
-	LASSERTF((int)offsetof(struct mdt_body, capability) == 112, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, capability));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->capability) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->capability));
-	LASSERTF((int)offsetof(struct mdt_body, mode) == 116, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, mode));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->mode) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->mode));
-	LASSERTF((int)offsetof(struct mdt_body, uid) == 120, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, uid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->uid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->uid));
-	LASSERTF((int)offsetof(struct mdt_body, gid) == 124, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, gid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->gid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->gid));
-	LASSERTF((int)offsetof(struct mdt_body, flags) == 128, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, flags));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->flags) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->flags));
-	LASSERTF((int)offsetof(struct mdt_body, rdev) == 132, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, rdev));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->rdev) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->rdev));
-	LASSERTF((int)offsetof(struct mdt_body, nlink) == 136, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, nlink));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->nlink) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->nlink));
-	LASSERTF((int)offsetof(struct mdt_body, unused2) == 140, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, unused2));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->unused2) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->unused2));
-	LASSERTF((int)offsetof(struct mdt_body, suppgid) == 144, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, suppgid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->suppgid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->suppgid));
-	LASSERTF((int)offsetof(struct mdt_body, eadatasize) == 148, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, eadatasize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->eadatasize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->eadatasize));
-	LASSERTF((int)offsetof(struct mdt_body, aclsize) == 152, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, aclsize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->aclsize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->aclsize));
-	LASSERTF((int)offsetof(struct mdt_body, max_mdsize) == 156, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, max_mdsize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->max_mdsize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->max_mdsize));
-	LASSERTF((int)offsetof(struct mdt_body, max_cookiesize) == 160, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, max_cookiesize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->max_cookiesize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->max_cookiesize));
-	LASSERTF((int)offsetof(struct mdt_body, uid_h) == 164, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, uid_h));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->uid_h) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->uid_h));
-	LASSERTF((int)offsetof(struct mdt_body, gid_h) == 168, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, gid_h));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->gid_h) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->gid_h));
-	LASSERTF((int)offsetof(struct mdt_body, padding_5) == 172, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_5));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_5) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_5));
-	LASSERTF((int)offsetof(struct mdt_body, padding_6) == 176, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_6));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_6) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_6));
-	LASSERTF((int)offsetof(struct mdt_body, padding_7) == 184, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_7));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_7) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_7));
-	LASSERTF((int)offsetof(struct mdt_body, padding_8) == 192, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_8));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_8) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_8));
-	LASSERTF((int)offsetof(struct mdt_body, padding_9) == 200, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_9));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_9) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_9));
-	LASSERTF((int)offsetof(struct mdt_body, padding_10) == 208, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_10));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_10) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_10));
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_t_state));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fsuid) == 104, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fsuid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fsuid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fsuid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fsgid) == 108, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fsgid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fsgid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fsgid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_capability) == 112, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_capability));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_capability) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_capability));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_mode) == 116, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_mode));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_mode) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_mode));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_uid) == 120, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_uid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_uid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_uid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_gid) == 124, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_gid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_gid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_gid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_flags) == 128, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_flags));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_flags) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_flags));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_rdev) == 132, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_rdev));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_rdev) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_rdev));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_nlink) == 136, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_nlink));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_nlink) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_nlink));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_unused2) == 140, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_unused2));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_unused2) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_unused2));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_suppgid) == 144, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_suppgid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_suppgid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_suppgid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_eadatasize) == 148, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_eadatasize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_eadatasize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_eadatasize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_aclsize) == 152, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_aclsize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_aclsize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_aclsize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_max_mdsize) == 156, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_max_mdsize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_max_mdsize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_max_mdsize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_max_cookiesize) == 160, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_max_cookiesize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_max_cookiesize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_max_cookiesize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_uid_h) == 164, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_uid_h));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_uid_h) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_uid_h));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_gid_h) == 168, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_gid_h));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_gid_h) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_gid_h));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_5) == 172, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_5));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_5) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_5));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_6) == 176, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_6));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_6) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_6));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_7) == 184, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_7));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_7) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_7));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_8) == 192, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_8));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_8) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_8));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_9) == 200, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_9));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_9) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_9));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_10) == 208, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_10));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_10) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_10));
 	LASSERTF(MDS_FMODE_CLOSED == 000000000000UL, "found 0%.11oUL\n",
 		MDS_FMODE_CLOSED);
 	LASSERTF(MDS_FMODE_EXEC == 000000000004UL, "found 0%.11oUL\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 55/80] staging: lustre: mdt: add mbo_ prefix to members of struct mdt_body
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Rename each member of struct mdt_body, adding the prefix mbo_.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/10202
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   74 +++---
 drivers/staging/lustre/lustre/include/lustre_mdc.h |   14 +-
 drivers/staging/lustre/lustre/llite/dir.c          |   30 +-
 drivers/staging/lustre/lustre/llite/file.c         |   20 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |    2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  110 ++++----
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |    6 +-
 drivers/staging/lustre/lustre/llite/namei.c        |   44 ++--
 drivers/staging/lustre/lustre/llite/statahead.c    |    4 +-
 drivers/staging/lustre/lustre/llite/symlink.c      |    6 +-
 drivers/staging/lustre/lustre/llite/xattr.c        |   14 +-
 drivers/staging/lustre/lustre/llite/xattr_cache.c  |   12 +-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   38 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   16 +-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   62 +++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   32 ++--
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |    4 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |   52 ++--
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |   56 ++--
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |  268 ++++++++++----------
 20 files changed, 432 insertions(+), 432 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index bbf0c8d..400ab3c 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2097,43 +2097,43 @@ enum md_transient_state {
 };
 
 struct mdt_body {
-	struct lu_fid  fid1;
-	struct lu_fid  fid2;
-	struct lustre_handle handle;
-	__u64	  valid;
-	__u64	  size;   /* Offset, in the case of MDS_READPAGE */
-	__s64	  mtime;
-	__s64	  atime;
-	__s64	  ctime;
-	__u64	  blocks; /* XID, in the case of MDS_READPAGE */
-	__u64	  ioepoch;
-	__u64	  t_state; /* transient file state defined in
-			    * enum md_transient_state
-			    * was "ino" until 2.4.0
-			    */
-	__u32	  fsuid;
-	__u32	  fsgid;
-	__u32	  capability;
-	__u32	  mode;
-	__u32	  uid;
-	__u32	  gid;
-	__u32	  flags; /* from vfs for pin/unpin, LUSTRE_BFLAG close */
-	__u32	  rdev;
-	__u32	  nlink; /* #bytes to read in the case of MDS_READPAGE */
-	__u32	  unused2; /* was "generation" until 2.4.0 */
-	__u32	  suppgid;
-	__u32	  eadatasize;
-	__u32	  aclsize;
-	__u32	  max_mdsize;
-	__u32	  max_cookiesize;
-	__u32	  uid_h; /* high 32-bits of uid, for FUID */
-	__u32	  gid_h; /* high 32-bits of gid, for FUID */
-	__u32	  padding_5; /* also fix lustre_swab_mdt_body */
-	__u64	  padding_6;
-	__u64	  padding_7;
-	__u64	  padding_8;
-	__u64	  padding_9;
-	__u64	  padding_10;
+	struct lu_fid mbo_fid1;
+	struct lu_fid mbo_fid2;
+	struct lustre_handle mbo_handle;
+	__u64	mbo_valid;
+	__u64	mbo_size;	/* Offset, in the case of MDS_READPAGE */
+	__s64	mbo_mtime;
+	__s64	mbo_atime;
+	__s64	mbo_ctime;
+	__u64	mbo_blocks;	/* XID, in the case of MDS_READPAGE */
+	__u64	mbo_ioepoch;
+	__u64	mbo_t_state;	/* transient file state defined in
+				 * enum md_transient_state
+				 * was "ino" until 2.4.0
+				 */
+	__u32	mbo_fsuid;
+	__u32	mbo_fsgid;
+	__u32	mbo_capability;
+	__u32	mbo_mode;
+	__u32	mbo_uid;
+	__u32	mbo_gid;
+	__u32	mbo_flags;
+	__u32	mbo_rdev;
+	__u32	mbo_nlink;	/* #bytes to read in the case of MDS_READPAGE */
+	__u32	mbo_unused2;	/* was "generation" until 2.4.0 */
+	__u32	mbo_suppgid;
+	__u32	mbo_eadatasize;
+	__u32	mbo_aclsize;
+	__u32	mbo_max_mdsize;
+	__u32	mbo_max_cookiesize;
+	__u32	mbo_uid_h;	/* high 32-bits of uid, for FUID */
+	__u32	mbo_gid_h;	/* high 32-bits of gid, for FUID */
+	__u32	mbo_padding_5;	/* also fix lustre_swab_mdt_body */
+	__u64	mbo_padding_6;
+	__u64	mbo_padding_7;
+	__u64	mbo_padding_8;
+	__u64	mbo_padding_9;
+	__u64	mbo_padding_10;
 }; /* 216 */
 
 void lustre_swab_mdt_body(struct mdt_body *b);
diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h b/drivers/staging/lustre/lustre/include/lustre_mdc.h
index bf6f87a..9549fb4 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
@@ -163,18 +163,18 @@ static inline void mdc_put_rpc_lock(struct mdc_rpc_lock *lck,
 static inline void mdc_update_max_ea_from_body(struct obd_export *exp,
 					       struct mdt_body *body)
 {
-	if (body->valid & OBD_MD_FLMODEASIZE) {
+	if (body->mbo_valid & OBD_MD_FLMODEASIZE) {
 		struct client_obd *cli = &exp->exp_obd->u.cli;
 
-		if (cli->cl_max_mds_easize < body->max_mdsize) {
-			cli->cl_max_mds_easize = body->max_mdsize;
+		if (cli->cl_max_mds_easize < body->mbo_max_mdsize) {
+			cli->cl_max_mds_easize = body->mbo_max_mdsize;
 			cli->cl_default_mds_easize =
-			    min_t(__u32, body->max_mdsize, PAGE_SIZE);
+			    min_t(__u32, body->mbo_max_mdsize, PAGE_SIZE);
 		}
-		if (cli->cl_max_mds_cookiesize < body->max_cookiesize) {
-			cli->cl_max_mds_cookiesize = body->max_cookiesize;
+		if (cli->cl_max_mds_cookiesize < body->mbo_max_cookiesize) {
+			cli->cl_max_mds_cookiesize = body->mbo_max_cookiesize;
 			cli->cl_default_mds_cookiesize =
-			    min_t(__u32, body->max_cookiesize, PAGE_SIZE);
+			    min_t(__u32, body->mbo_max_cookiesize, PAGE_SIZE);
 		}
 	}
 }
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index a1b5143..9c7fa8f 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -188,8 +188,8 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 	} else if (rc == 0) {
 		body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
 		/* Checked by mdc_readpage() */
-		if (body->valid & OBD_MD_FLSIZE)
-			i_size_write(inode, body->size);
+		if (body->mbo_valid & OBD_MD_FLSIZE)
+			i_size_write(inode, body->mbo_size);
 
 		nrdpgs = (request->rq_bulk->bd_nob_transferred+PAGE_SIZE-1)
 			 >> PAGE_SHIFT;
@@ -894,9 +894,9 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-	lmmsize = body->eadatasize;
+	lmmsize = body->mbo_eadatasize;
 
-	if (!(body->valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
+	if (!(body->mbo_valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
 	    lmmsize == 0) {
 		rc = -ENODATA;
 		goto out;
@@ -1639,18 +1639,18 @@ skip_lmm:
 			lstat_t st = { 0 };
 
 			st.st_dev     = inode->i_sb->s_dev;
-			st.st_mode    = body->mode;
-			st.st_nlink   = body->nlink;
-			st.st_uid     = body->uid;
-			st.st_gid     = body->gid;
-			st.st_rdev    = body->rdev;
-			st.st_size    = body->size;
+			st.st_mode    = body->mbo_mode;
+			st.st_nlink   = body->mbo_nlink;
+			st.st_uid     = body->mbo_uid;
+			st.st_gid     = body->mbo_gid;
+			st.st_rdev    = body->mbo_rdev;
+			st.st_size    = body->mbo_size;
 			st.st_blksize = PAGE_SIZE;
-			st.st_blocks  = body->blocks;
-			st.st_atime   = body->atime;
-			st.st_mtime   = body->mtime;
-			st.st_ctime   = body->ctime;
-			st.st_ino     = cl_fid_build_ino(&body->fid1,
+			st.st_blocks  = body->mbo_blocks;
+			st.st_atime   = body->mbo_atime;
+			st.st_mtime   = body->mbo_mtime;
+			st.st_ctime   = body->mbo_ctime;
+			st.st_ino     = cl_fid_build_ino(&body->mbo_fid1,
 							 sbi->ll_flags &
 							 LL_SBI_32BIT_API);
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 90a7170..563cdf6 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -200,7 +200,7 @@ static int ll_close_inode_openhandle(struct obd_export *md_exp,
 		struct mdt_body *body;
 
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-		if (!(body->valid & OBD_MD_FLRELEASED))
+		if (!(body->mbo_valid & OBD_MD_FLRELEASED))
 			rc = -EBUSY;
 	}
 
@@ -482,8 +482,8 @@ static int ll_och_fill(struct obd_export *md_exp, struct lookup_intent *it,
 	struct mdt_body *body;
 
 	body = req_capsule_server_get(&it->it_request->rq_pill, &RMF_MDT_BODY);
-	och->och_fh = body->handle;
-	och->och_fid = body->fid1;
+	och->och_fh = body->mbo_handle;
+	och->och_fid = body->mbo_fid1;
 	och->och_lease_handle.cookie = it->it_lock_handle;
 	och->och_magic = OBD_CLIENT_HANDLE_MAGIC;
 	och->och_flags = it->it_flags;
@@ -511,7 +511,7 @@ static int ll_local_open(struct file *file, struct lookup_intent *it,
 
 		body = req_capsule_server_get(&it->it_request->rq_pill,
 					      &RMF_MDT_BODY);
-		ll_ioepoch_open(lli, body->ioepoch);
+		ll_ioepoch_open(lli, body->mbo_ioepoch);
 	}
 
 	LUSTRE_FPRIVATE(file) = fd;
@@ -1451,9 +1451,9 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-	lmmsize = body->eadatasize;
+	lmmsize = body->mbo_eadatasize;
 
-	if (!(body->valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
+	if (!(body->mbo_valid & (OBD_MD_FLEASIZE | OBD_MD_FLDIREA)) ||
 	    lmmsize == 0) {
 		rc = -ENODATA;
 		goto out;
@@ -1484,13 +1484,13 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 		 */
 		if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V1)) {
 			lustre_swab_lov_user_md_v1((struct lov_user_md_v1 *)lmm);
-			if (S_ISREG(body->mode))
+			if (S_ISREG(body->mbo_mode))
 				lustre_swab_lov_user_md_objects(
 				 ((struct lov_user_md_v1 *)lmm)->lmm_objects,
 				 stripe_count);
 		} else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V3)) {
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
-			if (S_ISREG(body->mode))
+			if (S_ISREG(body->mbo_mode))
 				lustre_swab_lov_user_md_objects(
 				 ((struct lov_user_md_v3 *)lmm)->lmm_objects,
 				 stripe_count);
@@ -2861,7 +2861,7 @@ int ll_get_fid_by_name(struct inode *parent, const char *name,
 		goto out_req;
 	}
 	if (fid)
-		*fid = body->fid1;
+		*fid = body->mbo_fid1;
 out_req:
 	ptlrpc_req_finished(req);
 	return rc;
@@ -3583,7 +3583,7 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock)
 		goto out;
 	}
 
-	lmmsize = body->eadatasize;
+	lmmsize = body->mbo_eadatasize;
 	if (lmmsize == 0) /* empty layout */ {
 		rc = 0;
 		goto out;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 396e4e4..eed464b 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -154,7 +154,7 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
 	int result = 0;
 	int refcheck;
 
-	LASSERT(md->body->valid & OBD_MD_FLID);
+	LASSERT(md->body->mbo_valid & OBD_MD_FLID);
 	LASSERT(S_ISREG(inode->i_mode));
 
 	env = cl_env_get(&refcheck);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index dd44ee8..5f6343a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1035,7 +1035,7 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 		struct lmv_stripe_md *lsm = md->lmv;
 
 		inode->i_mode = (inode->i_mode & ~S_IFMT) |
-				(body->mode & S_IFMT);
+				(body->mbo_mode & S_IFMT);
 		LASSERTF(S_ISDIR(inode->i_mode), "Not slave inode "DFID"\n",
 			 PFID(fid));
 
@@ -1051,7 +1051,7 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 
 		LASSERT(lsm);
 		/* master object FID */
-		lli->lli_pfid = body->fid1;
+		lli->lli_pfid = body->mbo_fid1;
 		CDEBUG(D_INODE, "lli %p slave "DFID" master "DFID"\n",
 		       lli, PFID(fid), PFID(&lli->lli_pfid));
 		unlock_new_inode(inode);
@@ -1320,8 +1320,8 @@ static int ll_md_setattr(struct dentry *dentry, struct md_op_data *op_data,
 	op_data->op_attr.ia_valid = ia_valid;
 
 	/* Extract epoch data if obtained. */
-	op_data->op_handle = md.body->handle;
-	op_data->op_ioepoch = md.body->ioepoch;
+	op_data->op_handle = md.body->mbo_handle;
+	op_data->op_ioepoch = md.body->mbo_ioepoch;
 
 	rc = ll_update_inode(inode, &md);
 	ptlrpc_req_finished(request);
@@ -1689,7 +1689,7 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	struct lov_stripe_md *lsm = md->lsm;
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 
-	LASSERT((lsm != NULL) == ((body->valid & OBD_MD_FLEASIZE) != 0));
+	LASSERT((lsm != NULL) == ((body->mbo_valid & OBD_MD_FLEASIZE) != 0));
 	if (lsm) {
 		if (!lli->lli_has_smd &&
 		    !(sbi->ll_flags & LL_SBI_LAYOUT_LOCK))
@@ -1709,7 +1709,7 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	}
 
 #ifdef CONFIG_FS_POSIX_ACL
-	if (body->valid & OBD_MD_FLACL) {
+	if (body->mbo_valid & OBD_MD_FLACL) {
 		spin_lock(&lli->lli_lock);
 		if (lli->lli_posix_acl)
 			posix_acl_release(lli->lli_posix_acl);
@@ -1717,65 +1717,65 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 		spin_unlock(&lli->lli_lock);
 	}
 #endif
-	inode->i_ino = cl_fid_build_ino(&body->fid1,
+	inode->i_ino = cl_fid_build_ino(&body->mbo_fid1,
 					sbi->ll_flags & LL_SBI_32BIT_API);
-	inode->i_generation = cl_fid_build_gen(&body->fid1);
+	inode->i_generation = cl_fid_build_gen(&body->mbo_fid1);
 
-	if (body->valid & OBD_MD_FLATIME) {
-		if (body->atime > LTIME_S(inode->i_atime))
-			LTIME_S(inode->i_atime) = body->atime;
-		lli->lli_atime = body->atime;
+	if (body->mbo_valid & OBD_MD_FLATIME) {
+		if (body->mbo_atime > LTIME_S(inode->i_atime))
+			LTIME_S(inode->i_atime) = body->mbo_atime;
+		lli->lli_atime = body->mbo_atime;
 	}
-	if (body->valid & OBD_MD_FLMTIME) {
-		if (body->mtime > LTIME_S(inode->i_mtime)) {
+	if (body->mbo_valid & OBD_MD_FLMTIME) {
+		if (body->mbo_mtime > LTIME_S(inode->i_mtime)) {
 			CDEBUG(D_INODE, "setting ino %lu mtime from %lu to %llu\n",
 			       inode->i_ino, LTIME_S(inode->i_mtime),
-			       body->mtime);
-			LTIME_S(inode->i_mtime) = body->mtime;
+			       body->mbo_mtime);
+			LTIME_S(inode->i_mtime) = body->mbo_mtime;
 		}
-		lli->lli_mtime = body->mtime;
+		lli->lli_mtime = body->mbo_mtime;
 	}
-	if (body->valid & OBD_MD_FLCTIME) {
-		if (body->ctime > LTIME_S(inode->i_ctime))
-			LTIME_S(inode->i_ctime) = body->ctime;
-		lli->lli_ctime = body->ctime;
+	if (body->mbo_valid & OBD_MD_FLCTIME) {
+		if (body->mbo_ctime > LTIME_S(inode->i_ctime))
+			LTIME_S(inode->i_ctime) = body->mbo_ctime;
+		lli->lli_ctime = body->mbo_ctime;
 	}
-	if (body->valid & OBD_MD_FLMODE)
-		inode->i_mode = (inode->i_mode & S_IFMT)|(body->mode & ~S_IFMT);
-	if (body->valid & OBD_MD_FLTYPE)
-		inode->i_mode = (inode->i_mode & ~S_IFMT)|(body->mode & S_IFMT);
+	if (body->mbo_valid & OBD_MD_FLMODE)
+		inode->i_mode = (inode->i_mode & S_IFMT)|(body->mbo_mode & ~S_IFMT);
+	if (body->mbo_valid & OBD_MD_FLTYPE)
+		inode->i_mode = (inode->i_mode & ~S_IFMT)|(body->mbo_mode & S_IFMT);
 	LASSERT(inode->i_mode != 0);
 	if (S_ISREG(inode->i_mode))
 		inode->i_blkbits = min(PTLRPC_MAX_BRW_BITS + 1,
 				       LL_MAX_BLKSIZE_BITS);
 	else
 		inode->i_blkbits = inode->i_sb->s_blocksize_bits;
-	if (body->valid & OBD_MD_FLUID)
-		inode->i_uid = make_kuid(&init_user_ns, body->uid);
-	if (body->valid & OBD_MD_FLGID)
-		inode->i_gid = make_kgid(&init_user_ns, body->gid);
-	if (body->valid & OBD_MD_FLFLAGS)
-		inode->i_flags = ll_ext_to_inode_flags(body->flags);
-	if (body->valid & OBD_MD_FLNLINK)
-		set_nlink(inode, body->nlink);
-	if (body->valid & OBD_MD_FLRDEV)
-		inode->i_rdev = old_decode_dev(body->rdev);
-
-	if (body->valid & OBD_MD_FLID) {
+	if (body->mbo_valid & OBD_MD_FLUID)
+		inode->i_uid = make_kuid(&init_user_ns, body->mbo_uid);
+	if (body->mbo_valid & OBD_MD_FLGID)
+		inode->i_gid = make_kgid(&init_user_ns, body->mbo_gid);
+	if (body->mbo_valid & OBD_MD_FLFLAGS)
+		inode->i_flags = ll_ext_to_inode_flags(body->mbo_flags);
+	if (body->mbo_valid & OBD_MD_FLNLINK)
+		set_nlink(inode, body->mbo_nlink);
+	if (body->mbo_valid & OBD_MD_FLRDEV)
+		inode->i_rdev = old_decode_dev(body->mbo_rdev);
+
+	if (body->mbo_valid & OBD_MD_FLID) {
 		/* FID shouldn't be changed! */
 		if (fid_is_sane(&lli->lli_fid)) {
-			LASSERTF(lu_fid_eq(&lli->lli_fid, &body->fid1),
+			LASSERTF(lu_fid_eq(&lli->lli_fid, &body->mbo_fid1),
 				 "Trying to change FID "DFID" to the "DFID", inode "DFID"(%p)\n",
-				 PFID(&lli->lli_fid), PFID(&body->fid1),
+				 PFID(&lli->lli_fid), PFID(&body->mbo_fid1),
 				 PFID(ll_inode2fid(inode)), inode);
 		} else {
-			lli->lli_fid = body->fid1;
+			lli->lli_fid = body->mbo_fid1;
 		}
 	}
 
 	LASSERT(fid_seq(&lli->lli_fid) != 0);
 
-	if (body->valid & OBD_MD_FLSIZE) {
+	if (body->mbo_valid & OBD_MD_FLSIZE) {
 		if (exp_connect_som(ll_i2mdexp(inode)) &&
 		    S_ISREG(inode->i_mode)) {
 			struct lustre_handle lockh;
@@ -1802,7 +1802,7 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 					/* Use old size assignment to avoid
 					 * deadlock bz14138 & bz14326
 					 */
-					i_size_write(inode, body->size);
+					i_size_write(inode, body->mbo_size);
 					spin_lock(&lli->lli_lock);
 					lli->lli_flags |= LLIF_MDS_SIZE_LOCK;
 					spin_unlock(&lli->lli_lock);
@@ -1813,18 +1813,18 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 			/* Use old size assignment to avoid
 			 * deadlock bz14138 & bz14326
 			 */
-			i_size_write(inode, body->size);
+			i_size_write(inode, body->mbo_size);
 
 			CDEBUG(D_VFSTRACE, "inode=%lu, updating i_size %llu\n",
-			       inode->i_ino, (unsigned long long)body->size);
+			       inode->i_ino, (unsigned long long)body->mbo_size);
 		}
 
-		if (body->valid & OBD_MD_FLBLOCKS)
-			inode->i_blocks = body->blocks;
+		if (body->mbo_valid & OBD_MD_FLBLOCKS)
+			inode->i_blocks = body->mbo_blocks;
 	}
 
-	if (body->valid & OBD_MD_TSTATE) {
-		if (body->t_state & MS_RESTORE)
+	if (body->mbo_valid & OBD_MD_TSTATE) {
+		if (body->mbo_t_state & MS_RESTORE)
 			lli->lli_flags |= LLIF_FILE_RESTORING;
 	}
 
@@ -1936,7 +1936,7 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-		flags = body->flags;
+		flags = body->mbo_flags;
 
 		ptlrpc_req_finished(req);
 
@@ -2118,9 +2118,9 @@ void ll_open_cleanup(struct super_block *sb, struct ptlrpc_request *open_req)
 	if (!op_data)
 		return;
 
-	op_data->op_fid1 = body->fid1;
-	op_data->op_ioepoch = body->ioepoch;
-	op_data->op_handle = body->handle;
+	op_data->op_fid1 = body->mbo_fid1;
+	op_data->op_ioepoch = body->mbo_ioepoch;
+	op_data->op_handle = body->mbo_handle;
 	op_data->op_mod_time = get_seconds();
 	md_close(exp, op_data, NULL, &close_req);
 	ptlrpc_req_finished(close_req);
@@ -2152,15 +2152,15 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 		 * At this point server returns to client's same fid as client
 		 * generated for creating. So using ->fid1 is okay here.
 		 */
-		if (!fid_is_sane(&md.body->fid1)) {
+		if (!fid_is_sane(&md.body->mbo_fid1)) {
 			CERROR("%s: Fid is insane " DFID "\n",
 			       ll_get_fsname(sb, NULL, 0),
-			       PFID(&md.body->fid1));
+			       PFID(&md.body->mbo_fid1));
 			rc = -EINVAL;
 			goto out;
 		}
 
-		*inode = ll_iget(sb, cl_fid_build_ino(&md.body->fid1,
+		*inode = ll_iget(sb, cl_fid_build_ino(&md.body->mbo_fid1,
 					     sbi->ll_flags & LL_SBI_32BIT_API),
 				 &md);
 		if (IS_ERR(*inode)) {
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index 06a8199..ac96d89 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -343,10 +343,10 @@ int ll_dir_get_parent_fid(struct inode *dir, struct lu_fid *parent_fid)
 	 * LU-3952: MDT may lost the FID of its parent, we should not crash
 	 * the NFS server, ll_iget_for_nfs() will handle the error.
 	 */
-	if (body->valid & OBD_MD_FLID) {
+	if (body->mbo_valid & OBD_MD_FLID) {
 		CDEBUG(D_INFO, "parent for " DFID " is " DFID "\n",
-		       PFID(ll_inode2fid(dir)), PFID(&body->fid1));
-		*parent_fid = body->fid1;
+		       PFID(ll_inode2fid(dir)), PFID(&body->mbo_fid1));
+		*parent_fid = body->mbo_fid1;
 	}
 
 	ptlrpc_req_finished(req);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 581b083..ac0f442 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -56,12 +56,12 @@ static int ll_test_inode(struct inode *inode, void *opaque)
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct lustre_md     *md = opaque;
 
-	if (unlikely(!(md->body->valid & OBD_MD_FLID))) {
+	if (unlikely(!(md->body->mbo_valid & OBD_MD_FLID))) {
 		CERROR("MDS body missing FID\n");
 		return 0;
 	}
 
-	if (!lu_fid_eq(&lli->lli_fid, &md->body->fid1))
+	if (!lu_fid_eq(&lli->lli_fid, &md->body->mbo_fid1))
 		return 0;
 
 	return 1;
@@ -72,20 +72,20 @@ static int ll_set_inode(struct inode *inode, void *opaque)
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct mdt_body *body = ((struct lustre_md *)opaque)->body;
 
-	if (unlikely(!(body->valid & OBD_MD_FLID))) {
+	if (unlikely(!(body->mbo_valid & OBD_MD_FLID))) {
 		CERROR("MDS body missing FID\n");
 		return -EINVAL;
 	}
 
-	lli->lli_fid = body->fid1;
-	if (unlikely(!(body->valid & OBD_MD_FLTYPE))) {
+	lli->lli_fid = body->mbo_fid1;
+	if (unlikely(!(body->mbo_valid & OBD_MD_FLTYPE))) {
 		CERROR("Can not initialize inode " DFID
 		       " without object type: valid = %#llx\n",
-		       PFID(&lli->lli_fid), body->valid);
+		       PFID(&lli->lli_fid), body->mbo_valid);
 		return -EINVAL;
 	}
 
-	inode->i_mode = (inode->i_mode & ~S_IFMT) | (body->mode & S_IFMT);
+	inode->i_mode = (inode->i_mode & ~S_IFMT) | (body->mbo_mode & S_IFMT);
 	if (unlikely(inode->i_mode == 0)) {
 		CERROR("Invalid inode "DFID" type\n", PFID(&lli->lli_fid));
 		return -EINVAL;
@@ -131,7 +131,7 @@ struct inode *ll_iget(struct super_block *sb, ino_t hash,
 	} else if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
 		rc = ll_update_inode(inode, md);
 		CDEBUG(D_VFSTRACE, "got inode: "DFID"(%p): rc = %d\n",
-		       PFID(&md->body->fid1), inode, rc);
+		       PFID(&md->body->mbo_fid1), inode, rc);
 		if (rc) {
 			make_bad_inode(inode);
 			iput(inode);
@@ -774,16 +774,16 @@ void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
 						       &RMF_MDT_BODY);
 
 	LASSERT(body);
-	if (body->valid & OBD_MD_FLMTIME &&
-	    body->mtime > LTIME_S(inode->i_mtime)) {
+	if (body->mbo_valid & OBD_MD_FLMTIME &&
+	    body->mbo_mtime > LTIME_S(inode->i_mtime)) {
 		CDEBUG(D_INODE, "setting fid "DFID" mtime from %lu to %llu\n",
 		       PFID(ll_inode2fid(inode)), LTIME_S(inode->i_mtime),
-		       body->mtime);
-		LTIME_S(inode->i_mtime) = body->mtime;
+		       body->mbo_mtime);
+		LTIME_S(inode->i_mtime) = body->mbo_mtime;
 	}
-	if (body->valid & OBD_MD_FLCTIME &&
-	    body->ctime > LTIME_S(inode->i_ctime))
-		LTIME_S(inode->i_ctime) = body->ctime;
+	if (body->mbo_valid & OBD_MD_FLCTIME &&
+	    body->mbo_ctime > LTIME_S(inode->i_ctime))
+		LTIME_S(inode->i_ctime) = body->mbo_ctime;
 }
 
 static int ll_new_node(struct inode *dir, struct dentry *dentry,
@@ -899,10 +899,10 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 
 	/* req is swabbed so this is safe */
 	body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
-	if (!(body->valid & OBD_MD_FLEASIZE))
+	if (!(body->mbo_valid & OBD_MD_FLEASIZE))
 		return 0;
 
-	if (body->eadatasize == 0) {
+	if (body->mbo_eadatasize == 0) {
 		CERROR("OBD_MD_FLEASIZE set but eadatasize zero\n");
 		rc = -EPROTO;
 		goto out;
@@ -914,10 +914,10 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 	 * check it is complete and sensible.
 	 */
 	eadata = req_capsule_server_sized_get(&request->rq_pill, &RMF_MDT_MD,
-					      body->eadatasize);
+					      body->mbo_eadatasize);
 	LASSERT(eadata);
 
-	rc = obd_unpackmd(ll_i2dtexp(dir), &lsm, eadata, body->eadatasize);
+	rc = obd_unpackmd(ll_i2dtexp(dir), &lsm, eadata, body->mbo_eadatasize);
 	if (rc < 0) {
 		CERROR("obd_unpackmd: %d\n", rc);
 		goto out;
@@ -931,10 +931,10 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 	}
 
 	oa->o_oi = lsm->lsm_oi;
-	oa->o_mode = body->mode & S_IFMT;
+	oa->o_mode = body->mbo_mode & S_IFMT;
 	oa->o_valid = OBD_MD_FLID | OBD_MD_FLTYPE | OBD_MD_FLGROUP;
 
-	if (body->valid & OBD_MD_FLCOOKIE) {
+	if (body->mbo_valid & OBD_MD_FLCOOKIE) {
 		oa->o_valid |= OBD_MD_FLCOOKIE;
 		oti.oti_logcookies =
 			req_capsule_server_sized_get(&request->rq_pill,
@@ -943,7 +943,7 @@ int ll_objects_destroy(struct ptlrpc_request *request, struct inode *dir)
 						     lsm->lsm_stripe_count);
 		if (!oti.oti_logcookies) {
 			oa->o_valid &= ~OBD_MD_FLCOOKIE;
-			body->valid &= ~OBD_MD_FLCOOKIE;
+			body->mbo_valid &= ~OBD_MD_FLCOOKIE;
 		}
 	}
 
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index e8c1959..46b8faf 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -632,7 +632,7 @@ static void ll_post_statahead(struct ll_statahead_info *sai)
 		/* XXX: No fid in reply, this is probably cross-ref case.
 		 * SA can't handle it yet.
 		 */
-		if (body->valid & OBD_MD_MDS) {
+		if (body->mbo_valid & OBD_MD_MDS) {
 			rc = -EAGAIN;
 			goto out;
 		}
@@ -641,7 +641,7 @@ static void ll_post_statahead(struct ll_statahead_info *sai)
 		 * revalidate.
 		 */
 		/* unlinked and re-created with the same name */
-		if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->fid1))) {
+		if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->mbo_fid1))) {
 			entry->se_inode = NULL;
 			iput(child);
 			child = NULL;
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 4601be9..47fb799 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -80,17 +80,17 @@ static int ll_readlink_internal(struct inode *inode,
 	}
 
 	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);
-	if ((body->valid & OBD_MD_LINKNAME) == 0) {
+	if ((body->mbo_valid & OBD_MD_LINKNAME) == 0) {
 		CERROR("OBD_MD_LINKNAME not set on reply\n");
 		rc = -EPROTO;
 		goto failed;
 	}
 
 	LASSERT(symlen != 0);
-	if (body->eadatasize != symlen) {
+	if (body->mbo_eadatasize != symlen) {
 		CERROR("%s: inode "DFID": symlink length %d not expected %d\n",
 		       ll_get_fsname(inode->i_sb, NULL, 0),
-		       PFID(ll_inode2fid(inode)), body->eadatasize - 1,
+		       PFID(ll_inode2fid(inode)), body->mbo_eadatasize - 1,
 		       symlen - 1);
 		rc = -EPROTO;
 		goto failed;
diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index 146da6b..f252c26 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -263,32 +263,32 @@ getxattr_nocache:
 
 		/* only detect the xattr size */
 		if (size == 0) {
-			rc = body->eadatasize;
+			rc = body->mbo_eadatasize;
 			goto out;
 		}
 
-		if (size < body->eadatasize) {
+		if (size < body->mbo_eadatasize) {
 			CERROR("server bug: replied size %u > %u\n",
-			       body->eadatasize, (int)size);
+			       body->mbo_eadatasize, (int)size);
 			rc = -ERANGE;
 			goto out;
 		}
 
-		if (body->eadatasize == 0) {
+		if (body->mbo_eadatasize == 0) {
 			rc = -ENODATA;
 			goto out;
 		}
 
 		/* do not need swab xattr data */
 		xdata = req_capsule_server_sized_get(&req->rq_pill, &RMF_EADATA,
-						     body->eadatasize);
+						     body->mbo_eadatasize);
 		if (!xdata) {
 			rc = -EFAULT;
 			goto out;
 		}
 
-		memcpy(buffer, xdata, body->eadatasize);
-		rc = body->eadatasize;
+		memcpy(buffer, xdata, body->mbo_eadatasize);
+		rc = body->mbo_eadatasize;
 	}
 
 out_xattr:
diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c b/drivers/staging/lustre/lustre/llite/xattr_cache.c
index 8089da8..b66542c 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_cache.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c
@@ -380,25 +380,25 @@ static int ll_xattr_cache_refill(struct inode *inode, struct lookup_intent *oit)
 	}
 	/* do not need swab xattr data */
 	xdata = req_capsule_server_sized_get(&req->rq_pill, &RMF_EADATA,
-					     body->eadatasize);
+					     body->mbo_eadatasize);
 	xval = req_capsule_server_sized_get(&req->rq_pill, &RMF_EAVALS,
-					    body->aclsize);
+					    body->mbo_aclsize);
 	xsizes = req_capsule_server_sized_get(&req->rq_pill, &RMF_EAVALS_LENS,
-					      body->max_mdsize * sizeof(__u32));
+					      body->mbo_max_mdsize * sizeof(__u32));
 	if (!xdata || !xval || !xsizes) {
 		CERROR("wrong setxattr reply\n");
 		rc = -EPROTO;
 		goto out_destroy;
 	}
 
-	xtail = xdata + body->eadatasize;
-	xvtail = xval + body->aclsize;
+	xtail = xdata + body->mbo_eadatasize;
+	xvtail = xval + body->mbo_aclsize;
 
 	CDEBUG(D_CACHE, "caching: xdata=%p xtail=%p\n", xdata, xtail);
 
 	ll_xattr_cache_init(lli);
 
-	for (i = 0; i < body->max_mdsize; i++) {
+	for (i = 0; i < body->mbo_max_mdsize; i++) {
 		CDEBUG(D_CACHE, "caching [%s]=%.*s\n", xdata, *xsizes, xval);
 		/* Perform consistency checks: attr names and vals in pill */
 		if (!memchr(xdata, 0, xtail - xdata)) {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 7f81e78..761ab24 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -69,7 +69,7 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 	if (!body)
 		return -EPROTO;
 
-	LASSERT((body->valid & OBD_MD_MDS));
+	LASSERT((body->mbo_valid & OBD_MD_MDS));
 
 	/*
 	 * Unfortunately, we have to lie to MDC/MDS to retrieve
@@ -88,9 +88,9 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		it->it_request = NULL;
 	}
 
-	LASSERT(fid_is_sane(&body->fid1));
+	LASSERT(fid_is_sane(&body->mbo_fid1));
 
-	tgt = lmv_find_target(lmv, &body->fid1);
+	tgt = lmv_find_target(lmv, &body->mbo_fid1);
 	if (IS_ERR(tgt)) {
 		rc = PTR_ERR(tgt);
 		goto out;
@@ -102,7 +102,7 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		goto out;
 	}
 
-	op_data->op_fid1 = body->fid1;
+	op_data->op_fid1 = body->mbo_fid1;
 	/* Sent the parent FID to the remote MDT */
 	if (parent_fid) {
 		/* The parent fid is only for remote open to
@@ -114,12 +114,12 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		/* Add object FID to op_fid3, in case it needs to check stale
 		 * (M_CHECK_STALE), see mdc_finish_intent_lock
 		 */
-		op_data->op_fid3 = body->fid1;
+		op_data->op_fid3 = body->mbo_fid1;
 	}
 
 	op_data->op_bias = MDS_CROSS_REF;
 	CDEBUG(D_INODE, "REMOTE_INTENT with fid="DFID" -> mds #%d\n",
-	       PFID(&body->fid1), tgt->ltd_idx);
+	       PFID(&body->mbo_fid1), tgt->ltd_idx);
 
 	rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
 			    flags, &req, cb_blocking, extra_lock_flags);
@@ -227,9 +227,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 						      &RMF_MDT_BODY);
 			LASSERT(body);
 
-			if (unlikely(body->nlink < 2)) {
+			if (unlikely(body->mbo_nlink < 2)) {
 				CERROR("%s: nlink %d < 2 corrupt stripe %d "DFID":" DFID"\n",
-				       obd->obd_name, body->nlink, i,
+				       obd->obd_name, body->mbo_nlink, i,
 				       PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
 				       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
 
@@ -245,11 +245,11 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 				goto cleanup;
 			}
 
-			i_size_write(inode, body->size);
-			set_nlink(inode, body->nlink);
-			LTIME_S(inode->i_atime) = body->atime;
-			LTIME_S(inode->i_ctime) = body->ctime;
-			LTIME_S(inode->i_mtime) = body->mtime;
+			i_size_write(inode, body->mbo_size);
+			set_nlink(inode, body->mbo_nlink);
+			LTIME_S(inode->i_atime) = body->mbo_atime;
+			LTIME_S(inode->i_ctime) = body->mbo_ctime;
+			LTIME_S(inode->i_mtime) = body->mbo_mtime;
 
 			if (req)
 				ptlrpc_req_finished(req);
@@ -288,9 +288,9 @@ int lmv_revalidate_slaves(struct obd_export *exp, struct mdt_body *mbody,
 	       PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
 
 	if (mbody) {
-		mbody->atime = atime;
-		mbody->ctime = ctime;
-		mbody->mtime = mtime;
+		mbody->mbo_atime = atime;
+		mbody->mbo_ctime = ctime;
+		mbody->mbo_mtime = mtime;
 	}
 cleanup:
 	kfree(op_data);
@@ -360,7 +360,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	if (rc != 0)
 		return rc;
 	/*
-	 * Nothing is found, do not access body->fid1 as it is zero and thus
+	 * Nothing is found, do not access body->mbo_fid1 as it is zero and thus
 	 * pointless.
 	 */
 	if ((it->it_disposition & DISP_LOOKUP_NEG) &&
@@ -373,7 +373,7 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 		return -EPROTO;
 
 	/* Not cross-ref case, just get out of here. */
-	if (unlikely((body->valid & OBD_MD_MDS))) {
+	if (unlikely((body->mbo_valid & OBD_MD_MDS))) {
 		rc = lmv_intent_remote(exp, lmm, lmmsize, it, &op_data->op_fid1,
 				       flags, reqp, cb_blocking,
 				       extra_lock_flags);
@@ -470,7 +470,7 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		return -EPROTO;
 
 	/* Not cross-ref case, just get out of here. */
-	if (unlikely((body->valid & OBD_MD_MDS))) {
+	if (unlikely((body->mbo_valid & OBD_MD_MDS))) {
 		rc = lmv_intent_remote(exp, lmm, lmmsize, it, NULL, flags,
 				       reqp, cb_blocking, extra_lock_flags);
 		if (rc != 0)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 6917a03..27a6be1 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1813,11 +1813,11 @@ lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 
 	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
-	if (!(body->valid & OBD_MD_MDS))
+	if (!(body->mbo_valid & OBD_MD_MDS))
 		return 0;
 
 	CDEBUG(D_INODE, "REMOTE_ENQUEUE '%s' on "DFID" -> "DFID"\n",
-	       LL_IT2STR(it), PFID(&op_data->op_fid1), PFID(&body->fid1));
+	       LL_IT2STR(it), PFID(&op_data->op_fid1), PFID(&body->mbo_fid1));
 
 	/*
 	 * We got LOOKUP lock, but we really need attrs.
@@ -1827,7 +1827,7 @@ lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 	memcpy(&plock, lockh, sizeof(plock));
 	it->it_lock_mode = 0;
 	it->it_request = NULL;
-	fid1 = body->fid1;
+	fid1 = body->mbo_fid1;
 
 	ptlrpc_req_finished(req);
 
@@ -1917,8 +1917,8 @@ lmv_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		return rc;
 
 	body = req_capsule_server_get(&(*preq)->rq_pill, &RMF_MDT_BODY);
-	if (body->valid & OBD_MD_MDS) {
-		struct lu_fid rid = body->fid1;
+	if (body->mbo_valid & OBD_MD_MDS) {
+		struct lu_fid rid = body->mbo_fid1;
 
 		CDEBUG(D_INODE, "Request attrs for "DFID"\n",
 		       PFID(&rid));
@@ -2433,11 +2433,11 @@ retry:
 		return -EPROTO;
 
 	/* Not cross-ref case, just get out of here. */
-	if (likely(!(body->valid & OBD_MD_MDS)))
+	if (likely(!(body->mbo_valid & OBD_MD_MDS)))
 		return 0;
 
 	CDEBUG(D_INODE, "%s: try unlink to another MDT for "DFID"\n",
-	       exp->exp_obd->obd_name, PFID(&body->fid1));
+	       exp->exp_obd->obd_name, PFID(&body->mbo_fid1));
 
 	/* This is a remote object, try remote MDT, Note: it may
 	 * try more than 1 time here, Considering following case
@@ -2459,7 +2459,7 @@ retry:
 	 * In theory, it might try unlimited time here, but it should
 	 * be very rare case.
 	 */
-	op_data->op_fid2 = body->fid1;
+	op_data->op_fid2 = body->mbo_fid1;
 	ptlrpc_req_finished(*request);
 	*request = NULL;
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 16c3571..813f923 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -37,12 +37,12 @@
 
 static void __mdc_pack_body(struct mdt_body *b, __u32 suppgid)
 {
-	b->suppgid = suppgid;
-	b->uid = from_kuid(&init_user_ns, current_uid());
-	b->gid = from_kgid(&init_user_ns, current_gid());
-	b->fsuid = from_kuid(&init_user_ns, current_fsuid());
-	b->fsgid = from_kgid(&init_user_ns, current_fsgid());
-	b->capability = cfs_curproc_cap_pack();
+	b->mbo_suppgid = suppgid;
+	b->mbo_uid = from_kuid(&init_user_ns, current_uid());
+	b->mbo_gid = from_kgid(&init_user_ns, current_gid());
+	b->mbo_fsuid = from_kuid(&init_user_ns, current_fsuid());
+	b->mbo_fsgid = from_kgid(&init_user_ns, current_fsgid());
+	b->mbo_capability = cfs_curproc_cap_pack();
 }
 
 void mdc_is_subdir_pack(struct ptlrpc_request *req, const struct lu_fid *pfid,
@@ -52,12 +52,12 @@ void mdc_is_subdir_pack(struct ptlrpc_request *req, const struct lu_fid *pfid,
 						    &RMF_MDT_BODY);
 
 	if (pfid) {
-		b->fid1 = *pfid;
-		b->valid = OBD_MD_FLID;
+		b->mbo_fid1 = *pfid;
+		b->mbo_valid = OBD_MD_FLID;
 	}
 	if (cfid)
-		b->fid2 = *cfid;
-	b->flags = flags;
+		b->mbo_fid2 = *cfid;
+	b->mbo_flags = flags;
 }
 
 void mdc_swap_layouts_pack(struct ptlrpc_request *req,
@@ -67,9 +67,9 @@ void mdc_swap_layouts_pack(struct ptlrpc_request *req,
 						    &RMF_MDT_BODY);
 
 	__mdc_pack_body(b, op_data->op_suppgids[0]);
-	b->fid1 = op_data->op_fid1;
-	b->fid2 = op_data->op_fid2;
-	b->valid |= OBD_MD_FLID;
+	b->mbo_fid1 = op_data->op_fid1;
+	b->mbo_fid2 = op_data->op_fid2;
+	b->mbo_valid |= OBD_MD_FLID;
 }
 
 void mdc_pack_body(struct ptlrpc_request *req, const struct lu_fid *fid,
@@ -77,13 +77,13 @@ void mdc_pack_body(struct ptlrpc_request *req, const struct lu_fid *fid,
 {
 	struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
 						    &RMF_MDT_BODY);
-	b->valid = valid;
-	b->eadatasize = ea_size;
-	b->flags = flags;
+	b->mbo_valid = valid;
+	b->mbo_eadatasize = ea_size;
+	b->mbo_flags = flags;
 	__mdc_pack_body(b, suppgid);
 	if (fid) {
-		b->fid1 = *fid;
-		b->valid |= OBD_MD_FLID;
+		b->mbo_fid1 = *fid;
+		b->mbo_valid |= OBD_MD_FLID;
 	}
 }
 
@@ -123,12 +123,12 @@ void mdc_readdir_pack(struct ptlrpc_request *req, __u64 pgoff,
 {
 	struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
 						    &RMF_MDT_BODY);
-	b->fid1 = *fid;
-	b->valid |= OBD_MD_FLID;
-	b->size = pgoff;		       /* !! */
-	b->nlink = size;			/* !! */
+	b->mbo_fid1 = *fid;
+	b->mbo_valid |= OBD_MD_FLID;
+	b->mbo_size = pgoff;		       /* !! */
+	b->mbo_nlink = size;			/* !! */
 	__mdc_pack_body(b, -1);
-	b->mode = LUDA_FID | LUDA_TYPE;
+	b->mbo_mode = LUDA_FID | LUDA_TYPE;
 }
 
 /* packing of MDS records */
@@ -440,18 +440,18 @@ void mdc_getattr_pack(struct ptlrpc_request *req, __u64 valid, int flags,
 	struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
 						    &RMF_MDT_BODY);
 
-	b->valid = valid;
+	b->mbo_valid = valid;
 	if (op_data->op_bias & MDS_CHECK_SPLIT)
-		b->valid |= OBD_MD_FLCKSPLIT;
+		b->mbo_valid |= OBD_MD_FLCKSPLIT;
 	if (op_data->op_bias & MDS_CROSS_REF)
-		b->valid |= OBD_MD_FLCROSSREF;
-	b->eadatasize = ea_size;
-	b->flags = flags;
+		b->mbo_valid |= OBD_MD_FLCROSSREF;
+	b->mbo_eadatasize = ea_size;
+	b->mbo_flags = flags;
 	__mdc_pack_body(b, op_data->op_suppgids[0]);
 
-	b->fid1 = op_data->op_fid1;
-	b->fid2 = op_data->op_fid2;
-	b->valid |= OBD_MD_FLID;
+	b->mbo_fid1 = op_data->op_fid1;
+	b->mbo_fid2 = op_data->op_fid2;
+	b->mbo_valid |= OBD_MD_FLID;
 
 	if (op_data->op_name)
 		mdc_pack_name(req, &RMF_NAME, op_data->op_name,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 20b15f6..551f3d9 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -240,12 +240,12 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 
 	/* FIXME: remove this explicit offset. */
 	rc = sptlrpc_cli_enlarge_reqbuf(req, DLM_INTENT_REC_OFF + 4,
-					body->eadatasize);
+					body->mbo_eadatasize);
 	if (rc) {
 		CERROR("Can't enlarge segment %d size to %d\n",
-		       DLM_INTENT_REC_OFF + 4, body->eadatasize);
-		body->valid &= ~OBD_MD_FLEASIZE;
-		body->eadatasize = 0;
+		       DLM_INTENT_REC_OFF + 4, body->mbo_eadatasize);
+		body->mbo_valid &= ~OBD_MD_FLEASIZE;
+		body->mbo_eadatasize = 0;
 	}
 }
 
@@ -608,7 +608,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			mdc_set_open_replay_data(NULL, NULL, it);
 		}
 
-		if ((body->valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) != 0) {
+		if ((body->mbo_valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) != 0) {
 			void *eadata;
 
 			mdc_update_max_ea_from_body(exp, body);
@@ -618,7 +618,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * Eventually, obd_unpackmd() will check the contents.
 			 */
 			eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
-							      body->eadatasize);
+							      body->mbo_eadatasize);
 			if (!eadata)
 				return -EPROTO;
 
@@ -626,7 +626,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * lock
 			 */
 			lvb_data = eadata;
-			lvb_len = body->eadatasize;
+			lvb_len = body->mbo_eadatasize;
 
 			/*
 			 * We save the reply LOV EA in case we have to replay a
@@ -642,20 +642,20 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 
 				if (req_capsule_get_size(pill, &RMF_EADATA,
 							 RCL_CLIENT) <
-				    body->eadatasize)
+				    body->mbo_eadatasize)
 					mdc_realloc_openmsg(req, body);
 				else
 					req_capsule_shrink(pill, &RMF_EADATA,
-							   body->eadatasize,
+							   body->mbo_eadatasize,
 							   RCL_CLIENT);
 
 				req_capsule_set_size(pill, &RMF_EADATA,
 						     RCL_CLIENT,
-						     body->eadatasize);
+						     body->mbo_eadatasize);
 
 				lmm = req_capsule_client_get(pill, &RMF_EADATA);
 				if (lmm)
-					memcpy(lmm, eadata, body->eadatasize);
+					memcpy(lmm, eadata, body->mbo_eadatasize);
 			}
 		}
 	} else if (it->it_op & IT_LAYOUT) {
@@ -935,11 +935,11 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 		 * op_fid3 - existent fid - if file only open.
 		 * op_fid3 is saved in lmv_intent_open
 		 */
-		if ((!lu_fid_eq(&op_data->op_fid2, &mdt_body->fid1)) &&
-		    (!lu_fid_eq(&op_data->op_fid3, &mdt_body->fid1))) {
+		if ((!lu_fid_eq(&op_data->op_fid2, &mdt_body->mbo_fid1)) &&
+		    (!lu_fid_eq(&op_data->op_fid3, &mdt_body->mbo_fid1))) {
 			CDEBUG(D_DENTRY, "Found stale data "DFID"("DFID")/"DFID
 			       "\n", PFID(&op_data->op_fid2),
-			       PFID(&op_data->op_fid2), PFID(&mdt_body->fid1));
+			       PFID(&op_data->op_fid2), PFID(&mdt_body->mbo_fid1));
 			return -ESTALE;
 		}
 	}
@@ -986,10 +986,10 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 
 		LDLM_DEBUG(lock, "matching against this");
 
-		LASSERTF(fid_res_name_eq(&mdt_body->fid1,
+		LASSERTF(fid_res_name_eq(&mdt_body->mbo_fid1,
 					 &lock->l_resource->lr_name),
 			 "Lock res_id: "DLDLMRES", fid: "DFID"\n",
-			 PLDLMRES(lock->l_resource), PFID(&mdt_body->fid1));
+			 PLDLMRES(lock->l_resource), PFID(&mdt_body->mbo_fid1));
 		LDLM_LOCK_PUT(lock);
 
 		memcpy(&old_lock, lockh, sizeof(*lockh));
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index c3781a6..9bec049 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -177,8 +177,8 @@ int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data,
 
 		epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH);
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-		epoch->handle = body->handle;
-		epoch->ioepoch = body->ioepoch;
+		epoch->handle = body->mbo_handle;
+		epoch->ioepoch = body->mbo_ioepoch;
 		req->rq_replay_cb = mdc_replay_open;
 	/** bug 3633, open may be committed and estale answer is not error */
 	} else if (rc == -ESTALE && (op_data->op_flags & MF_SOM_CHANGE)) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index e26d0d7..74ddec3 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -100,7 +100,7 @@ static int mdc_getstatus(struct obd_export *exp, struct lu_fid *rootfid)
 		goto out;
 	}
 
-	*rootfid = body->fid1;
+	*rootfid = body->mbo_fid1;
 	CDEBUG(D_NET,
 	       "root fid="DFID", last_committed=%llu\n",
 	       PFID(rootfid),
@@ -138,12 +138,12 @@ static int mdc_getattr_common(struct obd_export *exp,
 	if (!body)
 		return -EPROTO;
 
-	CDEBUG(D_NET, "mode: %o\n", body->mode);
+	CDEBUG(D_NET, "mode: %o\n", body->mbo_mode);
 
 	mdc_update_max_ea_from_body(exp, body);
-	if (body->eadatasize != 0) {
+	if (body->mbo_eadatasize != 0) {
 		eadata = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
-						      body->eadatasize);
+						      body->mbo_eadatasize);
 		if (!eadata)
 			return -EPROTO;
 	}
@@ -399,15 +399,15 @@ static int mdc_unpack_acl(struct ptlrpc_request *req, struct lustre_md *md)
 	void		   *buf;
 	int		     rc;
 
-	if (!body->aclsize)
+	if (!body->mbo_aclsize)
 		return 0;
 
-	buf = req_capsule_server_sized_get(pill, &RMF_ACL, body->aclsize);
+	buf = req_capsule_server_sized_get(pill, &RMF_ACL, body->mbo_aclsize);
 
 	if (!buf)
 		return -EPROTO;
 
-	acl = posix_acl_from_xattr(&init_user_ns, buf, body->aclsize);
+	acl = posix_acl_from_xattr(&init_user_ns, buf, body->mbo_aclsize);
 	if (!acl)
 		return 0;
 
@@ -445,24 +445,24 @@ static int mdc_get_lustre_md(struct obd_export *exp,
 
 	md->body = req_capsule_server_get(pill, &RMF_MDT_BODY);
 
-	if (md->body->valid & OBD_MD_FLEASIZE) {
+	if (md->body->mbo_valid & OBD_MD_FLEASIZE) {
 		int lmmsize;
 		struct lov_mds_md *lmm;
 
-		if (!S_ISREG(md->body->mode)) {
+		if (!S_ISREG(md->body->mbo_mode)) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLEASIZE set, should be a regular file, but is not\n");
 			rc = -EPROTO;
 			goto out;
 		}
 
-		if (md->body->eadatasize == 0) {
+		if (md->body->mbo_eadatasize == 0) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLEASIZE set, but eadatasize 0\n");
 			rc = -EPROTO;
 			goto out;
 		}
-		lmmsize = md->body->eadatasize;
+		lmmsize = md->body->mbo_eadatasize;
 		lmm = req_capsule_server_sized_get(pill, &RMF_MDT_MD, lmmsize);
 		if (!lmm) {
 			rc = -EPROTO;
@@ -481,24 +481,24 @@ static int mdc_get_lustre_md(struct obd_export *exp,
 			goto out;
 		}
 
-	} else if (md->body->valid & OBD_MD_FLDIREA) {
+	} else if (md->body->mbo_valid & OBD_MD_FLDIREA) {
 		int lmvsize;
 		struct lov_mds_md *lmv;
 
-		if (!S_ISDIR(md->body->mode)) {
+		if (!S_ISDIR(md->body->mbo_mode)) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLDIREA set, should be a directory, but is not\n");
 			rc = -EPROTO;
 			goto out;
 		}
 
-		if (md->body->eadatasize == 0) {
+		if (md->body->mbo_eadatasize == 0) {
 			CDEBUG(D_INFO,
 			       "OBD_MD_FLDIREA is set, but eadatasize 0\n");
 			return -EPROTO;
 		}
-		if (md->body->valid & OBD_MD_MEA) {
-			lmvsize = md->body->eadatasize;
+		if (md->body->mbo_valid & OBD_MD_MEA) {
+			lmvsize = md->body->mbo_eadatasize;
 			lmv = req_capsule_server_sized_get(pill, &RMF_MDT_MD,
 							   lmvsize);
 			if (!lmv) {
@@ -522,12 +522,12 @@ static int mdc_get_lustre_md(struct obd_export *exp,
 	}
 	rc = 0;
 
-	if (md->body->valid & OBD_MD_FLACL) {
+	if (md->body->mbo_valid & OBD_MD_FLACL) {
 		/* for ACL, it's possible that FLACL is set but aclsize is zero.
 		 * only when aclsize != 0 there's an actual segment for ACL
 		 * in reply buffer.
 		 */
-		if (md->body->aclsize) {
+		if (md->body->mbo_aclsize) {
 			rc = mdc_unpack_acl(req, md);
 			if (rc)
 				goto out;
@@ -582,9 +582,9 @@ void mdc_replay_open(struct ptlrpc_request *req)
 
 		file_fh = &och->och_fh;
 		CDEBUG(D_HA, "updating handle from %#llx to %#llx\n",
-		       file_fh->cookie, body->handle.cookie);
+		       file_fh->cookie, body->mbo_handle.cookie);
 		old = *file_fh;
-		*file_fh = body->handle;
+		*file_fh = body->mbo_handle;
 	}
 	close_req = mod->mod_close_req;
 	if (close_req) {
@@ -599,7 +599,7 @@ void mdc_replay_open(struct ptlrpc_request *req)
 		if (och)
 			LASSERT(!memcmp(&old, &epoch->handle, sizeof(old)));
 		DEBUG_REQ(D_HA, close_req, "updating close body with new fh");
-		epoch->handle = body->handle;
+		epoch->handle = body->mbo_handle;
 	}
 }
 
@@ -681,11 +681,11 @@ int mdc_set_open_replay_data(struct obd_export *exp,
 		spin_unlock(&open_req->rq_lock);
 	}
 
-	rec->cr_fid2 = body->fid1;
-	rec->cr_ioepoch = body->ioepoch;
-	rec->cr_old_handle.cookie = body->handle.cookie;
+	rec->cr_fid2 = body->mbo_fid1;
+	rec->cr_ioepoch = body->mbo_ioepoch;
+	rec->cr_old_handle.cookie = body->mbo_handle.cookie;
 	open_req->rq_replay_cb = mdc_replay_open;
-	if (!fid_is_sane(&body->fid1)) {
+	if (!fid_is_sane(&body->mbo_fid1)) {
 		DEBUG_REQ(D_ERROR, open_req,
 			  "Saving replay request with insane fid");
 		LBUG();
@@ -746,7 +746,7 @@ static void mdc_close_handle_reply(struct ptlrpc_request *req,
 		epoch = req_capsule_client_get(&req->rq_pill, &RMF_MDT_EPOCH);
 
 		epoch->flags |= MF_SOM_AU;
-		if (repbody->valid & OBD_MD_FLGETATTRLOCK)
+		if (repbody->mbo_valid & OBD_MD_FLGETATTRLOCK)
 			op_data->op_flags |= MF_GETATTR_LOCK;
 	}
 }
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 6ddc9c7..465698b 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1674,35 +1674,35 @@ EXPORT_SYMBOL(lustre_swab_lquota_lvb);
 
 void lustre_swab_mdt_body(struct mdt_body *b)
 {
-	lustre_swab_lu_fid(&b->fid1);
-	lustre_swab_lu_fid(&b->fid2);
+	lustre_swab_lu_fid(&b->mbo_fid1);
+	lustre_swab_lu_fid(&b->mbo_fid2);
 	/* handle is opaque */
-	__swab64s(&b->valid);
-	__swab64s(&b->size);
-	__swab64s(&b->mtime);
-	__swab64s(&b->atime);
-	__swab64s(&b->ctime);
-	__swab64s(&b->blocks);
-	__swab64s(&b->ioepoch);
-	__swab64s(&b->t_state);
-	__swab32s(&b->fsuid);
-	__swab32s(&b->fsgid);
-	__swab32s(&b->capability);
-	__swab32s(&b->mode);
-	__swab32s(&b->uid);
-	__swab32s(&b->gid);
-	__swab32s(&b->flags);
-	__swab32s(&b->rdev);
-	__swab32s(&b->nlink);
-	CLASSERT(offsetof(typeof(*b), unused2) != 0);
-	__swab32s(&b->suppgid);
-	__swab32s(&b->eadatasize);
-	__swab32s(&b->aclsize);
-	__swab32s(&b->max_mdsize);
-	__swab32s(&b->max_cookiesize);
-	__swab32s(&b->uid_h);
-	__swab32s(&b->gid_h);
-	CLASSERT(offsetof(typeof(*b), padding_5) != 0);
+	__swab64s(&b->mbo_valid);
+	__swab64s(&b->mbo_size);
+	__swab64s(&b->mbo_mtime);
+	__swab64s(&b->mbo_atime);
+	__swab64s(&b->mbo_ctime);
+	__swab64s(&b->mbo_blocks);
+	__swab64s(&b->mbo_ioepoch);
+	__swab64s(&b->mbo_t_state);
+	__swab32s(&b->mbo_fsuid);
+	__swab32s(&b->mbo_fsgid);
+	__swab32s(&b->mbo_capability);
+	__swab32s(&b->mbo_mode);
+	__swab32s(&b->mbo_uid);
+	__swab32s(&b->mbo_gid);
+	__swab32s(&b->mbo_flags);
+	__swab32s(&b->mbo_rdev);
+	__swab32s(&b->mbo_nlink);
+	CLASSERT(offsetof(typeof(*b), mbo_unused2) != 0);
+	__swab32s(&b->mbo_suppgid);
+	__swab32s(&b->mbo_eadatasize);
+	__swab32s(&b->mbo_aclsize);
+	__swab32s(&b->mbo_max_mdsize);
+	__swab32s(&b->mbo_max_cookiesize);
+	__swab32s(&b->mbo_uid_h);
+	__swab32s(&b->mbo_gid_h);
+	CLASSERT(offsetof(typeof(*b), mbo_padding_5) != 0);
 }
 EXPORT_SYMBOL(lustre_swab_mdt_body);
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 8dbaf32..60d03dd 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1350,7 +1350,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct lov_mds_md_v1, lmm_objects[0]));
 	LASSERTF((int)sizeof(((struct lov_mds_md_v1 *)0)->lmm_objects[0]) == 24, "found %lld\n",
 		 (long long)(int)sizeof(((struct lov_mds_md_v1 *)0)->lmm_objects[0]));
-	CLASSERT(LOV_MAGIC_V1 == 0x0BD10BD0);
+	CLASSERT(LOV_MAGIC_V1 == (0x0BD10000 | 0x0BD0));
 
 	/* Checks for struct lov_mds_md_v3 */
 	LASSERTF((int)sizeof(struct lov_mds_md_v3) == 48, "found %lld\n",
@@ -1388,7 +1388,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct lov_mds_md_v3, lmm_objects[0]));
 	LASSERTF((int)sizeof(((struct lov_mds_md_v3 *)0)->lmm_objects[0]) == 24, "found %lld\n",
 		 (long long)(int)sizeof(((struct lov_mds_md_v3 *)0)->lmm_objects[0]));
-	CLASSERT(LOV_MAGIC_V3 == 0x0BD30BD0);
+	CLASSERT(LOV_MAGIC_V3 == (0x0BD30000 | 0x0BD0));
 	LASSERTF(LOV_PATTERN_RAID0 == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)LOV_PATTERN_RAID0);
 	LASSERTF(LOV_PATTERN_RAID1 == 0x00000002UL, "found 0x%.8xUL\n",
@@ -1667,139 +1667,139 @@ void lustre_assert_wire_constants(void)
 	/* Checks for struct mdt_body */
 	LASSERTF((int)sizeof(struct mdt_body) == 216, "found %lld\n",
 		 (long long)(int)sizeof(struct mdt_body));
-	LASSERTF((int)offsetof(struct mdt_body, fid1) == 0, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fid1));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fid1) == 16, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fid1));
-	LASSERTF((int)offsetof(struct mdt_body, fid2) == 16, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fid2));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fid2) == 16, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fid2));
-	LASSERTF((int)offsetof(struct mdt_body, handle) == 32, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, handle));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->handle) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->handle));
-	LASSERTF((int)offsetof(struct mdt_body, valid) == 40, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, valid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->valid) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->valid));
-	LASSERTF((int)offsetof(struct mdt_body, size) == 48, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, size));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->size) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->size));
-	LASSERTF((int)offsetof(struct mdt_body, mtime) == 56, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, mtime));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->mtime) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->mtime));
-	LASSERTF((int)offsetof(struct mdt_body, atime) == 64, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, atime));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->atime) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->atime));
-	LASSERTF((int)offsetof(struct mdt_body, ctime) == 72, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, ctime));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->ctime) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->ctime));
-	LASSERTF((int)offsetof(struct mdt_body, blocks) == 80, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, blocks));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->blocks) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->blocks));
-	LASSERTF((int)offsetof(struct mdt_body, t_state) == 96, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, t_state));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->t_state) == 8,
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fid1) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fid1));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fid1) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fid1));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fid2) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fid2));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fid2) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fid2));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_handle) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_handle));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_handle) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_handle));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_valid) == 40, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_valid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_valid) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_valid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_size) == 48, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_size));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_size) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_size));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_mtime) == 56, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_mtime));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_mtime) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_mtime));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_atime) == 64, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_atime));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_atime) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_atime));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_ctime) == 72, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_ctime));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_ctime) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_ctime));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_blocks) == 80, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_blocks));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_blocks) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_blocks));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_t_state) == 96, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_t_state));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_t_state) == 8,
 		 "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->t_state));
-	LASSERTF((int)offsetof(struct mdt_body, fsuid) == 104, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fsuid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fsuid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fsuid));
-	LASSERTF((int)offsetof(struct mdt_body, fsgid) == 108, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, fsgid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->fsgid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->fsgid));
-	LASSERTF((int)offsetof(struct mdt_body, capability) == 112, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, capability));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->capability) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->capability));
-	LASSERTF((int)offsetof(struct mdt_body, mode) == 116, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, mode));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->mode) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->mode));
-	LASSERTF((int)offsetof(struct mdt_body, uid) == 120, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, uid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->uid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->uid));
-	LASSERTF((int)offsetof(struct mdt_body, gid) == 124, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, gid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->gid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->gid));
-	LASSERTF((int)offsetof(struct mdt_body, flags) == 128, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, flags));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->flags) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->flags));
-	LASSERTF((int)offsetof(struct mdt_body, rdev) == 132, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, rdev));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->rdev) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->rdev));
-	LASSERTF((int)offsetof(struct mdt_body, nlink) == 136, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, nlink));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->nlink) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->nlink));
-	LASSERTF((int)offsetof(struct mdt_body, unused2) == 140, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, unused2));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->unused2) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->unused2));
-	LASSERTF((int)offsetof(struct mdt_body, suppgid) == 144, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, suppgid));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->suppgid) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->suppgid));
-	LASSERTF((int)offsetof(struct mdt_body, eadatasize) == 148, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, eadatasize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->eadatasize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->eadatasize));
-	LASSERTF((int)offsetof(struct mdt_body, aclsize) == 152, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, aclsize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->aclsize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->aclsize));
-	LASSERTF((int)offsetof(struct mdt_body, max_mdsize) == 156, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, max_mdsize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->max_mdsize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->max_mdsize));
-	LASSERTF((int)offsetof(struct mdt_body, max_cookiesize) == 160, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, max_cookiesize));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->max_cookiesize) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->max_cookiesize));
-	LASSERTF((int)offsetof(struct mdt_body, uid_h) == 164, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, uid_h));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->uid_h) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->uid_h));
-	LASSERTF((int)offsetof(struct mdt_body, gid_h) == 168, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, gid_h));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->gid_h) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->gid_h));
-	LASSERTF((int)offsetof(struct mdt_body, padding_5) == 172, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_5));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_5) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_5));
-	LASSERTF((int)offsetof(struct mdt_body, padding_6) == 176, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_6));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_6) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_6));
-	LASSERTF((int)offsetof(struct mdt_body, padding_7) == 184, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_7));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_7) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_7));
-	LASSERTF((int)offsetof(struct mdt_body, padding_8) == 192, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_8));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_8) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_8));
-	LASSERTF((int)offsetof(struct mdt_body, padding_9) == 200, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_9));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_9) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_9));
-	LASSERTF((int)offsetof(struct mdt_body, padding_10) == 208, "found %lld\n",
-		 (long long)(int)offsetof(struct mdt_body, padding_10));
-	LASSERTF((int)sizeof(((struct mdt_body *)0)->padding_10) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct mdt_body *)0)->padding_10));
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_t_state));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fsuid) == 104, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fsuid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fsuid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fsuid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_fsgid) == 108, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_fsgid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_fsgid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_fsgid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_capability) == 112, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_capability));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_capability) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_capability));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_mode) == 116, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_mode));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_mode) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_mode));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_uid) == 120, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_uid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_uid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_uid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_gid) == 124, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_gid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_gid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_gid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_flags) == 128, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_flags));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_flags) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_flags));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_rdev) == 132, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_rdev));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_rdev) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_rdev));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_nlink) == 136, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_nlink));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_nlink) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_nlink));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_unused2) == 140, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_unused2));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_unused2) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_unused2));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_suppgid) == 144, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_suppgid));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_suppgid) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_suppgid));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_eadatasize) == 148, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_eadatasize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_eadatasize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_eadatasize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_aclsize) == 152, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_aclsize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_aclsize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_aclsize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_max_mdsize) == 156, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_max_mdsize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_max_mdsize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_max_mdsize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_max_cookiesize) == 160, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_max_cookiesize));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_max_cookiesize) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_max_cookiesize));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_uid_h) == 164, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_uid_h));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_uid_h) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_uid_h));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_gid_h) == 168, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_gid_h));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_gid_h) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_gid_h));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_5) == 172, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_5));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_5) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_5));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_6) == 176, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_6));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_6) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_6));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_7) == 184, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_7));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_7) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_7));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_8) == 192, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_8));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_8) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_8));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_9) == 200, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_9));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_9) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_9));
+	LASSERTF((int)offsetof(struct mdt_body, mbo_padding_10) == 208, "found %lld\n",
+		 (long long)(int)offsetof(struct mdt_body, mbo_padding_10));
+	LASSERTF((int)sizeof(((struct mdt_body *)0)->mbo_padding_10) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct mdt_body *)0)->mbo_padding_10));
 	LASSERTF(MDS_FMODE_CLOSED == 000000000000UL, "found 0%.11oUL\n",
 		MDS_FMODE_CLOSED);
 	LASSERTF(MDS_FMODE_EXEC == 000000000004UL, "found 0%.11oUL\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 56/80] staging: lustre: clio: Reduce memory overhead of per-page allocation
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

A page in clio used to occupy 584 bytes, which will use size-1024
slab cache. This patch reduces the per-page overhead to 512 bytes
so it can use size-512 instead.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4793
Reviewed-on: http://review.whamcloud.com/10070
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   37 +++++---------------
 drivers/staging/lustre/lustre/llite/vvp_internal.h |    6 ++--
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |    4 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |    6 +--
 drivers/staging/lustre/lustre/lov/lov_page.c       |    1 +
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   10 +-----
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |   12 +-----
 drivers/staging/lustre/lustre/osc/osc_internal.h   |    1 -
 drivers/staging/lustre/lustre/osc/osc_io.c         |    7 +++-
 drivers/staging/lustre/lustre/osc/osc_request.c    |    6 ---
 10 files changed, 26 insertions(+), 64 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 0fa71a5..d269b32 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -690,17 +690,6 @@ enum cl_page_type {
 };
 
 /**
- * Flags maintained for every cl_page.
- */
-enum cl_page_flags {
-	/**
-	 * Set when pagein completes. Used for debugging (read completes at
-	 * most once for a page).
-	 */
-	CPF_READ_COMPLETED = 1 << 0
-};
-
-/**
  * Fields are protected by the lock on struct page, except for atomics and
  * immutables.
  *
@@ -712,26 +701,23 @@ enum cl_page_flags {
 struct cl_page {
 	/** Reference counter. */
 	atomic_t	     cp_ref;
+	/** Transfer error. */
+	int			 cp_error;
 	/** An object this page is a part of. Immutable after creation. */
 	struct cl_object	*cp_obj;
-	/** List of slices. Immutable after creation. */
-	struct list_head	       cp_layers;
 	/** vmpage */
 	struct page		*cp_vmpage;
+	/** Linkage of pages within group. Pages must be owned */
+	struct list_head	 cp_batch;
+	/** List of slices. Immutable after creation. */
+	struct list_head	 cp_layers;
+	/** Linkage of pages within cl_req. */
+	struct list_head         cp_flight;
 	/**
 	 * Page state. This field is const to avoid accidental update, it is
 	 * modified only internally within cl_page.c. Protected by a VM lock.
 	 */
 	const enum cl_page_state cp_state;
-	/** Linkage of pages within group. Protected by cl_page::cp_mutex. */
-	struct list_head		cp_batch;
-	/** Mutex serializing membership of a page in a batch. */
-	struct mutex		cp_mutex;
-	/** Linkage of pages within cl_req. */
-	struct list_head	       cp_flight;
-	/** Transfer error. */
-	int		      cp_error;
-
 	/**
 	 * Page type. Only CPT_TRANSIENT is used so far. Immutable after
 	 * creation.
@@ -744,10 +730,6 @@ struct cl_page {
 	 */
 	struct cl_io	    *cp_owner;
 	/**
-	 * Debug information, the task is owning the page.
-	 */
-	struct task_struct	*cp_task;
-	/**
 	 * Owning IO request in cl_page_state::CPS_PAGEOUT and
 	 * cl_page_state::CPS_PAGEIN states. This field is maintained only in
 	 * the top-level pages. Protected by a VM lock.
@@ -759,8 +741,6 @@ struct cl_page {
 	struct lu_ref_link       cp_obj_ref;
 	/** Link to a queue, for debugging. */
 	struct lu_ref_link       cp_queue_ref;
-	/** Per-page flags from enum cl_page_flags. Protected by a VM lock. */
-	unsigned                 cp_flags;
 	/** Assigned if doing a sync_io */
 	struct cl_sync_io       *cp_sync_io;
 };
@@ -2200,6 +2180,7 @@ static inline void cl_object_page_init(struct cl_object *clob, int size)
 {
 	clob->co_slice_off = cl_object_header(clob)->coh_page_bufsize;
 	cl_object_header(clob)->coh_page_bufsize += cfs_size_round(size);
+	WARN_ON(cl_object_header(clob)->coh_page_bufsize > 512);
 }
 
 static inline void *cl_object_page_slice(struct cl_object *clob,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 79fc428..99437b8 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -247,9 +247,9 @@ struct vvp_object {
  */
 struct vvp_page {
 	struct cl_page_slice vpg_cl;
-	int		  vpg_defer_uptodate;
-	int		  vpg_ra_used;
-	int		  vpg_write_queued;
+	unsigned int	vpg_defer_uptodate:1,
+			vpg_ra_used:1,
+			vpg_write_queued:1;
 	/**
 	 * Non-empty iff this page is already counted in
 	 * vvp_object::vob_pending_list. This list is only used as a flag,
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 9740568..43d1a3f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -289,8 +289,8 @@ struct lov_lock {
 };
 
 struct lov_page {
-	struct cl_page_slice lps_cl;
-	int		  lps_invalid;
+	struct cl_page_slice	lps_cl;
+	unsigned int		lps_stripe; /* stripe index */
 };
 
 /*
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 95126c3..5d47a5a 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -244,14 +244,12 @@ void lov_sub_put(struct lov_io_sub *sub)
 
 int lov_page_stripe(const struct cl_page *page)
 {
-	struct lovsub_object *subobj;
 	const struct cl_page_slice *slice;
 
-	slice = cl_page_at(page, &lovsub_device_type);
+	slice = cl_page_at(page, &lov_device_type);
 	LASSERT(slice->cpl_obj);
 
-	subobj = cl2lovsub(slice->cpl_obj);
-	return subobj->lso_index;
+	return cl2lov_page(slice)->lps_stripe;
 }
 
 struct lov_io_sub *lov_page_subio(const struct lu_env *env, struct lov_io *lio,
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index 45b5ae9..00bfaba 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -129,6 +129,7 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	rc = lov_stripe_offset(loo->lo_lsm, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
+	lpg->lps_stripe = stripe;
 	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_raid0_page_ops);
 
 	sub = lov_sub_get(env, lio, stripe);
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index e72f1fc..4516fff 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -859,9 +859,6 @@ void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page)
 	LASSERT(page->cp_owner);
 	LINVRNT(plist->pl_owner == current);
 
-	lockdep_off();
-	mutex_lock(&page->cp_mutex);
-	lockdep_on();
 	LASSERT(list_empty(&page->cp_batch));
 	list_add_tail(&page->cp_batch, &plist->pl_pages);
 	++plist->pl_nr;
@@ -877,12 +874,10 @@ void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist,
 		      struct cl_page *page)
 {
 	LASSERT(plist->pl_nr > 0);
+	LASSERT(cl_page_is_vmlocked(env, page));
 	LINVRNT(plist->pl_owner == current);
 
 	list_del_init(&page->cp_batch);
-	lockdep_off();
-	mutex_unlock(&page->cp_mutex);
-	lockdep_on();
 	--plist->pl_nr;
 	lu_ref_del_at(&page->cp_reference, &page->cp_queue_ref, "queue", plist);
 	cl_page_put(env, page);
@@ -959,9 +954,6 @@ void cl_page_list_disown(const struct lu_env *env,
 		LASSERT(plist->pl_nr > 0);
 
 		list_del_init(&page->cp_batch);
-		lockdep_off();
-		mutex_unlock(&page->cp_mutex);
-		lockdep_on();
 		--plist->pl_nr;
 		/*
 		 * cl_page_disown0 rather than usual cl_page_disown() is used,
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index db2dc6b..bd71859 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -151,7 +151,6 @@ struct cl_page *cl_page_alloc(const struct lu_env *env,
 		INIT_LIST_HEAD(&page->cp_layers);
 		INIT_LIST_HEAD(&page->cp_batch);
 		INIT_LIST_HEAD(&page->cp_flight);
-		mutex_init(&page->cp_mutex);
 		lu_ref_init(&page->cp_reference);
 		head = o->co_lu.lo_header;
 		list_for_each_entry(o, &head->loh_layers, co_lu.lo_linkage) {
@@ -478,7 +477,6 @@ static void cl_page_owner_clear(struct cl_page *page)
 		LASSERT(page->cp_owner->ci_owned_nr > 0);
 		page->cp_owner->ci_owned_nr--;
 		page->cp_owner = NULL;
-		page->cp_task = NULL;
 	}
 }
 
@@ -562,7 +560,6 @@ static int cl_page_own0(const struct lu_env *env, struct cl_io *io,
 			PASSERT(env, pg, !pg->cp_owner);
 			PASSERT(env, pg, !pg->cp_req);
 			pg->cp_owner = cl_io_top(io);
-			pg->cp_task  = current;
 			cl_page_owner_set(pg);
 			if (pg->cp_state != CPS_FREEING) {
 				cl_page_state_set(env, pg, CPS_OWNED);
@@ -619,7 +616,6 @@ void cl_page_assume(const struct lu_env *env,
 	cl_page_invoid(env, io, pg, CL_PAGE_OP(cpo_assume));
 	PASSERT(env, pg, !pg->cp_owner);
 	pg->cp_owner = cl_io_top(io);
-	pg->cp_task = current;
 	cl_page_owner_set(pg);
 	cl_page_state_set(env, pg, CPS_OWNED);
 }
@@ -860,10 +856,6 @@ void cl_page_completion(const struct lu_env *env,
 	PASSERT(env, pg, pg->cp_state == cl_req_type_state(crt));
 
 	CL_PAGE_HEADER(D_TRACE, env, pg, "%d %d\n", crt, ioret);
-	if (crt == CRT_READ && ioret == 0) {
-		PASSERT(env, pg, !(pg->cp_flags & CPF_READ_COMPLETED));
-		pg->cp_flags |= CPF_READ_COMPLETED;
-	}
 
 	cl_page_state_set(env, pg, CPS_CACHED);
 	if (crt >= CRT_NR)
@@ -989,10 +981,10 @@ void cl_page_header_print(const struct lu_env *env, void *cookie,
 			  lu_printer_t printer, const struct cl_page *pg)
 {
 	(*printer)(env, cookie,
-		   "page@%p[%d %p %d %d %d %p %p %#x]\n",
+		   "page@%p[%d %p %d %d %d %p %p]\n",
 		   pg, atomic_read(&pg->cp_ref), pg->cp_obj,
 		   pg->cp_state, pg->cp_error, pg->cp_type,
-		   pg->cp_owner, pg->cp_req, pg->cp_flags);
+		   pg->cp_owner, pg->cp_req);
 }
 EXPORT_SYMBOL(cl_page_header_print);
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index 7a27f09..2038885 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -71,7 +71,6 @@ struct osc_async_page {
 	struct client_obd       *oap_cli;
 	struct osc_object       *oap_obj;
 
-	struct ldlm_lock	*oap_ldlm_lock;
 	spinlock_t		 oap_lock;
 };
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 6e3dcd3..69424ea 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -163,7 +163,6 @@ static int osc_io_submit(const struct lu_env *env,
 			continue;
 		}
 
-		cl_page_list_move(qout, qin, page);
 		spin_lock(&oap->oap_lock);
 		oap->oap_async_flags = ASYNC_URGENT|ASYNC_READY;
 		oap->oap_async_flags |= ASYNC_COUNT_STABLE;
@@ -171,6 +170,12 @@ static int osc_io_submit(const struct lu_env *env,
 
 		osc_page_submit(env, opg, crt, brw_flags);
 		list_add_tail(&oap->oap_pending_item, &list);
+
+		if (page->cp_sync_io)
+			cl_page_list_move(qout, qin, page);
+		else /* async IO */
+			cl_page_list_del(env, qin, page);
+
 		if (++queued == max_pages) {
 			queued = 0;
 			result = osc_queue_sync_pages(env, osc, &list, cmd,
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index d231827..042a081 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1882,7 +1882,6 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	struct osc_async_page *tmp;
 	struct cl_req *clerq = NULL;
 	enum cl_req_type crt = (cmd & OBD_BRW_WRITE) ? CRT_WRITE : CRT_READ;
-	struct ldlm_lock *lock = NULL;
 	struct cl_req_attr *crattr = NULL;
 	u64 starting_offset = OBD_OBJECT_EOF;
 	u64 ending_offset = 0;
@@ -1948,7 +1947,6 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 				rc = PTR_ERR(clerq);
 				goto out;
 			}
-			lock = oap->oap_ldlm_lock;
 		}
 		if (mem_tight)
 			oap->oap_brw_flags |= OBD_BRW_MEMALLOC;
@@ -1965,10 +1963,6 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	LASSERT(clerq);
 	crattr->cra_oa = oa;
 	cl_req_attr_set(env, clerq, crattr, ~0ULL);
-	if (lock) {
-		oa->o_handle = lock->l_remote_handle;
-		oa->o_valid |= OBD_MD_FLHANDLE;
-	}
 
 	rc = cl_req_prep(env, clerq);
 	if (rc != 0) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 56/80] staging: lustre: clio: Reduce memory overhead of per-page allocation
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

A page in clio used to occupy 584 bytes, which will use size-1024
slab cache. This patch reduces the per-page overhead to 512 bytes
so it can use size-512 instead.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4793
Reviewed-on: http://review.whamcloud.com/10070
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   37 +++++---------------
 drivers/staging/lustre/lustre/llite/vvp_internal.h |    6 ++--
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |    4 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |    6 +--
 drivers/staging/lustre/lustre/lov/lov_page.c       |    1 +
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   10 +-----
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |   12 +-----
 drivers/staging/lustre/lustre/osc/osc_internal.h   |    1 -
 drivers/staging/lustre/lustre/osc/osc_io.c         |    7 +++-
 drivers/staging/lustre/lustre/osc/osc_request.c    |    6 ---
 10 files changed, 26 insertions(+), 64 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 0fa71a5..d269b32 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -690,17 +690,6 @@ enum cl_page_type {
 };
 
 /**
- * Flags maintained for every cl_page.
- */
-enum cl_page_flags {
-	/**
-	 * Set when pagein completes. Used for debugging (read completes at
-	 * most once for a page).
-	 */
-	CPF_READ_COMPLETED = 1 << 0
-};
-
-/**
  * Fields are protected by the lock on struct page, except for atomics and
  * immutables.
  *
@@ -712,26 +701,23 @@ enum cl_page_flags {
 struct cl_page {
 	/** Reference counter. */
 	atomic_t	     cp_ref;
+	/** Transfer error. */
+	int			 cp_error;
 	/** An object this page is a part of. Immutable after creation. */
 	struct cl_object	*cp_obj;
-	/** List of slices. Immutable after creation. */
-	struct list_head	       cp_layers;
 	/** vmpage */
 	struct page		*cp_vmpage;
+	/** Linkage of pages within group. Pages must be owned */
+	struct list_head	 cp_batch;
+	/** List of slices. Immutable after creation. */
+	struct list_head	 cp_layers;
+	/** Linkage of pages within cl_req. */
+	struct list_head         cp_flight;
 	/**
 	 * Page state. This field is const to avoid accidental update, it is
 	 * modified only internally within cl_page.c. Protected by a VM lock.
 	 */
 	const enum cl_page_state cp_state;
-	/** Linkage of pages within group. Protected by cl_page::cp_mutex. */
-	struct list_head		cp_batch;
-	/** Mutex serializing membership of a page in a batch. */
-	struct mutex		cp_mutex;
-	/** Linkage of pages within cl_req. */
-	struct list_head	       cp_flight;
-	/** Transfer error. */
-	int		      cp_error;
-
 	/**
 	 * Page type. Only CPT_TRANSIENT is used so far. Immutable after
 	 * creation.
@@ -744,10 +730,6 @@ struct cl_page {
 	 */
 	struct cl_io	    *cp_owner;
 	/**
-	 * Debug information, the task is owning the page.
-	 */
-	struct task_struct	*cp_task;
-	/**
 	 * Owning IO request in cl_page_state::CPS_PAGEOUT and
 	 * cl_page_state::CPS_PAGEIN states. This field is maintained only in
 	 * the top-level pages. Protected by a VM lock.
@@ -759,8 +741,6 @@ struct cl_page {
 	struct lu_ref_link       cp_obj_ref;
 	/** Link to a queue, for debugging. */
 	struct lu_ref_link       cp_queue_ref;
-	/** Per-page flags from enum cl_page_flags. Protected by a VM lock. */
-	unsigned                 cp_flags;
 	/** Assigned if doing a sync_io */
 	struct cl_sync_io       *cp_sync_io;
 };
@@ -2200,6 +2180,7 @@ static inline void cl_object_page_init(struct cl_object *clob, int size)
 {
 	clob->co_slice_off = cl_object_header(clob)->coh_page_bufsize;
 	cl_object_header(clob)->coh_page_bufsize += cfs_size_round(size);
+	WARN_ON(cl_object_header(clob)->coh_page_bufsize > 512);
 }
 
 static inline void *cl_object_page_slice(struct cl_object *clob,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 79fc428..99437b8 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -247,9 +247,9 @@ struct vvp_object {
  */
 struct vvp_page {
 	struct cl_page_slice vpg_cl;
-	int		  vpg_defer_uptodate;
-	int		  vpg_ra_used;
-	int		  vpg_write_queued;
+	unsigned int	vpg_defer_uptodate:1,
+			vpg_ra_used:1,
+			vpg_write_queued:1;
 	/**
 	 * Non-empty iff this page is already counted in
 	 * vvp_object::vob_pending_list. This list is only used as a flag,
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 9740568..43d1a3f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -289,8 +289,8 @@ struct lov_lock {
 };
 
 struct lov_page {
-	struct cl_page_slice lps_cl;
-	int		  lps_invalid;
+	struct cl_page_slice	lps_cl;
+	unsigned int		lps_stripe; /* stripe index */
 };
 
 /*
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 95126c3..5d47a5a 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -244,14 +244,12 @@ void lov_sub_put(struct lov_io_sub *sub)
 
 int lov_page_stripe(const struct cl_page *page)
 {
-	struct lovsub_object *subobj;
 	const struct cl_page_slice *slice;
 
-	slice = cl_page_at(page, &lovsub_device_type);
+	slice = cl_page_at(page, &lov_device_type);
 	LASSERT(slice->cpl_obj);
 
-	subobj = cl2lovsub(slice->cpl_obj);
-	return subobj->lso_index;
+	return cl2lov_page(slice)->lps_stripe;
 }
 
 struct lov_io_sub *lov_page_subio(const struct lu_env *env, struct lov_io *lio,
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index 45b5ae9..00bfaba 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -129,6 +129,7 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	rc = lov_stripe_offset(loo->lo_lsm, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
+	lpg->lps_stripe = stripe;
 	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_raid0_page_ops);
 
 	sub = lov_sub_get(env, lio, stripe);
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index e72f1fc..4516fff 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -859,9 +859,6 @@ void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page)
 	LASSERT(page->cp_owner);
 	LINVRNT(plist->pl_owner == current);
 
-	lockdep_off();
-	mutex_lock(&page->cp_mutex);
-	lockdep_on();
 	LASSERT(list_empty(&page->cp_batch));
 	list_add_tail(&page->cp_batch, &plist->pl_pages);
 	++plist->pl_nr;
@@ -877,12 +874,10 @@ void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist,
 		      struct cl_page *page)
 {
 	LASSERT(plist->pl_nr > 0);
+	LASSERT(cl_page_is_vmlocked(env, page));
 	LINVRNT(plist->pl_owner == current);
 
 	list_del_init(&page->cp_batch);
-	lockdep_off();
-	mutex_unlock(&page->cp_mutex);
-	lockdep_on();
 	--plist->pl_nr;
 	lu_ref_del_at(&page->cp_reference, &page->cp_queue_ref, "queue", plist);
 	cl_page_put(env, page);
@@ -959,9 +954,6 @@ void cl_page_list_disown(const struct lu_env *env,
 		LASSERT(plist->pl_nr > 0);
 
 		list_del_init(&page->cp_batch);
-		lockdep_off();
-		mutex_unlock(&page->cp_mutex);
-		lockdep_on();
 		--plist->pl_nr;
 		/*
 		 * cl_page_disown0 rather than usual cl_page_disown() is used,
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index db2dc6b..bd71859 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -151,7 +151,6 @@ struct cl_page *cl_page_alloc(const struct lu_env *env,
 		INIT_LIST_HEAD(&page->cp_layers);
 		INIT_LIST_HEAD(&page->cp_batch);
 		INIT_LIST_HEAD(&page->cp_flight);
-		mutex_init(&page->cp_mutex);
 		lu_ref_init(&page->cp_reference);
 		head = o->co_lu.lo_header;
 		list_for_each_entry(o, &head->loh_layers, co_lu.lo_linkage) {
@@ -478,7 +477,6 @@ static void cl_page_owner_clear(struct cl_page *page)
 		LASSERT(page->cp_owner->ci_owned_nr > 0);
 		page->cp_owner->ci_owned_nr--;
 		page->cp_owner = NULL;
-		page->cp_task = NULL;
 	}
 }
 
@@ -562,7 +560,6 @@ static int cl_page_own0(const struct lu_env *env, struct cl_io *io,
 			PASSERT(env, pg, !pg->cp_owner);
 			PASSERT(env, pg, !pg->cp_req);
 			pg->cp_owner = cl_io_top(io);
-			pg->cp_task  = current;
 			cl_page_owner_set(pg);
 			if (pg->cp_state != CPS_FREEING) {
 				cl_page_state_set(env, pg, CPS_OWNED);
@@ -619,7 +616,6 @@ void cl_page_assume(const struct lu_env *env,
 	cl_page_invoid(env, io, pg, CL_PAGE_OP(cpo_assume));
 	PASSERT(env, pg, !pg->cp_owner);
 	pg->cp_owner = cl_io_top(io);
-	pg->cp_task = current;
 	cl_page_owner_set(pg);
 	cl_page_state_set(env, pg, CPS_OWNED);
 }
@@ -860,10 +856,6 @@ void cl_page_completion(const struct lu_env *env,
 	PASSERT(env, pg, pg->cp_state == cl_req_type_state(crt));
 
 	CL_PAGE_HEADER(D_TRACE, env, pg, "%d %d\n", crt, ioret);
-	if (crt == CRT_READ && ioret == 0) {
-		PASSERT(env, pg, !(pg->cp_flags & CPF_READ_COMPLETED));
-		pg->cp_flags |= CPF_READ_COMPLETED;
-	}
 
 	cl_page_state_set(env, pg, CPS_CACHED);
 	if (crt >= CRT_NR)
@@ -989,10 +981,10 @@ void cl_page_header_print(const struct lu_env *env, void *cookie,
 			  lu_printer_t printer, const struct cl_page *pg)
 {
 	(*printer)(env, cookie,
-		   "page@%p[%d %p %d %d %d %p %p %#x]\n",
+		   "page@%p[%d %p %d %d %d %p %p]\n",
 		   pg, atomic_read(&pg->cp_ref), pg->cp_obj,
 		   pg->cp_state, pg->cp_error, pg->cp_type,
-		   pg->cp_owner, pg->cp_req, pg->cp_flags);
+		   pg->cp_owner, pg->cp_req);
 }
 EXPORT_SYMBOL(cl_page_header_print);
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index 7a27f09..2038885 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -71,7 +71,6 @@ struct osc_async_page {
 	struct client_obd       *oap_cli;
 	struct osc_object       *oap_obj;
 
-	struct ldlm_lock	*oap_ldlm_lock;
 	spinlock_t		 oap_lock;
 };
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 6e3dcd3..69424ea 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -163,7 +163,6 @@ static int osc_io_submit(const struct lu_env *env,
 			continue;
 		}
 
-		cl_page_list_move(qout, qin, page);
 		spin_lock(&oap->oap_lock);
 		oap->oap_async_flags = ASYNC_URGENT|ASYNC_READY;
 		oap->oap_async_flags |= ASYNC_COUNT_STABLE;
@@ -171,6 +170,12 @@ static int osc_io_submit(const struct lu_env *env,
 
 		osc_page_submit(env, opg, crt, brw_flags);
 		list_add_tail(&oap->oap_pending_item, &list);
+
+		if (page->cp_sync_io)
+			cl_page_list_move(qout, qin, page);
+		else /* async IO */
+			cl_page_list_del(env, qin, page);
+
 		if (++queued == max_pages) {
 			queued = 0;
 			result = osc_queue_sync_pages(env, osc, &list, cmd,
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index d231827..042a081 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1882,7 +1882,6 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	struct osc_async_page *tmp;
 	struct cl_req *clerq = NULL;
 	enum cl_req_type crt = (cmd & OBD_BRW_WRITE) ? CRT_WRITE : CRT_READ;
-	struct ldlm_lock *lock = NULL;
 	struct cl_req_attr *crattr = NULL;
 	u64 starting_offset = OBD_OBJECT_EOF;
 	u64 ending_offset = 0;
@@ -1948,7 +1947,6 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 				rc = PTR_ERR(clerq);
 				goto out;
 			}
-			lock = oap->oap_ldlm_lock;
 		}
 		if (mem_tight)
 			oap->oap_brw_flags |= OBD_BRW_MEMALLOC;
@@ -1965,10 +1963,6 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	LASSERT(clerq);
 	crattr->cra_oa = oa;
 	cl_req_attr_set(env, clerq, crattr, ~0ULL);
-	if (lock) {
-		oa->o_handle = lock->l_remote_handle;
-		oa->o_valid |= OBD_MD_FLHANDLE;
-	}
 
 	rc = cl_req_prep(env, clerq);
 	if (rc != 0) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

A few changes are made in this patch for unstable pages tracking:

1. Remove kernel NFS unstable pages tracking because it killed
   performance
2. Track unstable pages as part of LRU cache. Otherwise Lustre
   can use much more memory than max_cached_mb
3. Remove obd_unstable_pages tracking to avoid using global
   atomic counter
4. Make unstable pages track optional. Tracking unstable pages is
   turned off by default, and can be controlled by
   llite.*.unstable_stats.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
Reviewed-on: http://review.whamcloud.com/10003
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   35 +++-
 .../staging/lustre/lustre/include/obd_support.h    |    1 -
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |   41 ++++-
 drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 -
 drivers/staging/lustre/lustre/osc/osc_cache.c      |   96 +---------
 drivers/staging/lustre/lustre/osc/osc_internal.h   |    2 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  208 +++++++++++++++++---
 drivers/staging/lustre/lustre/osc/osc_request.c    |   13 +-
 8 files changed, 253 insertions(+), 145 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index d269b32..ec6cf7c 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1039,23 +1039,32 @@ do {									  \
 	}								     \
 } while (0)
 
-static inline int __page_in_use(const struct cl_page *page, int refc)
-{
-	if (page->cp_type == CPT_CACHEABLE)
-		++refc;
-	LASSERT(atomic_read(&page->cp_ref) > 0);
-	return (atomic_read(&page->cp_ref) > refc);
-}
-
-#define cl_page_in_use(pg)       __page_in_use(pg, 1)
-#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
-
 static inline struct page *cl_page_vmpage(struct cl_page *page)
 {
 	LASSERT(page->cp_vmpage);
 	return page->cp_vmpage;
 }
 
+/**
+ * Check if a cl_page is in use.
+ *
+ * Client cache holds a refcount, this refcount will be dropped when
+ * the page is taken out of cache, see vvp_page_delete().
+ */
+static inline bool __page_in_use(const struct cl_page *page, int refc)
+{
+	return (atomic_read(&page->cp_ref) > refc + 1);
+}
+
+/**
+ * Caller itself holds a refcount of cl_page.
+ */
+#define cl_page_in_use(pg)	 __page_in_use(pg, 1)
+/**
+ * Caller doesn't hold a refcount.
+ */
+#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
+
 /** @} cl_page */
 
 /** \addtogroup cl_lock cl_lock
@@ -2331,6 +2340,10 @@ struct cl_client_cache {
 	 */
 	spinlock_t		ccc_lru_lock;
 	/**
+	 * Set if unstable check is enabled
+	 */
+	unsigned int		ccc_unstable_check:1;
+	/**
 	 * # of unstable pages for this mount point
 	 */
 	atomic_t		ccc_unstable_nr;
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 26fdff6..a11fff1 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -54,7 +54,6 @@ extern int at_early_margin;
 extern int at_extra;
 extern unsigned int obd_sync_filter;
 extern unsigned int obd_max_dirty_pages;
-extern atomic_t obd_unstable_pages;
 extern atomic_t obd_dirty_pages;
 extern atomic_t obd_dirty_transit_pages;
 extern char obd_jobid_var[];
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index 2f1f389..5f8e78d 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -828,10 +828,45 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
 	pages = atomic_read(&cache->ccc_unstable_nr);
 	mb = (pages * PAGE_SIZE) >> 20;
 
-	return sprintf(buf, "unstable_pages: %8d\n"
-			    "unstable_mb:    %8d\n", pages, mb);
+	return sprintf(buf, "unstable_check: %8d\n"
+			    "unstable_pages: %8d\n"
+			    "unstable_mb:    %8d\n",
+			    cache->ccc_unstable_check, pages, mb);
 }
-LUSTRE_RO_ATTR(unstable_stats);
+
+static ssize_t unstable_stats_store(struct kobject *kobj,
+				    struct attribute *attr,
+				    const char *buffer,
+				    size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	char kernbuf[128];
+	int val, rc;
+
+	if (!count)
+		return 0;
+	if (count < 0 || count >= sizeof(kernbuf))
+		return -EINVAL;
+
+	if (copy_from_user(kernbuf, buffer, count))
+		return -EFAULT;
+	kernbuf[count] = 0;
+
+	buffer += lprocfs_find_named_value(kernbuf, "unstable_check:", &count) -
+		  kernbuf;
+	rc = lprocfs_write_helper(buffer, count, &val);
+	if (rc < 0)
+		return rc;
+
+	/* borrow lru lock to set the value */
+	spin_lock(&sbi->ll_cache->ccc_lru_lock);
+	sbi->ll_cache->ccc_unstable_check = !!val;
+	spin_unlock(&sbi->ll_cache->ccc_lru_lock);
+
+	return count;
+}
+LUSTRE_RW_ATTR(unstable_stats);
 
 static ssize_t root_squash_show(struct kobject *kobj, struct attribute *attr,
 				char *buf)
diff --git a/drivers/staging/lustre/lustre/obdclass/class_obd.c b/drivers/staging/lustre/lustre/obdclass/class_obd.c
index 6edf53e..90a365b 100644
--- a/drivers/staging/lustre/lustre/obdclass/class_obd.c
+++ b/drivers/staging/lustre/lustre/obdclass/class_obd.c
@@ -57,8 +57,6 @@ unsigned int obd_dump_on_eviction;
 EXPORT_SYMBOL(obd_dump_on_eviction);
 unsigned int obd_max_dirty_pages = 256;
 EXPORT_SYMBOL(obd_max_dirty_pages);
-atomic_t obd_unstable_pages;
-EXPORT_SYMBOL(obd_unstable_pages);
 atomic_t obd_dirty_pages;
 EXPORT_SYMBOL(obd_dirty_pages);
 unsigned int obd_timeout = OBD_TIMEOUT_DEFAULT;   /* seconds */
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 683b3c2..deaf912 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1384,13 +1384,11 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 #define OSC_DUMP_GRANT(lvl, cli, fmt, args...) do {			      \
 	struct client_obd *__tmp = (cli);				      \
 	CDEBUG(lvl, "%s: grant { dirty: %ld/%ld dirty_pages: %d/%d "	      \
-	       "unstable_pages: %d/%d dropped: %ld avail: %ld, "	      \
-	       "reserved: %ld, flight: %d } lru {in list: %d, "		      \
-	       "left: %d, waiters: %d }" fmt,                                 \
+	       "dropped: %ld avail: %ld, reserved: %ld, flight: %d }"	      \
+	       "lru {in list: %d, left: %d, waiters: %d }" fmt,		      \
 	       __tmp->cl_import->imp_obd->obd_name,			      \
 	       __tmp->cl_dirty, __tmp->cl_dirty_max,			      \
 	       atomic_read(&obd_dirty_pages), obd_max_dirty_pages,	      \
-	       atomic_read(&obd_unstable_pages), obd_max_dirty_pages,	      \
 	       __tmp->cl_lost_grant, __tmp->cl_avail_grant,		      \
 	       __tmp->cl_reserved_grant, __tmp->cl_w_in_flight,		      \
 	       atomic_read(&__tmp->cl_lru_in_list),			      \
@@ -1542,8 +1540,7 @@ static int osc_enter_cache_try(struct client_obd *cli,
 		return 0;
 
 	if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
-	    atomic_read(&obd_unstable_pages) + 1 +
-	    atomic_read(&obd_dirty_pages) <= obd_max_dirty_pages) {
+	    atomic_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) {
 		osc_consume_write_grant(cli, &oap->oap_brw_page);
 		if (transient) {
 			cli->cl_dirty_transit += PAGE_SIZE;
@@ -1671,8 +1668,7 @@ void osc_wake_cache_waiters(struct client_obd *cli)
 		ocw->ocw_rc = -EDQUOT;
 		/* we can't dirty more */
 		if ((cli->cl_dirty + PAGE_SIZE > cli->cl_dirty_max) ||
-		    (atomic_read(&obd_unstable_pages) + 1 +
-		     atomic_read(&obd_dirty_pages) > obd_max_dirty_pages)) {
+		    (atomic_read(&obd_dirty_pages) + 1 > obd_max_dirty_pages)) {
 			CDEBUG(D_CACHE, "no dirty room: dirty: %ld osc max %ld, sys max %d\n",
 			       cli->cl_dirty,
 			       cli->cl_dirty_max, obd_max_dirty_pages);
@@ -1843,84 +1839,6 @@ static void osc_process_ar(struct osc_async_rc *ar, __u64 xid,
 		ar->ar_force_sync = 0;
 }
 
-/**
- * Performs "unstable" page accounting. This function balances the
- * increment operations performed in osc_inc_unstable_pages. It is
- * registered as the RPC request callback, and is executed when the
- * bulk RPC is committed on the server. Thus at this point, the pages
- * involved in the bulk transfer are no longer considered unstable.
- */
-void osc_dec_unstable_pages(struct ptlrpc_request *req)
-{
-	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
-	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
-	int page_count = desc->bd_iov_count;
-	int i;
-
-	/* No unstable page tracking */
-	if (!cli->cl_cache)
-		return;
-
-	LASSERT(page_count >= 0);
-
-	for (i = 0; i < page_count; i++)
-		dec_node_page_state(desc->bd_iov[i].bv_page, NR_UNSTABLE_NFS);
-
-	atomic_sub(page_count, &cli->cl_cache->ccc_unstable_nr);
-	LASSERT(atomic_read(&cli->cl_cache->ccc_unstable_nr) >= 0);
-
-	atomic_sub(page_count, &cli->cl_unstable_count);
-	LASSERT(atomic_read(&cli->cl_unstable_count) >= 0);
-
-	atomic_sub(page_count, &obd_unstable_pages);
-	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
-
-	wake_up_all(&cli->cl_cache->ccc_unstable_waitq);
-}
-
-/* "unstable" page accounting. See: osc_dec_unstable_pages. */
-void osc_inc_unstable_pages(struct ptlrpc_request *req)
-{
-	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
-	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
-	long page_count = desc->bd_iov_count;
-	int i;
-
-	/* No unstable page tracking */
-	if (!cli->cl_cache)
-		return;
-
-	LASSERT(page_count >= 0);
-
-	for (i = 0; i < page_count; i++)
-		inc_node_page_state(desc->bd_iov[i].bv_page, NR_UNSTABLE_NFS);
-
-	LASSERT(atomic_read(&cli->cl_cache->ccc_unstable_nr) >= 0);
-	atomic_add(page_count, &cli->cl_cache->ccc_unstable_nr);
-
-	LASSERT(atomic_read(&cli->cl_unstable_count) >= 0);
-	atomic_add(page_count, &cli->cl_unstable_count);
-
-	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
-	atomic_add(page_count, &obd_unstable_pages);
-
-	/*
-	 * If the request has already been committed (i.e. brw_commit
-	 * called via rq_commit_cb), we need to undo the unstable page
-	 * increments we just performed because rq_commit_cb wont be
-	 * called again.
-	 */
-	spin_lock(&req->rq_lock);
-	if (unlikely(req->rq_committed)) {
-		/* Drop lock before calling osc_dec_unstable_pages */
-		spin_unlock(&req->rq_lock);
-		osc_dec_unstable_pages(req);
-	} else {
-		req->rq_unstable = 1;
-		spin_unlock(&req->rq_lock);
-	}
-}
-
 /* this must be called holding the loi list lock to give coverage to exit_cache,
  * async_flag maintenance, and oap_request
  */
@@ -1932,9 +1850,6 @@ static void osc_ap_completion(const struct lu_env *env, struct client_obd *cli,
 	__u64 xid = 0;
 
 	if (oap->oap_request) {
-		if (!rc)
-			osc_inc_unstable_pages(oap->oap_request);
-
 		xid = ptlrpc_req_xid(oap->oap_request);
 		ptlrpc_req_finished(oap->oap_request);
 		oap->oap_request = NULL;
@@ -2421,9 +2336,6 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 			return rc;
 	}
 
-	if (osc_over_unstable_soft_limit(cli))
-		brw_flags |= OBD_BRW_SOFT_SYNC;
-
 	oap->oap_cmd = cmd;
 	oap->oap_page_off = ops->ops_from;
 	oap->oap_count = ops->ops_to - ops->ops_from;
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index 2038885..eca5fef 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -197,7 +197,7 @@ int osc_quotacheck(struct obd_device *unused, struct obd_export *exp,
 int osc_quota_poll_check(struct obd_export *exp, struct if_quotacheck *qchk);
 void osc_inc_unstable_pages(struct ptlrpc_request *req);
 void osc_dec_unstable_pages(struct ptlrpc_request *req);
-int  osc_over_unstable_soft_limit(struct client_obd *cli);
+bool osc_over_unstable_soft_limit(struct client_obd *cli);
 
 struct ldlm_lock *osc_dlmlock_at_pgoff(const struct lu_env *env,
 				       struct osc_object *obj, pgoff_t index,
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 355f496..583a0af 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -323,32 +323,6 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 	return result;
 }
 
-int osc_over_unstable_soft_limit(struct client_obd *cli)
-{
-	long obd_upages, obd_dpages, osc_upages;
-
-	/* Can't check cli->cl_unstable_count, therefore, no soft limit */
-	if (!cli)
-		return 0;
-
-	obd_upages = atomic_read(&obd_unstable_pages);
-	obd_dpages = atomic_read(&obd_dirty_pages);
-
-	osc_upages = atomic_read(&cli->cl_unstable_count);
-
-	/*
-	 * obd_max_dirty_pages is the max number of (dirty + unstable)
-	 * pages allowed at any given time. To simulate an unstable page
-	 * only limit, we subtract the current number of dirty pages
-	 * from this max. This difference is roughly the amount of pages
-	 * currently available for unstable pages. Thus, the soft limit
-	 * is half of that difference. Check osc_upages to ensure we don't
-	 * set SOFT_SYNC for OSCs without any outstanding unstable pages.
-	 */
-	return osc_upages &&
-	       obd_upages >= (obd_max_dirty_pages - obd_dpages) / 2;
-}
-
 /**
  * Helper function called by osc_io_submit() for every page in an immediate
  * transfer (i.e., transferred synchronously).
@@ -368,9 +342,6 @@ void osc_page_submit(const struct lu_env *env, struct osc_page *opg,
 	oap->oap_count = opg->ops_to - opg->ops_from;
 	oap->oap_brw_flags = brw_flags | OBD_BRW_SYNC;
 
-	if (osc_over_unstable_soft_limit(oap->oap_cli))
-		oap->oap_brw_flags |= OBD_BRW_SOFT_SYNC;
-
 	if (capable(CFS_CAP_SYS_RESOURCE)) {
 		oap->oap_brw_flags |= OBD_BRW_NOQUOTA;
 		oap->oap_cmd |= OBD_BRW_NOQUOTA;
@@ -540,6 +511,28 @@ static void discard_pagevec(const struct lu_env *env, struct cl_io *io,
 }
 
 /**
+ * Check if a cl_page can be released, i.e, it's not being used.
+ *
+ * If unstable account is turned on, bulk transfer may hold one refcount
+ * for recovery so we need to check vmpage refcount as well; otherwise,
+ * even we can destroy cl_page but the corresponding vmpage can't be reused.
+ */
+static inline bool lru_page_busy(struct client_obd *cli, struct cl_page *page)
+{
+	if (cl_page_in_use_noref(page))
+		return true;
+
+	if (cli->cl_cache->ccc_unstable_check) {
+		struct page *vmpage = cl_page_vmpage(page);
+
+		/* vmpage have two known users: cl_page and VM page cache */
+		if (page_count(vmpage) - page_mapcount(vmpage) > 2)
+			return true;
+	}
+	return false;
+}
+
+/**
  * Drop @target of pages from LRU at most.
  */
 int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
@@ -584,7 +577,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 			break;
 
 		page = opg->ops_cl.cpl_page;
-		if (cl_page_in_use_noref(page)) {
+		if (lru_page_busy(cli, page)) {
 			list_move_tail(&opg->ops_lru, &cli->cl_lru_list);
 			continue;
 		}
@@ -620,7 +613,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 		}
 
 		if (cl_page_own_try(env, io, page) == 0) {
-			if (!cl_page_in_use_noref(page)) {
+			if (!lru_page_busy(cli, page)) {
 				/* remove it from lru list earlier to avoid
 				 * lock contention
 				 */
@@ -742,6 +735,13 @@ out:
 	return rc;
 }
 
+/**
+ * osc_lru_reserve() is called to reserve an LRU slot for a cl_page.
+ *
+ * Usually the LRU slots are reserved in osc_io_iter_rw_init().
+ * Only in the case that the LRU slots are in extreme shortage, it should
+ * have reserved enough slots for an IO.
+ */
 static int osc_lru_reserve(const struct lu_env *env, struct osc_object *obj,
 			   struct osc_page *opg)
 {
@@ -787,4 +787,150 @@ out:
 	return rc;
 }
 
+/**
+ * Atomic operations are expensive. We accumulate the accounting for the
+ * same page zone to get better performance.
+ * In practice this can work pretty good because the pages in the same RPC
+ * are likely from the same page zone.
+ */
+static inline void unstable_page_accounting(struct ptlrpc_bulk_desc *desc,
+					    int factor)
+{
+	int page_count = desc->bd_iov_count;
+	void *zone = NULL;
+	int count = 0;
+	int i;
+
+	for (i = 0; i < page_count; i++) {
+		void *pz = page_zone(desc->bd_iov[i].bv_page);
+
+		if (likely(pz == zone)) {
+			++count;
+			continue;
+		}
+
+		if (count > 0) {
+			mod_zone_page_state(zone, NR_UNSTABLE_NFS,
+					    factor * count);
+			count = 0;
+		}
+		zone = pz;
+		++count;
+	}
+	if (count > 0)
+		mod_zone_page_state(zone, NR_UNSTABLE_NFS, factor * count);
+}
+
+static inline void add_unstable_page_accounting(struct ptlrpc_bulk_desc *desc)
+{
+	unstable_page_accounting(desc, 1);
+}
+
+static inline void dec_unstable_page_accounting(struct ptlrpc_bulk_desc *desc)
+{
+	unstable_page_accounting(desc, -1);
+}
+
+/**
+ * Performs "unstable" page accounting. This function balances the
+ * increment operations performed in osc_inc_unstable_pages. It is
+ * registered as the RPC request callback, and is executed when the
+ * bulk RPC is committed on the server. Thus at this point, the pages
+ * involved in the bulk transfer are no longer considered unstable.
+ *
+ * If this function is called, the request should have been committed
+ * or req:rq_unstable must have been set; it implies that the unstable
+ * statistic have been added.
+ */
+void osc_dec_unstable_pages(struct ptlrpc_request *req)
+{
+	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
+	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
+	int page_count = desc->bd_iov_count;
+	int unstable_count;
+
+	LASSERT(page_count >= 0);
+	dec_unstable_page_accounting(desc);
+
+	unstable_count = atomic_sub_return(page_count, &cli->cl_unstable_count);
+	LASSERT(unstable_count >= 0);
+
+	unstable_count = atomic_sub_return(page_count,
+					   &cli->cl_cache->ccc_unstable_nr);
+	LASSERT(unstable_count >= 0);
+	if (!unstable_count)
+		wake_up_all(&cli->cl_cache->ccc_unstable_waitq);
+
+	if (osc_cache_too_much(cli))
+		(void)ptlrpcd_queue_work(cli->cl_lru_work);
+}
+
+/**
+ * "unstable" page accounting. See: osc_dec_unstable_pages.
+ */
+void osc_inc_unstable_pages(struct ptlrpc_request *req)
+{
+	struct client_obd *cli  = &req->rq_import->imp_obd->u.cli;
+	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
+	int page_count = desc->bd_iov_count;
+
+	/* No unstable page tracking */
+	if (!cli->cl_cache || !cli->cl_cache->ccc_unstable_check)
+		return;
+
+	add_unstable_page_accounting(desc);
+	atomic_add(page_count, &cli->cl_unstable_count);
+	atomic_add(page_count, &cli->cl_cache->ccc_unstable_nr);
+
+	/*
+	 * If the request has already been committed (i.e. brw_commit
+	 * called via rq_commit_cb), we need to undo the unstable page
+	 * increments we just performed because rq_commit_cb wont be
+	 * called again.
+	 */
+	spin_lock(&req->rq_lock);
+	if (unlikely(req->rq_committed)) {
+		spin_unlock(&req->rq_lock);
+
+		osc_dec_unstable_pages(req);
+	} else {
+		req->rq_unstable = 1;
+		spin_unlock(&req->rq_lock);
+	}
+}
+
+/**
+ * Check if it piggybacks SOFT_SYNC flag to OST from this OSC.
+ * This function will be called by every BRW RPC so it's critical
+ * to make this function fast.
+ */
+bool osc_over_unstable_soft_limit(struct client_obd *cli)
+{
+	long unstable_nr, osc_unstable_count;
+
+	/* Can't check cli->cl_unstable_count, therefore, no soft limit */
+	if (!cli->cl_cache || !cli->cl_cache->ccc_unstable_check)
+		return false;
+
+	osc_unstable_count = atomic_read(&cli->cl_unstable_count);
+	unstable_nr = atomic_read(&cli->cl_cache->ccc_unstable_nr);
+
+	CDEBUG(D_CACHE,
+	       "%s: cli: %p unstable pages: %lu, osc unstable pages: %lu\n",
+	       cli->cl_import->imp_obd->obd_name, cli,
+	       unstable_nr, osc_unstable_count);
+
+	/*
+	 * If the LRU slots are in shortage - 25% remaining AND this OSC
+	 * has one full RPC window of unstable pages, it's a good chance
+	 * to piggyback a SOFT_SYNC flag.
+	 * Please notice that the OST won't take immediate response for the
+	 * SOFT_SYNC request so active OSCs will have more chance to carry
+	 * the flag, this is reasonable.
+	 */
+	return unstable_nr > cli->cl_cache->ccc_lru_max >> 2 &&
+	       osc_unstable_count > cli->cl_max_pages_per_rpc *
+				    cli->cl_max_rpcs_in_flight;
+}
+
 /** @} osc */
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 042a081..e5669e2 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -807,17 +807,15 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 		CERROR("dirty %lu - %lu > dirty_max %lu\n",
 		       cli->cl_dirty, cli->cl_dirty_transit, cli->cl_dirty_max);
 		oa->o_undirty = 0;
-	} else if (unlikely(atomic_read(&obd_unstable_pages) +
-			    atomic_read(&obd_dirty_pages) -
+	} else if (unlikely(atomic_read(&obd_dirty_pages) -
 			    atomic_read(&obd_dirty_transit_pages) >
 			    (long)(obd_max_dirty_pages + 1))) {
 		/* The atomic_read() allowing the atomic_inc() are
 		 * not covered by a lock thus they may safely race and trip
 		 * this CERROR() unless we add in a small fudge factor (+1).
 		 */
-		CERROR("%s: dirty %d + %d - %d > system dirty_max %d\n",
+		CERROR("%s: dirty %d + %d > system dirty_max %d\n",
 		       cli->cl_import->imp_obd->obd_name,
-		       atomic_read(&obd_unstable_pages),
 		       atomic_read(&obd_dirty_pages),
 		       atomic_read(&obd_dirty_transit_pages),
 		       obd_max_dirty_pages);
@@ -1818,6 +1816,9 @@ static int brw_interpret(const struct lu_env *env,
 	}
 	kmem_cache_free(obdo_cachep, aa->aa_oa);
 
+	if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE && rc == 0)
+		osc_inc_unstable_pages(req);
+
 	list_for_each_entry_safe(ext, tmp, &aa->aa_exts, oe_link) {
 		list_del_init(&ext->oe_link);
 		osc_extent_finish(env, ext, 1, rc);
@@ -1888,6 +1889,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	int mpflag = 0;
 	int mem_tight = 0;
 	int page_count = 0;
+	bool soft_sync = false;
 	int i;
 	int rc;
 	struct ost_body *body;
@@ -1915,6 +1917,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		}
 	}
 
+	soft_sync = osc_over_unstable_soft_limit(cli);
 	if (mem_tight)
 		mpflag = cfs_memory_pressure_get_and_set();
 
@@ -1950,6 +1953,8 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		}
 		if (mem_tight)
 			oap->oap_brw_flags |= OBD_BRW_MEMALLOC;
+		if (soft_sync)
+			oap->oap_brw_flags |= OBD_BRW_SOFT_SYNC;
 		pga[i] = &oap->oap_brw_page;
 		pga[i]->off = oap->oap_obj_off + oap->oap_page_off;
 		CDEBUG(0, "put page %p index %lu oap %p flg %x to pga\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, James Simmons

From: Jinshan Xiong <jinshan.xiong@intel.com>

A few changes are made in this patch for unstable pages tracking:

1. Remove kernel NFS unstable pages tracking because it killed
   performance
2. Track unstable pages as part of LRU cache. Otherwise Lustre
   can use much more memory than max_cached_mb
3. Remove obd_unstable_pages tracking to avoid using global
   atomic counter
4. Make unstable pages track optional. Tracking unstable pages is
   turned off by default, and can be controlled by
   llite.*.unstable_stats.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
Reviewed-on: http://review.whamcloud.com/10003
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   35 +++-
 .../staging/lustre/lustre/include/obd_support.h    |    1 -
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |   41 ++++-
 drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 -
 drivers/staging/lustre/lustre/osc/osc_cache.c      |   96 +---------
 drivers/staging/lustre/lustre/osc/osc_internal.h   |    2 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  208 +++++++++++++++++---
 drivers/staging/lustre/lustre/osc/osc_request.c    |   13 +-
 8 files changed, 253 insertions(+), 145 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index d269b32..ec6cf7c 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1039,23 +1039,32 @@ do {									  \
 	}								     \
 } while (0)
 
-static inline int __page_in_use(const struct cl_page *page, int refc)
-{
-	if (page->cp_type == CPT_CACHEABLE)
-		++refc;
-	LASSERT(atomic_read(&page->cp_ref) > 0);
-	return (atomic_read(&page->cp_ref) > refc);
-}
-
-#define cl_page_in_use(pg)       __page_in_use(pg, 1)
-#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
-
 static inline struct page *cl_page_vmpage(struct cl_page *page)
 {
 	LASSERT(page->cp_vmpage);
 	return page->cp_vmpage;
 }
 
+/**
+ * Check if a cl_page is in use.
+ *
+ * Client cache holds a refcount, this refcount will be dropped when
+ * the page is taken out of cache, see vvp_page_delete().
+ */
+static inline bool __page_in_use(const struct cl_page *page, int refc)
+{
+	return (atomic_read(&page->cp_ref) > refc + 1);
+}
+
+/**
+ * Caller itself holds a refcount of cl_page.
+ */
+#define cl_page_in_use(pg)	 __page_in_use(pg, 1)
+/**
+ * Caller doesn't hold a refcount.
+ */
+#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
+
 /** @} cl_page */
 
 /** \addtogroup cl_lock cl_lock
@@ -2331,6 +2340,10 @@ struct cl_client_cache {
 	 */
 	spinlock_t		ccc_lru_lock;
 	/**
+	 * Set if unstable check is enabled
+	 */
+	unsigned int		ccc_unstable_check:1;
+	/**
 	 * # of unstable pages for this mount point
 	 */
 	atomic_t		ccc_unstable_nr;
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 26fdff6..a11fff1 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -54,7 +54,6 @@ extern int at_early_margin;
 extern int at_extra;
 extern unsigned int obd_sync_filter;
 extern unsigned int obd_max_dirty_pages;
-extern atomic_t obd_unstable_pages;
 extern atomic_t obd_dirty_pages;
 extern atomic_t obd_dirty_transit_pages;
 extern char obd_jobid_var[];
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index 2f1f389..5f8e78d 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -828,10 +828,45 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
 	pages = atomic_read(&cache->ccc_unstable_nr);
 	mb = (pages * PAGE_SIZE) >> 20;
 
-	return sprintf(buf, "unstable_pages: %8d\n"
-			    "unstable_mb:    %8d\n", pages, mb);
+	return sprintf(buf, "unstable_check: %8d\n"
+			    "unstable_pages: %8d\n"
+			    "unstable_mb:    %8d\n",
+			    cache->ccc_unstable_check, pages, mb);
 }
-LUSTRE_RO_ATTR(unstable_stats);
+
+static ssize_t unstable_stats_store(struct kobject *kobj,
+				    struct attribute *attr,
+				    const char *buffer,
+				    size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	char kernbuf[128];
+	int val, rc;
+
+	if (!count)
+		return 0;
+	if (count < 0 || count >= sizeof(kernbuf))
+		return -EINVAL;
+
+	if (copy_from_user(kernbuf, buffer, count))
+		return -EFAULT;
+	kernbuf[count] = 0;
+
+	buffer += lprocfs_find_named_value(kernbuf, "unstable_check:", &count) -
+		  kernbuf;
+	rc = lprocfs_write_helper(buffer, count, &val);
+	if (rc < 0)
+		return rc;
+
+	/* borrow lru lock to set the value */
+	spin_lock(&sbi->ll_cache->ccc_lru_lock);
+	sbi->ll_cache->ccc_unstable_check = !!val;
+	spin_unlock(&sbi->ll_cache->ccc_lru_lock);
+
+	return count;
+}
+LUSTRE_RW_ATTR(unstable_stats);
 
 static ssize_t root_squash_show(struct kobject *kobj, struct attribute *attr,
 				char *buf)
diff --git a/drivers/staging/lustre/lustre/obdclass/class_obd.c b/drivers/staging/lustre/lustre/obdclass/class_obd.c
index 6edf53e..90a365b 100644
--- a/drivers/staging/lustre/lustre/obdclass/class_obd.c
+++ b/drivers/staging/lustre/lustre/obdclass/class_obd.c
@@ -57,8 +57,6 @@ unsigned int obd_dump_on_eviction;
 EXPORT_SYMBOL(obd_dump_on_eviction);
 unsigned int obd_max_dirty_pages = 256;
 EXPORT_SYMBOL(obd_max_dirty_pages);
-atomic_t obd_unstable_pages;
-EXPORT_SYMBOL(obd_unstable_pages);
 atomic_t obd_dirty_pages;
 EXPORT_SYMBOL(obd_dirty_pages);
 unsigned int obd_timeout = OBD_TIMEOUT_DEFAULT;   /* seconds */
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 683b3c2..deaf912 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1384,13 +1384,11 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 #define OSC_DUMP_GRANT(lvl, cli, fmt, args...) do {			      \
 	struct client_obd *__tmp = (cli);				      \
 	CDEBUG(lvl, "%s: grant { dirty: %ld/%ld dirty_pages: %d/%d "	      \
-	       "unstable_pages: %d/%d dropped: %ld avail: %ld, "	      \
-	       "reserved: %ld, flight: %d } lru {in list: %d, "		      \
-	       "left: %d, waiters: %d }" fmt,                                 \
+	       "dropped: %ld avail: %ld, reserved: %ld, flight: %d }"	      \
+	       "lru {in list: %d, left: %d, waiters: %d }" fmt,		      \
 	       __tmp->cl_import->imp_obd->obd_name,			      \
 	       __tmp->cl_dirty, __tmp->cl_dirty_max,			      \
 	       atomic_read(&obd_dirty_pages), obd_max_dirty_pages,	      \
-	       atomic_read(&obd_unstable_pages), obd_max_dirty_pages,	      \
 	       __tmp->cl_lost_grant, __tmp->cl_avail_grant,		      \
 	       __tmp->cl_reserved_grant, __tmp->cl_w_in_flight,		      \
 	       atomic_read(&__tmp->cl_lru_in_list),			      \
@@ -1542,8 +1540,7 @@ static int osc_enter_cache_try(struct client_obd *cli,
 		return 0;
 
 	if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
-	    atomic_read(&obd_unstable_pages) + 1 +
-	    atomic_read(&obd_dirty_pages) <= obd_max_dirty_pages) {
+	    atomic_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) {
 		osc_consume_write_grant(cli, &oap->oap_brw_page);
 		if (transient) {
 			cli->cl_dirty_transit += PAGE_SIZE;
@@ -1671,8 +1668,7 @@ void osc_wake_cache_waiters(struct client_obd *cli)
 		ocw->ocw_rc = -EDQUOT;
 		/* we can't dirty more */
 		if ((cli->cl_dirty + PAGE_SIZE > cli->cl_dirty_max) ||
-		    (atomic_read(&obd_unstable_pages) + 1 +
-		     atomic_read(&obd_dirty_pages) > obd_max_dirty_pages)) {
+		    (atomic_read(&obd_dirty_pages) + 1 > obd_max_dirty_pages)) {
 			CDEBUG(D_CACHE, "no dirty room: dirty: %ld osc max %ld, sys max %d\n",
 			       cli->cl_dirty,
 			       cli->cl_dirty_max, obd_max_dirty_pages);
@@ -1843,84 +1839,6 @@ static void osc_process_ar(struct osc_async_rc *ar, __u64 xid,
 		ar->ar_force_sync = 0;
 }
 
-/**
- * Performs "unstable" page accounting. This function balances the
- * increment operations performed in osc_inc_unstable_pages. It is
- * registered as the RPC request callback, and is executed when the
- * bulk RPC is committed on the server. Thus at this point, the pages
- * involved in the bulk transfer are no longer considered unstable.
- */
-void osc_dec_unstable_pages(struct ptlrpc_request *req)
-{
-	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
-	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
-	int page_count = desc->bd_iov_count;
-	int i;
-
-	/* No unstable page tracking */
-	if (!cli->cl_cache)
-		return;
-
-	LASSERT(page_count >= 0);
-
-	for (i = 0; i < page_count; i++)
-		dec_node_page_state(desc->bd_iov[i].bv_page, NR_UNSTABLE_NFS);
-
-	atomic_sub(page_count, &cli->cl_cache->ccc_unstable_nr);
-	LASSERT(atomic_read(&cli->cl_cache->ccc_unstable_nr) >= 0);
-
-	atomic_sub(page_count, &cli->cl_unstable_count);
-	LASSERT(atomic_read(&cli->cl_unstable_count) >= 0);
-
-	atomic_sub(page_count, &obd_unstable_pages);
-	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
-
-	wake_up_all(&cli->cl_cache->ccc_unstable_waitq);
-}
-
-/* "unstable" page accounting. See: osc_dec_unstable_pages. */
-void osc_inc_unstable_pages(struct ptlrpc_request *req)
-{
-	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
-	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
-	long page_count = desc->bd_iov_count;
-	int i;
-
-	/* No unstable page tracking */
-	if (!cli->cl_cache)
-		return;
-
-	LASSERT(page_count >= 0);
-
-	for (i = 0; i < page_count; i++)
-		inc_node_page_state(desc->bd_iov[i].bv_page, NR_UNSTABLE_NFS);
-
-	LASSERT(atomic_read(&cli->cl_cache->ccc_unstable_nr) >= 0);
-	atomic_add(page_count, &cli->cl_cache->ccc_unstable_nr);
-
-	LASSERT(atomic_read(&cli->cl_unstable_count) >= 0);
-	atomic_add(page_count, &cli->cl_unstable_count);
-
-	LASSERT(atomic_read(&obd_unstable_pages) >= 0);
-	atomic_add(page_count, &obd_unstable_pages);
-
-	/*
-	 * If the request has already been committed (i.e. brw_commit
-	 * called via rq_commit_cb), we need to undo the unstable page
-	 * increments we just performed because rq_commit_cb wont be
-	 * called again.
-	 */
-	spin_lock(&req->rq_lock);
-	if (unlikely(req->rq_committed)) {
-		/* Drop lock before calling osc_dec_unstable_pages */
-		spin_unlock(&req->rq_lock);
-		osc_dec_unstable_pages(req);
-	} else {
-		req->rq_unstable = 1;
-		spin_unlock(&req->rq_lock);
-	}
-}
-
 /* this must be called holding the loi list lock to give coverage to exit_cache,
  * async_flag maintenance, and oap_request
  */
@@ -1932,9 +1850,6 @@ static void osc_ap_completion(const struct lu_env *env, struct client_obd *cli,
 	__u64 xid = 0;
 
 	if (oap->oap_request) {
-		if (!rc)
-			osc_inc_unstable_pages(oap->oap_request);
-
 		xid = ptlrpc_req_xid(oap->oap_request);
 		ptlrpc_req_finished(oap->oap_request);
 		oap->oap_request = NULL;
@@ -2421,9 +2336,6 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 			return rc;
 	}
 
-	if (osc_over_unstable_soft_limit(cli))
-		brw_flags |= OBD_BRW_SOFT_SYNC;
-
 	oap->oap_cmd = cmd;
 	oap->oap_page_off = ops->ops_from;
 	oap->oap_count = ops->ops_to - ops->ops_from;
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index 2038885..eca5fef 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -197,7 +197,7 @@ int osc_quotacheck(struct obd_device *unused, struct obd_export *exp,
 int osc_quota_poll_check(struct obd_export *exp, struct if_quotacheck *qchk);
 void osc_inc_unstable_pages(struct ptlrpc_request *req);
 void osc_dec_unstable_pages(struct ptlrpc_request *req);
-int  osc_over_unstable_soft_limit(struct client_obd *cli);
+bool osc_over_unstable_soft_limit(struct client_obd *cli);
 
 struct ldlm_lock *osc_dlmlock_at_pgoff(const struct lu_env *env,
 				       struct osc_object *obj, pgoff_t index,
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 355f496..583a0af 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -323,32 +323,6 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 	return result;
 }
 
-int osc_over_unstable_soft_limit(struct client_obd *cli)
-{
-	long obd_upages, obd_dpages, osc_upages;
-
-	/* Can't check cli->cl_unstable_count, therefore, no soft limit */
-	if (!cli)
-		return 0;
-
-	obd_upages = atomic_read(&obd_unstable_pages);
-	obd_dpages = atomic_read(&obd_dirty_pages);
-
-	osc_upages = atomic_read(&cli->cl_unstable_count);
-
-	/*
-	 * obd_max_dirty_pages is the max number of (dirty + unstable)
-	 * pages allowed at any given time. To simulate an unstable page
-	 * only limit, we subtract the current number of dirty pages
-	 * from this max. This difference is roughly the amount of pages
-	 * currently available for unstable pages. Thus, the soft limit
-	 * is half of that difference. Check osc_upages to ensure we don't
-	 * set SOFT_SYNC for OSCs without any outstanding unstable pages.
-	 */
-	return osc_upages &&
-	       obd_upages >= (obd_max_dirty_pages - obd_dpages) / 2;
-}
-
 /**
  * Helper function called by osc_io_submit() for every page in an immediate
  * transfer (i.e., transferred synchronously).
@@ -368,9 +342,6 @@ void osc_page_submit(const struct lu_env *env, struct osc_page *opg,
 	oap->oap_count = opg->ops_to - opg->ops_from;
 	oap->oap_brw_flags = brw_flags | OBD_BRW_SYNC;
 
-	if (osc_over_unstable_soft_limit(oap->oap_cli))
-		oap->oap_brw_flags |= OBD_BRW_SOFT_SYNC;
-
 	if (capable(CFS_CAP_SYS_RESOURCE)) {
 		oap->oap_brw_flags |= OBD_BRW_NOQUOTA;
 		oap->oap_cmd |= OBD_BRW_NOQUOTA;
@@ -540,6 +511,28 @@ static void discard_pagevec(const struct lu_env *env, struct cl_io *io,
 }
 
 /**
+ * Check if a cl_page can be released, i.e, it's not being used.
+ *
+ * If unstable account is turned on, bulk transfer may hold one refcount
+ * for recovery so we need to check vmpage refcount as well; otherwise,
+ * even we can destroy cl_page but the corresponding vmpage can't be reused.
+ */
+static inline bool lru_page_busy(struct client_obd *cli, struct cl_page *page)
+{
+	if (cl_page_in_use_noref(page))
+		return true;
+
+	if (cli->cl_cache->ccc_unstable_check) {
+		struct page *vmpage = cl_page_vmpage(page);
+
+		/* vmpage have two known users: cl_page and VM page cache */
+		if (page_count(vmpage) - page_mapcount(vmpage) > 2)
+			return true;
+	}
+	return false;
+}
+
+/**
  * Drop @target of pages from LRU at most.
  */
 int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
@@ -584,7 +577,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 			break;
 
 		page = opg->ops_cl.cpl_page;
-		if (cl_page_in_use_noref(page)) {
+		if (lru_page_busy(cli, page)) {
 			list_move_tail(&opg->ops_lru, &cli->cl_lru_list);
 			continue;
 		}
@@ -620,7 +613,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 		}
 
 		if (cl_page_own_try(env, io, page) == 0) {
-			if (!cl_page_in_use_noref(page)) {
+			if (!lru_page_busy(cli, page)) {
 				/* remove it from lru list earlier to avoid
 				 * lock contention
 				 */
@@ -742,6 +735,13 @@ out:
 	return rc;
 }
 
+/**
+ * osc_lru_reserve() is called to reserve an LRU slot for a cl_page.
+ *
+ * Usually the LRU slots are reserved in osc_io_iter_rw_init().
+ * Only in the case that the LRU slots are in extreme shortage, it should
+ * have reserved enough slots for an IO.
+ */
 static int osc_lru_reserve(const struct lu_env *env, struct osc_object *obj,
 			   struct osc_page *opg)
 {
@@ -787,4 +787,150 @@ out:
 	return rc;
 }
 
+/**
+ * Atomic operations are expensive. We accumulate the accounting for the
+ * same page zone to get better performance.
+ * In practice this can work pretty good because the pages in the same RPC
+ * are likely from the same page zone.
+ */
+static inline void unstable_page_accounting(struct ptlrpc_bulk_desc *desc,
+					    int factor)
+{
+	int page_count = desc->bd_iov_count;
+	void *zone = NULL;
+	int count = 0;
+	int i;
+
+	for (i = 0; i < page_count; i++) {
+		void *pz = page_zone(desc->bd_iov[i].bv_page);
+
+		if (likely(pz == zone)) {
+			++count;
+			continue;
+		}
+
+		if (count > 0) {
+			mod_zone_page_state(zone, NR_UNSTABLE_NFS,
+					    factor * count);
+			count = 0;
+		}
+		zone = pz;
+		++count;
+	}
+	if (count > 0)
+		mod_zone_page_state(zone, NR_UNSTABLE_NFS, factor * count);
+}
+
+static inline void add_unstable_page_accounting(struct ptlrpc_bulk_desc *desc)
+{
+	unstable_page_accounting(desc, 1);
+}
+
+static inline void dec_unstable_page_accounting(struct ptlrpc_bulk_desc *desc)
+{
+	unstable_page_accounting(desc, -1);
+}
+
+/**
+ * Performs "unstable" page accounting. This function balances the
+ * increment operations performed in osc_inc_unstable_pages. It is
+ * registered as the RPC request callback, and is executed when the
+ * bulk RPC is committed on the server. Thus at this point, the pages
+ * involved in the bulk transfer are no longer considered unstable.
+ *
+ * If this function is called, the request should have been committed
+ * or req:rq_unstable must have been set; it implies that the unstable
+ * statistic have been added.
+ */
+void osc_dec_unstable_pages(struct ptlrpc_request *req)
+{
+	struct client_obd *cli = &req->rq_import->imp_obd->u.cli;
+	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
+	int page_count = desc->bd_iov_count;
+	int unstable_count;
+
+	LASSERT(page_count >= 0);
+	dec_unstable_page_accounting(desc);
+
+	unstable_count = atomic_sub_return(page_count, &cli->cl_unstable_count);
+	LASSERT(unstable_count >= 0);
+
+	unstable_count = atomic_sub_return(page_count,
+					   &cli->cl_cache->ccc_unstable_nr);
+	LASSERT(unstable_count >= 0);
+	if (!unstable_count)
+		wake_up_all(&cli->cl_cache->ccc_unstable_waitq);
+
+	if (osc_cache_too_much(cli))
+		(void)ptlrpcd_queue_work(cli->cl_lru_work);
+}
+
+/**
+ * "unstable" page accounting. See: osc_dec_unstable_pages.
+ */
+void osc_inc_unstable_pages(struct ptlrpc_request *req)
+{
+	struct client_obd *cli  = &req->rq_import->imp_obd->u.cli;
+	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
+	int page_count = desc->bd_iov_count;
+
+	/* No unstable page tracking */
+	if (!cli->cl_cache || !cli->cl_cache->ccc_unstable_check)
+		return;
+
+	add_unstable_page_accounting(desc);
+	atomic_add(page_count, &cli->cl_unstable_count);
+	atomic_add(page_count, &cli->cl_cache->ccc_unstable_nr);
+
+	/*
+	 * If the request has already been committed (i.e. brw_commit
+	 * called via rq_commit_cb), we need to undo the unstable page
+	 * increments we just performed because rq_commit_cb wont be
+	 * called again.
+	 */
+	spin_lock(&req->rq_lock);
+	if (unlikely(req->rq_committed)) {
+		spin_unlock(&req->rq_lock);
+
+		osc_dec_unstable_pages(req);
+	} else {
+		req->rq_unstable = 1;
+		spin_unlock(&req->rq_lock);
+	}
+}
+
+/**
+ * Check if it piggybacks SOFT_SYNC flag to OST from this OSC.
+ * This function will be called by every BRW RPC so it's critical
+ * to make this function fast.
+ */
+bool osc_over_unstable_soft_limit(struct client_obd *cli)
+{
+	long unstable_nr, osc_unstable_count;
+
+	/* Can't check cli->cl_unstable_count, therefore, no soft limit */
+	if (!cli->cl_cache || !cli->cl_cache->ccc_unstable_check)
+		return false;
+
+	osc_unstable_count = atomic_read(&cli->cl_unstable_count);
+	unstable_nr = atomic_read(&cli->cl_cache->ccc_unstable_nr);
+
+	CDEBUG(D_CACHE,
+	       "%s: cli: %p unstable pages: %lu, osc unstable pages: %lu\n",
+	       cli->cl_import->imp_obd->obd_name, cli,
+	       unstable_nr, osc_unstable_count);
+
+	/*
+	 * If the LRU slots are in shortage - 25% remaining AND this OSC
+	 * has one full RPC window of unstable pages, it's a good chance
+	 * to piggyback a SOFT_SYNC flag.
+	 * Please notice that the OST won't take immediate response for the
+	 * SOFT_SYNC request so active OSCs will have more chance to carry
+	 * the flag, this is reasonable.
+	 */
+	return unstable_nr > cli->cl_cache->ccc_lru_max >> 2 &&
+	       osc_unstable_count > cli->cl_max_pages_per_rpc *
+				    cli->cl_max_rpcs_in_flight;
+}
+
 /** @} osc */
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 042a081..e5669e2 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -807,17 +807,15 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 		CERROR("dirty %lu - %lu > dirty_max %lu\n",
 		       cli->cl_dirty, cli->cl_dirty_transit, cli->cl_dirty_max);
 		oa->o_undirty = 0;
-	} else if (unlikely(atomic_read(&obd_unstable_pages) +
-			    atomic_read(&obd_dirty_pages) -
+	} else if (unlikely(atomic_read(&obd_dirty_pages) -
 			    atomic_read(&obd_dirty_transit_pages) >
 			    (long)(obd_max_dirty_pages + 1))) {
 		/* The atomic_read() allowing the atomic_inc() are
 		 * not covered by a lock thus they may safely race and trip
 		 * this CERROR() unless we add in a small fudge factor (+1).
 		 */
-		CERROR("%s: dirty %d + %d - %d > system dirty_max %d\n",
+		CERROR("%s: dirty %d + %d > system dirty_max %d\n",
 		       cli->cl_import->imp_obd->obd_name,
-		       atomic_read(&obd_unstable_pages),
 		       atomic_read(&obd_dirty_pages),
 		       atomic_read(&obd_dirty_transit_pages),
 		       obd_max_dirty_pages);
@@ -1818,6 +1816,9 @@ static int brw_interpret(const struct lu_env *env,
 	}
 	kmem_cache_free(obdo_cachep, aa->aa_oa);
 
+	if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE && rc == 0)
+		osc_inc_unstable_pages(req);
+
 	list_for_each_entry_safe(ext, tmp, &aa->aa_exts, oe_link) {
 		list_del_init(&ext->oe_link);
 		osc_extent_finish(env, ext, 1, rc);
@@ -1888,6 +1889,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	int mpflag = 0;
 	int mem_tight = 0;
 	int page_count = 0;
+	bool soft_sync = false;
 	int i;
 	int rc;
 	struct ost_body *body;
@@ -1915,6 +1917,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		}
 	}
 
+	soft_sync = osc_over_unstable_soft_limit(cli);
 	if (mem_tight)
 		mpflag = cfs_memory_pressure_get_and_set();
 
@@ -1950,6 +1953,8 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		}
 		if (mem_tight)
 			oap->oap_brw_flags |= OBD_BRW_MEMALLOC;
+		if (soft_sync)
+			oap->oap_brw_flags |= OBD_BRW_SOFT_SYNC;
 		pga[i] = &oap->oap_brw_page;
 		pga[i]->off = oap->oap_obj_off + oap->oap_page_off;
 		CDEBUG(0, "put page %p index %lu oap %p flg %x to pga\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 58/80] staging: lustre: mdc: always use D_INFO for debug info when mdc_put_rpc_lock fails
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	Andreas Dilger, James Simmons

From: wang di <di.wang@intel.com>

Also use D_INFO no matter what the error returned from
mdc_put_rpc_lock.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4973
Reviewed-on: http://review.whamcloud.com/10150
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 551f3d9..3291201 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -841,9 +841,8 @@ resend:
 	mdc_put_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
 
 	if (rc < 0) {
-		CDEBUG_LIMIT((rc == -EACCES || rc == -EIDRM) ? D_INFO : D_ERROR,
-			     "%s: ldlm_cli_enqueue failed: rc = %d\n",
-			     obddev->obd_name, rc);
+		CDEBUG(D_INFO, "%s: ldlm_cli_enqueue failed: rc = %d\n",
+		       obddev->obd_name, rc);
 
 		mdc_clear_replay_flag(req, rc);
 		ptlrpc_req_finished(req);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 58/80] staging: lustre: mdc: always use D_INFO for debug info when mdc_put_rpc_lock fails
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Also use D_INFO no matter what the error returned from
mdc_put_rpc_lock.

Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4973
Reviewed-on: http://review.whamcloud.com/10150
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 551f3d9..3291201 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -841,9 +841,8 @@ resend:
 	mdc_put_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
 
 	if (rc < 0) {
-		CDEBUG_LIMIT((rc == -EACCES || rc == -EIDRM) ? D_INFO : D_ERROR,
-			     "%s: ldlm_cli_enqueue failed: rc = %d\n",
-			     obddev->obd_name, rc);
+		CDEBUG(D_INFO, "%s: ldlm_cli_enqueue failed: rc = %d\n",
+		       obddev->obd_name, rc);
 
 		mdc_clear_replay_flag(req, rc);
 		ptlrpc_req_finished(req);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 59/80] staging: lustre: fld: add fld description documentation
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Patrick Farrell, James Simmons

From: Patrick Farrell <paf@cray.com>

Add subsystem description from Di Wang to header file.

Signed-off-by: Patrick Farrell <paf@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5153
Reviewed-on: http://review.whamcloud.com/10631
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/fld/fld_internal.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/fld/fld_internal.h b/drivers/staging/lustre/lustre/fld/fld_internal.h
index f0efe5b..08eaec7 100644
--- a/drivers/staging/lustre/lustre/fld/fld_internal.h
+++ b/drivers/staging/lustre/lustre/fld/fld_internal.h
@@ -31,6 +31,25 @@
  *
  * lustre/fld/fld_internal.h
  *
+ * Subsystem Description:
+ * FLD is FID Location Database, which stores where (IE, on which MDT)
+ * FIDs are located.
+ * The database is basically a record file, each record consists of a FID
+ * sequence range, MDT/OST index, and flags. The FLD for the whole FS
+ * is only stored on the sequence controller(MDT0) right now, but each target
+ * also has its local FLD, which only stores the local sequence.
+ *
+ * The FLD subsystem usually has two tasks:
+ * 1. maintain the database, i.e. when the sequence controller allocates
+ * new sequence ranges to some nodes, it will call the FLD API to insert the
+ * location information <sequence_range, node_index> in FLDB.
+ *
+ * 2. Handle requests from other nodes, i.e. if client needs to know where
+ * the FID is located, if it can not find the information in the local cache,
+ * it will send a FLD lookup RPC to the FLD service, and the FLD service will
+ * look up the FLDB entry and return the location information to client.
+ *
+ *
  * Author: Yury Umanets <umka@clusterfs.com>
  * Author: Tom WangDi <wangdi@clusterfs.com>
  */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 59/80] staging: lustre: fld: add fld description documentation
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Patrick Farrell, James Simmons

From: Patrick Farrell <paf@cray.com>

Add subsystem description from Di Wang to header file.

Signed-off-by: Patrick Farrell <paf@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5153
Reviewed-on: http://review.whamcloud.com/10631
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/fld/fld_internal.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/fld/fld_internal.h b/drivers/staging/lustre/lustre/fld/fld_internal.h
index f0efe5b..08eaec7 100644
--- a/drivers/staging/lustre/lustre/fld/fld_internal.h
+++ b/drivers/staging/lustre/lustre/fld/fld_internal.h
@@ -31,6 +31,25 @@
  *
  * lustre/fld/fld_internal.h
  *
+ * Subsystem Description:
+ * FLD is FID Location Database, which stores where (IE, on which MDT)
+ * FIDs are located.
+ * The database is basically a record file, each record consists of a FID
+ * sequence range, MDT/OST index, and flags. The FLD for the whole FS
+ * is only stored on the sequence controller(MDT0) right now, but each target
+ * also has its local FLD, which only stores the local sequence.
+ *
+ * The FLD subsystem usually has two tasks:
+ * 1. maintain the database, i.e. when the sequence controller allocates
+ * new sequence ranges to some nodes, it will call the FLD API to insert the
+ * location information <sequence_range, node_index> in FLDB.
+ *
+ * 2. Handle requests from other nodes, i.e. if client needs to know where
+ * the FID is located, if it can not find the information in the local cache,
+ * it will send a FLD lookup RPC to the FLD service, and the FLD service will
+ * look up the FLDB entry and return the location information to client.
+ *
+ *
  * Author: Yury Umanets <umka@clusterfs.com>
  * Author: Tom WangDi <wangdi@clusterfs.com>
  */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 60/80] staging: lustre: ldlm: improve ldlm_lock_create() return value
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Emoly Liu,
	James Simmons

From: Emoly Liu <emoly.liu@intel.com>

ldlm_lock_create() and ldlm_resource_get() always return NULL as
error reporting and "NULL" is interpretted as ENOMEM incorrectly
sometimes. This patch fixes this problem by using ERR_PTR() rather
than NULL.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4524
Reviewed-on: http://review.whamcloud.com/9004
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_flock.c    |    4 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   28 ++++++++++++--------
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   13 +++-----
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |   25 +++++------------
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |    2 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |    2 +-
 drivers/staging/lustre/lustre/osc/osc_request.c    |    2 +-
 7 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 65e8e14..61d649f 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -339,10 +339,10 @@ reprocess:
 						lock->l_granted_mode, &null_cbs,
 						NULL, 0, LVB_T_NONE);
 			lock_res_and_lock(req);
-			if (!new2) {
+			if (IS_ERR(new2)) {
 				ldlm_flock_destroy(req, lock->l_granted_mode,
 						   *flags);
-				*err = -ENOLCK;
+				*err = PTR_ERR(new2);
 				return LDLM_ITER_STOP;
 			}
 			goto reprocess;
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index 1a0fce1..a91cdb4 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -481,8 +481,8 @@ int ldlm_lock_change_resource(struct ldlm_namespace *ns, struct ldlm_lock *lock,
 	unlock_res_and_lock(lock);
 
 	newres = ldlm_resource_get(ns, NULL, new_resid, type, 1);
-	if (!newres)
-		return -ENOMEM;
+	if (IS_ERR(newres))
+		return PTR_ERR(newres);
 
 	lu_ref_add(&newres->lr_reference, "lock", lock);
 	/*
@@ -1227,7 +1227,7 @@ enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, __u64 flags,
 	}
 
 	res = ldlm_resource_get(ns, NULL, res_id, type, 0);
-	if (!res) {
+	if (IS_ERR(res)) {
 		LASSERT(!old_lock);
 		return 0;
 	}
@@ -1475,15 +1475,15 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns,
 {
 	struct ldlm_lock *lock;
 	struct ldlm_resource *res;
+	int rc;
 
 	res = ldlm_resource_get(ns, NULL, res_id, type, 1);
-	if (!res)
-		return NULL;
+	if (IS_ERR(res))
+		return ERR_CAST(res);
 
 	lock = ldlm_lock_new(res);
-
 	if (!lock)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	lock->l_req_mode = mode;
 	lock->l_ast_data = data;
@@ -1497,27 +1497,33 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns,
 	lock->l_tree_node = NULL;
 	/* if this is the extent lock, allocate the interval tree node */
 	if (type == LDLM_EXTENT) {
-		if (!ldlm_interval_alloc(lock))
+		if (!ldlm_interval_alloc(lock)) {
+			rc = -ENOMEM;
 			goto out;
+		}
 	}
 
 	if (lvb_len) {
 		lock->l_lvb_len = lvb_len;
 		lock->l_lvb_data = kzalloc(lvb_len, GFP_NOFS);
-		if (!lock->l_lvb_data)
+		if (!lock->l_lvb_data) {
+			rc = -ENOMEM;
 			goto out;
+		}
 	}
 
 	lock->l_lvb_type = lvb_type;
-	if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_NEW_LOCK))
+	if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_NEW_LOCK)) {
+		rc = -ENOENT;
 		goto out;
+	}
 
 	return lock;
 
 out:
 	ldlm_lock_destroy(lock);
 	LDLM_LOCK_RELEASE(lock);
-	return NULL;
+	return ERR_PTR(rc);
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index 984a460..048214c 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -694,8 +694,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 		lock = ldlm_lock_create(ns, res_id, einfo->ei_type,
 					einfo->ei_mode, &cbs, einfo->ei_cbdata,
 					lvb_len, lvb_type);
-		if (!lock)
-			return -ENOMEM;
+		if (IS_ERR(lock))
+			return PTR_ERR(lock);
 		/* for the local lock, add the reference */
 		ldlm_lock_addref_internal(lock, einfo->ei_mode);
 		ldlm_lock2handle(lock, lockh);
@@ -1658,7 +1658,7 @@ int ldlm_cli_cancel_unused_resource(struct ldlm_namespace *ns,
 	int rc;
 
 	res = ldlm_resource_get(ns, NULL, res_id, 0, 0);
-	if (!res) {
+	if (IS_ERR(res)) {
 		/* This is not a problem. */
 		CDEBUG(D_INFO, "No resource %llu\n", res_id->name[0]);
 		return 0;
@@ -1809,13 +1809,10 @@ int ldlm_resource_iterate(struct ldlm_namespace *ns,
 	struct ldlm_resource *res;
 	int rc;
 
-	if (!ns) {
-		CERROR("must pass in namespace\n");
-		LBUG();
-	}
+	LASSERTF(ns, "must pass in namespace\n");
 
 	res = ldlm_resource_get(ns, NULL, res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 
 	LDLM_RESOURCE_ADDREF(res);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 5866b00..c37a7b0 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -1088,7 +1088,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 		  int create)
 {
 	struct hlist_node     *hnode;
-	struct ldlm_resource *res;
+	struct ldlm_resource *res = NULL;
 	struct cfs_hash_bd	 bd;
 	__u64		 version;
 	int		      ns_refcount = 0;
@@ -1101,31 +1101,20 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 	hnode = cfs_hash_bd_lookup_locked(ns->ns_rs_hash, &bd, (void *)name);
 	if (hnode) {
 		cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 0);
-		res = hlist_entry(hnode, struct ldlm_resource, lr_hash);
-		/* Synchronize with regard to resource creation. */
-		if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) {
-			mutex_lock(&res->lr_lvb_mutex);
-			mutex_unlock(&res->lr_lvb_mutex);
-		}
-
-		if (unlikely(res->lr_lvb_len < 0)) {
-			ldlm_resource_putref(res);
-			res = NULL;
-		}
-		return res;
+		goto lvbo_init;
 	}
 
 	version = cfs_hash_bd_version_get(&bd);
 	cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 0);
 
 	if (create == 0)
-		return NULL;
+		return ERR_PTR(-ENOENT);
 
 	LASSERTF(type >= LDLM_MIN_TYPE && type < LDLM_MAX_TYPE,
 		 "type: %d\n", type);
 	res = ldlm_resource_new();
 	if (!res)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	res->lr_ns_bucket  = cfs_hash_bd_extra_get(ns->ns_rs_hash, &bd);
 	res->lr_name       = *name;
@@ -1143,7 +1132,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 		/* We have taken lr_lvb_mutex. Drop it. */
 		mutex_unlock(&res->lr_lvb_mutex);
 		kmem_cache_free(ldlm_resource_slab, res);
-
+lvbo_init:
 		res = hlist_entry(hnode, struct ldlm_resource, lr_hash);
 		/* Synchronize with regard to resource creation. */
 		if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) {
@@ -1153,7 +1142,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 
 		if (unlikely(res->lr_lvb_len < 0)) {
 			ldlm_resource_putref(res);
-			res = NULL;
+			res = ERR_PTR(res->lr_lvb_len);
 		}
 		return res;
 	}
@@ -1175,7 +1164,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 			res->lr_lvb_len = rc;
 			mutex_unlock(&res->lr_lvb_mutex);
 			ldlm_resource_putref(res);
-			return NULL;
+			return ERR_PTR(rc);
 		}
 	}
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 3291201..fab83dd 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -174,7 +174,7 @@ int mdc_null_inode(struct obd_export *exp,
 	fid_build_reg_res_name(fid, &res_id);
 
 	res = ldlm_resource_get(ns, NULL, &res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 
 	lock_res(res);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index 9bec049..0f71392 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -86,7 +86,7 @@ int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid,
 	fid_build_reg_res_name(fid, &res_id);
 	res = ldlm_resource_get(exp->exp_obd->obd_namespace,
 				NULL, &res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 	LDLM_RESOURCE_ADDREF(res);
 	/* Initialize ibits lock policy. */
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index e5669e2..90c8416 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -650,7 +650,7 @@ static int osc_resource_get_unused(struct obd_export *exp, struct obdo *oa,
 
 	ostid_build_res_name(&oa->o_oi, &res_id);
 	res = ldlm_resource_get(ns, NULL, &res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 
 	LDLM_RESOURCE_ADDREF(res);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 60/80] staging: lustre: ldlm: improve ldlm_lock_create() return value
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Emoly Liu,
	James Simmons

From: Emoly Liu <emoly.liu@intel.com>

ldlm_lock_create() and ldlm_resource_get() always return NULL as
error reporting and "NULL" is interpretted as ENOMEM incorrectly
sometimes. This patch fixes this problem by using ERR_PTR() rather
than NULL.

Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4524
Reviewed-on: http://review.whamcloud.com/9004
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_flock.c    |    4 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   28 ++++++++++++--------
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   13 +++-----
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |   25 +++++------------
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |    2 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |    2 +-
 drivers/staging/lustre/lustre/osc/osc_request.c    |    2 +-
 7 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
index 65e8e14..61d649f 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
@@ -339,10 +339,10 @@ reprocess:
 						lock->l_granted_mode, &null_cbs,
 						NULL, 0, LVB_T_NONE);
 			lock_res_and_lock(req);
-			if (!new2) {
+			if (IS_ERR(new2)) {
 				ldlm_flock_destroy(req, lock->l_granted_mode,
 						   *flags);
-				*err = -ENOLCK;
+				*err = PTR_ERR(new2);
 				return LDLM_ITER_STOP;
 			}
 			goto reprocess;
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index 1a0fce1..a91cdb4 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -481,8 +481,8 @@ int ldlm_lock_change_resource(struct ldlm_namespace *ns, struct ldlm_lock *lock,
 	unlock_res_and_lock(lock);
 
 	newres = ldlm_resource_get(ns, NULL, new_resid, type, 1);
-	if (!newres)
-		return -ENOMEM;
+	if (IS_ERR(newres))
+		return PTR_ERR(newres);
 
 	lu_ref_add(&newres->lr_reference, "lock", lock);
 	/*
@@ -1227,7 +1227,7 @@ enum ldlm_mode ldlm_lock_match(struct ldlm_namespace *ns, __u64 flags,
 	}
 
 	res = ldlm_resource_get(ns, NULL, res_id, type, 0);
-	if (!res) {
+	if (IS_ERR(res)) {
 		LASSERT(!old_lock);
 		return 0;
 	}
@@ -1475,15 +1475,15 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns,
 {
 	struct ldlm_lock *lock;
 	struct ldlm_resource *res;
+	int rc;
 
 	res = ldlm_resource_get(ns, NULL, res_id, type, 1);
-	if (!res)
-		return NULL;
+	if (IS_ERR(res))
+		return ERR_CAST(res);
 
 	lock = ldlm_lock_new(res);
-
 	if (!lock)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	lock->l_req_mode = mode;
 	lock->l_ast_data = data;
@@ -1497,27 +1497,33 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns,
 	lock->l_tree_node = NULL;
 	/* if this is the extent lock, allocate the interval tree node */
 	if (type == LDLM_EXTENT) {
-		if (!ldlm_interval_alloc(lock))
+		if (!ldlm_interval_alloc(lock)) {
+			rc = -ENOMEM;
 			goto out;
+		}
 	}
 
 	if (lvb_len) {
 		lock->l_lvb_len = lvb_len;
 		lock->l_lvb_data = kzalloc(lvb_len, GFP_NOFS);
-		if (!lock->l_lvb_data)
+		if (!lock->l_lvb_data) {
+			rc = -ENOMEM;
 			goto out;
+		}
 	}
 
 	lock->l_lvb_type = lvb_type;
-	if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_NEW_LOCK))
+	if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_NEW_LOCK)) {
+		rc = -ENOENT;
 		goto out;
+	}
 
 	return lock;
 
 out:
 	ldlm_lock_destroy(lock);
 	LDLM_LOCK_RELEASE(lock);
-	return NULL;
+	return ERR_PTR(rc);
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index 984a460..048214c 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -694,8 +694,8 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 		lock = ldlm_lock_create(ns, res_id, einfo->ei_type,
 					einfo->ei_mode, &cbs, einfo->ei_cbdata,
 					lvb_len, lvb_type);
-		if (!lock)
-			return -ENOMEM;
+		if (IS_ERR(lock))
+			return PTR_ERR(lock);
 		/* for the local lock, add the reference */
 		ldlm_lock_addref_internal(lock, einfo->ei_mode);
 		ldlm_lock2handle(lock, lockh);
@@ -1658,7 +1658,7 @@ int ldlm_cli_cancel_unused_resource(struct ldlm_namespace *ns,
 	int rc;
 
 	res = ldlm_resource_get(ns, NULL, res_id, 0, 0);
-	if (!res) {
+	if (IS_ERR(res)) {
 		/* This is not a problem. */
 		CDEBUG(D_INFO, "No resource %llu\n", res_id->name[0]);
 		return 0;
@@ -1809,13 +1809,10 @@ int ldlm_resource_iterate(struct ldlm_namespace *ns,
 	struct ldlm_resource *res;
 	int rc;
 
-	if (!ns) {
-		CERROR("must pass in namespace\n");
-		LBUG();
-	}
+	LASSERTF(ns, "must pass in namespace\n");
 
 	res = ldlm_resource_get(ns, NULL, res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 
 	LDLM_RESOURCE_ADDREF(res);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 5866b00..c37a7b0 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -1088,7 +1088,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 		  int create)
 {
 	struct hlist_node     *hnode;
-	struct ldlm_resource *res;
+	struct ldlm_resource *res = NULL;
 	struct cfs_hash_bd	 bd;
 	__u64		 version;
 	int		      ns_refcount = 0;
@@ -1101,31 +1101,20 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 	hnode = cfs_hash_bd_lookup_locked(ns->ns_rs_hash, &bd, (void *)name);
 	if (hnode) {
 		cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 0);
-		res = hlist_entry(hnode, struct ldlm_resource, lr_hash);
-		/* Synchronize with regard to resource creation. */
-		if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) {
-			mutex_lock(&res->lr_lvb_mutex);
-			mutex_unlock(&res->lr_lvb_mutex);
-		}
-
-		if (unlikely(res->lr_lvb_len < 0)) {
-			ldlm_resource_putref(res);
-			res = NULL;
-		}
-		return res;
+		goto lvbo_init;
 	}
 
 	version = cfs_hash_bd_version_get(&bd);
 	cfs_hash_bd_unlock(ns->ns_rs_hash, &bd, 0);
 
 	if (create == 0)
-		return NULL;
+		return ERR_PTR(-ENOENT);
 
 	LASSERTF(type >= LDLM_MIN_TYPE && type < LDLM_MAX_TYPE,
 		 "type: %d\n", type);
 	res = ldlm_resource_new();
 	if (!res)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	res->lr_ns_bucket  = cfs_hash_bd_extra_get(ns->ns_rs_hash, &bd);
 	res->lr_name       = *name;
@@ -1143,7 +1132,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 		/* We have taken lr_lvb_mutex. Drop it. */
 		mutex_unlock(&res->lr_lvb_mutex);
 		kmem_cache_free(ldlm_resource_slab, res);
-
+lvbo_init:
 		res = hlist_entry(hnode, struct ldlm_resource, lr_hash);
 		/* Synchronize with regard to resource creation. */
 		if (ns->ns_lvbo && ns->ns_lvbo->lvbo_init) {
@@ -1153,7 +1142,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 
 		if (unlikely(res->lr_lvb_len < 0)) {
 			ldlm_resource_putref(res);
-			res = NULL;
+			res = ERR_PTR(res->lr_lvb_len);
 		}
 		return res;
 	}
@@ -1175,7 +1164,7 @@ ldlm_resource_get(struct ldlm_namespace *ns, struct ldlm_resource *parent,
 			res->lr_lvb_len = rc;
 			mutex_unlock(&res->lr_lvb_mutex);
 			ldlm_resource_putref(res);
-			return NULL;
+			return ERR_PTR(rc);
 		}
 	}
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 3291201..fab83dd 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -174,7 +174,7 @@ int mdc_null_inode(struct obd_export *exp,
 	fid_build_reg_res_name(fid, &res_id);
 
 	res = ldlm_resource_get(ns, NULL, &res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 
 	lock_res(res);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index 9bec049..0f71392 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -86,7 +86,7 @@ int mdc_resource_get_unused(struct obd_export *exp, const struct lu_fid *fid,
 	fid_build_reg_res_name(fid, &res_id);
 	res = ldlm_resource_get(exp->exp_obd->obd_namespace,
 				NULL, &res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 	LDLM_RESOURCE_ADDREF(res);
 	/* Initialize ibits lock policy. */
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index e5669e2..90c8416 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -650,7 +650,7 @@ static int osc_resource_get_unused(struct obd_export *exp, struct obdo *oa,
 
 	ostid_build_res_name(&oa->o_oi, &res_id);
 	res = ldlm_resource_get(ns, NULL, &res_id, 0, 0);
-	if (!res)
+	if (IS_ERR(res))
 		return 0;
 
 	LDLM_RESOURCE_ADDREF(res);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 61/80] staging: lustre: obdclass: compile issues with variable not being initialized
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	James Simmons, James Simmons

One of the versions of gcc I have refuses to build obd_mount.c due to
index not be initialized in function lmd_make_exclusion before it is
used.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-on: http://review.whamcloud.com/10705
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 33d6c42..595ea1f 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -730,7 +730,7 @@ int lustre_check_exclusion(struct super_block *sb, char *svname)
 static int lmd_make_exclusion(struct lustre_mount_data *lmd, const char *ptr)
 {
 	const char *s1 = ptr, *s2;
-	__u32 index, *exclude_list;
+	__u32 index = 0, *exclude_list;
 	int rc = 0, devmax;
 
 	/* The shortest an ost name can be is 8 chars: -OST0000.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 61/80] staging: lustre: obdclass: compile issues with variable not being initialized
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	James Simmons, James Simmons

One of the versions of gcc I have refuses to build obd_mount.c due to
index not be initialized in function lmd_make_exclusion before it is
used.

Signed-off-by: James Simmons <uja.ornl@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4629
Reviewed-on: http://review.whamcloud.com/10705
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 33d6c42..595ea1f 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -730,7 +730,7 @@ int lustre_check_exclusion(struct super_block *sb, char *svname)
 static int lmd_make_exclusion(struct lustre_mount_data *lmd, const char *ptr)
 {
 	const char *s1 = ptr, *s2;
-	__u32 index, *exclude_list;
+	__u32 index = 0, *exclude_list;
 	int rc = 0, devmax;
 
 	/* The shortest an ost name can be is 8 chars: -OST0000.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 62/80] staging: lustre: obd: limit lu_object cache
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Brian Behlendorf, James Simmons

From: Brian Behlendorf <behlendorf1@llnl.gov>

As the LU cache grows it can consume large enough chunks of
memory that ends up preventing buffers for other objects,
such as the OIs, from being cached and severely impacting
the performance for FID lookups. Limit the lu_object cache
to a maximum of lu_cache_nr objects.

NOTES:

* In order to be able to quickly determine the number of objects in
  the hash table the CFS_HASH_COUNTER flag is added.  This adds an
  atomic_inc/dec to the hash insert/remove paths but is not expected
  to have any measurable impact of performance.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5164
Reviewed-on: http://review.whamcloud.com/10237
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/lu_object.c |   91 ++++++++++++++------
 1 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 0c00bf8..9d1c96b 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -55,6 +55,34 @@
 #include "../include/lu_ref.h"
 #include <linux/list.h>
 
+enum {
+	LU_CACHE_PERCENT_MAX	 = 50,
+	LU_CACHE_PERCENT_DEFAULT = 20
+};
+
+#define LU_CACHE_NR_MAX_ADJUST		128
+#define LU_CACHE_NR_UNLIMITED		-1
+#define LU_CACHE_NR_DEFAULT		LU_CACHE_NR_UNLIMITED
+#define LU_CACHE_NR_LDISKFS_LIMIT	LU_CACHE_NR_UNLIMITED
+#define LU_CACHE_NR_ZFS_LIMIT		256
+
+#define LU_SITE_BITS_MIN	12
+#define LU_SITE_BITS_MAX	24
+/**
+ * total 256 buckets, we don't want too many buckets because:
+ * - consume too much memory
+ * - avoid unbalanced LRU list
+ */
+#define LU_SITE_BKT_BITS	8
+
+static unsigned int lu_cache_percent = LU_CACHE_PERCENT_DEFAULT;
+module_param(lu_cache_percent, int, 0644);
+MODULE_PARM_DESC(lu_cache_percent, "Percentage of memory to be used as lu_object cache");
+
+static long lu_cache_nr = LU_CACHE_NR_DEFAULT;
+module_param(lu_cache_nr, long, 0644);
+MODULE_PARM_DESC(lu_cache_nr, "Maximum number of objects in lu_object cache");
+
 static void lu_object_free(const struct lu_env *env, struct lu_object *o);
 static __u32 ls_stats_read(struct lprocfs_stats *stats, int idx);
 
@@ -573,6 +601,27 @@ static struct lu_object *lu_object_find(const struct lu_env *env,
 	return lu_object_find_at(env, dev->ld_site->ls_top_dev, f, conf);
 }
 
+/*
+ * Limit the lu_object cache to a maximum of lu_cache_nr objects.  Because
+ * the calculation for the number of objects to reclaim is not covered by
+ * a lock the maximum number of objects is capped by LU_CACHE_MAX_ADJUST.
+ * This ensures that many concurrent threads will not accidentally purge
+ * the entire cache.
+ */
+static void lu_object_limit(const struct lu_env *env, struct lu_device *dev)
+{
+	__u64 size, nr;
+
+	if (lu_cache_nr == LU_CACHE_NR_UNLIMITED)
+		return;
+
+	size = cfs_hash_size_get(dev->ld_site->ls_obj_hash);
+	nr = (__u64)lu_cache_nr;
+	if (size > nr)
+		lu_site_purge(env, dev->ld_site,
+			      min_t(__u64, size - nr, LU_CACHE_NR_MAX_ADJUST));
+}
+
 static struct lu_object *lu_object_new(const struct lu_env *env,
 				       struct lu_device *dev,
 				       const struct lu_fid *f,
@@ -590,6 +639,9 @@ static struct lu_object *lu_object_new(const struct lu_env *env,
 	cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
 	cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
 	cfs_hash_bd_unlock(hs, &bd, 1);
+
+	lu_object_limit(env, dev);
+
 	return o;
 }
 
@@ -656,6 +708,9 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
 	if (likely(PTR_ERR(shadow) == -ENOENT)) {
 		cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
 		cfs_hash_bd_unlock(hs, &bd, 1);
+
+		lu_object_limit(env, dev);
+
 		return o;
 	}
 
@@ -805,20 +860,12 @@ void lu_site_print(const struct lu_env *env, struct lu_site *s, void *cookie,
 }
 EXPORT_SYMBOL(lu_site_print);
 
-enum {
-	LU_CACHE_PERCENT_MAX     = 50,
-	LU_CACHE_PERCENT_DEFAULT = 20
-};
-
-static unsigned int lu_cache_percent = LU_CACHE_PERCENT_DEFAULT;
-module_param(lu_cache_percent, int, 0644);
-MODULE_PARM_DESC(lu_cache_percent, "Percentage of memory to be used as lu_object cache");
-
 /**
  * Return desired hash table order.
  */
-static int lu_htable_order(void)
+static int lu_htable_order(struct lu_device *top)
 {
+	unsigned long bits_max = LU_SITE_BITS_MAX;
 	unsigned long cache_size;
 	int bits;
 
@@ -851,7 +898,7 @@ static int lu_htable_order(void)
 	for (bits = 1; (1 << bits) < cache_size; ++bits) {
 		;
 	}
-	return bits;
+	return clamp_t(typeof(bits), bits, LU_SITE_BITS_MIN, bits_max);
 }
 
 static unsigned lu_obj_hop_hash(struct cfs_hash *hs,
@@ -927,28 +974,17 @@ static void lu_dev_add_linkage(struct lu_site *s, struct lu_device *d)
 /**
  * Initialize site \a s, with \a d as the top level device.
  */
-#define LU_SITE_BITS_MIN    12
-#define LU_SITE_BITS_MAX    19
-/**
- * total 256 buckets, we don't want too many buckets because:
- * - consume too much memory
- * - avoid unbalanced LRU list
- */
-#define LU_SITE_BKT_BITS    8
-
 int lu_site_init(struct lu_site *s, struct lu_device *top)
 {
 	struct lu_site_bkt_data *bkt;
 	struct cfs_hash_bd bd;
+	unsigned long bits;
+	unsigned long i;
 	char name[16];
-	int bits;
-	int i;
 
 	memset(s, 0, sizeof(*s));
-	bits = lu_htable_order();
 	snprintf(name, 16, "lu_site_%s", top->ld_type->ldt_name);
-	for (bits = min(max(LU_SITE_BITS_MIN, bits), LU_SITE_BITS_MAX);
-	     bits >= LU_SITE_BITS_MIN; bits--) {
+	for (bits = lu_htable_order(top); bits >= LU_SITE_BITS_MIN; bits--) {
 		s->ls_obj_hash = cfs_hash_create(name, bits, bits,
 						 bits - LU_SITE_BKT_BITS,
 						 sizeof(*bkt), 0, 0,
@@ -956,13 +992,14 @@ int lu_site_init(struct lu_site *s, struct lu_device *top)
 						 CFS_HASH_SPIN_BKTLOCK |
 						 CFS_HASH_NO_ITEMREF |
 						 CFS_HASH_DEPTH |
-						 CFS_HASH_ASSERT_EMPTY);
+						 CFS_HASH_ASSERT_EMPTY |
+						 CFS_HASH_COUNTER);
 		if (s->ls_obj_hash)
 			break;
 	}
 
 	if (!s->ls_obj_hash) {
-		CERROR("failed to create lu_site hash with bits: %d\n", bits);
+		CERROR("failed to create lu_site hash with bits: %lu\n", bits);
 		return -ENOMEM;
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 62/80] staging: lustre: obd: limit lu_object cache
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Brian Behlendorf, James Simmons

From: Brian Behlendorf <behlendorf1@llnl.gov>

As the LU cache grows it can consume large enough chunks of
memory that ends up preventing buffers for other objects,
such as the OIs, from being cached and severely impacting
the performance for FID lookups. Limit the lu_object cache
to a maximum of lu_cache_nr objects.

NOTES:

* In order to be able to quickly determine the number of objects in
  the hash table the CFS_HASH_COUNTER flag is added.  This adds an
  atomic_inc/dec to the hash insert/remove paths but is not expected
  to have any measurable impact of performance.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5164
Reviewed-on: http://review.whamcloud.com/10237
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/lu_object.c |   91 ++++++++++++++------
 1 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 0c00bf8..9d1c96b 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -55,6 +55,34 @@
 #include "../include/lu_ref.h"
 #include <linux/list.h>
 
+enum {
+	LU_CACHE_PERCENT_MAX	 = 50,
+	LU_CACHE_PERCENT_DEFAULT = 20
+};
+
+#define LU_CACHE_NR_MAX_ADJUST		128
+#define LU_CACHE_NR_UNLIMITED		-1
+#define LU_CACHE_NR_DEFAULT		LU_CACHE_NR_UNLIMITED
+#define LU_CACHE_NR_LDISKFS_LIMIT	LU_CACHE_NR_UNLIMITED
+#define LU_CACHE_NR_ZFS_LIMIT		256
+
+#define LU_SITE_BITS_MIN	12
+#define LU_SITE_BITS_MAX	24
+/**
+ * total 256 buckets, we don't want too many buckets because:
+ * - consume too much memory
+ * - avoid unbalanced LRU list
+ */
+#define LU_SITE_BKT_BITS	8
+
+static unsigned int lu_cache_percent = LU_CACHE_PERCENT_DEFAULT;
+module_param(lu_cache_percent, int, 0644);
+MODULE_PARM_DESC(lu_cache_percent, "Percentage of memory to be used as lu_object cache");
+
+static long lu_cache_nr = LU_CACHE_NR_DEFAULT;
+module_param(lu_cache_nr, long, 0644);
+MODULE_PARM_DESC(lu_cache_nr, "Maximum number of objects in lu_object cache");
+
 static void lu_object_free(const struct lu_env *env, struct lu_object *o);
 static __u32 ls_stats_read(struct lprocfs_stats *stats, int idx);
 
@@ -573,6 +601,27 @@ static struct lu_object *lu_object_find(const struct lu_env *env,
 	return lu_object_find_at(env, dev->ld_site->ls_top_dev, f, conf);
 }
 
+/*
+ * Limit the lu_object cache to a maximum of lu_cache_nr objects.  Because
+ * the calculation for the number of objects to reclaim is not covered by
+ * a lock the maximum number of objects is capped by LU_CACHE_MAX_ADJUST.
+ * This ensures that many concurrent threads will not accidentally purge
+ * the entire cache.
+ */
+static void lu_object_limit(const struct lu_env *env, struct lu_device *dev)
+{
+	__u64 size, nr;
+
+	if (lu_cache_nr == LU_CACHE_NR_UNLIMITED)
+		return;
+
+	size = cfs_hash_size_get(dev->ld_site->ls_obj_hash);
+	nr = (__u64)lu_cache_nr;
+	if (size > nr)
+		lu_site_purge(env, dev->ld_site,
+			      min_t(__u64, size - nr, LU_CACHE_NR_MAX_ADJUST));
+}
+
 static struct lu_object *lu_object_new(const struct lu_env *env,
 				       struct lu_device *dev,
 				       const struct lu_fid *f,
@@ -590,6 +639,9 @@ static struct lu_object *lu_object_new(const struct lu_env *env,
 	cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1);
 	cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
 	cfs_hash_bd_unlock(hs, &bd, 1);
+
+	lu_object_limit(env, dev);
+
 	return o;
 }
 
@@ -656,6 +708,9 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
 	if (likely(PTR_ERR(shadow) == -ENOENT)) {
 		cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash);
 		cfs_hash_bd_unlock(hs, &bd, 1);
+
+		lu_object_limit(env, dev);
+
 		return o;
 	}
 
@@ -805,20 +860,12 @@ void lu_site_print(const struct lu_env *env, struct lu_site *s, void *cookie,
 }
 EXPORT_SYMBOL(lu_site_print);
 
-enum {
-	LU_CACHE_PERCENT_MAX     = 50,
-	LU_CACHE_PERCENT_DEFAULT = 20
-};
-
-static unsigned int lu_cache_percent = LU_CACHE_PERCENT_DEFAULT;
-module_param(lu_cache_percent, int, 0644);
-MODULE_PARM_DESC(lu_cache_percent, "Percentage of memory to be used as lu_object cache");
-
 /**
  * Return desired hash table order.
  */
-static int lu_htable_order(void)
+static int lu_htable_order(struct lu_device *top)
 {
+	unsigned long bits_max = LU_SITE_BITS_MAX;
 	unsigned long cache_size;
 	int bits;
 
@@ -851,7 +898,7 @@ static int lu_htable_order(void)
 	for (bits = 1; (1 << bits) < cache_size; ++bits) {
 		;
 	}
-	return bits;
+	return clamp_t(typeof(bits), bits, LU_SITE_BITS_MIN, bits_max);
 }
 
 static unsigned lu_obj_hop_hash(struct cfs_hash *hs,
@@ -927,28 +974,17 @@ static void lu_dev_add_linkage(struct lu_site *s, struct lu_device *d)
 /**
  * Initialize site \a s, with \a d as the top level device.
  */
-#define LU_SITE_BITS_MIN    12
-#define LU_SITE_BITS_MAX    19
-/**
- * total 256 buckets, we don't want too many buckets because:
- * - consume too much memory
- * - avoid unbalanced LRU list
- */
-#define LU_SITE_BKT_BITS    8
-
 int lu_site_init(struct lu_site *s, struct lu_device *top)
 {
 	struct lu_site_bkt_data *bkt;
 	struct cfs_hash_bd bd;
+	unsigned long bits;
+	unsigned long i;
 	char name[16];
-	int bits;
-	int i;
 
 	memset(s, 0, sizeof(*s));
-	bits = lu_htable_order();
 	snprintf(name, 16, "lu_site_%s", top->ld_type->ldt_name);
-	for (bits = min(max(LU_SITE_BITS_MIN, bits), LU_SITE_BITS_MAX);
-	     bits >= LU_SITE_BITS_MIN; bits--) {
+	for (bits = lu_htable_order(top); bits >= LU_SITE_BITS_MIN; bits--) {
 		s->ls_obj_hash = cfs_hash_create(name, bits, bits,
 						 bits - LU_SITE_BKT_BITS,
 						 sizeof(*bkt), 0, 0,
@@ -956,13 +992,14 @@ int lu_site_init(struct lu_site *s, struct lu_device *top)
 						 CFS_HASH_SPIN_BKTLOCK |
 						 CFS_HASH_NO_ITEMREF |
 						 CFS_HASH_DEPTH |
-						 CFS_HASH_ASSERT_EMPTY);
+						 CFS_HASH_ASSERT_EMPTY |
+						 CFS_HASH_COUNTER);
 		if (s->ls_obj_hash)
 			break;
 	}
 
 	if (!s->ls_obj_hash) {
-		CERROR("failed to create lu_site hash with bits: %d\n", bits);
+		CERROR("failed to create lu_site hash with bits: %lu\n", bits);
 		return -ENOMEM;
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 63/80] staging: lustre: fid: do open-by-fid by default
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Lai Siyao,
	James Simmons

From: Lai Siyao <lai.siyao@intel.com>

Currently client open-by-fid often packs name into the request,
but the name may be invalid, eg. NFS export, and even if it's
valid, it may cause inconsistency because this operation is done
on this fid, which is globally unique, but name not.

Since open-by-fid doesn't pack name, for striped dir we can't know
parent stripe fid on client, so we set parent fid the same as
child fid, and MDT has to find its parent fid from linkea (this is
already supported by MDT).

M_CHECK_STALE becomes obsolete.

Unset MDS_OPEN_FL_INTERNAL from open syscall flags, because these
flags are internally used, and should not be set from user space.

It's not necessary to store parent fid in lli_pfid, because MDT
can get it's parent fid from linkea, and now that DNE stripe
directory stores master inode fid in lli_pfid, stop storing parent
fid to avoid conflict.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3544
Reviewed-on: http://review.whamcloud.com/7476
Reviewed-on: http://review.whamcloud.com/10692
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    5 ++
 .../staging/lustre/lustre/include/lustre_lite.h    |    1 -
 drivers/staging/lustre/lustre/include/lustre_mds.h |    3 -
 drivers/staging/lustre/lustre/llite/file.c         |   71 +++++++++-----------
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   17 +----
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |   14 +++-
 drivers/staging/lustre/lustre/llite/namei.c        |    1 +
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   41 +++++------
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    1 -
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |    5 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   21 ------
 .../lustre/lustre/obdclass/lprocfs_status.c        |    2 +-
 13 files changed, 71 insertions(+), 115 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 400ab3c..a9661c0 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2252,6 +2252,11 @@ void lustre_swab_mdt_rec_setattr(struct mdt_rec_setattr *sa);
 					      */
 #define MDS_OPEN_RELEASE   02000000000000ULL /* Open the file for HSM release */
 
+#define MDS_OPEN_FL_INTERNAL (MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS |	\
+			      MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK |	\
+			      MDS_OPEN_BY_FID | MDS_OPEN_LEASE |	\
+			      MDS_OPEN_RELEASE)
+
 enum mds_op_bias {
 	MDS_CHECK_SPLIT		= 1 << 0,
 	MDS_CROSS_REF		= 1 << 1,
diff --git a/drivers/staging/lustre/lustre/include/lustre_lite.h b/drivers/staging/lustre/lustre/include/lustre_lite.h
index b168977..a3d7573 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lite.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lite.h
@@ -42,7 +42,6 @@
 
 #include "obd_class.h"
 #include "lustre_net.h"
-#include "lustre_mds.h"
 #include "lustre_ha.h"
 
 /* 4UL * 1024 * 1024 */
diff --git a/drivers/staging/lustre/lustre/include/lustre_mds.h b/drivers/staging/lustre/lustre/include/lustre_mds.h
index 4104bd9..23a7e4f 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mds.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mds.h
@@ -58,9 +58,6 @@ struct mds_group_info {
 #define MDD_OBD_NAME     "mdd_obd"
 #define MDD_OBD_UUID     "mdd_obd_uuid"
 
-/* these are local flags, used only on the client, private */
-#define M_CHECK_STALE	   0200000000
-
 /** @} mds */
 
 #endif
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 563cdf6..015b0ab 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -379,53 +379,35 @@ int ll_file_release(struct inode *inode, struct file *file)
 	return rc;
 }
 
-static int ll_intent_file_open(struct dentry *dentry, void *lmm,
-			       int lmmsize, struct lookup_intent *itp)
+static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize,
+			       struct lookup_intent *itp)
 {
-	struct inode *inode = d_inode(dentry);
+	struct inode *inode = d_inode(de);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
-	struct dentry *parent = dentry->d_parent;
-	const char *name = dentry->d_name.name;
-	const int len = dentry->d_name.len;
+	struct dentry *parent = de->d_parent;
+	const char *name = NULL;
 	struct md_op_data *op_data;
 	struct ptlrpc_request *req;
-	__u32 opc = LUSTRE_OPC_ANY;
-	int rc;
+	int len = 0, rc;
 
-	/* Usually we come here only for NFSD, and we want open lock. */
-	/* We can also get here if there was cached open handle in revalidate_it
-	 * but it disappeared while we were getting from there to ll_file_open.
-	 * But this means this file was closed and immediately opened which
-	 * makes a good candidate for using OPEN lock
-	 */
-	/* If lmmsize & lmm are not 0, we are just setting stripe info
-	 * parameters. No need for the open lock
+	LASSERT(parent);
+	LASSERT(itp->it_flags & MDS_OPEN_BY_FID);
+
+	/*
+	 * if server supports open-by-fid, or file name is invalid, don't pack
+	 * name in open request
 	 */
-	if (!lmm && lmmsize == 0) {
-		struct ll_dentry_data *ldd = ll_d2d(dentry);
-		/*
-		 * If we came via ll_iget_for_nfs, then we need to request
-		 * struct ll_dentry_data *ldd = ll_d2d(file->f_dentry);
-		 *
-		 * NB: when ldd is NULL, it must have come via normal
-		 * lookup path only, since ll_iget_for_nfs always calls
-		 * ll_d_init().
-		 */
-		if (ldd && ldd->lld_nfs_dentry) {
-			ldd->lld_nfs_dentry = 0;
-			itp->it_flags |= MDS_OPEN_LOCK;
-		}
-		if (itp->it_flags & FMODE_WRITE)
-			opc = LUSTRE_OPC_CREATE;
+	if (!(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_OPEN_BY_FID) &&
+	    lu_name_is_valid_2(de->d_name.name, de->d_name.len)) {
+		name = de->d_name.name;
+		len = de->d_name.len;
 	}
 
-	op_data  = ll_prep_md_op_data(NULL, d_inode(parent),
-				      inode, name, len,
-				      O_RDWR, opc, NULL);
+	op_data  = ll_prep_md_op_data(NULL, d_inode(parent), inode, name, len,
+				      O_RDWR, LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	itp->it_flags |= MDS_OPEN_BY_FID;
 	rc = md_intent_lock(sbi->ll_md_exp, op_data, lmm, lmmsize, itp,
 			    0 /*unused */, &req, ll_md_blocking_ast, 0);
 	ll_finish_md_op_data(op_data);
@@ -655,9 +637,19 @@ restart:
 			 * result in a deadlock
 			 */
 			mutex_unlock(&lli->lli_och_mutex);
-			it->it_create_mode |= M_CHECK_STALE;
+			/*
+			 * Normally called under two situations:
+			 * 1. NFS export.
+			 * 2. revalidate with IT_OPEN (revalidate doesn't
+			 *    execute this intent any more).
+			 *
+			 * Always fetch MDS_OPEN_LOCK if this is not setstripe.
+			 *
+			 * Always specify MDS_OPEN_BY_FID because we don't want
+			 * to get file with different fid.
+			 */
+			it->it_flags |= MDS_OPEN_LOCK | MDS_OPEN_BY_FID;
 			rc = ll_intent_file_open(file->f_path.dentry, NULL, 0, it);
-			it->it_create_mode &= ~M_CHECK_STALE;
 			if (rc)
 				goto out_openerr;
 
@@ -1399,6 +1391,7 @@ int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
 	}
 
 	ll_inode_size_lock(inode);
+	oit.it_flags |= MDS_OPEN_BY_FID;
 	rc = ll_intent_file_open(dentry, lum, lum_size, &oit);
 	if (rc)
 		goto out_unlock;
@@ -3066,7 +3059,6 @@ static int __ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 		if (IS_ERR(op_data))
 			return PTR_ERR(op_data);
 
-		oit.it_create_mode |= M_CHECK_STALE;
 		rc = md_intent_lock(exp, op_data, NULL, 0,
 				    /* we are not interested in name
 				     * based lookup
@@ -3074,7 +3066,6 @@ static int __ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 				    &oit, 0, &req,
 				    ll_md_blocking_ast, 0);
 		ll_finish_md_op_data(op_data);
-		oit.it_create_mode &= ~M_CHECK_STALE;
 		if (rc < 0) {
 			rc = ll_inode_revalidate_fini(inode, rc);
 			goto out;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 43269aa..b4e843a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -118,9 +118,7 @@ struct ll_inode_info {
 
 	/* identifying fields for both metadata and data stacks. */
 	struct lu_fid		   lli_fid;
-	/* Parent fid for accessing default stripe data on parent directory
-	 * for allocating OST objects after a mknod() and later open-by-FID.
-	 */
+	/* master inode fid for stripe directory */
 	struct lu_fid		   lli_pfid;
 
 	struct list_head	      lli_close_list;
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 5f6343a..da00fbd 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -189,7 +189,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 				  OBD_CONNECT_PINGLESS |
 				  OBD_CONNECT_MAX_EASIZE |
 				  OBD_CONNECT_FLOCK_DEAD |
-				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK;
+				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK |
+				  OBD_CONNECT_OPEN_BY_FID;
 
 	if (sbi->ll_flags & LL_SBI_SOM_PREVIEW)
 		data->ocd_connect_flags |= OBD_CONNECT_SOM;
@@ -2364,20 +2365,6 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	op_data->op_mds = 0;
 	op_data->op_data = data;
 
-	/* If the file is being opened after mknod() (normally due to NFS)
-	 * try to use the default stripe data from parent directory for
-	 * allocating OST objects.  Try to pass the parent FID to MDS.
-	 */
-	if (opc == LUSTRE_OPC_CREATE && i1 == i2 && S_ISREG(i2->i_mode) &&
-	    !ll_i2info(i2)->lli_has_smd) {
-		struct ll_inode_info *lli = ll_i2info(i2);
-
-		spin_lock(&lli->lli_lock);
-		if (likely(!lli->lli_has_smd && !fid_is_zero(&lli->lli_pfid)))
-			op_data->op_fid1 = lli->lli_pfid;
-		spin_unlock(&lli->lli_lock);
-	}
-
 	/* When called by ll_setattr_raw, file is i1. */
 	if (ll_i2info(i1)->lli_flags & LLIF_DATA_MODIFIED)
 		op_data->op_bias |= MDS_DATA_MODIFIED;
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index ac96d89..2b65240 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -148,12 +148,18 @@ ll_iget_for_nfs(struct super_block *sb, struct lu_fid *fid, struct lu_fid *paren
 		return ERR_PTR(-ESTALE);
 	}
 
+	result = d_obtain_alias(inode);
+	if (IS_ERR(result)) {
+		iput(inode);
+		return result;
+	}
+
 	/**
-	 * It is an anonymous dentry without OST objects created yet.
-	 * We have to find the parent to tell MDS how to init lov objects.
+	 * In case d_obtain_alias() found a disconnected dentry, always update
+	 * lli_pfid to allow later operation (normally open) have parent fid,
+	 * which may be used by MDS to create data.
 	 */
-	if (S_ISREG(inode->i_mode) && !ll_i2info(inode)->lli_has_smd &&
-	    parent && !fid_is_zero(parent)) {
+	if (parent) {
 		struct ll_inode_info *lli = ll_i2info(inode);
 
 		spin_lock(&lli->lli_lock);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index ac0f442..ee5a42e 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -650,6 +650,7 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 	it->it_create_mode = (mode & S_IALLUGO) | S_IFREG;
 	it->it_flags = (open_flags & ~O_ACCMODE) | OPEN_FMODE(open_flags);
+	it->it_flags &= ~MDS_OPEN_FL_INTERNAL;
 
 	/* Dentry added to dcache tree in ll_lookup_it */
 	de = ll_lookup_it(dir, dentry, it, lookup_flags);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 761ab24..cde1d7b 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -111,10 +111,6 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		 */
 		LASSERT(it->it_op & IT_OPEN);
 		op_data->op_fid2 = *parent_fid;
-		/* Add object FID to op_fid3, in case it needs to check stale
-		 * (M_CHECK_STALE), see mdc_finish_intent_lock
-		 */
-		op_data->op_fid3 = body->mbo_fid1;
 	}
 
 	op_data->op_bias = MDS_CROSS_REF;
@@ -313,17 +309,16 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	struct mdt_body		*body;
 	int			rc;
 
-	if (it->it_flags & MDS_OPEN_BY_FID && fid_is_sane(&op_data->op_fid2)) {
-		if (op_data->op_mea1) {
-			struct lmv_stripe_md *lsm = op_data->op_mea1;
-			const struct lmv_oinfo *oinfo;
+	if (it->it_flags & MDS_OPEN_BY_FID) {
+		LASSERT(fid_is_sane(&op_data->op_fid2));
 
-			oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
-							op_data->op_namelen);
-			if (IS_ERR(oinfo))
-				return PTR_ERR(oinfo);
-			op_data->op_fid1 = oinfo->lmo_fid;
-		}
+		/*
+		 * for striped directory, we can't know parent stripe fid
+		 * without name, but we can set it to child fid, and MDT
+		 * will obtain it from linkea in open in such case.
+		 */
+		if (op_data->op_mea1)
+			op_data->op_fid1 = op_data->op_fid2;
 
 		tgt = lmv_find_target(lmv, &op_data->op_fid2);
 		if (IS_ERR(tgt))
@@ -331,6 +326,10 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 
 		op_data->op_mds = tgt->ltd_idx;
 	} else {
+		LASSERT(fid_is_sane(&op_data->op_fid1));
+		LASSERT(fid_is_zero(&op_data->op_fid2));
+		LASSERT(op_data->op_name);
+
 		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
 		if (IS_ERR(tgt))
 			return PTR_ERR(tgt);
@@ -339,13 +338,11 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	/* If it is ready to open the file by FID, do not need
 	 * allocate FID at all, otherwise it will confuse MDT
 	 */
-	if ((it->it_op & IT_CREAT) &&
-	    !(it->it_flags & MDS_OPEN_BY_FID)) {
+	if ((it->it_op & IT_CREAT) && !(it->it_flags & MDS_OPEN_BY_FID)) {
 		/*
-		 * For open with IT_CREATE and for IT_CREATE cases allocate new
-		 * fid and setup FLD for it.
+		 * For lookup(IT_CREATE) cases allocate new fid and setup FLD
+		 * for it.
 		 */
-		op_data->op_fid3 = op_data->op_fid2;
 		rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc != 0)
 			return rc;
@@ -494,9 +491,9 @@ int lmv_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 
 	LASSERT(fid_is_sane(&op_data->op_fid1));
 
-	CDEBUG(D_INODE, "INTENT LOCK '%s' for '%*s' on "DFID"\n",
-	       LL_IT2STR(it), op_data->op_namelen, op_data->op_name,
-	       PFID(&op_data->op_fid1));
+	CDEBUG(D_INODE, "INTENT LOCK '%s' for "DFID" '%*s' on "DFID"\n",
+	       LL_IT2STR(it), PFID(&op_data->op_fid2), op_data->op_namelen,
+	       op_data->op_name, PFID(&op_data->op_fid1));
 
 	rc = lmv_check_connect(obd);
 	if (rc)
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 00e8435..1901b93 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -34,7 +34,6 @@
 #define _MDC_INTERNAL_H
 
 #include "../include/lustre_mdc.h"
-#include "../include/lustre_mds.h"
 
 void lprocfs_mdc_init_vars(struct lprocfs_static_vars *lvars);
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 813f923..aa496f3 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -171,10 +171,7 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 static __u64 mds_pack_open_flags(__u64 flags, __u32 mode)
 {
 	__u64 cr_flags = (flags & (FMODE_READ | FMODE_WRITE |
-				   MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS |
-				   MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK |
-				   MDS_OPEN_BY_FID | MDS_OPEN_LEASE |
-				   MDS_OPEN_RELEASE));
+				   MDS_OPEN_FL_INTERNAL));
 	if (flags & O_CREAT)
 		cr_flags |= MDS_OPEN_CREAT;
 	if (flags & O_EXCL)
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index fab83dd..1c3b78d 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -922,27 +922,6 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 	mdt_body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
 	LASSERT(mdt_body);      /* mdc_enqueue checked */
 
-	/* If we were revalidating a fid/name pair, mark the intent in
-	 * case we fail and get called again from lookup
-	 */
-	if (fid_is_sane(&op_data->op_fid2) &&
-	    it->it_create_mode & M_CHECK_STALE &&
-	    it->it_op != IT_GETATTR) {
-		/* Also: did we find the same inode? */
-		/* sever can return one of two fids:
-		 * op_fid2 - new allocated fid - if file is created.
-		 * op_fid3 - existent fid - if file only open.
-		 * op_fid3 is saved in lmv_intent_open
-		 */
-		if ((!lu_fid_eq(&op_data->op_fid2, &mdt_body->mbo_fid1)) &&
-		    (!lu_fid_eq(&op_data->op_fid3, &mdt_body->mbo_fid1))) {
-			CDEBUG(D_DENTRY, "Found stale data "DFID"("DFID")/"DFID
-			       "\n", PFID(&op_data->op_fid2),
-			       PFID(&op_data->op_fid2), PFID(&mdt_body->mbo_fid1));
-			return -ESTALE;
-		}
-	}
-
 	rc = it_open_error(DISP_LOOKUP_EXECD, it);
 	if (rc)
 		return rc;
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index f42ed17..fbb0851 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -96,7 +96,7 @@ static const char * const obd_connect_names[] = {
 	"pingless",
 	"flock_deadlock",
 	"disp_stripe",
-	"unknown",
+	"open_by_fid",
 	"lfsck",
 	"unknown",
 	NULL
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 63/80] staging: lustre: fid: do open-by-fid by default
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Lai Siyao,
	James Simmons

From: Lai Siyao <lai.siyao@intel.com>

Currently client open-by-fid often packs name into the request,
but the name may be invalid, eg. NFS export, and even if it's
valid, it may cause inconsistency because this operation is done
on this fid, which is globally unique, but name not.

Since open-by-fid doesn't pack name, for striped dir we can't know
parent stripe fid on client, so we set parent fid the same as
child fid, and MDT has to find its parent fid from linkea (this is
already supported by MDT).

M_CHECK_STALE becomes obsolete.

Unset MDS_OPEN_FL_INTERNAL from open syscall flags, because these
flags are internally used, and should not be set from user space.

It's not necessary to store parent fid in lli_pfid, because MDT
can get it's parent fid from linkea, and now that DNE stripe
directory stores master inode fid in lli_pfid, stop storing parent
fid to avoid conflict.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3544
Reviewed-on: http://review.whamcloud.com/7476
Reviewed-on: http://review.whamcloud.com/10692
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    5 ++
 .../staging/lustre/lustre/include/lustre_lite.h    |    1 -
 drivers/staging/lustre/lustre/include/lustre_mds.h |    3 -
 drivers/staging/lustre/lustre/llite/file.c         |   71 +++++++++-----------
 .../staging/lustre/lustre/llite/llite_internal.h   |    4 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   17 +----
 drivers/staging/lustre/lustre/llite/llite_nfs.c    |   14 +++-
 drivers/staging/lustre/lustre/llite/namei.c        |    1 +
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   41 +++++------
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |    1 -
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |    5 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   21 ------
 .../lustre/lustre/obdclass/lprocfs_status.c        |    2 +-
 13 files changed, 71 insertions(+), 115 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 400ab3c..a9661c0 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2252,6 +2252,11 @@ void lustre_swab_mdt_rec_setattr(struct mdt_rec_setattr *sa);
 					      */
 #define MDS_OPEN_RELEASE   02000000000000ULL /* Open the file for HSM release */
 
+#define MDS_OPEN_FL_INTERNAL (MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS |	\
+			      MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK |	\
+			      MDS_OPEN_BY_FID | MDS_OPEN_LEASE |	\
+			      MDS_OPEN_RELEASE)
+
 enum mds_op_bias {
 	MDS_CHECK_SPLIT		= 1 << 0,
 	MDS_CROSS_REF		= 1 << 1,
diff --git a/drivers/staging/lustre/lustre/include/lustre_lite.h b/drivers/staging/lustre/lustre/include/lustre_lite.h
index b168977..a3d7573 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lite.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lite.h
@@ -42,7 +42,6 @@
 
 #include "obd_class.h"
 #include "lustre_net.h"
-#include "lustre_mds.h"
 #include "lustre_ha.h"
 
 /* 4UL * 1024 * 1024 */
diff --git a/drivers/staging/lustre/lustre/include/lustre_mds.h b/drivers/staging/lustre/lustre/include/lustre_mds.h
index 4104bd9..23a7e4f 100644
--- a/drivers/staging/lustre/lustre/include/lustre_mds.h
+++ b/drivers/staging/lustre/lustre/include/lustre_mds.h
@@ -58,9 +58,6 @@ struct mds_group_info {
 #define MDD_OBD_NAME     "mdd_obd"
 #define MDD_OBD_UUID     "mdd_obd_uuid"
 
-/* these are local flags, used only on the client, private */
-#define M_CHECK_STALE	   0200000000
-
 /** @} mds */
 
 #endif
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 563cdf6..015b0ab 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -379,53 +379,35 @@ int ll_file_release(struct inode *inode, struct file *file)
 	return rc;
 }
 
-static int ll_intent_file_open(struct dentry *dentry, void *lmm,
-			       int lmmsize, struct lookup_intent *itp)
+static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize,
+			       struct lookup_intent *itp)
 {
-	struct inode *inode = d_inode(dentry);
+	struct inode *inode = d_inode(de);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
-	struct dentry *parent = dentry->d_parent;
-	const char *name = dentry->d_name.name;
-	const int len = dentry->d_name.len;
+	struct dentry *parent = de->d_parent;
+	const char *name = NULL;
 	struct md_op_data *op_data;
 	struct ptlrpc_request *req;
-	__u32 opc = LUSTRE_OPC_ANY;
-	int rc;
+	int len = 0, rc;
 
-	/* Usually we come here only for NFSD, and we want open lock. */
-	/* We can also get here if there was cached open handle in revalidate_it
-	 * but it disappeared while we were getting from there to ll_file_open.
-	 * But this means this file was closed and immediately opened which
-	 * makes a good candidate for using OPEN lock
-	 */
-	/* If lmmsize & lmm are not 0, we are just setting stripe info
-	 * parameters. No need for the open lock
+	LASSERT(parent);
+	LASSERT(itp->it_flags & MDS_OPEN_BY_FID);
+
+	/*
+	 * if server supports open-by-fid, or file name is invalid, don't pack
+	 * name in open request
 	 */
-	if (!lmm && lmmsize == 0) {
-		struct ll_dentry_data *ldd = ll_d2d(dentry);
-		/*
-		 * If we came via ll_iget_for_nfs, then we need to request
-		 * struct ll_dentry_data *ldd = ll_d2d(file->f_dentry);
-		 *
-		 * NB: when ldd is NULL, it must have come via normal
-		 * lookup path only, since ll_iget_for_nfs always calls
-		 * ll_d_init().
-		 */
-		if (ldd && ldd->lld_nfs_dentry) {
-			ldd->lld_nfs_dentry = 0;
-			itp->it_flags |= MDS_OPEN_LOCK;
-		}
-		if (itp->it_flags & FMODE_WRITE)
-			opc = LUSTRE_OPC_CREATE;
+	if (!(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_OPEN_BY_FID) &&
+	    lu_name_is_valid_2(de->d_name.name, de->d_name.len)) {
+		name = de->d_name.name;
+		len = de->d_name.len;
 	}
 
-	op_data  = ll_prep_md_op_data(NULL, d_inode(parent),
-				      inode, name, len,
-				      O_RDWR, opc, NULL);
+	op_data  = ll_prep_md_op_data(NULL, d_inode(parent), inode, name, len,
+				      O_RDWR, LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	itp->it_flags |= MDS_OPEN_BY_FID;
 	rc = md_intent_lock(sbi->ll_md_exp, op_data, lmm, lmmsize, itp,
 			    0 /*unused */, &req, ll_md_blocking_ast, 0);
 	ll_finish_md_op_data(op_data);
@@ -655,9 +637,19 @@ restart:
 			 * result in a deadlock
 			 */
 			mutex_unlock(&lli->lli_och_mutex);
-			it->it_create_mode |= M_CHECK_STALE;
+			/*
+			 * Normally called under two situations:
+			 * 1. NFS export.
+			 * 2. revalidate with IT_OPEN (revalidate doesn't
+			 *    execute this intent any more).
+			 *
+			 * Always fetch MDS_OPEN_LOCK if this is not setstripe.
+			 *
+			 * Always specify MDS_OPEN_BY_FID because we don't want
+			 * to get file with different fid.
+			 */
+			it->it_flags |= MDS_OPEN_LOCK | MDS_OPEN_BY_FID;
 			rc = ll_intent_file_open(file->f_path.dentry, NULL, 0, it);
-			it->it_create_mode &= ~M_CHECK_STALE;
 			if (rc)
 				goto out_openerr;
 
@@ -1399,6 +1391,7 @@ int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
 	}
 
 	ll_inode_size_lock(inode);
+	oit.it_flags |= MDS_OPEN_BY_FID;
 	rc = ll_intent_file_open(dentry, lum, lum_size, &oit);
 	if (rc)
 		goto out_unlock;
@@ -3066,7 +3059,6 @@ static int __ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 		if (IS_ERR(op_data))
 			return PTR_ERR(op_data);
 
-		oit.it_create_mode |= M_CHECK_STALE;
 		rc = md_intent_lock(exp, op_data, NULL, 0,
 				    /* we are not interested in name
 				     * based lookup
@@ -3074,7 +3066,6 @@ static int __ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 				    &oit, 0, &req,
 				    ll_md_blocking_ast, 0);
 		ll_finish_md_op_data(op_data);
-		oit.it_create_mode &= ~M_CHECK_STALE;
 		if (rc < 0) {
 			rc = ll_inode_revalidate_fini(inode, rc);
 			goto out;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 43269aa..b4e843a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -118,9 +118,7 @@ struct ll_inode_info {
 
 	/* identifying fields for both metadata and data stacks. */
 	struct lu_fid		   lli_fid;
-	/* Parent fid for accessing default stripe data on parent directory
-	 * for allocating OST objects after a mknod() and later open-by-FID.
-	 */
+	/* master inode fid for stripe directory */
 	struct lu_fid		   lli_pfid;
 
 	struct list_head	      lli_close_list;
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 5f6343a..da00fbd 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -189,7 +189,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
 				  OBD_CONNECT_PINGLESS |
 				  OBD_CONNECT_MAX_EASIZE |
 				  OBD_CONNECT_FLOCK_DEAD |
-				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK;
+				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK |
+				  OBD_CONNECT_OPEN_BY_FID;
 
 	if (sbi->ll_flags & LL_SBI_SOM_PREVIEW)
 		data->ocd_connect_flags |= OBD_CONNECT_SOM;
@@ -2364,20 +2365,6 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	op_data->op_mds = 0;
 	op_data->op_data = data;
 
-	/* If the file is being opened after mknod() (normally due to NFS)
-	 * try to use the default stripe data from parent directory for
-	 * allocating OST objects.  Try to pass the parent FID to MDS.
-	 */
-	if (opc == LUSTRE_OPC_CREATE && i1 == i2 && S_ISREG(i2->i_mode) &&
-	    !ll_i2info(i2)->lli_has_smd) {
-		struct ll_inode_info *lli = ll_i2info(i2);
-
-		spin_lock(&lli->lli_lock);
-		if (likely(!lli->lli_has_smd && !fid_is_zero(&lli->lli_pfid)))
-			op_data->op_fid1 = lli->lli_pfid;
-		spin_unlock(&lli->lli_lock);
-	}
-
 	/* When called by ll_setattr_raw, file is i1. */
 	if (ll_i2info(i1)->lli_flags & LLIF_DATA_MODIFIED)
 		op_data->op_bias |= MDS_DATA_MODIFIED;
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index ac96d89..2b65240 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -148,12 +148,18 @@ ll_iget_for_nfs(struct super_block *sb, struct lu_fid *fid, struct lu_fid *paren
 		return ERR_PTR(-ESTALE);
 	}
 
+	result = d_obtain_alias(inode);
+	if (IS_ERR(result)) {
+		iput(inode);
+		return result;
+	}
+
 	/**
-	 * It is an anonymous dentry without OST objects created yet.
-	 * We have to find the parent to tell MDS how to init lov objects.
+	 * In case d_obtain_alias() found a disconnected dentry, always update
+	 * lli_pfid to allow later operation (normally open) have parent fid,
+	 * which may be used by MDS to create data.
 	 */
-	if (S_ISREG(inode->i_mode) && !ll_i2info(inode)->lli_has_smd &&
-	    parent && !fid_is_zero(parent)) {
+	if (parent) {
 		struct ll_inode_info *lli = ll_i2info(inode);
 
 		spin_lock(&lli->lli_lock);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index ac0f442..ee5a42e 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -650,6 +650,7 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 	it->it_create_mode = (mode & S_IALLUGO) | S_IFREG;
 	it->it_flags = (open_flags & ~O_ACCMODE) | OPEN_FMODE(open_flags);
+	it->it_flags &= ~MDS_OPEN_FL_INTERNAL;
 
 	/* Dentry added to dcache tree in ll_lookup_it */
 	de = ll_lookup_it(dir, dentry, it, lookup_flags);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 761ab24..cde1d7b 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -111,10 +111,6 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
 		 */
 		LASSERT(it->it_op & IT_OPEN);
 		op_data->op_fid2 = *parent_fid;
-		/* Add object FID to op_fid3, in case it needs to check stale
-		 * (M_CHECK_STALE), see mdc_finish_intent_lock
-		 */
-		op_data->op_fid3 = body->mbo_fid1;
 	}
 
 	op_data->op_bias = MDS_CROSS_REF;
@@ -313,17 +309,16 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	struct mdt_body		*body;
 	int			rc;
 
-	if (it->it_flags & MDS_OPEN_BY_FID && fid_is_sane(&op_data->op_fid2)) {
-		if (op_data->op_mea1) {
-			struct lmv_stripe_md *lsm = op_data->op_mea1;
-			const struct lmv_oinfo *oinfo;
+	if (it->it_flags & MDS_OPEN_BY_FID) {
+		LASSERT(fid_is_sane(&op_data->op_fid2));
 
-			oinfo = lsm_name_to_stripe_info(lsm, op_data->op_name,
-							op_data->op_namelen);
-			if (IS_ERR(oinfo))
-				return PTR_ERR(oinfo);
-			op_data->op_fid1 = oinfo->lmo_fid;
-		}
+		/*
+		 * for striped directory, we can't know parent stripe fid
+		 * without name, but we can set it to child fid, and MDT
+		 * will obtain it from linkea in open in such case.
+		 */
+		if (op_data->op_mea1)
+			op_data->op_fid1 = op_data->op_fid2;
 
 		tgt = lmv_find_target(lmv, &op_data->op_fid2);
 		if (IS_ERR(tgt))
@@ -331,6 +326,10 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 
 		op_data->op_mds = tgt->ltd_idx;
 	} else {
+		LASSERT(fid_is_sane(&op_data->op_fid1));
+		LASSERT(fid_is_zero(&op_data->op_fid2));
+		LASSERT(op_data->op_name);
+
 		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
 		if (IS_ERR(tgt))
 			return PTR_ERR(tgt);
@@ -339,13 +338,11 @@ static int lmv_intent_open(struct obd_export *exp, struct md_op_data *op_data,
 	/* If it is ready to open the file by FID, do not need
 	 * allocate FID at all, otherwise it will confuse MDT
 	 */
-	if ((it->it_op & IT_CREAT) &&
-	    !(it->it_flags & MDS_OPEN_BY_FID)) {
+	if ((it->it_op & IT_CREAT) && !(it->it_flags & MDS_OPEN_BY_FID)) {
 		/*
-		 * For open with IT_CREATE and for IT_CREATE cases allocate new
-		 * fid and setup FLD for it.
+		 * For lookup(IT_CREATE) cases allocate new fid and setup FLD
+		 * for it.
 		 */
-		op_data->op_fid3 = op_data->op_fid2;
 		rc = lmv_fid_alloc(NULL, exp, &op_data->op_fid2, op_data);
 		if (rc != 0)
 			return rc;
@@ -494,9 +491,9 @@ int lmv_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
 
 	LASSERT(fid_is_sane(&op_data->op_fid1));
 
-	CDEBUG(D_INODE, "INTENT LOCK '%s' for '%*s' on "DFID"\n",
-	       LL_IT2STR(it), op_data->op_namelen, op_data->op_name,
-	       PFID(&op_data->op_fid1));
+	CDEBUG(D_INODE, "INTENT LOCK '%s' for "DFID" '%*s' on "DFID"\n",
+	       LL_IT2STR(it), PFID(&op_data->op_fid2), op_data->op_namelen,
+	       op_data->op_name, PFID(&op_data->op_fid1));
 
 	rc = lmv_check_connect(obd);
 	if (rc)
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 00e8435..1901b93 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -34,7 +34,6 @@
 #define _MDC_INTERNAL_H
 
 #include "../include/lustre_mdc.h"
-#include "../include/lustre_mds.h"
 
 void lprocfs_mdc_init_vars(struct lprocfs_static_vars *lvars);
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 813f923..aa496f3 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -171,10 +171,7 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 static __u64 mds_pack_open_flags(__u64 flags, __u32 mode)
 {
 	__u64 cr_flags = (flags & (FMODE_READ | FMODE_WRITE |
-				   MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS |
-				   MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK |
-				   MDS_OPEN_BY_FID | MDS_OPEN_LEASE |
-				   MDS_OPEN_RELEASE));
+				   MDS_OPEN_FL_INTERNAL));
 	if (flags & O_CREAT)
 		cr_flags |= MDS_OPEN_CREAT;
 	if (flags & O_EXCL)
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index fab83dd..1c3b78d 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -922,27 +922,6 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
 	mdt_body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
 	LASSERT(mdt_body);      /* mdc_enqueue checked */
 
-	/* If we were revalidating a fid/name pair, mark the intent in
-	 * case we fail and get called again from lookup
-	 */
-	if (fid_is_sane(&op_data->op_fid2) &&
-	    it->it_create_mode & M_CHECK_STALE &&
-	    it->it_op != IT_GETATTR) {
-		/* Also: did we find the same inode? */
-		/* sever can return one of two fids:
-		 * op_fid2 - new allocated fid - if file is created.
-		 * op_fid3 - existent fid - if file only open.
-		 * op_fid3 is saved in lmv_intent_open
-		 */
-		if ((!lu_fid_eq(&op_data->op_fid2, &mdt_body->mbo_fid1)) &&
-		    (!lu_fid_eq(&op_data->op_fid3, &mdt_body->mbo_fid1))) {
-			CDEBUG(D_DENTRY, "Found stale data "DFID"("DFID")/"DFID
-			       "\n", PFID(&op_data->op_fid2),
-			       PFID(&op_data->op_fid2), PFID(&mdt_body->mbo_fid1));
-			return -ESTALE;
-		}
-	}
-
 	rc = it_open_error(DISP_LOOKUP_EXECD, it);
 	if (rc)
 		return rc;
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index f42ed17..fbb0851 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -96,7 +96,7 @@ static const char * const obd_connect_names[] = {
 	"pingless",
 	"flock_deadlock",
 	"disp_stripe",
-	"unknown",
+	"open_by_fid",
 	"lfsck",
 	"unknown",
 	NULL
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 64/80] staging: lustre: ptlrpc: add OBD_CONNECT_UNLINK_CLOSE flag
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Lai Siyao,
	James Simmons

From: Lai Siyao <lai.siyao@intel.com>

Add OBD_CONNECT_UNLINK_CLOSE flag for interop, once this is supported,
client packs file handle in unlink RPC, and MDT will close file before
unlink.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4367
Reviewed-on: http://review.whamcloud.com/10426
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    1 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |    2 ++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    2 ++
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a9661c0..4a7ccc8 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1290,6 +1290,7 @@ void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 							 * name in request
 							 */
 #define OBD_CONNECT_LFSCK	0x40000000000000ULL/* support online LFSCK */
+#define OBD_CONNECT_UNLINK_CLOSE 0x100000000000000ULL/* close file in unlink */
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index fbb0851..45e3c4a 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -99,6 +99,8 @@ static const char * const obd_connect_names[] = {
 	"open_by_fid",
 	"lfsck",
 	"unknown",
+	"unlink_close",
+	"unknown",
 	NULL
 };
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 60d03dd..2c718e0 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1073,6 +1073,8 @@ void lustre_assert_wire_constants(void)
 		 "found 0x%.16llxULL\n", OBD_CONNECT_OPEN_BY_FID);
 	LASSERTF(OBD_CONNECT_LFSCK == 0x40000000000000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT_LFSCK);
+	LASSERTF(OBD_CONNECT_UNLINK_CLOSE == 0x100000000000000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT_UNLINK_CLOSE);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 64/80] staging: lustre: ptlrpc: add OBD_CONNECT_UNLINK_CLOSE flag
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Lai Siyao,
	James Simmons

From: Lai Siyao <lai.siyao@intel.com>

Add OBD_CONNECT_UNLINK_CLOSE flag for interop, once this is supported,
client packs file handle in unlink RPC, and MDT will close file before
unlink.

Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4367
Reviewed-on: http://review.whamcloud.com/10426
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    1 +
 .../lustre/lustre/obdclass/lprocfs_status.c        |    2 ++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |    2 ++
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a9661c0..4a7ccc8 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1290,6 +1290,7 @@ void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
 							 * name in request
 							 */
 #define OBD_CONNECT_LFSCK	0x40000000000000ULL/* support online LFSCK */
+#define OBD_CONNECT_UNLINK_CLOSE 0x100000000000000ULL/* close file in unlink */
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index fbb0851..45e3c4a 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -99,6 +99,8 @@ static const char * const obd_connect_names[] = {
 	"open_by_fid",
 	"lfsck",
 	"unknown",
+	"unlink_close",
+	"unknown",
 	NULL
 };
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 60d03dd..2c718e0 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1073,6 +1073,8 @@ void lustre_assert_wire_constants(void)
 		 "found 0x%.16llxULL\n", OBD_CONNECT_OPEN_BY_FID);
 	LASSERTF(OBD_CONNECT_LFSCK == 0x40000000000000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT_LFSCK);
+	LASSERTF(OBD_CONNECT_UNLINK_CLOSE == 0x100000000000000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT_UNLINK_CLOSE);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		(unsigned)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 65/80] staging: lustre: llog: keep llog ctxt indices constant
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Mikhail Pershin, James Simmons

From: Mikhail Pershin <mike.pershin@intel.com>

The llog context id table cannot be shrunk easily because that
will cause index shifting and incompatibility between old client
and new server and vice versa.

Patch moves llog_ctxt_id table to the lustre_idl.h because this is
wire protocol data, these values are added to the wirecheck.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5218
Reviewed-on: http://review.whamcloud.com/10758
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   24 +++++++++++++++++++-
 drivers/staging/lustre/lustre/include/obd.h        |   21 -----------------
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |   13 ++++++++++
 3 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4a7ccc8..05fe359 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2936,7 +2936,29 @@ enum obd_cmd {
 };
 #define OBD_FIRST_OPC OBD_PING
 
-/* catalog of log objects */
+/**
+ * llog contexts indices.
+ *
+ * There is compatibility problem with indexes below, they are not
+ * continuous and must keep their numbers for compatibility needs.
+ * See LU-5218 for details.
+ */
+enum llog_ctxt_id {
+	LLOG_CONFIG_ORIG_CTXT  =  0,
+	LLOG_CONFIG_REPL_CTXT = 1,
+	LLOG_MDS_OST_ORIG_CTXT = 2,
+	LLOG_MDS_OST_REPL_CTXT = 3, /* kept just to avoid re-assignment */
+	LLOG_SIZE_ORIG_CTXT = 4,
+	LLOG_SIZE_REPL_CTXT = 5,
+	LLOG_TEST_ORIG_CTXT = 8,
+	LLOG_TEST_REPL_CTXT = 9, /* kept just to avoid re-assignment */
+	LLOG_CHANGELOG_ORIG_CTXT = 12, /**< changelog generation on mdd */
+	LLOG_CHANGELOG_REPL_CTXT = 13, /**< changelog access on clients */
+	/* for multiple changelog consumers */
+	LLOG_CHANGELOG_USER_ORIG_CTXT = 14,
+	LLOG_AGENT_ORIG_CTXT = 15, /**< agent requests generation on cdt */
+	LLOG_MAX_CTXTS
+};
 
 /** Identifier for a single log object */
 struct llog_logid {
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index b7bdd07..e7e03be 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -172,27 +172,6 @@ struct brw_page {
 	u32 flag;
 };
 
-/* llog contexts */
-enum llog_ctxt_id {
-	LLOG_CONFIG_ORIG_CTXT  =  0,
-	LLOG_CONFIG_REPL_CTXT,
-	LLOG_MDS_OST_ORIG_CTXT,
-	LLOG_MDS_OST_REPL_CTXT,
-	LLOG_SIZE_ORIG_CTXT,
-	LLOG_SIZE_REPL_CTXT,
-	LLOG_RD1_ORIG_CTXT,
-	LLOG_RD1_REPL_CTXT,
-	LLOG_TEST_ORIG_CTXT,
-	LLOG_TEST_REPL_CTXT,
-	LLOG_LOVEA_ORIG_CTXT,
-	LLOG_LOVEA_REPL_CTXT,
-	LLOG_CHANGELOG_ORIG_CTXT,	/**< changelog generation on mdd */
-	LLOG_CHANGELOG_REPL_CTXT,	/**< changelog access on clients */
-	LLOG_CHANGELOG_USER_ORIG_CTXT,	/**< for multiple changelog consumers */
-	LLOG_AGENT_ORIG_CTXT,		/**< agent requests generation on cdt */
-	LLOG_MAX_CTXTS
-};
-
 struct timeout_item {
 	enum timeout_event ti_event;
 	unsigned long	 ti_timeout;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 2c718e0..31d3326 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -3483,6 +3483,19 @@ void lustre_assert_wire_constants(void)
 	CLASSERT(LLOG_ORIGIN_HANDLE_DESTROY == 509);
 	CLASSERT(LLOG_FIRST_OPC == 501);
 	CLASSERT(LLOG_LAST_OPC == 510);
+	CLASSERT(LLOG_CONFIG_ORIG_CTXT == 0);
+	CLASSERT(LLOG_CONFIG_REPL_CTXT == 1);
+	CLASSERT(LLOG_MDS_OST_ORIG_CTXT == 2);
+	CLASSERT(LLOG_MDS_OST_REPL_CTXT == 3);
+	CLASSERT(LLOG_SIZE_ORIG_CTXT == 4);
+	CLASSERT(LLOG_SIZE_REPL_CTXT == 5);
+	CLASSERT(LLOG_TEST_ORIG_CTXT == 8);
+	CLASSERT(LLOG_TEST_REPL_CTXT == 9);
+	CLASSERT(LLOG_CHANGELOG_ORIG_CTXT == 12);
+	CLASSERT(LLOG_CHANGELOG_REPL_CTXT == 13);
+	CLASSERT(LLOG_CHANGELOG_USER_ORIG_CTXT == 14);
+	CLASSERT(LLOG_AGENT_ORIG_CTXT == 15);
+	CLASSERT(LLOG_MAX_CTXTS == 16);
 
 	/* Checks for struct llogd_conn_body */
 	LASSERTF((int)sizeof(struct llogd_conn_body) == 40, "found %lld\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 65/80] staging: lustre: llog: keep llog ctxt indices constant
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Mikhail Pershin, James Simmons

From: Mikhail Pershin <mike.pershin@intel.com>

The llog context id table cannot be shrunk easily because that
will cause index shifting and incompatibility between old client
and new server and vice versa.

Patch moves llog_ctxt_id table to the lustre_idl.h because this is
wire protocol data, these values are added to the wirecheck.

Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5218
Reviewed-on: http://review.whamcloud.com/10758
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |   24 +++++++++++++++++++-
 drivers/staging/lustre/lustre/include/obd.h        |   21 -----------------
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |   13 ++++++++++
 3 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4a7ccc8..05fe359 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2936,7 +2936,29 @@ enum obd_cmd {
 };
 #define OBD_FIRST_OPC OBD_PING
 
-/* catalog of log objects */
+/**
+ * llog contexts indices.
+ *
+ * There is compatibility problem with indexes below, they are not
+ * continuous and must keep their numbers for compatibility needs.
+ * See LU-5218 for details.
+ */
+enum llog_ctxt_id {
+	LLOG_CONFIG_ORIG_CTXT  =  0,
+	LLOG_CONFIG_REPL_CTXT = 1,
+	LLOG_MDS_OST_ORIG_CTXT = 2,
+	LLOG_MDS_OST_REPL_CTXT = 3, /* kept just to avoid re-assignment */
+	LLOG_SIZE_ORIG_CTXT = 4,
+	LLOG_SIZE_REPL_CTXT = 5,
+	LLOG_TEST_ORIG_CTXT = 8,
+	LLOG_TEST_REPL_CTXT = 9, /* kept just to avoid re-assignment */
+	LLOG_CHANGELOG_ORIG_CTXT = 12, /**< changelog generation on mdd */
+	LLOG_CHANGELOG_REPL_CTXT = 13, /**< changelog access on clients */
+	/* for multiple changelog consumers */
+	LLOG_CHANGELOG_USER_ORIG_CTXT = 14,
+	LLOG_AGENT_ORIG_CTXT = 15, /**< agent requests generation on cdt */
+	LLOG_MAX_CTXTS
+};
 
 /** Identifier for a single log object */
 struct llog_logid {
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index b7bdd07..e7e03be 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -172,27 +172,6 @@ struct brw_page {
 	u32 flag;
 };
 
-/* llog contexts */
-enum llog_ctxt_id {
-	LLOG_CONFIG_ORIG_CTXT  =  0,
-	LLOG_CONFIG_REPL_CTXT,
-	LLOG_MDS_OST_ORIG_CTXT,
-	LLOG_MDS_OST_REPL_CTXT,
-	LLOG_SIZE_ORIG_CTXT,
-	LLOG_SIZE_REPL_CTXT,
-	LLOG_RD1_ORIG_CTXT,
-	LLOG_RD1_REPL_CTXT,
-	LLOG_TEST_ORIG_CTXT,
-	LLOG_TEST_REPL_CTXT,
-	LLOG_LOVEA_ORIG_CTXT,
-	LLOG_LOVEA_REPL_CTXT,
-	LLOG_CHANGELOG_ORIG_CTXT,	/**< changelog generation on mdd */
-	LLOG_CHANGELOG_REPL_CTXT,	/**< changelog access on clients */
-	LLOG_CHANGELOG_USER_ORIG_CTXT,	/**< for multiple changelog consumers */
-	LLOG_AGENT_ORIG_CTXT,		/**< agent requests generation on cdt */
-	LLOG_MAX_CTXTS
-};
-
 struct timeout_item {
 	enum timeout_event ti_event;
 	unsigned long	 ti_timeout;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 2c718e0..31d3326 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -3483,6 +3483,19 @@ void lustre_assert_wire_constants(void)
 	CLASSERT(LLOG_ORIGIN_HANDLE_DESTROY == 509);
 	CLASSERT(LLOG_FIRST_OPC == 501);
 	CLASSERT(LLOG_LAST_OPC == 510);
+	CLASSERT(LLOG_CONFIG_ORIG_CTXT == 0);
+	CLASSERT(LLOG_CONFIG_REPL_CTXT == 1);
+	CLASSERT(LLOG_MDS_OST_ORIG_CTXT == 2);
+	CLASSERT(LLOG_MDS_OST_REPL_CTXT == 3);
+	CLASSERT(LLOG_SIZE_ORIG_CTXT == 4);
+	CLASSERT(LLOG_SIZE_REPL_CTXT == 5);
+	CLASSERT(LLOG_TEST_ORIG_CTXT == 8);
+	CLASSERT(LLOG_TEST_REPL_CTXT == 9);
+	CLASSERT(LLOG_CHANGELOG_ORIG_CTXT == 12);
+	CLASSERT(LLOG_CHANGELOG_REPL_CTXT == 13);
+	CLASSERT(LLOG_CHANGELOG_USER_ORIG_CTXT == 14);
+	CLASSERT(LLOG_AGENT_ORIG_CTXT == 15);
+	CLASSERT(LLOG_MAX_CTXTS == 16);
 
 	/* Checks for struct llogd_conn_body */
 	LASSERTF((int)sizeof(struct llogd_conn_body) == 40, "found %lld\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 66/80] staging: lustre: lmv: try all stripes for unknown hash functions
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

For unknown hash type, LMV should try all stripes to locate
the name entry. But it will only for lookup and unlink, i.e.
we can only list and unlink entries under striped dir with
unknown hash type.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4921
Reviewed-on: http://review.whamcloud.com/10041
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_user.h     |    1 +
 .../staging/lustre/lustre/include/obd_support.h    |    3 +
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   70 +++++++---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |   12 ++
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  144 ++++++++++++++++----
 5 files changed, 182 insertions(+), 48 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 9e38ed3..52cd585 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -383,6 +383,7 @@ struct lmv_user_mds_data {
 };
 
 enum lmv_hash_type {
+	LMV_HASH_TYPE_UNKNOWN	= 0,	/* 0 is reserved for testing purpose */
 	LMV_HASH_TYPE_ALL_CHARS = 1,
 	LMV_HASH_TYPE_FNV_1A_64 = 2,
 };
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index a11fff1..f747bca 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -483,6 +483,9 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_UPDATE_OBJ_NET			0x1700
 #define OBD_FAIL_UPDATE_OBJ_NET_REP		0x1701
 
+/* LMV */
+#define OBD_FAIL_UNKNOWN_LMV_STRIPE		0x1901
+
 /* Assign references to moved code to reduce code changes */
 #define OBD_FAIL_PRECHECK(id)		   CFS_FAIL_PRECHECK(id)
 #define OBD_FAIL_CHECK(id)		      CFS_FAIL_CHECK(id)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index cde1d7b..0559445 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -402,10 +402,28 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	struct mdt_body	*body;
 	int		     rc = 0;
 
+	/*
+	 * If it returns ERR_PTR(-EBADFD) then it is an unknown hash type
+	 * it will try all stripes to locate the object
+	 */
 	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(tgt))
+	if (IS_ERR(tgt) && (PTR_ERR(tgt) != -EBADFD))
 		return PTR_ERR(tgt);
 
+	/*
+	 * Both migrating dir and unknown hash dir need to try
+	 * all of sub-stripes
+	 */
+	if (lsm && !lmv_is_known_hash_type(lsm)) {
+		struct lmv_oinfo *oinfo = &lsm->lsm_md_oinfo[0];
+
+		op_data->op_fid1 = oinfo->lmo_fid;
+		op_data->op_mds = oinfo->lmo_mds;
+		tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
+
 	if (!fid_is_sane(&op_data->op_fid2))
 		fid_zero(&op_data->op_fid2);
 
@@ -435,27 +453,39 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		}
 		return rc;
 	} else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm &&
-		   lsm->lsm_md_magic & LMV_HASH_FLAG_MIGRATION) {
+		   lmv_need_try_all_stripes(lsm)) {
 		/*
-		 * For migrating directory, if it can not find the child in
-		 * the source directory(master stripe), try the targeting
-		 * directory(stripe 1)
+		 * For migrating and unknown hash type directory, it will
+		 * try to target the entry on other stripes
 		 */
-		tgt = lmv_find_target(lmv, &lsm->lsm_md_oinfo[1].lmo_fid);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
-
-		ptlrpc_req_finished(*reqp);
-		it->it_request = NULL;
-		*reqp = NULL;
-
-		CDEBUG(D_INODE, "For migrating dir, try target dir "DFID"\n",
-		       PFID(&lsm->lsm_md_oinfo[1].lmo_fid));
-
-		op_data->op_fid1 = lsm->lsm_md_oinfo[1].lmo_fid;
-		it->it_disposition &= ~DISP_ENQ_COMPLETE;
-		rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
-				    flags, reqp, cb_blocking, extra_lock_flags);
+		int stripe_index;
+
+		for (stripe_index = 1;
+		     stripe_index < lsm->lsm_md_stripe_count &&
+		     it_disposition(it, DISP_LOOKUP_NEG); stripe_index++) {
+			struct lmv_oinfo *oinfo;
+
+			/* release the previous request */
+			ptlrpc_req_finished(*reqp);
+			it->it_request = NULL;
+			*reqp = NULL;
+
+			oinfo = &lsm->lsm_md_oinfo[stripe_index];
+			tgt = lmv_find_target(lmv, &oinfo->lmo_fid);
+			if (IS_ERR(tgt))
+				return PTR_ERR(tgt);
+
+			CDEBUG(D_INODE, "Try other stripes " DFID"\n",
+			       PFID(&oinfo->lmo_fid));
+
+			op_data->op_fid1 = oinfo->lmo_fid;
+			it->it_disposition &= ~DISP_ENQ_COMPLETE;
+			rc = md_intent_lock(tgt->ltd_exp, op_data, lmm,
+					    lmmsize, it, flags, reqp,
+					    cb_blocking, extra_lock_flags);
+			if (rc)
+				return rc;
+		}
 	}
 
 	/*
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index faf6a7b..ea528ae 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -147,6 +147,18 @@ lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name,
 	return &lsm->lsm_md_oinfo[stripe_index];
 }
 
+static inline bool lmv_is_known_hash_type(const struct lmv_stripe_md *lsm)
+{
+	return lsm->lsm_md_hash_type == LMV_HASH_TYPE_FNV_1A_64 ||
+	       lsm->lsm_md_hash_type == LMV_HASH_TYPE_ALL_CHARS;
+}
+
+static inline bool lmv_need_try_all_stripes(const struct lmv_stripe_md *lsm)
+{
+	return !lmv_is_known_hash_type(lsm) ||
+	       lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION;
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 27a6be1..e9f4e9a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -102,8 +102,8 @@ int lmv_name_to_stripe_index(__u32 lmv_hash_type, unsigned int stripe_count,
 		idx = lmv_hash_fnv1a(stripe_count, name, namelen);
 		break;
 	default:
-		CERROR("Unknown hash type 0x%x\n", hash_type);
-		return -EINVAL;
+		idx = -EBADFD;
+		break;
 	}
 
 	CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name,
@@ -1697,6 +1697,23 @@ lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 	return tgt;
 }
 
+/**
+ * Locate mds by fid or name
+ *
+ * For striped directory (lsm != NULL), it will locate the stripe
+ * by name hash (see lsm_name_to_stripe_info()). Note: if the hash_type
+ * is unknown, it will return -EBADFD, and lmv_intent_lookup might need
+ * walk through all of stripes to locate the entry.
+ *
+ * For normal direcotry, it will locate MDS by FID directly.
+ * \param[in] lmv	LMV device
+ * \param[in] op_data	client MD stack parameters, name, namelen
+ *			mds_num etc.
+ * \param[in] fid	object FID used to locate MDS.
+ *
+ * retval		pointer to the lmv_tgt_desc if succeed.
+ *			ERR_PTR(errno) if failed.
+ */
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid)
@@ -2351,45 +2368,94 @@ static int lmv_readpage(struct obd_export *exp, struct md_op_data *op_data,
 	return rc;
 }
 
+/**
+ * Unlink a file/directory
+ *
+ * Unlink a file or directory under the parent dir. The unlink request
+ * usually will be sent to the MDT where the child is located, but if
+ * the client does not have the child FID then request will be sent to the
+ * MDT where the parent is located.
+ *
+ * If the parent is a striped directory then it also needs to locate which
+ * stripe the name of the child is located, and replace the parent FID
+ * (@op->op_fid1) with the stripe FID. Note: if the stripe is unknown,
+ * it will walk through all of sub-stripes until the child is being
+ * unlinked finally.
+ *
+ * \param[in] exp	export refer to LMV
+ * \param[in] op_data	different parameters transferred beween client
+ *			MD stacks, name, namelen, FIDs etc.
+ *			op_fid1 is the parent FID, op_fid2 is the child
+ *			FID.
+ * \param[out] request point to the request of unlink.
+ *
+ * retval		0 if succeed
+ *			negative errno if failed.
+ */
 static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 		      struct ptlrpc_request **request)
 {
-	struct obd_device       *obd = exp->exp_obd;
+	struct lmv_stripe_md *lsm = op_data->op_mea1;
+	struct obd_device    *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *parent_tgt = NULL;
 	struct lmv_tgt_desc     *tgt = NULL;
 	struct mdt_body		*body;
+	int stripe_index = 0;
 	int		     rc;
 
 	rc = lmv_check_connect(obd);
 	if (rc)
 		return rc;
-retry:
-	/* Send unlink requests to the MDT where the child is located */
-	if (likely(!fid_is_zero(&op_data->op_fid2))) {
-		tgt = lmv_find_target(lmv, &op_data->op_fid2);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
+retry_unlink:
+	/* For striped dir, we need to locate the parent as well */
+	if (lsm) {
+		struct lmv_tgt_desc *tmp;
 
-		/* For striped dir, we need to locate the parent as well */
-		if (op_data->op_mea1) {
-			struct lmv_tgt_desc *tmp;
-
-			LASSERT(op_data->op_name && op_data->op_namelen);
-			tmp = lmv_locate_target_for_name(lmv, op_data->op_mea1,
-							 op_data->op_name,
-							 op_data->op_namelen,
-							 &op_data->op_fid1,
-							 &op_data->op_mds);
-			if (IS_ERR(tmp))
-				return PTR_ERR(tmp);
+		LASSERT(op_data->op_name && op_data->op_namelen);
+
+		tmp = lmv_locate_target_for_name(lmv, lsm,
+						 op_data->op_name,
+						 op_data->op_namelen,
+						 &op_data->op_fid1,
+						 &op_data->op_mds);
+
+		/*
+		 * return -EBADFD means unknown hash type, might
+		 * need try all sub-stripe here
+		 */
+		if (IS_ERR(tmp) && PTR_ERR(tmp) != -EBADFD)
+			return PTR_ERR(tmp);
+
+		/*
+		 * Note: both migrating dir and unknown hash dir need to
+		 * try all of sub-stripes, so we need start search the
+		 * name from stripe 0, but migrating dir is already handled
+		 * inside lmv_locate_target_for_name(), so we only check
+		 * unknown hash type directory here
+		 */
+		if (!lmv_is_known_hash_type(lsm)) {
+			struct lmv_oinfo *oinfo;
+
+			oinfo = &lsm->lsm_md_oinfo[stripe_index];
+
+			op_data->op_fid1 = oinfo->lmo_fid;
+			op_data->op_mds = oinfo->lmo_mds;
 		}
-	} else {
-		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
 	}
 
+try_next_stripe:
+	/* Send unlink requests to the MDT where the child is located */
+	if (likely(!fid_is_zero(&op_data->op_fid2)))
+		tgt = lmv_find_target(lmv, &op_data->op_fid2);
+	else if (lsm)
+		tgt = lmv_get_target(lmv, op_data->op_mds, NULL);
+	else
+		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
+
+	if (IS_ERR(tgt))
+		return PTR_ERR(tgt);
+
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
@@ -2425,9 +2491,28 @@ retry:
 	       PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), tgt->ltd_idx);
 
 	rc = md_unlink(tgt->ltd_exp, op_data, request);
-	if (rc != 0 && rc != -EREMOTE)
+	if (rc != 0 && rc != -EREMOTE  && rc != -ENOENT)
 		return rc;
 
+	/* Try next stripe if it is needed. */
+	if (rc == -ENOENT && lsm && lmv_need_try_all_stripes(lsm)) {
+		struct lmv_oinfo *oinfo;
+
+		stripe_index++;
+		if (stripe_index >= lsm->lsm_md_stripe_count)
+			return rc;
+
+		oinfo = &lsm->lsm_md_oinfo[stripe_index];
+
+		op_data->op_fid1 = oinfo->lmo_fid;
+		op_data->op_mds = oinfo->lmo_mds;
+
+		ptlrpc_req_finished(*request);
+		*request = NULL;
+
+		goto try_next_stripe;
+	}
+
 	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
@@ -2463,7 +2548,7 @@ retry:
 	ptlrpc_req_finished(*request);
 	*request = NULL;
 
-	goto retry;
+	goto retry_unlink;
 }
 
 static int lmv_precleanup(struct obd_device *obd, enum obd_cleanup_stage stage)
@@ -2683,7 +2768,10 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
 	lsm->lsm_md_magic = le32_to_cpu(lmm1->lmv_magic);
 	lsm->lsm_md_stripe_count = le32_to_cpu(lmm1->lmv_stripe_count);
 	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmm1->lmv_master_mdt_index);
-	lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
+	if (OBD_FAIL_CHECK(OBD_FAIL_UNKNOWN_LMV_STRIPE))
+		lsm->lsm_md_hash_type = LMV_HASH_TYPE_UNKNOWN;
+	else
+		lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
 	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
 	fid_le_to_cpu(&lsm->lsm_md_master_fid, &lmm1->lmv_master_fid);
 	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 66/80] staging: lustre: lmv: try all stripes for unknown hash functions
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

For unknown hash type, LMV should try all stripes to locate
the name entry. But it will only for lookup and unlink, i.e.
we can only list and unlink entries under striped dir with
unknown hash type.

Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4921
Reviewed-on: http://review.whamcloud.com/10041
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_user.h     |    1 +
 .../staging/lustre/lustre/include/obd_support.h    |    3 +
 drivers/staging/lustre/lustre/lmv/lmv_intent.c     |   70 +++++++---
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |   12 ++
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  144 ++++++++++++++++----
 5 files changed, 182 insertions(+), 48 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 9e38ed3..52cd585 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -383,6 +383,7 @@ struct lmv_user_mds_data {
 };
 
 enum lmv_hash_type {
+	LMV_HASH_TYPE_UNKNOWN	= 0,	/* 0 is reserved for testing purpose */
 	LMV_HASH_TYPE_ALL_CHARS = 1,
 	LMV_HASH_TYPE_FNV_1A_64 = 2,
 };
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index a11fff1..f747bca 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -483,6 +483,9 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_UPDATE_OBJ_NET			0x1700
 #define OBD_FAIL_UPDATE_OBJ_NET_REP		0x1701
 
+/* LMV */
+#define OBD_FAIL_UNKNOWN_LMV_STRIPE		0x1901
+
 /* Assign references to moved code to reduce code changes */
 #define OBD_FAIL_PRECHECK(id)		   CFS_FAIL_PRECHECK(id)
 #define OBD_FAIL_CHECK(id)		      CFS_FAIL_CHECK(id)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index cde1d7b..0559445 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -402,10 +402,28 @@ static int lmv_intent_lookup(struct obd_export *exp,
 	struct mdt_body	*body;
 	int		     rc = 0;
 
+	/*
+	 * If it returns ERR_PTR(-EBADFD) then it is an unknown hash type
+	 * it will try all stripes to locate the object
+	 */
 	tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-	if (IS_ERR(tgt))
+	if (IS_ERR(tgt) && (PTR_ERR(tgt) != -EBADFD))
 		return PTR_ERR(tgt);
 
+	/*
+	 * Both migrating dir and unknown hash dir need to try
+	 * all of sub-stripes
+	 */
+	if (lsm && !lmv_is_known_hash_type(lsm)) {
+		struct lmv_oinfo *oinfo = &lsm->lsm_md_oinfo[0];
+
+		op_data->op_fid1 = oinfo->lmo_fid;
+		op_data->op_mds = oinfo->lmo_mds;
+		tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL);
+		if (IS_ERR(tgt))
+			return PTR_ERR(tgt);
+	}
+
 	if (!fid_is_sane(&op_data->op_fid2))
 		fid_zero(&op_data->op_fid2);
 
@@ -435,27 +453,39 @@ static int lmv_intent_lookup(struct obd_export *exp,
 		}
 		return rc;
 	} else if (it_disposition(it, DISP_LOOKUP_NEG) && lsm &&
-		   lsm->lsm_md_magic & LMV_HASH_FLAG_MIGRATION) {
+		   lmv_need_try_all_stripes(lsm)) {
 		/*
-		 * For migrating directory, if it can not find the child in
-		 * the source directory(master stripe), try the targeting
-		 * directory(stripe 1)
+		 * For migrating and unknown hash type directory, it will
+		 * try to target the entry on other stripes
 		 */
-		tgt = lmv_find_target(lmv, &lsm->lsm_md_oinfo[1].lmo_fid);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
-
-		ptlrpc_req_finished(*reqp);
-		it->it_request = NULL;
-		*reqp = NULL;
-
-		CDEBUG(D_INODE, "For migrating dir, try target dir "DFID"\n",
-		       PFID(&lsm->lsm_md_oinfo[1].lmo_fid));
-
-		op_data->op_fid1 = lsm->lsm_md_oinfo[1].lmo_fid;
-		it->it_disposition &= ~DISP_ENQ_COMPLETE;
-		rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
-				    flags, reqp, cb_blocking, extra_lock_flags);
+		int stripe_index;
+
+		for (stripe_index = 1;
+		     stripe_index < lsm->lsm_md_stripe_count &&
+		     it_disposition(it, DISP_LOOKUP_NEG); stripe_index++) {
+			struct lmv_oinfo *oinfo;
+
+			/* release the previous request */
+			ptlrpc_req_finished(*reqp);
+			it->it_request = NULL;
+			*reqp = NULL;
+
+			oinfo = &lsm->lsm_md_oinfo[stripe_index];
+			tgt = lmv_find_target(lmv, &oinfo->lmo_fid);
+			if (IS_ERR(tgt))
+				return PTR_ERR(tgt);
+
+			CDEBUG(D_INODE, "Try other stripes " DFID"\n",
+			       PFID(&oinfo->lmo_fid));
+
+			op_data->op_fid1 = oinfo->lmo_fid;
+			it->it_disposition &= ~DISP_ENQ_COMPLETE;
+			rc = md_intent_lock(tgt->ltd_exp, op_data, lmm,
+					    lmmsize, it, flags, reqp,
+					    cb_blocking, extra_lock_flags);
+			if (rc)
+				return rc;
+		}
 	}
 
 	/*
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index faf6a7b..ea528ae 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -147,6 +147,18 @@ lsm_name_to_stripe_info(const struct lmv_stripe_md *lsm, const char *name,
 	return &lsm->lsm_md_oinfo[stripe_index];
 }
 
+static inline bool lmv_is_known_hash_type(const struct lmv_stripe_md *lsm)
+{
+	return lsm->lsm_md_hash_type == LMV_HASH_TYPE_FNV_1A_64 ||
+	       lsm->lsm_md_hash_type == LMV_HASH_TYPE_ALL_CHARS;
+}
+
+static inline bool lmv_need_try_all_stripes(const struct lmv_stripe_md *lsm)
+{
+	return !lmv_is_known_hash_type(lsm) ||
+	       lsm->lsm_md_hash_type & LMV_HASH_FLAG_MIGRATION;
+}
+
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid);
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 27a6be1..e9f4e9a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -102,8 +102,8 @@ int lmv_name_to_stripe_index(__u32 lmv_hash_type, unsigned int stripe_count,
 		idx = lmv_hash_fnv1a(stripe_count, name, namelen);
 		break;
 	default:
-		CERROR("Unknown hash type 0x%x\n", hash_type);
-		return -EINVAL;
+		idx = -EBADFD;
+		break;
 	}
 
 	CDEBUG(D_INFO, "name %.*s hash_type %d idx %d\n", namelen, name,
@@ -1697,6 +1697,23 @@ lmv_locate_target_for_name(struct lmv_obd *lmv, struct lmv_stripe_md *lsm,
 	return tgt;
 }
 
+/**
+ * Locate mds by fid or name
+ *
+ * For striped directory (lsm != NULL), it will locate the stripe
+ * by name hash (see lsm_name_to_stripe_info()). Note: if the hash_type
+ * is unknown, it will return -EBADFD, and lmv_intent_lookup might need
+ * walk through all of stripes to locate the entry.
+ *
+ * For normal direcotry, it will locate MDS by FID directly.
+ * \param[in] lmv	LMV device
+ * \param[in] op_data	client MD stack parameters, name, namelen
+ *			mds_num etc.
+ * \param[in] fid	object FID used to locate MDS.
+ *
+ * retval		pointer to the lmv_tgt_desc if succeed.
+ *			ERR_PTR(errno) if failed.
+ */
 struct lmv_tgt_desc
 *lmv_locate_mds(struct lmv_obd *lmv, struct md_op_data *op_data,
 		struct lu_fid *fid)
@@ -2351,45 +2368,94 @@ static int lmv_readpage(struct obd_export *exp, struct md_op_data *op_data,
 	return rc;
 }
 
+/**
+ * Unlink a file/directory
+ *
+ * Unlink a file or directory under the parent dir. The unlink request
+ * usually will be sent to the MDT where the child is located, but if
+ * the client does not have the child FID then request will be sent to the
+ * MDT where the parent is located.
+ *
+ * If the parent is a striped directory then it also needs to locate which
+ * stripe the name of the child is located, and replace the parent FID
+ * (@op->op_fid1) with the stripe FID. Note: if the stripe is unknown,
+ * it will walk through all of sub-stripes until the child is being
+ * unlinked finally.
+ *
+ * \param[in] exp	export refer to LMV
+ * \param[in] op_data	different parameters transferred beween client
+ *			MD stacks, name, namelen, FIDs etc.
+ *			op_fid1 is the parent FID, op_fid2 is the child
+ *			FID.
+ * \param[out] request point to the request of unlink.
+ *
+ * retval		0 if succeed
+ *			negative errno if failed.
+ */
 static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 		      struct ptlrpc_request **request)
 {
-	struct obd_device       *obd = exp->exp_obd;
+	struct lmv_stripe_md *lsm = op_data->op_mea1;
+	struct obd_device    *obd = exp->exp_obd;
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *parent_tgt = NULL;
 	struct lmv_tgt_desc     *tgt = NULL;
 	struct mdt_body		*body;
+	int stripe_index = 0;
 	int		     rc;
 
 	rc = lmv_check_connect(obd);
 	if (rc)
 		return rc;
-retry:
-	/* Send unlink requests to the MDT where the child is located */
-	if (likely(!fid_is_zero(&op_data->op_fid2))) {
-		tgt = lmv_find_target(lmv, &op_data->op_fid2);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
+retry_unlink:
+	/* For striped dir, we need to locate the parent as well */
+	if (lsm) {
+		struct lmv_tgt_desc *tmp;
 
-		/* For striped dir, we need to locate the parent as well */
-		if (op_data->op_mea1) {
-			struct lmv_tgt_desc *tmp;
-
-			LASSERT(op_data->op_name && op_data->op_namelen);
-			tmp = lmv_locate_target_for_name(lmv, op_data->op_mea1,
-							 op_data->op_name,
-							 op_data->op_namelen,
-							 &op_data->op_fid1,
-							 &op_data->op_mds);
-			if (IS_ERR(tmp))
-				return PTR_ERR(tmp);
+		LASSERT(op_data->op_name && op_data->op_namelen);
+
+		tmp = lmv_locate_target_for_name(lmv, lsm,
+						 op_data->op_name,
+						 op_data->op_namelen,
+						 &op_data->op_fid1,
+						 &op_data->op_mds);
+
+		/*
+		 * return -EBADFD means unknown hash type, might
+		 * need try all sub-stripe here
+		 */
+		if (IS_ERR(tmp) && PTR_ERR(tmp) != -EBADFD)
+			return PTR_ERR(tmp);
+
+		/*
+		 * Note: both migrating dir and unknown hash dir need to
+		 * try all of sub-stripes, so we need start search the
+		 * name from stripe 0, but migrating dir is already handled
+		 * inside lmv_locate_target_for_name(), so we only check
+		 * unknown hash type directory here
+		 */
+		if (!lmv_is_known_hash_type(lsm)) {
+			struct lmv_oinfo *oinfo;
+
+			oinfo = &lsm->lsm_md_oinfo[stripe_index];
+
+			op_data->op_fid1 = oinfo->lmo_fid;
+			op_data->op_mds = oinfo->lmo_mds;
 		}
-	} else {
-		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
-		if (IS_ERR(tgt))
-			return PTR_ERR(tgt);
 	}
 
+try_next_stripe:
+	/* Send unlink requests to the MDT where the child is located */
+	if (likely(!fid_is_zero(&op_data->op_fid2)))
+		tgt = lmv_find_target(lmv, &op_data->op_fid2);
+	else if (lsm)
+		tgt = lmv_get_target(lmv, op_data->op_mds, NULL);
+	else
+		tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
+
+	if (IS_ERR(tgt))
+		return PTR_ERR(tgt);
+
 	op_data->op_fsuid = from_kuid(&init_user_ns, current_fsuid());
 	op_data->op_fsgid = from_kgid(&init_user_ns, current_fsgid());
 	op_data->op_cap = cfs_curproc_cap_pack();
@@ -2425,9 +2491,28 @@ retry:
 	       PFID(&op_data->op_fid1), PFID(&op_data->op_fid2), tgt->ltd_idx);
 
 	rc = md_unlink(tgt->ltd_exp, op_data, request);
-	if (rc != 0 && rc != -EREMOTE)
+	if (rc != 0 && rc != -EREMOTE  && rc != -ENOENT)
 		return rc;
 
+	/* Try next stripe if it is needed. */
+	if (rc == -ENOENT && lsm && lmv_need_try_all_stripes(lsm)) {
+		struct lmv_oinfo *oinfo;
+
+		stripe_index++;
+		if (stripe_index >= lsm->lsm_md_stripe_count)
+			return rc;
+
+		oinfo = &lsm->lsm_md_oinfo[stripe_index];
+
+		op_data->op_fid1 = oinfo->lmo_fid;
+		op_data->op_mds = oinfo->lmo_mds;
+
+		ptlrpc_req_finished(*request);
+		*request = NULL;
+
+		goto try_next_stripe;
+	}
+
 	body = req_capsule_server_get(&(*request)->rq_pill, &RMF_MDT_BODY);
 	if (!body)
 		return -EPROTO;
@@ -2463,7 +2548,7 @@ retry:
 	ptlrpc_req_finished(*request);
 	*request = NULL;
 
-	goto retry;
+	goto retry_unlink;
 }
 
 static int lmv_precleanup(struct obd_device *obd, enum obd_cleanup_stage stage)
@@ -2683,7 +2768,10 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
 	lsm->lsm_md_magic = le32_to_cpu(lmm1->lmv_magic);
 	lsm->lsm_md_stripe_count = le32_to_cpu(lmm1->lmv_stripe_count);
 	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmm1->lmv_master_mdt_index);
-	lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
+	if (OBD_FAIL_CHECK(OBD_FAIL_UNKNOWN_LMV_STRIPE))
+		lsm->lsm_md_hash_type = LMV_HASH_TYPE_UNKNOWN;
+	else
+		lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
 	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
 	fid_le_to_cpu(&lsm->lsm_md_master_fid, &lmm1->lmv_master_fid);
 	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 67/80] staging: lustre: ptlrpc: request gets stuck in UNREGISTERING phase
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Andriy Skulysh, James Simmons

From: Andriy Skulysh <Andriy_Skulysh@xyratex.com>

Exit condition from UNREGISTERING phase is releasing of
both reply and bulk buffers.

Call ptlrpc_unregister_bulk() if ptlrpc_unregister_reply()
wasn't completed in async mode before switching to
UNREGISTERING phase.

Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5259
Xyratex-bug-id: MRP-1960
Reviewed-on: http://review.whamcloud.com/10846
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index f2e71b4..bae91bd 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1630,8 +1630,10 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 			    req->rq_waiting || req->rq_wait_ctx) {
 				int status;
 
-				if (!ptlrpc_unregister_reply(req, 1))
+				if (!ptlrpc_unregister_reply(req, 1)) {
+					ptlrpc_unregister_bulk(req, 1);
 					continue;
+				}
 
 				spin_lock(&imp->imp_lock);
 				if (ptlrpc_import_delay_req(imp, req,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 67/80] staging: lustre: ptlrpc: request gets stuck in UNREGISTERING phase
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Andriy Skulysh, James Simmons

From: Andriy Skulysh <Andriy_Skulysh@xyratex.com>

Exit condition from UNREGISTERING phase is releasing of
both reply and bulk buffers.

Call ptlrpc_unregister_bulk() if ptlrpc_unregister_reply()
wasn't completed in async mode before switching to
UNREGISTERING phase.

Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5259
Xyratex-bug-id: MRP-1960
Reviewed-on: http://review.whamcloud.com/10846
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index f2e71b4..bae91bd 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1630,8 +1630,10 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 			    req->rq_waiting || req->rq_wait_ctx) {
 				int status;
 
-				if (!ptlrpc_unregister_reply(req, 1))
+				if (!ptlrpc_unregister_reply(req, 1)) {
+					ptlrpc_unregister_bulk(req, 1);
 					continue;
+				}
 
 				spin_lock(&imp->imp_lock);
 				if (ptlrpc_import_delay_req(imp, req,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 68/80] staging: lustre: lmv: build master LMV EA dynamically build via readdir
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

When creating a striped directory, the master object saves the slave
objects (or shards) as internal sub-directories. The sub-directory's
name is composed of ${shard_FID}:${shard_idx}. With the name, we can
easily to know what the shard is and where it should be.

On the other hand, we need to store some information related with the
striped directory, such as magic, hash type, shards count, and so on.
That is the LMV EA (header). We do NOT store the FID of each shard in
the LMV EA. Instead, when we need the shards' FIDs (such as readdir()
on client-side), we can build the entrie LMV EA on the MDT (in RAM) by
iterating the sub-directory entries that are contained in the master
object of the striped directroy.

Above mechanism can simplify the striped directory create operation.
For very large striped directory, logging the FIDs array in the LMV
EA will be trouble. It also simplify the LFSCK for verifying striped
directory, because it reduces the inconsistency sources.

Another fixing is about the lmv_master_fid in master LMV EA header,
it is redundant information, and may become one of the inconsistency
sources. So replace it with two __u64 padding fields.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5223
Reviewed-on: http://review.whamcloud.com/10751
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    7 +--
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   30 ------------
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |    4 --
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |   49 ++++++++++++++++++++
 4 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 05fe359..17581ba 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2494,10 +2494,9 @@ struct lmv_mds_md_v1 {
 					 * for example migrating or dead.
 					 */
 	__u32 lmv_layout_version;	/* Used for directory restriping */
-	__u32 lmv_padding;
-	struct lu_fid lmv_master_fid;	/* The FID of the master object, which
-					 * is the namespace-visible dir FID
-					 */
+	__u32 lmv_padding1;
+	__u64 lmv_padding2;
+	__u64 lmv_padding3;
 	char lmv_pool_name[LOV_MAXPOOLNAME];	/* pool name */
 	struct lu_fid lmv_stripe_fids[0];	/* FIDs for each stripe */
 };
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 1dd3e92..085e596 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -48,7 +48,6 @@ struct lmv_stripe_md {
 	__u32	lsm_md_layout_version;
 	__u32	lsm_md_default_count;
 	__u32	lsm_md_default_index;
-	struct lu_fid lsm_md_master_fid;
 	char	lsm_md_pool_name[LOV_MAXPOOLNAME];
 	struct lmv_oinfo lsm_md_oinfo[0];
 };
@@ -90,23 +89,6 @@ static inline void lmv_free_memmd(struct lmv_stripe_md *lsm)
 	lmv_unpack_md(NULL, &lsm, NULL, 0);
 }
 
-static inline void lmv1_cpu_to_le(struct lmv_mds_md_v1 *lmv_dst,
-				  const struct lmv_mds_md_v1 *lmv_src)
-{
-	int i;
-
-	lmv_dst->lmv_magic = cpu_to_le32(lmv_src->lmv_magic);
-	lmv_dst->lmv_stripe_count = cpu_to_le32(lmv_src->lmv_stripe_count);
-	lmv_dst->lmv_master_mdt_index =
-		cpu_to_le32(lmv_src->lmv_master_mdt_index);
-	lmv_dst->lmv_hash_type = cpu_to_le32(lmv_src->lmv_hash_type);
-	lmv_dst->lmv_layout_version = cpu_to_le32(lmv_src->lmv_layout_version);
-
-	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
-		fid_cpu_to_le(&lmv_dst->lmv_stripe_fids[i],
-			      &lmv_src->lmv_stripe_fids[i]);
-}
-
 static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
 				  const struct lmv_mds_md_v1 *lmv_src)
 {
@@ -124,18 +106,6 @@ static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
 			      &lmv_src->lmv_stripe_fids[i]);
 }
 
-static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
-				 const union lmv_mds_md *lmv_src)
-{
-	switch (lmv_src->lmv_magic) {
-	case LMV_MAGIC_V1:
-		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
-		break;
-	default:
-		break;
-	}
-}
-
 static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
 				 const union lmv_mds_md *lmv_src)
 {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index e9f4e9a..b8275e1 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2773,13 +2773,9 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
 	else
 		lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
 	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
-	fid_le_to_cpu(&lsm->lsm_md_master_fid, &lmm1->lmv_master_fid);
 	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
 			sizeof(lsm->lsm_md_pool_name));
 
-	if (!fid_is_sane(&lsm->lsm_md_master_fid))
-		return -EPROTO;
-
 	if (cplen >= sizeof(lsm->lsm_md_pool_name))
 		return -E2BIG;
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 31d3326..b428528 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1400,6 +1400,55 @@ void lustre_assert_wire_constants(void)
 	LASSERTF(LOV_PATTERN_CMOBD == 0x00000200UL, "found 0x%.8xUL\n",
 		(unsigned)LOV_PATTERN_CMOBD);
 
+	/* Checks for struct lmv_mds_md_v1 */
+	LASSERTF((int)sizeof(struct lmv_mds_md_v1) == 56, "found %lld\n",
+		 (long long)(int)sizeof(struct lmv_mds_md_v1));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_magic) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_magic));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_magic) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_magic));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_stripe_count) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_stripe_count));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_count) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_count));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_master_mdt_index) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_master_mdt_index));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_master_mdt_index) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_master_mdt_index));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_hash_type) == 12, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_hash_type));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_hash_type) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_hash_type));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_layout_version) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_layout_version));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_layout_version) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_layout_version));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding1) == 20, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding1));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding1) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding1));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding2) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding2));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding3) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding3));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding3) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding3));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_pool_name[16]) == 56, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_pool_name[16]));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_pool_name[16]) == 1, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_pool_name[16]));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_stripe_fids[0]) == 56, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_stripe_fids[0]));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_fids[0]) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_fids[0]));
+	CLASSERT(LMV_MAGIC_V1 == 0x0CD20CD0);
+	CLASSERT(LMV_MAGIC_STRIPE == 0x0CD40CD0);
+	CLASSERT(LMV_HASH_TYPE_MASK == 0x0000ffff);
+	CLASSERT(LMV_HASH_FLAG_MIGRATION == 0x80000000);
+	CLASSERT(LMV_HASH_FLAG_DEAD == 0x40000000);
+
 	/* Checks for struct obd_statfs */
 	LASSERTF((int)sizeof(struct obd_statfs) == 144, "found %lld\n",
 		 (long long)(int)sizeof(struct obd_statfs));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 68/80] staging: lustre: lmv: build master LMV EA dynamically build via readdir
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Fan Yong,
	James Simmons

From: Fan Yong <fan.yong@intel.com>

When creating a striped directory, the master object saves the slave
objects (or shards) as internal sub-directories. The sub-directory's
name is composed of ${shard_FID}:${shard_idx}. With the name, we can
easily to know what the shard is and where it should be.

On the other hand, we need to store some information related with the
striped directory, such as magic, hash type, shards count, and so on.
That is the LMV EA (header). We do NOT store the FID of each shard in
the LMV EA. Instead, when we need the shards' FIDs (such as readdir()
on client-side), we can build the entrie LMV EA on the MDT (in RAM) by
iterating the sub-directory entries that are contained in the master
object of the striped directroy.

Above mechanism can simplify the striped directory create operation.
For very large striped directory, logging the FIDs array in the LMV
EA will be trouble. It also simplify the LFSCK for verifying striped
directory, because it reduces the inconsistency sources.

Another fixing is about the lmv_master_fid in master LMV EA header,
it is redundant information, and may become one of the inconsistency
sources. So replace it with two __u64 padding fields.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5223
Reviewed-on: http://review.whamcloud.com/10751
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    7 +--
 drivers/staging/lustre/lustre/include/lustre_lmv.h |   30 ------------
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |    4 --
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |   49 ++++++++++++++++++++
 4 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 05fe359..17581ba 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2494,10 +2494,9 @@ struct lmv_mds_md_v1 {
 					 * for example migrating or dead.
 					 */
 	__u32 lmv_layout_version;	/* Used for directory restriping */
-	__u32 lmv_padding;
-	struct lu_fid lmv_master_fid;	/* The FID of the master object, which
-					 * is the namespace-visible dir FID
-					 */
+	__u32 lmv_padding1;
+	__u64 lmv_padding2;
+	__u64 lmv_padding3;
 	char lmv_pool_name[LOV_MAXPOOLNAME];	/* pool name */
 	struct lu_fid lmv_stripe_fids[0];	/* FIDs for each stripe */
 };
diff --git a/drivers/staging/lustre/lustre/include/lustre_lmv.h b/drivers/staging/lustre/lustre/include/lustre_lmv.h
index 1dd3e92..085e596 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lmv.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lmv.h
@@ -48,7 +48,6 @@ struct lmv_stripe_md {
 	__u32	lsm_md_layout_version;
 	__u32	lsm_md_default_count;
 	__u32	lsm_md_default_index;
-	struct lu_fid lsm_md_master_fid;
 	char	lsm_md_pool_name[LOV_MAXPOOLNAME];
 	struct lmv_oinfo lsm_md_oinfo[0];
 };
@@ -90,23 +89,6 @@ static inline void lmv_free_memmd(struct lmv_stripe_md *lsm)
 	lmv_unpack_md(NULL, &lsm, NULL, 0);
 }
 
-static inline void lmv1_cpu_to_le(struct lmv_mds_md_v1 *lmv_dst,
-				  const struct lmv_mds_md_v1 *lmv_src)
-{
-	int i;
-
-	lmv_dst->lmv_magic = cpu_to_le32(lmv_src->lmv_magic);
-	lmv_dst->lmv_stripe_count = cpu_to_le32(lmv_src->lmv_stripe_count);
-	lmv_dst->lmv_master_mdt_index =
-		cpu_to_le32(lmv_src->lmv_master_mdt_index);
-	lmv_dst->lmv_hash_type = cpu_to_le32(lmv_src->lmv_hash_type);
-	lmv_dst->lmv_layout_version = cpu_to_le32(lmv_src->lmv_layout_version);
-
-	for (i = 0; i < lmv_src->lmv_stripe_count; i++)
-		fid_cpu_to_le(&lmv_dst->lmv_stripe_fids[i],
-			      &lmv_src->lmv_stripe_fids[i]);
-}
-
 static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
 				  const struct lmv_mds_md_v1 *lmv_src)
 {
@@ -124,18 +106,6 @@ static inline void lmv1_le_to_cpu(struct lmv_mds_md_v1 *lmv_dst,
 			      &lmv_src->lmv_stripe_fids[i]);
 }
 
-static inline void lmv_cpu_to_le(union lmv_mds_md *lmv_dst,
-				 const union lmv_mds_md *lmv_src)
-{
-	switch (lmv_src->lmv_magic) {
-	case LMV_MAGIC_V1:
-		lmv1_cpu_to_le(&lmv_dst->lmv_md_v1, &lmv_src->lmv_md_v1);
-		break;
-	default:
-		break;
-	}
-}
-
 static inline void lmv_le_to_cpu(union lmv_mds_md *lmv_dst,
 				 const union lmv_mds_md *lmv_src)
 {
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index e9f4e9a..b8275e1 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2773,13 +2773,9 @@ static int lmv_unpack_md_v1(struct obd_export *exp, struct lmv_stripe_md *lsm,
 	else
 		lsm->lsm_md_hash_type = le32_to_cpu(lmm1->lmv_hash_type);
 	lsm->lsm_md_layout_version = le32_to_cpu(lmm1->lmv_layout_version);
-	fid_le_to_cpu(&lsm->lsm_md_master_fid, &lmm1->lmv_master_fid);
 	cplen = strlcpy(lsm->lsm_md_pool_name, lmm1->lmv_pool_name,
 			sizeof(lsm->lsm_md_pool_name));
 
-	if (!fid_is_sane(&lsm->lsm_md_master_fid))
-		return -EPROTO;
-
 	if (cplen >= sizeof(lsm->lsm_md_pool_name))
 		return -E2BIG;
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 31d3326..b428528 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1400,6 +1400,55 @@ void lustre_assert_wire_constants(void)
 	LASSERTF(LOV_PATTERN_CMOBD == 0x00000200UL, "found 0x%.8xUL\n",
 		(unsigned)LOV_PATTERN_CMOBD);
 
+	/* Checks for struct lmv_mds_md_v1 */
+	LASSERTF((int)sizeof(struct lmv_mds_md_v1) == 56, "found %lld\n",
+		 (long long)(int)sizeof(struct lmv_mds_md_v1));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_magic) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_magic));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_magic) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_magic));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_stripe_count) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_stripe_count));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_count) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_count));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_master_mdt_index) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_master_mdt_index));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_master_mdt_index) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_master_mdt_index));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_hash_type) == 12, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_hash_type));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_hash_type) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_hash_type));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_layout_version) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_layout_version));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_layout_version) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_layout_version));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding1) == 20, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding1));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding1) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding1));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding2) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding2));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding2));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_padding3) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_padding3));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding3) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_padding3));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_pool_name[16]) == 56, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_pool_name[16]));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_pool_name[16]) == 1, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_pool_name[16]));
+	LASSERTF((int)offsetof(struct lmv_mds_md_v1, lmv_stripe_fids[0]) == 56, "found %lld\n",
+		 (long long)(int)offsetof(struct lmv_mds_md_v1, lmv_stripe_fids[0]));
+	LASSERTF((int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_fids[0]) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct lmv_mds_md_v1 *)0)->lmv_stripe_fids[0]));
+	CLASSERT(LMV_MAGIC_V1 == 0x0CD20CD0);
+	CLASSERT(LMV_MAGIC_STRIPE == 0x0CD40CD0);
+	CLASSERT(LMV_HASH_TYPE_MASK == 0x0000ffff);
+	CLASSERT(LMV_HASH_FLAG_MIGRATION == 0x80000000);
+	CLASSERT(LMV_HASH_FLAG_DEAD == 0x40000000);
+
 	/* Checks for struct obd_statfs */
 	LASSERTF((int)sizeof(struct obd_statfs) == 144, "found %lld\n",
 		 (long long)(int)sizeof(struct obd_statfs));
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 69/80] staging: lustre: osc: Automatically increase the max_dirty_mb
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Hongchao Zhang, Li Xi, James Simmons

From: Hongchao Zhang <hongchao.zhang@intel.com>

When RPC size or the max RPCs in flight is increased, the actual
limit might be max_dirty_mb. This patch automatically increases
the max_dirty_mb value at connection time and when the related
values are tuned manually by proc file system.

this patch also changes the unit of "cl_dirty" and "cl_dirty_max"
in client_obd from byte to page.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4933
Reviewed-on: http://review.whamcloud.com/10446
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h     |   28 +++++++++++++++++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c   |   12 +++++---
 drivers/staging/lustre/lustre/osc/lproc_osc.c   |   10 ++++---
 drivers/staging/lustre/lustre/osc/osc_cache.c   |   28 +++++++++---------
 drivers/staging/lustre/lustre/osc/osc_request.c |   34 +++++++++++++---------
 drivers/staging/lustre/lustre/ptlrpc/import.c   |    1 +
 6 files changed, 74 insertions(+), 39 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index e7e03be..e91f65a 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -222,8 +222,8 @@ struct client_obd {
 	struct sptlrpc_flavor    cl_flvr_mgc;   /* fixed flavor of mgc->mgs */
 
 	/* the grant values are protected by loi_list_lock below */
-	long		     cl_dirty;	 /* all _dirty_ in bytes */
-	long		     cl_dirty_max;     /* allowed w/o rpc */
+	long		     cl_dirty_pages;	/* all _dirty_ in pahges */
+	long		     cl_dirty_max_pages;/* allowed w/o rpc */
 	long		     cl_dirty_transit; /* dirty synchronous */
 	long		     cl_avail_grant;   /* bytes of credit for ost */
 	long		     cl_lost_grant;    /* lost credits (trunc) */
@@ -1225,4 +1225,28 @@ static inline int cli_brw_size(struct obd_device *obd)
 	return obd->u.cli.cl_max_pages_per_rpc << PAGE_SHIFT;
 }
 
+/*
+ * when RPC size or the max RPCs in flight is increased, the max dirty pages
+ * of the client should be increased accordingly to avoid sending fragmented
+ * RPCs over the network when the client runs out of the maximum dirty space
+ * when so many RPCs are being generated.
+ */
+static inline void client_adjust_max_dirty(struct client_obd *cli)
+{
+	/* initializing */
+	if (cli->cl_dirty_max_pages <= 0)
+		cli->cl_dirty_max_pages =
+			(OSC_MAX_DIRTY_DEFAULT * 1024 * 1024) >> PAGE_SHIFT;
+	else {
+		long dirty_max = cli->cl_max_rpcs_in_flight *
+				 cli->cl_max_pages_per_rpc;
+
+		if (dirty_max > cli->cl_dirty_max_pages)
+			cli->cl_dirty_max_pages = dirty_max;
+	}
+
+	if (cli->cl_dirty_max_pages > totalram_pages / 8)
+		cli->cl_dirty_max_pages = totalram_pages / 8;
+}
+
 #endif /* __OBD_H */
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index ee40006..3c98ce2 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -299,12 +299,14 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	       min_t(unsigned int, LUSTRE_CFG_BUFLEN(lcfg, 2),
 		     sizeof(server_uuid)));
 
-	cli->cl_dirty = 0;
+	cli->cl_dirty_pages = 0;
 	cli->cl_avail_grant = 0;
-	/* FIXME: Should limit this for the sum of all cl_dirty_max. */
-	cli->cl_dirty_max = OSC_MAX_DIRTY_DEFAULT * 1024 * 1024;
-	if (cli->cl_dirty_max >> PAGE_SHIFT > totalram_pages / 8)
-		cli->cl_dirty_max = totalram_pages << (PAGE_SHIFT - 3);
+	/* FIXME: Should limit this for the sum of all cl_dirty_max_pages. */
+	/*
+	 * cl_dirty_max_pages may be changed at connect time in
+	 * ptlrpc_connect_interpret().
+	 */
+	client_adjust_max_dirty(cli);
 	INIT_LIST_HEAD(&cli->cl_cache_waiters);
 	INIT_LIST_HEAD(&cli->cl_loi_ready_list);
 	INIT_LIST_HEAD(&cli->cl_loi_hp_ready_list);
diff --git a/drivers/staging/lustre/lustre/osc/lproc_osc.c b/drivers/staging/lustre/lustre/osc/lproc_osc.c
index 7e83d39..9172b78 100644
--- a/drivers/staging/lustre/lustre/osc/lproc_osc.c
+++ b/drivers/staging/lustre/lustre/osc/lproc_osc.c
@@ -119,6 +119,7 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 
 	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_rpcs_in_flight = val;
+	client_adjust_max_dirty(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
@@ -136,10 +137,10 @@ static ssize_t max_dirty_mb_show(struct kobject *kobj,
 	int mult;
 
 	spin_lock(&cli->cl_loi_list_lock);
-	val = cli->cl_dirty_max;
+	val = cli->cl_dirty_max_pages;
 	spin_unlock(&cli->cl_loi_list_lock);
 
-	mult = 1 << 20;
+	mult = 1 << (20 - PAGE_SHIFT);
 	return lprocfs_read_frac_helper(buf, PAGE_SIZE, val, mult);
 }
 
@@ -166,7 +167,7 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj,
 		return -ERANGE;
 
 	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_dirty_max = (u32)(pages_number << PAGE_SHIFT);
+	cli->cl_dirty_max_pages = pages_number;
 	osc_wake_cache_waiters(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
 
@@ -244,7 +245,7 @@ static ssize_t cur_dirty_bytes_show(struct kobject *kobj,
 	int len;
 
 	spin_lock(&cli->cl_loi_list_lock);
-	len = sprintf(buf, "%lu\n", cli->cl_dirty);
+	len = sprintf(buf, "%lu\n", cli->cl_dirty_pages << PAGE_SHIFT);
 	spin_unlock(&cli->cl_loi_list_lock);
 
 	return len;
@@ -583,6 +584,7 @@ static ssize_t max_pages_per_rpc_store(struct kobject *kobj,
 	}
 	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_pages_per_rpc = val;
+	client_adjust_max_dirty(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index deaf912..c6e37c0 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1387,7 +1387,7 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 	       "dropped: %ld avail: %ld, reserved: %ld, flight: %d }"	      \
 	       "lru {in list: %d, left: %d, waiters: %d }" fmt,		      \
 	       __tmp->cl_import->imp_obd->obd_name,			      \
-	       __tmp->cl_dirty, __tmp->cl_dirty_max,			      \
+	       __tmp->cl_dirty_pages, __tmp->cl_dirty_max_pages,	      \
 	       atomic_read(&obd_dirty_pages), obd_max_dirty_pages,	      \
 	       __tmp->cl_lost_grant, __tmp->cl_avail_grant,		      \
 	       __tmp->cl_reserved_grant, __tmp->cl_w_in_flight,		      \
@@ -1403,7 +1403,7 @@ static void osc_consume_write_grant(struct client_obd *cli,
 	assert_spin_locked(&cli->cl_loi_list_lock);
 	LASSERT(!(pga->flag & OBD_BRW_FROM_GRANT));
 	atomic_inc(&obd_dirty_pages);
-	cli->cl_dirty += PAGE_SIZE;
+	cli->cl_dirty_pages++;
 	pga->flag |= OBD_BRW_FROM_GRANT;
 	CDEBUG(D_CACHE, "using %lu grant credits for brw %p page %p\n",
 	       PAGE_SIZE, pga, pga->pg);
@@ -1423,11 +1423,11 @@ static void osc_release_write_grant(struct client_obd *cli,
 
 	pga->flag &= ~OBD_BRW_FROM_GRANT;
 	atomic_dec(&obd_dirty_pages);
-	cli->cl_dirty -= PAGE_SIZE;
+	cli->cl_dirty_pages--;
 	if (pga->flag & OBD_BRW_NOCACHE) {
 		pga->flag &= ~OBD_BRW_NOCACHE;
 		atomic_dec(&obd_dirty_transit_pages);
-		cli->cl_dirty_transit -= PAGE_SIZE;
+		cli->cl_dirty_transit--;
 	}
 }
 
@@ -1496,7 +1496,7 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 
 	spin_lock(&cli->cl_loi_list_lock);
 	atomic_sub(nr_pages, &obd_dirty_pages);
-	cli->cl_dirty -= nr_pages << PAGE_SHIFT;
+	cli->cl_dirty_pages -= nr_pages;
 	cli->cl_lost_grant += lost_grant;
 	if (cli->cl_avail_grant < grant && cli->cl_lost_grant >= grant) {
 		/* borrow some grant from truncate to avoid the case that
@@ -1509,7 +1509,7 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 	spin_unlock(&cli->cl_loi_list_lock);
 	CDEBUG(D_CACHE, "lost %u grant: %lu avail: %lu dirty: %lu\n",
 	       lost_grant, cli->cl_lost_grant,
-	       cli->cl_avail_grant, cli->cl_dirty);
+	       cli->cl_avail_grant, cli->cl_dirty_pages << PAGE_SHIFT);
 }
 
 /**
@@ -1539,11 +1539,11 @@ static int osc_enter_cache_try(struct client_obd *cli,
 	if (rc < 0)
 		return 0;
 
-	if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
+	if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
 	    atomic_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) {
 		osc_consume_write_grant(cli, &oap->oap_brw_page);
 		if (transient) {
-			cli->cl_dirty_transit += PAGE_SIZE;
+			cli->cl_dirty_transit++;
 			atomic_inc(&obd_dirty_transit_pages);
 			oap->oap_brw_flags |= OBD_BRW_NOCACHE;
 		}
@@ -1590,8 +1590,8 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 	 * of queued writes and create a discontiguous rpc stream
 	 */
 	if (OBD_FAIL_CHECK(OBD_FAIL_OSC_NO_GRANT) ||
-	    cli->cl_dirty_max < PAGE_SIZE     ||
-	    cli->cl_ar.ar_force_sync || loi->loi_ar.ar_force_sync) {
+	    !cli->cl_dirty_max_pages || cli->cl_ar.ar_force_sync ||
+	    loi->loi_ar.ar_force_sync) {
 		rc = -EDQUOT;
 		goto out;
 	}
@@ -1612,7 +1612,7 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 	init_waitqueue_head(&ocw.ocw_waitq);
 	ocw.ocw_oap   = oap;
 	ocw.ocw_grant = bytes;
-	while (cli->cl_dirty > 0 || cli->cl_w_in_flight > 0) {
+	while (cli->cl_dirty_pages > 0 || cli->cl_w_in_flight > 0) {
 		list_add_tail(&ocw.ocw_entry, &cli->cl_cache_waiters);
 		ocw.ocw_rc = 0;
 		spin_unlock(&cli->cl_loi_list_lock);
@@ -1667,11 +1667,11 @@ void osc_wake_cache_waiters(struct client_obd *cli)
 
 		ocw->ocw_rc = -EDQUOT;
 		/* we can't dirty more */
-		if ((cli->cl_dirty + PAGE_SIZE > cli->cl_dirty_max) ||
+		if ((cli->cl_dirty_pages > cli->cl_dirty_max_pages) ||
 		    (atomic_read(&obd_dirty_pages) + 1 > obd_max_dirty_pages)) {
 			CDEBUG(D_CACHE, "no dirty room: dirty: %ld osc max %ld, sys max %d\n",
-			       cli->cl_dirty,
-			       cli->cl_dirty_max, obd_max_dirty_pages);
+			       cli->cl_dirty_pages, cli->cl_dirty_max_pages,
+			       obd_max_dirty_pages);
 			goto wakeup;
 		}
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 90c8416..c618337 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -801,11 +801,12 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 
 	oa->o_valid |= bits;
 	spin_lock(&cli->cl_loi_list_lock);
-	oa->o_dirty = cli->cl_dirty;
-	if (unlikely(cli->cl_dirty - cli->cl_dirty_transit >
-		     cli->cl_dirty_max)) {
+	oa->o_dirty = cli->cl_dirty_pages << PAGE_SHIFT;
+	if (unlikely(cli->cl_dirty_pages - cli->cl_dirty_transit >
+		     cli->cl_dirty_max_pages)) {
 		CERROR("dirty %lu - %lu > dirty_max %lu\n",
-		       cli->cl_dirty, cli->cl_dirty_transit, cli->cl_dirty_max);
+		       cli->cl_dirty_pages, cli->cl_dirty_transit,
+		       cli->cl_dirty_max_pages);
 		oa->o_undirty = 0;
 	} else if (unlikely(atomic_read(&obd_dirty_pages) -
 			    atomic_read(&obd_dirty_transit_pages) >
@@ -820,15 +821,17 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 		       atomic_read(&obd_dirty_transit_pages),
 		       obd_max_dirty_pages);
 		oa->o_undirty = 0;
-	} else if (unlikely(cli->cl_dirty_max - cli->cl_dirty > 0x7fffffff)) {
+	} else if (unlikely(cli->cl_dirty_max_pages - cli->cl_dirty_pages >
+		   0x7fffffff)) {
 		CERROR("dirty %lu - dirty_max %lu too big???\n",
-		       cli->cl_dirty, cli->cl_dirty_max);
+		       cli->cl_dirty_pages, cli->cl_dirty_max_pages);
 		oa->o_undirty = 0;
 	} else {
 		long max_in_flight = (cli->cl_max_pages_per_rpc <<
 				      PAGE_SHIFT)*
 				     (cli->cl_max_rpcs_in_flight + 1);
-		oa->o_undirty = max(cli->cl_dirty_max, max_in_flight);
+		oa->o_undirty = max(cli->cl_dirty_max_pages << PAGE_SHIFT,
+				    max_in_flight);
 	}
 	oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant;
 	oa->o_dropped = cli->cl_lost_grant;
@@ -1028,22 +1031,24 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 {
 	/*
 	 * ocd_grant is the total grant amount we're expect to hold: if we've
-	 * been evicted, it's the new avail_grant amount, cl_dirty will drop
-	 * to 0 as inflight RPCs fail out; otherwise, it's avail_grant + dirty.
+	 * been evicted, it's the new avail_grant amount, cl_dirty_pages will
+	 * drop to 0 as inflight RPCs fail out; otherwise, it's avail_grant +
+	 * dirty.
 	 *
 	 * race is tolerable here: if we're evicted, but imp_state already
-	 * left EVICTED state, then cl_dirty must be 0 already.
+	 * left EVICTED state, then cl_dirty_pages must be 0 already.
 	 */
 	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_import->imp_state == LUSTRE_IMP_EVICTED)
 		cli->cl_avail_grant = ocd->ocd_grant;
 	else
-		cli->cl_avail_grant = ocd->ocd_grant - cli->cl_dirty;
+		cli->cl_avail_grant = ocd->ocd_grant -
+				      (cli->cl_dirty_pages << PAGE_SHIFT);
 
 	if (cli->cl_avail_grant < 0) {
 		CWARN("%s: available grant < 0: avail/ocd/dirty %ld/%u/%ld\n",
 		      cli->cl_import->imp_obd->obd_name, cli->cl_avail_grant,
-		      ocd->ocd_grant, cli->cl_dirty);
+		      ocd->ocd_grant, cli->cl_dirty_pages << PAGE_SHIFT);
 		/* workaround for servers which do not have the patch from
 		 * LU-2679
 		 */
@@ -3014,8 +3019,9 @@ static int osc_reconnect(const struct lu_env *env,
 		long lost_grant;
 
 		spin_lock(&cli->cl_loi_list_lock);
-		data->ocd_grant = (cli->cl_avail_grant + cli->cl_dirty) ?:
-				2 * cli_brw_size(obd);
+		data->ocd_grant = (cli->cl_avail_grant +
+				   (cli->cl_dirty_pages << PAGE_SHIFT)) ?:
+				   2 * cli_brw_size(obd);
 		lost_grant = cli->cl_lost_grant;
 		cli->cl_lost_grant = 0;
 		spin_unlock(&cli->cl_loi_list_lock);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index af8ffbc..c0122ef 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -1132,6 +1132,7 @@ finish:
 
 		LASSERT((cli->cl_max_pages_per_rpc <= PTLRPC_MAX_BRW_PAGES) &&
 			(cli->cl_max_pages_per_rpc > 0));
+		client_adjust_max_dirty(cli);
 	}
 
 out:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 69/80] staging: lustre: osc: Automatically increase the max_dirty_mb
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Hongchao Zhang, Li Xi, James Simmons

From: Hongchao Zhang <hongchao.zhang@intel.com>

When RPC size or the max RPCs in flight is increased, the actual
limit might be max_dirty_mb. This patch automatically increases
the max_dirty_mb value at connection time and when the related
values are tuned manually by proc file system.

this patch also changes the unit of "cl_dirty" and "cl_dirty_max"
in client_obd from byte to page.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Hongchao Zhang <hongchao.zhang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4933
Reviewed-on: http://review.whamcloud.com/10446
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h     |   28 +++++++++++++++++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c   |   12 +++++---
 drivers/staging/lustre/lustre/osc/lproc_osc.c   |   10 ++++---
 drivers/staging/lustre/lustre/osc/osc_cache.c   |   28 +++++++++---------
 drivers/staging/lustre/lustre/osc/osc_request.c |   34 +++++++++++++---------
 drivers/staging/lustre/lustre/ptlrpc/import.c   |    1 +
 6 files changed, 74 insertions(+), 39 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index e7e03be..e91f65a 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -222,8 +222,8 @@ struct client_obd {
 	struct sptlrpc_flavor    cl_flvr_mgc;   /* fixed flavor of mgc->mgs */
 
 	/* the grant values are protected by loi_list_lock below */
-	long		     cl_dirty;	 /* all _dirty_ in bytes */
-	long		     cl_dirty_max;     /* allowed w/o rpc */
+	long		     cl_dirty_pages;	/* all _dirty_ in pahges */
+	long		     cl_dirty_max_pages;/* allowed w/o rpc */
 	long		     cl_dirty_transit; /* dirty synchronous */
 	long		     cl_avail_grant;   /* bytes of credit for ost */
 	long		     cl_lost_grant;    /* lost credits (trunc) */
@@ -1225,4 +1225,28 @@ static inline int cli_brw_size(struct obd_device *obd)
 	return obd->u.cli.cl_max_pages_per_rpc << PAGE_SHIFT;
 }
 
+/*
+ * when RPC size or the max RPCs in flight is increased, the max dirty pages
+ * of the client should be increased accordingly to avoid sending fragmented
+ * RPCs over the network when the client runs out of the maximum dirty space
+ * when so many RPCs are being generated.
+ */
+static inline void client_adjust_max_dirty(struct client_obd *cli)
+{
+	/* initializing */
+	if (cli->cl_dirty_max_pages <= 0)
+		cli->cl_dirty_max_pages =
+			(OSC_MAX_DIRTY_DEFAULT * 1024 * 1024) >> PAGE_SHIFT;
+	else {
+		long dirty_max = cli->cl_max_rpcs_in_flight *
+				 cli->cl_max_pages_per_rpc;
+
+		if (dirty_max > cli->cl_dirty_max_pages)
+			cli->cl_dirty_max_pages = dirty_max;
+	}
+
+	if (cli->cl_dirty_max_pages > totalram_pages / 8)
+		cli->cl_dirty_max_pages = totalram_pages / 8;
+}
+
 #endif /* __OBD_H */
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index ee40006..3c98ce2 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -299,12 +299,14 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	       min_t(unsigned int, LUSTRE_CFG_BUFLEN(lcfg, 2),
 		     sizeof(server_uuid)));
 
-	cli->cl_dirty = 0;
+	cli->cl_dirty_pages = 0;
 	cli->cl_avail_grant = 0;
-	/* FIXME: Should limit this for the sum of all cl_dirty_max. */
-	cli->cl_dirty_max = OSC_MAX_DIRTY_DEFAULT * 1024 * 1024;
-	if (cli->cl_dirty_max >> PAGE_SHIFT > totalram_pages / 8)
-		cli->cl_dirty_max = totalram_pages << (PAGE_SHIFT - 3);
+	/* FIXME: Should limit this for the sum of all cl_dirty_max_pages. */
+	/*
+	 * cl_dirty_max_pages may be changed@connect time in
+	 * ptlrpc_connect_interpret().
+	 */
+	client_adjust_max_dirty(cli);
 	INIT_LIST_HEAD(&cli->cl_cache_waiters);
 	INIT_LIST_HEAD(&cli->cl_loi_ready_list);
 	INIT_LIST_HEAD(&cli->cl_loi_hp_ready_list);
diff --git a/drivers/staging/lustre/lustre/osc/lproc_osc.c b/drivers/staging/lustre/lustre/osc/lproc_osc.c
index 7e83d39..9172b78 100644
--- a/drivers/staging/lustre/lustre/osc/lproc_osc.c
+++ b/drivers/staging/lustre/lustre/osc/lproc_osc.c
@@ -119,6 +119,7 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 
 	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_rpcs_in_flight = val;
+	client_adjust_max_dirty(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
@@ -136,10 +137,10 @@ static ssize_t max_dirty_mb_show(struct kobject *kobj,
 	int mult;
 
 	spin_lock(&cli->cl_loi_list_lock);
-	val = cli->cl_dirty_max;
+	val = cli->cl_dirty_max_pages;
 	spin_unlock(&cli->cl_loi_list_lock);
 
-	mult = 1 << 20;
+	mult = 1 << (20 - PAGE_SHIFT);
 	return lprocfs_read_frac_helper(buf, PAGE_SIZE, val, mult);
 }
 
@@ -166,7 +167,7 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj,
 		return -ERANGE;
 
 	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_dirty_max = (u32)(pages_number << PAGE_SHIFT);
+	cli->cl_dirty_max_pages = pages_number;
 	osc_wake_cache_waiters(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
 
@@ -244,7 +245,7 @@ static ssize_t cur_dirty_bytes_show(struct kobject *kobj,
 	int len;
 
 	spin_lock(&cli->cl_loi_list_lock);
-	len = sprintf(buf, "%lu\n", cli->cl_dirty);
+	len = sprintf(buf, "%lu\n", cli->cl_dirty_pages << PAGE_SHIFT);
 	spin_unlock(&cli->cl_loi_list_lock);
 
 	return len;
@@ -583,6 +584,7 @@ static ssize_t max_pages_per_rpc_store(struct kobject *kobj,
 	}
 	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_pages_per_rpc = val;
+	client_adjust_max_dirty(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index deaf912..c6e37c0 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1387,7 +1387,7 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 	       "dropped: %ld avail: %ld, reserved: %ld, flight: %d }"	      \
 	       "lru {in list: %d, left: %d, waiters: %d }" fmt,		      \
 	       __tmp->cl_import->imp_obd->obd_name,			      \
-	       __tmp->cl_dirty, __tmp->cl_dirty_max,			      \
+	       __tmp->cl_dirty_pages, __tmp->cl_dirty_max_pages,	      \
 	       atomic_read(&obd_dirty_pages), obd_max_dirty_pages,	      \
 	       __tmp->cl_lost_grant, __tmp->cl_avail_grant,		      \
 	       __tmp->cl_reserved_grant, __tmp->cl_w_in_flight,		      \
@@ -1403,7 +1403,7 @@ static void osc_consume_write_grant(struct client_obd *cli,
 	assert_spin_locked(&cli->cl_loi_list_lock);
 	LASSERT(!(pga->flag & OBD_BRW_FROM_GRANT));
 	atomic_inc(&obd_dirty_pages);
-	cli->cl_dirty += PAGE_SIZE;
+	cli->cl_dirty_pages++;
 	pga->flag |= OBD_BRW_FROM_GRANT;
 	CDEBUG(D_CACHE, "using %lu grant credits for brw %p page %p\n",
 	       PAGE_SIZE, pga, pga->pg);
@@ -1423,11 +1423,11 @@ static void osc_release_write_grant(struct client_obd *cli,
 
 	pga->flag &= ~OBD_BRW_FROM_GRANT;
 	atomic_dec(&obd_dirty_pages);
-	cli->cl_dirty -= PAGE_SIZE;
+	cli->cl_dirty_pages--;
 	if (pga->flag & OBD_BRW_NOCACHE) {
 		pga->flag &= ~OBD_BRW_NOCACHE;
 		atomic_dec(&obd_dirty_transit_pages);
-		cli->cl_dirty_transit -= PAGE_SIZE;
+		cli->cl_dirty_transit--;
 	}
 }
 
@@ -1496,7 +1496,7 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 
 	spin_lock(&cli->cl_loi_list_lock);
 	atomic_sub(nr_pages, &obd_dirty_pages);
-	cli->cl_dirty -= nr_pages << PAGE_SHIFT;
+	cli->cl_dirty_pages -= nr_pages;
 	cli->cl_lost_grant += lost_grant;
 	if (cli->cl_avail_grant < grant && cli->cl_lost_grant >= grant) {
 		/* borrow some grant from truncate to avoid the case that
@@ -1509,7 +1509,7 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 	spin_unlock(&cli->cl_loi_list_lock);
 	CDEBUG(D_CACHE, "lost %u grant: %lu avail: %lu dirty: %lu\n",
 	       lost_grant, cli->cl_lost_grant,
-	       cli->cl_avail_grant, cli->cl_dirty);
+	       cli->cl_avail_grant, cli->cl_dirty_pages << PAGE_SHIFT);
 }
 
 /**
@@ -1539,11 +1539,11 @@ static int osc_enter_cache_try(struct client_obd *cli,
 	if (rc < 0)
 		return 0;
 
-	if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
+	if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
 	    atomic_read(&obd_dirty_pages) + 1 <= obd_max_dirty_pages) {
 		osc_consume_write_grant(cli, &oap->oap_brw_page);
 		if (transient) {
-			cli->cl_dirty_transit += PAGE_SIZE;
+			cli->cl_dirty_transit++;
 			atomic_inc(&obd_dirty_transit_pages);
 			oap->oap_brw_flags |= OBD_BRW_NOCACHE;
 		}
@@ -1590,8 +1590,8 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 	 * of queued writes and create a discontiguous rpc stream
 	 */
 	if (OBD_FAIL_CHECK(OBD_FAIL_OSC_NO_GRANT) ||
-	    cli->cl_dirty_max < PAGE_SIZE     ||
-	    cli->cl_ar.ar_force_sync || loi->loi_ar.ar_force_sync) {
+	    !cli->cl_dirty_max_pages || cli->cl_ar.ar_force_sync ||
+	    loi->loi_ar.ar_force_sync) {
 		rc = -EDQUOT;
 		goto out;
 	}
@@ -1612,7 +1612,7 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 	init_waitqueue_head(&ocw.ocw_waitq);
 	ocw.ocw_oap   = oap;
 	ocw.ocw_grant = bytes;
-	while (cli->cl_dirty > 0 || cli->cl_w_in_flight > 0) {
+	while (cli->cl_dirty_pages > 0 || cli->cl_w_in_flight > 0) {
 		list_add_tail(&ocw.ocw_entry, &cli->cl_cache_waiters);
 		ocw.ocw_rc = 0;
 		spin_unlock(&cli->cl_loi_list_lock);
@@ -1667,11 +1667,11 @@ void osc_wake_cache_waiters(struct client_obd *cli)
 
 		ocw->ocw_rc = -EDQUOT;
 		/* we can't dirty more */
-		if ((cli->cl_dirty + PAGE_SIZE > cli->cl_dirty_max) ||
+		if ((cli->cl_dirty_pages > cli->cl_dirty_max_pages) ||
 		    (atomic_read(&obd_dirty_pages) + 1 > obd_max_dirty_pages)) {
 			CDEBUG(D_CACHE, "no dirty room: dirty: %ld osc max %ld, sys max %d\n",
-			       cli->cl_dirty,
-			       cli->cl_dirty_max, obd_max_dirty_pages);
+			       cli->cl_dirty_pages, cli->cl_dirty_max_pages,
+			       obd_max_dirty_pages);
 			goto wakeup;
 		}
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 90c8416..c618337 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -801,11 +801,12 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 
 	oa->o_valid |= bits;
 	spin_lock(&cli->cl_loi_list_lock);
-	oa->o_dirty = cli->cl_dirty;
-	if (unlikely(cli->cl_dirty - cli->cl_dirty_transit >
-		     cli->cl_dirty_max)) {
+	oa->o_dirty = cli->cl_dirty_pages << PAGE_SHIFT;
+	if (unlikely(cli->cl_dirty_pages - cli->cl_dirty_transit >
+		     cli->cl_dirty_max_pages)) {
 		CERROR("dirty %lu - %lu > dirty_max %lu\n",
-		       cli->cl_dirty, cli->cl_dirty_transit, cli->cl_dirty_max);
+		       cli->cl_dirty_pages, cli->cl_dirty_transit,
+		       cli->cl_dirty_max_pages);
 		oa->o_undirty = 0;
 	} else if (unlikely(atomic_read(&obd_dirty_pages) -
 			    atomic_read(&obd_dirty_transit_pages) >
@@ -820,15 +821,17 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 		       atomic_read(&obd_dirty_transit_pages),
 		       obd_max_dirty_pages);
 		oa->o_undirty = 0;
-	} else if (unlikely(cli->cl_dirty_max - cli->cl_dirty > 0x7fffffff)) {
+	} else if (unlikely(cli->cl_dirty_max_pages - cli->cl_dirty_pages >
+		   0x7fffffff)) {
 		CERROR("dirty %lu - dirty_max %lu too big???\n",
-		       cli->cl_dirty, cli->cl_dirty_max);
+		       cli->cl_dirty_pages, cli->cl_dirty_max_pages);
 		oa->o_undirty = 0;
 	} else {
 		long max_in_flight = (cli->cl_max_pages_per_rpc <<
 				      PAGE_SHIFT)*
 				     (cli->cl_max_rpcs_in_flight + 1);
-		oa->o_undirty = max(cli->cl_dirty_max, max_in_flight);
+		oa->o_undirty = max(cli->cl_dirty_max_pages << PAGE_SHIFT,
+				    max_in_flight);
 	}
 	oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant;
 	oa->o_dropped = cli->cl_lost_grant;
@@ -1028,22 +1031,24 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 {
 	/*
 	 * ocd_grant is the total grant amount we're expect to hold: if we've
-	 * been evicted, it's the new avail_grant amount, cl_dirty will drop
-	 * to 0 as inflight RPCs fail out; otherwise, it's avail_grant + dirty.
+	 * been evicted, it's the new avail_grant amount, cl_dirty_pages will
+	 * drop to 0 as inflight RPCs fail out; otherwise, it's avail_grant +
+	 * dirty.
 	 *
 	 * race is tolerable here: if we're evicted, but imp_state already
-	 * left EVICTED state, then cl_dirty must be 0 already.
+	 * left EVICTED state, then cl_dirty_pages must be 0 already.
 	 */
 	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_import->imp_state == LUSTRE_IMP_EVICTED)
 		cli->cl_avail_grant = ocd->ocd_grant;
 	else
-		cli->cl_avail_grant = ocd->ocd_grant - cli->cl_dirty;
+		cli->cl_avail_grant = ocd->ocd_grant -
+				      (cli->cl_dirty_pages << PAGE_SHIFT);
 
 	if (cli->cl_avail_grant < 0) {
 		CWARN("%s: available grant < 0: avail/ocd/dirty %ld/%u/%ld\n",
 		      cli->cl_import->imp_obd->obd_name, cli->cl_avail_grant,
-		      ocd->ocd_grant, cli->cl_dirty);
+		      ocd->ocd_grant, cli->cl_dirty_pages << PAGE_SHIFT);
 		/* workaround for servers which do not have the patch from
 		 * LU-2679
 		 */
@@ -3014,8 +3019,9 @@ static int osc_reconnect(const struct lu_env *env,
 		long lost_grant;
 
 		spin_lock(&cli->cl_loi_list_lock);
-		data->ocd_grant = (cli->cl_avail_grant + cli->cl_dirty) ?:
-				2 * cli_brw_size(obd);
+		data->ocd_grant = (cli->cl_avail_grant +
+				   (cli->cl_dirty_pages << PAGE_SHIFT)) ?:
+				   2 * cli_brw_size(obd);
 		lost_grant = cli->cl_lost_grant;
 		cli->cl_lost_grant = 0;
 		spin_unlock(&cli->cl_loi_list_lock);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index af8ffbc..c0122ef 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -1132,6 +1132,7 @@ finish:
 
 		LASSERT((cli->cl_max_pages_per_rpc <= PTLRPC_MAX_BRW_PAGES) &&
 			(cli->cl_max_pages_per_rpc > 0));
+		client_adjust_max_dirty(cli);
 	}
 
 out:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 70/80] staging: lustre: include: fix one off errors in lustre_id.h
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

During inspection of another patch Dan Carpenter noticed some
one off errors in lustre_id.h. Fix the condition test for
OBIF_MAX_OID.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 17581ba..9545451 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -659,7 +659,7 @@ static inline void ostid_set_id(struct ost_id *oi, __u64 oid)
 		oi->oi_fid.f_oid = oid;
 		oi->oi_fid.f_ver = oid >> 48;
 	} else {
-		if (oid > OBIF_MAX_OID) {
+		if (oid >= OBIF_MAX_OID) {
 			CERROR("Bad %llu to set " DOSTID "\n", oid, POSTID(oi));
 			return;
 		}
@@ -684,7 +684,7 @@ static inline int fid_set_id(struct lu_fid *fid, __u64 oid)
 		fid->f_oid = oid;
 		fid->f_ver = oid >> 48;
 	} else {
-		if (oid > OBIF_MAX_OID) {
+		if (oid >= OBIF_MAX_OID) {
 			CERROR("Too large OID %#llx to set REG "DFID"\n",
 			       (unsigned long long)oid, PFID(fid));
 			return -EBADF;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 70/80] staging: lustre: include: fix one off errors in lustre_id.h
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

During inspection of another patch Dan Carpenter noticed some
one off errors in lustre_id.h. Fix the condition test for
OBIF_MAX_OID.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 17581ba..9545451 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -659,7 +659,7 @@ static inline void ostid_set_id(struct ost_id *oi, __u64 oid)
 		oi->oi_fid.f_oid = oid;
 		oi->oi_fid.f_ver = oid >> 48;
 	} else {
-		if (oid > OBIF_MAX_OID) {
+		if (oid >= OBIF_MAX_OID) {
 			CERROR("Bad %llu to set " DOSTID "\n", oid, POSTID(oi));
 			return;
 		}
@@ -684,7 +684,7 @@ static inline int fid_set_id(struct lu_fid *fid, __u64 oid)
 		fid->f_oid = oid;
 		fid->f_ver = oid >> 48;
 	} else {
-		if (oid > OBIF_MAX_OID) {
+		if (oid >= OBIF_MAX_OID) {
 			CERROR("Too large OID %#llx to set REG "DFID"\n",
 			       (unsigned long long)oid, PFID(fid));
 			return -EBADF;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 71/80] staging: lustre: llite: remove assert for acl refcount
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

The purpose of this asssert to was to ensure lustre
was properly managing its posix_acl access. This test
is invalid due to the VFS layer also taking references
on the posix_acl. In reality their is no simple way to
detect this class of mistakes.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/llite_lib.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index da00fbd..64c8a2b 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1247,7 +1247,6 @@ void ll_clear_inode(struct inode *inode)
 
 #ifdef CONFIG_FS_POSIX_ACL
 	if (lli->lli_posix_acl) {
-		LASSERT(atomic_read(&lli->lli_posix_acl->a_refcount) == 1);
 		posix_acl_release(lli->lli_posix_acl);
 		lli->lli_posix_acl = NULL;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 71/80] staging: lustre: llite: remove assert for acl refcount
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, James Simmons

The purpose of this asssert to was to ensure lustre
was properly managing its posix_acl access. This test
is invalid due to the VFS layer also taking references
on the posix_acl. In reality their is no simple way to
detect this class of mistakes.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/llite_lib.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index da00fbd..64c8a2b 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1247,7 +1247,6 @@ void ll_clear_inode(struct inode *inode)
 
 #ifdef CONFIG_FS_POSIX_ACL
 	if (lli->lli_posix_acl) {
-		LASSERT(atomic_read(&lli->lli_posix_acl->a_refcount) == 1);
 		posix_acl_release(lli->lli_posix_acl);
 		lli->lli_posix_acl = NULL;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 72/80] staging: lustre: obd: validate open handle cookies
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Add a const void *h_owner member to struct portals_handle. Add a const
void *owner parameter to class_handle2object() which must be matched
by the h_owner member of the handle in addition to the cookie.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3233
Reviewed-on: http://review.whamcloud.com/6938
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/include/lustre_handles.h |    3 ++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |    2 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |    2 +-
 .../lustre/lustre/obdclass/lustre_handles.c        |    4 ++--
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_handles.h b/drivers/staging/lustre/lustre/include/lustre_handles.h
index 1a63a6b..bc1dd46 100644
--- a/drivers/staging/lustre/lustre/include/lustre_handles.h
+++ b/drivers/staging/lustre/lustre/include/lustre_handles.h
@@ -66,6 +66,7 @@ struct portals_handle_ops {
 struct portals_handle {
 	struct list_head			h_link;
 	__u64				h_cookie;
+	const void			*h_owner;
 	struct portals_handle_ops	*h_ops;
 
 	/* newly added fields to handle the RCU issue. -jxiong */
@@ -83,7 +84,7 @@ struct portals_handle {
 void class_handle_hash(struct portals_handle *,
 		       struct portals_handle_ops *ops);
 void class_handle_unhash(struct portals_handle *);
-void *class_handle2object(__u64 cookie);
+void *class_handle2object(__u64 cookie, const void *owner);
 void class_handle_free_cb(struct rcu_head *rcu);
 int class_handle_init(void);
 void class_handle_cleanup(void);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index a91cdb4..7a34caf 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -542,7 +542,7 @@ struct ldlm_lock *__ldlm_handle2lock(const struct lustre_handle *handle,
 
 	LASSERT(handle);
 
-	lock = class_handle2object(handle->cookie);
+	lock = class_handle2object(handle->cookie, NULL);
 	if (!lock)
 		return NULL;
 
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index be25434..a739eb1 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -618,7 +618,7 @@ struct obd_export *class_conn2export(struct lustre_handle *conn)
 	}
 
 	CDEBUG(D_INFO, "looking for export cookie %#llx\n", conn->cookie);
-	export = class_handle2object(conn->cookie);
+	export = class_handle2object(conn->cookie, NULL);
 	return export;
 }
 EXPORT_SYMBOL(class_conn2export);
diff --git a/drivers/staging/lustre/lustre/obdclass/lustre_handles.c b/drivers/staging/lustre/lustre/obdclass/lustre_handles.c
index 082f530..7ca68ae 100644
--- a/drivers/staging/lustre/lustre/obdclass/lustre_handles.c
+++ b/drivers/staging/lustre/lustre/obdclass/lustre_handles.c
@@ -130,7 +130,7 @@ void class_handle_unhash(struct portals_handle *h)
 }
 EXPORT_SYMBOL(class_handle_unhash);
 
-void *class_handle2object(__u64 cookie)
+void *class_handle2object(__u64 cookie, const void *owner)
 {
 	struct handle_bucket *bucket;
 	struct portals_handle *h;
@@ -145,7 +145,7 @@ void *class_handle2object(__u64 cookie)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(h, &bucket->head, h_link) {
-		if (h->h_cookie != cookie)
+		if (h->h_cookie != cookie || h->h_owner != owner)
 			continue;
 
 		spin_lock(&h->h_lock);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 72/80] staging: lustre: obd: validate open handle cookies
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, James Simmons

From: John L. Hammond <john.hammond@intel.com>

Add a const void *h_owner member to struct portals_handle. Add a const
void *owner parameter to class_handle2object() which must be matched
by the h_owner member of the handle in addition to the cookie.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3233
Reviewed-on: http://review.whamcloud.com/6938
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/include/lustre_handles.h |    3 ++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |    2 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |    2 +-
 .../lustre/lustre/obdclass/lustre_handles.c        |    4 ++--
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_handles.h b/drivers/staging/lustre/lustre/include/lustre_handles.h
index 1a63a6b..bc1dd46 100644
--- a/drivers/staging/lustre/lustre/include/lustre_handles.h
+++ b/drivers/staging/lustre/lustre/include/lustre_handles.h
@@ -66,6 +66,7 @@ struct portals_handle_ops {
 struct portals_handle {
 	struct list_head			h_link;
 	__u64				h_cookie;
+	const void			*h_owner;
 	struct portals_handle_ops	*h_ops;
 
 	/* newly added fields to handle the RCU issue. -jxiong */
@@ -83,7 +84,7 @@ struct portals_handle {
 void class_handle_hash(struct portals_handle *,
 		       struct portals_handle_ops *ops);
 void class_handle_unhash(struct portals_handle *);
-void *class_handle2object(__u64 cookie);
+void *class_handle2object(__u64 cookie, const void *owner);
 void class_handle_free_cb(struct rcu_head *rcu);
 int class_handle_init(void);
 void class_handle_cleanup(void);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index a91cdb4..7a34caf 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -542,7 +542,7 @@ struct ldlm_lock *__ldlm_handle2lock(const struct lustre_handle *handle,
 
 	LASSERT(handle);
 
-	lock = class_handle2object(handle->cookie);
+	lock = class_handle2object(handle->cookie, NULL);
 	if (!lock)
 		return NULL;
 
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index be25434..a739eb1 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -618,7 +618,7 @@ struct obd_export *class_conn2export(struct lustre_handle *conn)
 	}
 
 	CDEBUG(D_INFO, "looking for export cookie %#llx\n", conn->cookie);
-	export = class_handle2object(conn->cookie);
+	export = class_handle2object(conn->cookie, NULL);
 	return export;
 }
 EXPORT_SYMBOL(class_conn2export);
diff --git a/drivers/staging/lustre/lustre/obdclass/lustre_handles.c b/drivers/staging/lustre/lustre/obdclass/lustre_handles.c
index 082f530..7ca68ae 100644
--- a/drivers/staging/lustre/lustre/obdclass/lustre_handles.c
+++ b/drivers/staging/lustre/lustre/obdclass/lustre_handles.c
@@ -130,7 +130,7 @@ void class_handle_unhash(struct portals_handle *h)
 }
 EXPORT_SYMBOL(class_handle_unhash);
 
-void *class_handle2object(__u64 cookie)
+void *class_handle2object(__u64 cookie, const void *owner)
 {
 	struct handle_bucket *bucket;
 	struct portals_handle *h;
@@ -145,7 +145,7 @@ void *class_handle2object(__u64 cookie)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(h, &bucket->head, h_link) {
-		if (h->h_cookie != cookie)
+		if (h->h_cookie != cookie || h->h_owner != owner)
 			continue;
 
 		spin_lock(&h->h_lock);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 73/80] staging: lustre: lmv: build error with gcc 4.7.0 20110509
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Dmitry Eremin, James Simmons

From: Dmitry Eremin <dmitry.eremin@intel.com>

Fixed comparison between signed and unsigned indexes.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3775
Reviewed-on: http://review.whamcloud.com/7382
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |   34 ++++++++++++++------------
 1 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index b8275e1..3e41f49 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -130,12 +130,12 @@ static void lmv_activate_target(struct lmv_obd *lmv,
  *  -ENOTCONN: The UUID is found, but the target connection is bad (!)
  *  -EBADF   : The UUID is found, but the OBD of the wrong type (!)
  */
-static int lmv_set_mdc_active(struct lmv_obd *lmv, struct obd_uuid *uuid,
+static int lmv_set_mdc_active(struct lmv_obd *lmv, const struct obd_uuid *uuid,
 			      int activate)
 {
 	struct lmv_tgt_desc    *uninitialized_var(tgt);
 	struct obd_device      *obd;
-	int		     i;
+	u32		     i;
 	int		     rc = 0;
 
 	CDEBUG(D_INFO, "Searching in lmv %p for uuid %s (activate=%d)\n",
@@ -307,7 +307,7 @@ static int lmv_connect(const struct lu_env *env,
 static void lmv_set_timeouts(struct obd_device *obd)
 {
 	struct lmv_obd	*lmv;
-	int		    i;
+	u32 i;
 
 	lmv = &obd->u.lmv;
 	if (lmv->server_timeout == 0)
@@ -333,7 +333,7 @@ static int lmv_init_ea_size(struct obd_export *exp, int easize,
 {
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
-	int		  i;
+	u32 i;
 	int		  rc = 0;
 	int		  change = 0;
 
@@ -578,7 +578,7 @@ int lmv_check_connect(struct obd_device *obd)
 {
 	struct lmv_obd       *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc  *tgt;
-	int		   i;
+	u32 i;
 	int		   rc;
 	int		   easize;
 
@@ -693,7 +693,7 @@ static int lmv_disconnect(struct obd_export *exp)
 	struct obd_device     *obd = class_exp2obd(exp);
 	struct lmv_obd	*lmv = &obd->u.lmv;
 	int		    rc;
-	int		    i;
+	u32 i;
 
 	if (!lmv->tgts)
 		goto out_local;
@@ -822,7 +822,7 @@ static int lmv_hsm_req_count(struct lmv_obd *lmv,
 			     const struct hsm_user_request *hur,
 			     const struct lmv_tgt_desc *tgt_mds)
 {
-	int			i, nr = 0;
+	u32 i, nr = 0;
 	struct lmv_tgt_desc    *curr_tgt;
 
 	/* count how many requests must be sent to the given target */
@@ -963,10 +963,10 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp,
 	struct obd_device    *obddev = class_exp2obd(exp);
 	struct lmv_obd       *lmv = &obddev->u.lmv;
 	struct lmv_tgt_desc *tgt = NULL;
-	int		   i = 0;
+	u32 i = 0;
 	int		   rc = 0;
 	int		   set = 0;
-	int		   count = lmv->desc.ld_tgt_count;
+	u32 count = lmv->desc.ld_tgt_count;
 
 	if (count == 0)
 		return -ENOTTY;
@@ -1444,7 +1444,7 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp,
 	struct lmv_obd	*lmv = &obd->u.lmv;
 	struct obd_statfs     *temp;
 	int		    rc = 0;
-	int		    i;
+	u32 i;
 
 	rc = lmv_check_connect(obd);
 	if (rc)
@@ -1586,7 +1586,7 @@ static int lmv_null_inode(struct obd_export *exp, const struct lu_fid *fid)
 {
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
-	int		  i;
+	u32 i;
 	int		  rc;
 
 	rc = lmv_check_connect(obd);
@@ -1615,7 +1615,7 @@ static int lmv_find_cbdata(struct obd_export *exp, const struct lu_fid *fid,
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	int tgt;
-	int		  i;
+	u32 i;
 	int		  rc;
 
 	rc = lmv_check_connect(obd);
@@ -2923,7 +2923,7 @@ static int lmv_cancel_unused(struct obd_export *exp, const struct lu_fid *fid,
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	int		      rc = 0;
 	int		      err;
-	int		      i;
+	u32 i;
 
 	LASSERT(fid);
 
@@ -2966,7 +2966,7 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, __u64 flags,
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	enum ldlm_mode	      rc;
 	int tgt;
-	int		      i;
+	u32 i;
 
 	CDEBUG(D_INODE, "Lock match for "DFID"\n", PFID(fid));
 
@@ -3125,8 +3125,9 @@ static int lmv_quotactl(struct obd_device *unused, struct obd_export *exp,
 	struct obd_device   *obd = class_exp2obd(exp);
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt = lmv->tgts[0];
-	int		  rc = 0, i;
+	int rc = 0;
 	__u64 curspace = 0, curinodes = 0;
+	u32 i;
 
 	if (!tgt || !tgt->ltd_exp || !tgt->ltd_active ||
 	    !lmv->desc.ld_tgt_count) {
@@ -3169,7 +3170,8 @@ static int lmv_quotacheck(struct obd_device *unused, struct obd_export *exp,
 	struct obd_device   *obd = class_exp2obd(exp);
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt;
-	int		  i, rc = 0;
+	int rc = 0;
+	u32 i;
 
 	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
 		int err;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 73/80] staging: lustre: lmv: build error with gcc 4.7.0 20110509
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Dmitry Eremin, James Simmons

From: Dmitry Eremin <dmitry.eremin@intel.com>

Fixed comparison between signed and unsigned indexes.

Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3775
Reviewed-on: http://review.whamcloud.com/7382
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c |   34 ++++++++++++++------------
 1 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index b8275e1..3e41f49 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -130,12 +130,12 @@ static void lmv_activate_target(struct lmv_obd *lmv,
  *  -ENOTCONN: The UUID is found, but the target connection is bad (!)
  *  -EBADF   : The UUID is found, but the OBD of the wrong type (!)
  */
-static int lmv_set_mdc_active(struct lmv_obd *lmv, struct obd_uuid *uuid,
+static int lmv_set_mdc_active(struct lmv_obd *lmv, const struct obd_uuid *uuid,
 			      int activate)
 {
 	struct lmv_tgt_desc    *uninitialized_var(tgt);
 	struct obd_device      *obd;
-	int		     i;
+	u32		     i;
 	int		     rc = 0;
 
 	CDEBUG(D_INFO, "Searching in lmv %p for uuid %s (activate=%d)\n",
@@ -307,7 +307,7 @@ static int lmv_connect(const struct lu_env *env,
 static void lmv_set_timeouts(struct obd_device *obd)
 {
 	struct lmv_obd	*lmv;
-	int		    i;
+	u32 i;
 
 	lmv = &obd->u.lmv;
 	if (lmv->server_timeout == 0)
@@ -333,7 +333,7 @@ static int lmv_init_ea_size(struct obd_export *exp, int easize,
 {
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
-	int		  i;
+	u32 i;
 	int		  rc = 0;
 	int		  change = 0;
 
@@ -578,7 +578,7 @@ int lmv_check_connect(struct obd_device *obd)
 {
 	struct lmv_obd       *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc  *tgt;
-	int		   i;
+	u32 i;
 	int		   rc;
 	int		   easize;
 
@@ -693,7 +693,7 @@ static int lmv_disconnect(struct obd_export *exp)
 	struct obd_device     *obd = class_exp2obd(exp);
 	struct lmv_obd	*lmv = &obd->u.lmv;
 	int		    rc;
-	int		    i;
+	u32 i;
 
 	if (!lmv->tgts)
 		goto out_local;
@@ -822,7 +822,7 @@ static int lmv_hsm_req_count(struct lmv_obd *lmv,
 			     const struct hsm_user_request *hur,
 			     const struct lmv_tgt_desc *tgt_mds)
 {
-	int			i, nr = 0;
+	u32 i, nr = 0;
 	struct lmv_tgt_desc    *curr_tgt;
 
 	/* count how many requests must be sent to the given target */
@@ -963,10 +963,10 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp,
 	struct obd_device    *obddev = class_exp2obd(exp);
 	struct lmv_obd       *lmv = &obddev->u.lmv;
 	struct lmv_tgt_desc *tgt = NULL;
-	int		   i = 0;
+	u32 i = 0;
 	int		   rc = 0;
 	int		   set = 0;
-	int		   count = lmv->desc.ld_tgt_count;
+	u32 count = lmv->desc.ld_tgt_count;
 
 	if (count == 0)
 		return -ENOTTY;
@@ -1444,7 +1444,7 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp,
 	struct lmv_obd	*lmv = &obd->u.lmv;
 	struct obd_statfs     *temp;
 	int		    rc = 0;
-	int		    i;
+	u32 i;
 
 	rc = lmv_check_connect(obd);
 	if (rc)
@@ -1586,7 +1586,7 @@ static int lmv_null_inode(struct obd_export *exp, const struct lu_fid *fid)
 {
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
-	int		  i;
+	u32 i;
 	int		  rc;
 
 	rc = lmv_check_connect(obd);
@@ -1615,7 +1615,7 @@ static int lmv_find_cbdata(struct obd_export *exp, const struct lu_fid *fid,
 	struct obd_device   *obd = exp->exp_obd;
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	int tgt;
-	int		  i;
+	u32 i;
 	int		  rc;
 
 	rc = lmv_check_connect(obd);
@@ -2923,7 +2923,7 @@ static int lmv_cancel_unused(struct obd_export *exp, const struct lu_fid *fid,
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	int		      rc = 0;
 	int		      err;
-	int		      i;
+	u32 i;
 
 	LASSERT(fid);
 
@@ -2966,7 +2966,7 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, __u64 flags,
 	struct lmv_obd	  *lmv = &obd->u.lmv;
 	enum ldlm_mode	      rc;
 	int tgt;
-	int		      i;
+	u32 i;
 
 	CDEBUG(D_INODE, "Lock match for "DFID"\n", PFID(fid));
 
@@ -3125,8 +3125,9 @@ static int lmv_quotactl(struct obd_device *unused, struct obd_export *exp,
 	struct obd_device   *obd = class_exp2obd(exp);
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt = lmv->tgts[0];
-	int		  rc = 0, i;
+	int rc = 0;
 	__u64 curspace = 0, curinodes = 0;
+	u32 i;
 
 	if (!tgt || !tgt->ltd_exp || !tgt->ltd_active ||
 	    !lmv->desc.ld_tgt_count) {
@@ -3169,7 +3170,8 @@ static int lmv_quotacheck(struct obd_device *unused, struct obd_export *exp,
 	struct obd_device   *obd = class_exp2obd(exp);
 	struct lmv_obd      *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt;
-	int		  i, rc = 0;
+	int rc = 0;
+	u32 i;
 
 	for (i = 0; i < lmv->desc.ld_tgt_count; i++) {
 		int err;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 74/80] staging: lustre: obd: implement md_read_page
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

This patch adds md_read_page which is a new more
flexiable api that will replace md_readpage.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/10761
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4906
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h       |   13 ++++++++++++-
 drivers/staging/lustre/lustre/include/obd_class.h |   15 +++++++++++++++
 2 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index e91f65a..92eebff 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -830,6 +830,15 @@ struct md_op_data {
 	struct lustre_handle	op_lease_handle;
 };
 
+#define op_stripe_offset       op_ioepoch
+#define op_max_pages           op_valid
+
+struct md_callback {
+	int (*md_blocking_ast)(struct ldlm_lock *lock,
+			       struct ldlm_lock_desc *desc,
+			       void *data, int flag);
+};
+
 enum op_cli_flags {
 	CLI_SET_MEA	= 1 << 0,
 	CLI_RM_ENTRY	= 1 << 1,
@@ -1039,7 +1048,9 @@ struct md_ops {
 		    struct ptlrpc_request **);
 	int (*readpage)(struct obd_export *, struct md_op_data *,
 			struct page **, struct ptlrpc_request **);
-
+	int (*read_page)(struct obd_export *, struct md_op_data *,
+			 struct md_callback *cb_op, __u64 hash_offset,
+			 struct page **ppage);
 	int (*unlink)(struct obd_export *, struct md_op_data *,
 		      struct ptlrpc_request **);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 69b628b..daca5a0 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1535,6 +1535,21 @@ static inline int md_readpage(struct obd_export *exp, struct md_op_data *opdata,
 	return rc;
 }
 
+static inline int md_read_page(struct obd_export *exp,
+			       struct md_op_data *op_data,
+			       struct md_callback *cb_op,
+			       __u64  hash_offset,
+			       struct page **ppage)
+{
+	int rc;
+
+	EXP_CHECK_MD_OP(exp, read_page);
+	EXP_MD_COUNTER_INCREMENT(exp, read_page);
+	rc = MDP(exp->exp_obd, read_page)(exp, op_data, cb_op, hash_offset,
+					  ppage);
+	return rc;
+}
+
 static inline int md_unlink(struct obd_export *exp, struct md_op_data *op_data,
 			    struct ptlrpc_request **request)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 74/80] staging: lustre: obd: implement md_read_page
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

This patch adds md_read_page which is a new more
flexiable api that will replace md_readpage.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/10761
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4906
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h       |   13 ++++++++++++-
 drivers/staging/lustre/lustre/include/obd_class.h |   15 +++++++++++++++
 2 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index e91f65a..92eebff 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -830,6 +830,15 @@ struct md_op_data {
 	struct lustre_handle	op_lease_handle;
 };
 
+#define op_stripe_offset       op_ioepoch
+#define op_max_pages           op_valid
+
+struct md_callback {
+	int (*md_blocking_ast)(struct ldlm_lock *lock,
+			       struct ldlm_lock_desc *desc,
+			       void *data, int flag);
+};
+
 enum op_cli_flags {
 	CLI_SET_MEA	= 1 << 0,
 	CLI_RM_ENTRY	= 1 << 1,
@@ -1039,7 +1048,9 @@ struct md_ops {
 		    struct ptlrpc_request **);
 	int (*readpage)(struct obd_export *, struct md_op_data *,
 			struct page **, struct ptlrpc_request **);
-
+	int (*read_page)(struct obd_export *, struct md_op_data *,
+			 struct md_callback *cb_op, __u64 hash_offset,
+			 struct page **ppage);
 	int (*unlink)(struct obd_export *, struct md_op_data *,
 		      struct ptlrpc_request **);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 69b628b..daca5a0 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1535,6 +1535,21 @@ static inline int md_readpage(struct obd_export *exp, struct md_op_data *opdata,
 	return rc;
 }
 
+static inline int md_read_page(struct obd_export *exp,
+			       struct md_op_data *op_data,
+			       struct md_callback *cb_op,
+			       __u64  hash_offset,
+			       struct page **ppage)
+{
+	int rc;
+
+	EXP_CHECK_MD_OP(exp, read_page);
+	EXP_MD_COUNTER_INCREMENT(exp, read_page);
+	rc = MDP(exp->exp_obd, read_page)(exp, op_data, cb_op, hash_offset,
+					  ppage);
+	return rc;
+}
+
 static inline int md_unlink(struct obd_export *exp, struct md_op_data *op_data,
 			    struct ptlrpc_request **request)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 75/80] staging: lustre: llite: set op_max_pages
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Cache the maximum allowed pages supported by the llite
layer. This value will be used in the mdc and lmv layer.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c       |    1 +
 drivers/staging/lustre/lustre/llite/llite_nfs.c |    1 +
 drivers/staging/lustre/lustre/llite/statahead.c |    6 ++++++
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 9c7fa8f..ed09015 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -649,6 +649,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx)
 			}
 		}
 	}
+	op_data->op_max_pages = sbi->ll_md_brw_pages;
 	ctx->pos = pos;
 	rc = ll_dir_read(inode, &pos, op_data, ctx);
 	pos = ctx->pos;
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index 2b65240..1e156dc 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -276,6 +276,7 @@ static int ll_get_name(struct dentry *dentry, char *name,
 		goto out;
 	}
 
+	op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
 	inode_lock(dir);
 	rc = ll_dir_read(dir, &pos, op_data, &lgd.ctx);
 	inode_unlock(dir);
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 46b8faf..454c33e 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1052,6 +1052,8 @@ static int ll_statahead_thread(void *arg)
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
+	op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
+
 	if (sbi->ll_flags & LL_SBI_AGL_ENABLED)
 		ll_start_agl(parent, sai);
 
@@ -1355,6 +1357,10 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 				     LUSTRE_OPC_ANY, dir);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
+	/**
+	 * FIXME choose the start offset of the readdir
+	 */
+	op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
 
 	ll_dir_chain_init(&chain);
 	page = ll_get_dir_page(dir, op_data, pos, &chain);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 75/80] staging: lustre: llite: set op_max_pages
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, wang di,
	James Simmons

From: wang di <di.wang@intel.com>

Cache the maximum allowed pages supported by the llite
layer. This value will be used in the mdc and lmv layer.

Signed-off-by: wang di <di.wang@intel.com>
Reviewed-on: http://review.whamcloud.com/7043
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3531
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c       |    1 +
 drivers/staging/lustre/lustre/llite/llite_nfs.c |    1 +
 drivers/staging/lustre/lustre/llite/statahead.c |    6 ++++++
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 9c7fa8f..ed09015 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -649,6 +649,7 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx)
 			}
 		}
 	}
+	op_data->op_max_pages = sbi->ll_md_brw_pages;
 	ctx->pos = pos;
 	rc = ll_dir_read(inode, &pos, op_data, ctx);
 	pos = ctx->pos;
diff --git a/drivers/staging/lustre/lustre/llite/llite_nfs.c b/drivers/staging/lustre/lustre/llite/llite_nfs.c
index 2b65240..1e156dc 100644
--- a/drivers/staging/lustre/lustre/llite/llite_nfs.c
+++ b/drivers/staging/lustre/lustre/llite/llite_nfs.c
@@ -276,6 +276,7 @@ static int ll_get_name(struct dentry *dentry, char *name,
 		goto out;
 	}
 
+	op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
 	inode_lock(dir);
 	rc = ll_dir_read(dir, &pos, op_data, &lgd.ctx);
 	inode_unlock(dir);
diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
index 46b8faf..454c33e 100644
--- a/drivers/staging/lustre/lustre/llite/statahead.c
+++ b/drivers/staging/lustre/lustre/llite/statahead.c
@@ -1052,6 +1052,8 @@ static int ll_statahead_thread(void *arg)
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
+	op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
+
 	if (sbi->ll_flags & LL_SBI_AGL_ENABLED)
 		ll_start_agl(parent, sai);
 
@@ -1355,6 +1357,10 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry)
 				     LUSTRE_OPC_ANY, dir);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
+	/**
+	 * FIXME choose the start offset of the readdir
+	 */
+	op_data->op_max_pages = ll_i2sbi(dir)->ll_md_brw_pages;
 
 	ll_dir_chain_init(&chain);
 	page = ll_get_dir_page(dir, op_data, pos, &chain);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 76/80] staging: lustre: lnet: Do not drop message when shutting down LNet
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Doug Oucharek, James Simmons

From: Doug Oucharek <doug.s.oucharek@intel.com>

There is a case in lnet_parse() where we discover that LNet
is shutting down but we continue to use the NI when we drop the
message and end up calling ko2iblnd_check_send_locked() which tries to
allocate from the Tx pool which has been cleaned up already.
This triggers a NULL pointer dereference.

This fix just returns from lnet_parse() when we disover LNet is
shutting down.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8106
Reviewed-on: http://review.whamcloud.com/19993
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/lnet/lib-move.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 6a3f2e1..5598fa8 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -2002,6 +2002,9 @@ lnet_parse(lnet_ni_t *ni, lnet_hdr_t *hdr, lnet_nid_t from_nid,
 		       libcfs_nid2str(from_nid), libcfs_nid2str(src_nid),
 		       lnet_msgtyp2str(type), rc);
 		lnet_msg_free(msg);
+		if (rc == -ESHUTDOWN)
+			/* We are shutting down. Don't do anything more */
+			return 0;
 		goto drop;
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 76/80] staging: lustre: lnet: Do not drop message when shutting down LNet
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Doug Oucharek, James Simmons

From: Doug Oucharek <doug.s.oucharek@intel.com>

There is a case in lnet_parse() where we discover that LNet
is shutting down but we continue to use the NI when we drop the
message and end up calling ko2iblnd_check_send_locked() which tries to
allocate from the Tx pool which has been cleaned up already.
This triggers a NULL pointer dereference.

This fix just returns from lnet_parse() when we disover LNet is
shutting down.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8106
Reviewed-on: http://review.whamcloud.com/19993
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/lnet/lib-move.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 6a3f2e1..5598fa8 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -2002,6 +2002,9 @@ lnet_parse(lnet_ni_t *ni, lnet_hdr_t *hdr, lnet_nid_t from_nid,
 		       libcfs_nid2str(from_nid), libcfs_nid2str(src_nid),
 		       lnet_msgtyp2str(type), rc);
 		lnet_msg_free(msg);
+		if (rc == -ESHUTDOWN)
+			/* We are shutting down. Don't do anything more */
+			return 0;
 		goto drop;
 	}
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 77/80] staging: lustre: lnet: Correct position of lnet_ni_decref()
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Doug Oucharek, James Simmons

From: Doug Oucharek <doug.s.oucharek@intel.com>

In fix http://review.whamcloud.com/#/c/19614/, the call
to lnet_ni_decref() should have followed the routines
which are using the NI.  This patch correct that.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8022
Reviewed-on: http://review.whamcloud.com/21001
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9eb1db6..19c90fc 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2526,9 +2526,9 @@ kiblnd_passive_connect(struct rdma_cm_id *cmid, void *priv, int priv_nob)
 
  failed:
 	if (ni) {
-		lnet_ni_decref(ni);
 		rej.ibr_cp.ibcp_queue_depth = kiblnd_msg_queue_size(version, ni);
 		rej.ibr_cp.ibcp_max_frags = kiblnd_rdma_frags(version, ni);
+		lnet_ni_decref(ni);
 	}
 
 	rej.ibr_version             = version;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 77/80] staging: lustre: lnet: Correct position of lnet_ni_decref()
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Doug Oucharek, James Simmons

From: Doug Oucharek <doug.s.oucharek@intel.com>

In fix http://review.whamcloud.com/#/c/19614/, the call
to lnet_ni_decref() should have followed the routines
which are using the NI.  This patch correct that.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8022
Reviewed-on: http://review.whamcloud.com/21001
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9eb1db6..19c90fc 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2526,9 +2526,9 @@ kiblnd_passive_connect(struct rdma_cm_id *cmid, void *priv, int priv_nob)
 
  failed:
 	if (ni) {
-		lnet_ni_decref(ni);
 		rej.ibr_cp.ibcp_queue_depth = kiblnd_msg_queue_size(version, ni);
 		rej.ibr_cp.ibcp_max_frags = kiblnd_rdma_frags(version, ni);
+		lnet_ni_decref(ni);
 	}
 
 	rej.ibr_version             = version;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 78/80] staging: lustre: lnet: make connection more stable with packet loss
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Alexander Boyko, Alexey Lyashkov, James Simmons

From: Alexander Boyko <alexander.boyko@seagate.com>

IB network may lose last connection handshake packet.
This problem isn't Lustre specific and described at
https://oss.oracle.com/pipermail/rds-devel/2007-December/000271.html
for example. Solution is to make conection established if any packet
is received for it.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Seagate-bug-id: MRP-2883
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8303
Reviewed-on: http://review.whamcloud.com/20874
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 19c90fc..6cd78ea 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -3419,6 +3419,12 @@ kiblnd_qp_event(struct ib_event *event, void *arg)
 	case IB_EVENT_COMM_EST:
 		CDEBUG(D_NET, "%s established\n",
 		       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+		/*
+		 * We received a packet but connection isn't established
+		 * probably handshake packet was lost, so free to
+		 * force make connection established
+		 */
+		rdma_notify(conn->ibc_cmid, IB_EVENT_COMM_EST);
 		return;
 
 	default:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 78/80] staging: lustre: lnet: make connection more stable with packet loss
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Alexander Boyko, Alexey Lyashkov, James Simmons

From: Alexander Boyko <alexander.boyko@seagate.com>

IB network may lose last connection handshake packet.
This problem isn't Lustre specific and described at
https://oss.oracle.com/pipermail/rds-devel/2007-December/000271.html
for example. Solution is to make conection established if any packet
is received for it.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Seagate-bug-id: MRP-2883
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8303
Reviewed-on: http://review.whamcloud.com/20874
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 19c90fc..6cd78ea 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -3419,6 +3419,12 @@ kiblnd_qp_event(struct ib_event *event, void *arg)
 	case IB_EVENT_COMM_EST:
 		CDEBUG(D_NET, "%s established\n",
 		       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+		/*
+		 * We received a packet but connection isn't established
+		 * probably handshake packet was lost, so free to
+		 * force make connection established
+		 */
+		rdma_notify(conn->ibc_cmid, IB_EVENT_COMM_EST);
 		return;
 
 	default:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 79/80] staging: lustre: lnet: lock improvement for ko2iblnd
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Liang Zhen,
	Doug Oucharek, James Simmons

From: Liang Zhen <liang.zhen@intel.com>

kiblnd_check_sends() takes conn::ibc_lock at the begin and release
this lock at the end, this is inefficient because most use-case
needs to explicitly release ibc_lock before caling this function.

This patches changes it to kiblnd_check_sends_locked() and avoid
unnecessary lock dances.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7099
Reviewed-on: http://review.whamcloud.com/20322
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   46 ++++++++------------
 1 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 6cd78ea..6d1b14a 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -38,7 +38,6 @@
 
 static void kiblnd_peer_alive(struct kib_peer *peer);
 static void kiblnd_peer_connect_failed(struct kib_peer *peer, int active, int error);
-static void kiblnd_check_sends(struct kib_conn *conn);
 static void kiblnd_init_tx_msg(lnet_ni_t *ni, struct kib_tx *tx,
 				int type, int body_nob);
 static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
@@ -46,6 +45,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 static void kiblnd_queue_tx_locked(struct kib_tx *tx, struct kib_conn *conn);
 static void kiblnd_queue_tx(struct kib_tx *tx, struct kib_conn *conn);
 static void kiblnd_unmap_tx(lnet_ni_t *ni, struct kib_tx *tx);
+static void kiblnd_check_sends_locked(struct kib_conn *conn);
 
 static void
 kiblnd_tx_done(lnet_ni_t *ni, struct kib_tx *tx)
@@ -211,9 +211,9 @@ kiblnd_post_rx(struct kib_rx *rx, int credit)
 		conn->ibc_outstanding_credits++;
 	else
 		conn->ibc_reserved_credits++;
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
 
-	kiblnd_check_sends(conn);
 out:
 	kiblnd_conn_decref(conn);
 	return rc;
@@ -344,8 +344,8 @@ kiblnd_handle_rx(struct kib_rx *rx)
 		    !IBLND_OOB_CAPABLE(conn->ibc_version)) /* v1 only */
 			conn->ibc_outstanding_credits++;
 
+		kiblnd_check_sends_locked(conn);
 		spin_unlock(&conn->ibc_lock);
-		kiblnd_check_sends(conn);
 	}
 
 	switch (msg->ibm_type) {
@@ -800,7 +800,7 @@ kiblnd_post_tx_locked(struct kib_conn *conn, struct kib_tx *tx, int credit)
 	      conn->ibc_noops_posted == IBLND_OOB_MSGS(ver)))) {
 		/*
 		 * OK to drop when posted enough NOOPs, since
-		 * kiblnd_check_sends will queue NOOP again when
+		 * kiblnd_check_sends_locked will queue NOOP again when
 		 * posted NOOPs complete
 		 */
 		spin_unlock(&conn->ibc_lock);
@@ -905,7 +905,7 @@ kiblnd_post_tx_locked(struct kib_conn *conn, struct kib_tx *tx, int credit)
 }
 
 static void
-kiblnd_check_sends(struct kib_conn *conn)
+kiblnd_check_sends_locked(struct kib_conn *conn)
 {
 	int ver = conn->ibc_version;
 	lnet_ni_t *ni = conn->ibc_peer->ibp_ni;
@@ -918,8 +918,6 @@ kiblnd_check_sends(struct kib_conn *conn)
 		return;
 	}
 
-	spin_lock(&conn->ibc_lock);
-
 	LASSERT(conn->ibc_nsends_posted <= kiblnd_concurrent_sends(ver, ni));
 	LASSERT(!IBLND_OOB_CAPABLE(ver) ||
 		conn->ibc_noops_posted <= IBLND_OOB_MSGS(ver));
@@ -969,8 +967,6 @@ kiblnd_check_sends(struct kib_conn *conn)
 		if (kiblnd_post_tx_locked(conn, tx, credit))
 			break;
 	}
-
-	spin_unlock(&conn->ibc_lock);
 }
 
 static void
@@ -1016,16 +1012,11 @@ kiblnd_tx_complete(struct kib_tx *tx, int status)
 	if (idle)
 		list_del(&tx->tx_list);
 
-	kiblnd_conn_addref(conn);	       /* 1 ref for me.... */
-
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
 
 	if (idle)
 		kiblnd_tx_done(conn->ibc_peer->ibp_ni, tx);
-
-	kiblnd_check_sends(conn);
-
-	kiblnd_conn_decref(conn);	       /* ...until here */
 }
 
 static void
@@ -1204,9 +1195,8 @@ kiblnd_queue_tx(struct kib_tx *tx, struct kib_conn *conn)
 {
 	spin_lock(&conn->ibc_lock);
 	kiblnd_queue_tx_locked(tx, conn);
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
-
-	kiblnd_check_sends(conn);
 }
 
 static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
@@ -2183,14 +2173,11 @@ kiblnd_connreq_done(struct kib_conn *conn, int status)
 		return;
 	}
 
-	/**
-	 * refcount taken by cmid is not reliable after I released the glock
-	 * because this connection is visible to other threads now, another
-	 * thread can find and close this connection right after I released
-	 * the glock, if kiblnd_cm_callback for RDMA_CM_EVENT_DISCONNECTED is
-	 * called, it can release the connection refcount taken by cmid.
-	 * It means the connection could be destroyed before I finish my
-	 * operations on it.
+	/*
+	 * +1 ref for myself, this connection is visible to other threads
+	 * now, refcount of peer:ibp_conns can be released by connection
+	 * close from either a different thread, or the calling of
+	 * kiblnd_check_sends_locked() below. See bz21911 for details.
 	 */
 	kiblnd_conn_addref(conn);
 	write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags);
@@ -2202,10 +2189,9 @@ kiblnd_connreq_done(struct kib_conn *conn, int status)
 
 		kiblnd_queue_tx_locked(tx, conn);
 	}
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
 
-	kiblnd_check_sends(conn);
-
 	/* schedule blocked rxs */
 	kiblnd_handle_early_rxs(conn);
 
@@ -3233,7 +3219,11 @@ kiblnd_check_conns(int idx)
 	 */
 	list_for_each_entry_safe(conn, temp, &checksends, ibc_connd_list) {
 		list_del(&conn->ibc_connd_list);
-		kiblnd_check_sends(conn);
+
+		spin_lock(&conn->ibc_lock);
+		kiblnd_check_sends_locked(conn);
+		spin_unlock(&conn->ibc_lock);
+
 		kiblnd_conn_decref(conn);
 	}
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 79/80] staging: lustre: lnet: lock improvement for ko2iblnd
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List, Liang Zhen,
	Doug Oucharek, James Simmons

From: Liang Zhen <liang.zhen@intel.com>

kiblnd_check_sends() takes conn::ibc_lock at the begin and release
this lock at the end, this is inefficient because most use-case
needs to explicitly release ibc_lock before caling this function.

This patches changes it to kiblnd_check_sends_locked() and avoid
unnecessary lock dances.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7099
Reviewed-on: http://review.whamcloud.com/20322
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   46 ++++++++------------
 1 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 6cd78ea..6d1b14a 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -38,7 +38,6 @@
 
 static void kiblnd_peer_alive(struct kib_peer *peer);
 static void kiblnd_peer_connect_failed(struct kib_peer *peer, int active, int error);
-static void kiblnd_check_sends(struct kib_conn *conn);
 static void kiblnd_init_tx_msg(lnet_ni_t *ni, struct kib_tx *tx,
 				int type, int body_nob);
 static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
@@ -46,6 +45,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 static void kiblnd_queue_tx_locked(struct kib_tx *tx, struct kib_conn *conn);
 static void kiblnd_queue_tx(struct kib_tx *tx, struct kib_conn *conn);
 static void kiblnd_unmap_tx(lnet_ni_t *ni, struct kib_tx *tx);
+static void kiblnd_check_sends_locked(struct kib_conn *conn);
 
 static void
 kiblnd_tx_done(lnet_ni_t *ni, struct kib_tx *tx)
@@ -211,9 +211,9 @@ kiblnd_post_rx(struct kib_rx *rx, int credit)
 		conn->ibc_outstanding_credits++;
 	else
 		conn->ibc_reserved_credits++;
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
 
-	kiblnd_check_sends(conn);
 out:
 	kiblnd_conn_decref(conn);
 	return rc;
@@ -344,8 +344,8 @@ kiblnd_handle_rx(struct kib_rx *rx)
 		    !IBLND_OOB_CAPABLE(conn->ibc_version)) /* v1 only */
 			conn->ibc_outstanding_credits++;
 
+		kiblnd_check_sends_locked(conn);
 		spin_unlock(&conn->ibc_lock);
-		kiblnd_check_sends(conn);
 	}
 
 	switch (msg->ibm_type) {
@@ -800,7 +800,7 @@ kiblnd_post_tx_locked(struct kib_conn *conn, struct kib_tx *tx, int credit)
 	      conn->ibc_noops_posted == IBLND_OOB_MSGS(ver)))) {
 		/*
 		 * OK to drop when posted enough NOOPs, since
-		 * kiblnd_check_sends will queue NOOP again when
+		 * kiblnd_check_sends_locked will queue NOOP again when
 		 * posted NOOPs complete
 		 */
 		spin_unlock(&conn->ibc_lock);
@@ -905,7 +905,7 @@ kiblnd_post_tx_locked(struct kib_conn *conn, struct kib_tx *tx, int credit)
 }
 
 static void
-kiblnd_check_sends(struct kib_conn *conn)
+kiblnd_check_sends_locked(struct kib_conn *conn)
 {
 	int ver = conn->ibc_version;
 	lnet_ni_t *ni = conn->ibc_peer->ibp_ni;
@@ -918,8 +918,6 @@ kiblnd_check_sends(struct kib_conn *conn)
 		return;
 	}
 
-	spin_lock(&conn->ibc_lock);
-
 	LASSERT(conn->ibc_nsends_posted <= kiblnd_concurrent_sends(ver, ni));
 	LASSERT(!IBLND_OOB_CAPABLE(ver) ||
 		conn->ibc_noops_posted <= IBLND_OOB_MSGS(ver));
@@ -969,8 +967,6 @@ kiblnd_check_sends(struct kib_conn *conn)
 		if (kiblnd_post_tx_locked(conn, tx, credit))
 			break;
 	}
-
-	spin_unlock(&conn->ibc_lock);
 }
 
 static void
@@ -1016,16 +1012,11 @@ kiblnd_tx_complete(struct kib_tx *tx, int status)
 	if (idle)
 		list_del(&tx->tx_list);
 
-	kiblnd_conn_addref(conn);	       /* 1 ref for me.... */
-
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
 
 	if (idle)
 		kiblnd_tx_done(conn->ibc_peer->ibp_ni, tx);
-
-	kiblnd_check_sends(conn);
-
-	kiblnd_conn_decref(conn);	       /* ...until here */
 }
 
 static void
@@ -1204,9 +1195,8 @@ kiblnd_queue_tx(struct kib_tx *tx, struct kib_conn *conn)
 {
 	spin_lock(&conn->ibc_lock);
 	kiblnd_queue_tx_locked(tx, conn);
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
-
-	kiblnd_check_sends(conn);
 }
 
 static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
@@ -2183,14 +2173,11 @@ kiblnd_connreq_done(struct kib_conn *conn, int status)
 		return;
 	}
 
-	/**
-	 * refcount taken by cmid is not reliable after I released the glock
-	 * because this connection is visible to other threads now, another
-	 * thread can find and close this connection right after I released
-	 * the glock, if kiblnd_cm_callback for RDMA_CM_EVENT_DISCONNECTED is
-	 * called, it can release the connection refcount taken by cmid.
-	 * It means the connection could be destroyed before I finish my
-	 * operations on it.
+	/*
+	 * +1 ref for myself, this connection is visible to other threads
+	 * now, refcount of peer:ibp_conns can be released by connection
+	 * close from either a different thread, or the calling of
+	 * kiblnd_check_sends_locked() below. See bz21911 for details.
 	 */
 	kiblnd_conn_addref(conn);
 	write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags);
@@ -2202,10 +2189,9 @@ kiblnd_connreq_done(struct kib_conn *conn, int status)
 
 		kiblnd_queue_tx_locked(tx, conn);
 	}
+	kiblnd_check_sends_locked(conn);
 	spin_unlock(&conn->ibc_lock);
 
-	kiblnd_check_sends(conn);
-
 	/* schedule blocked rxs */
 	kiblnd_handle_early_rxs(conn);
 
@@ -3233,7 +3219,11 @@ kiblnd_check_conns(int idx)
 	 */
 	list_for_each_entry_safe(conn, temp, &checksends, ibc_connd_list) {
 		list_del(&conn->ibc_connd_list);
-		kiblnd_check_sends(conn);
+
+		spin_lock(&conn->ibc_lock);
+		kiblnd_check_sends_locked(conn);
+		spin_unlock(&conn->ibc_lock);
+
 		kiblnd_conn_decref(conn);
 	}
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [PATCH 80/80] staging: lustre: lnet: Stop Infinite CON RACE Condition
  2016-08-16 20:18 ` [lustre-devel] " James Simmons
@ 2016-08-16 20:19   ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Doug Oucharek, James Simmons

From: Doug Oucharek <doug.s.oucharek@intel.com>

In current code, when a CON RACE occurs, the passive side will
let the node with the higher NID value win the race.

We have a field case where a node can have a "stuck"
connection which never goes away and is the trigger of a
never-ending loop of re-connections.

This patch introduces a counter to how many times a
connection in a connecting state has been the cause of a CON RACE
rejection. After 20 times (constant MAX_CONN_RACES_BEFORE_ABORT),
we assume the connection is stuck and let the other side (with
lower NID) win.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7646
Reviewed-on: http://review.whamcloud.com/19430
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    2 +
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   24 ++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index 078a0c3..fbc4f68 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -582,6 +582,8 @@ struct kib_peer {
 	unsigned short		ibp_connecting;
 	/* reconnect this peer later */
 	unsigned short		ibp_reconnecting:1;
+	/* counter of how many times we triggered a conn race */
+	unsigned char		ibp_races;
 	/* # consecutive reconnection attempts to this peer */
 	unsigned int		ibp_reconnected;
 	/* errno on closing this peer */
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 6d1b14a..430ff85 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -36,6 +36,8 @@
 
 #include "o2iblnd.h"
 
+#define MAX_CONN_RACES_BEFORE_ABORT 20
+
 static void kiblnd_peer_alive(struct kib_peer *peer);
 static void kiblnd_peer_connect_failed(struct kib_peer *peer, int active, int error);
 static void kiblnd_init_tx_msg(lnet_ni_t *ni, struct kib_tx *tx,
@@ -2405,23 +2407,37 @@ kiblnd_passive_connect(struct rdma_cm_id *cmid, void *priv, int priv_nob)
 			goto failed;
 		}
 
-		/* tie-break connection race in favour of the higher NID */
+		/*
+		 * Tie-break connection race in favour of the higher NID.
+		 * If we keep running into a race condition multiple times,
+		 * we have to assume that the connection attempt with the
+		 * higher NID is stuck in a connecting state and will never
+		 * recover.  As such, we pass through this if-block and let
+		 * the lower NID connection win so we can move forward.
+		 */
 		if (peer2->ibp_connecting &&
-		    nid < ni->ni_nid) {
+		    nid < ni->ni_nid && peer2->ibp_races <
+		    MAX_CONN_RACES_BEFORE_ABORT) {
+			peer2->ibp_races++;
 			write_unlock_irqrestore(g_lock, flags);
 
-			CWARN("Conn race %s\n", libcfs_nid2str(peer2->ibp_nid));
+			CDEBUG(D_NET, "Conn race %s\n",
+			       libcfs_nid2str(peer2->ibp_nid));
 
 			kiblnd_peer_decref(peer);
 			rej.ibr_why = IBLND_REJECT_CONN_RACE;
 			goto failed;
 		}
-
+		if (peer2->ibp_races >= MAX_CONN_RACES_BEFORE_ABORT)
+			CNETERR("Conn race %s: unresolved after %d attempts, letting lower NID win\n",
+				libcfs_nid2str(peer2->ibp_nid),
+				MAX_CONN_RACES_BEFORE_ABORT);
 		/**
 		 * passive connection is allowed even this peer is waiting for
 		 * reconnection.
 		 */
 		peer2->ibp_reconnecting = 0;
+		peer2->ibp_races = 0;
 		peer2->ibp_accepting++;
 		kiblnd_peer_addref(peer2);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 80/80] staging: lustre: lnet: Stop Infinite CON RACE Condition
@ 2016-08-16 20:19   ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2016-08-16 20:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Doug Oucharek, James Simmons

From: Doug Oucharek <doug.s.oucharek@intel.com>

In current code, when a CON RACE occurs, the passive side will
let the node with the higher NID value win the race.

We have a field case where a node can have a "stuck"
connection which never goes away and is the trigger of a
never-ending loop of re-connections.

This patch introduces a counter to how many times a
connection in a connecting state has been the cause of a CON RACE
rejection. After 20 times (constant MAX_CONN_RACES_BEFORE_ABORT),
we assume the connection is stuck and let the other side (with
lower NID) win.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7646
Reviewed-on: http://review.whamcloud.com/19430
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    2 +
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   24 ++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index 078a0c3..fbc4f68 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -582,6 +582,8 @@ struct kib_peer {
 	unsigned short		ibp_connecting;
 	/* reconnect this peer later */
 	unsigned short		ibp_reconnecting:1;
+	/* counter of how many times we triggered a conn race */
+	unsigned char		ibp_races;
 	/* # consecutive reconnection attempts to this peer */
 	unsigned int		ibp_reconnected;
 	/* errno on closing this peer */
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 6d1b14a..430ff85 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -36,6 +36,8 @@
 
 #include "o2iblnd.h"
 
+#define MAX_CONN_RACES_BEFORE_ABORT 20
+
 static void kiblnd_peer_alive(struct kib_peer *peer);
 static void kiblnd_peer_connect_failed(struct kib_peer *peer, int active, int error);
 static void kiblnd_init_tx_msg(lnet_ni_t *ni, struct kib_tx *tx,
@@ -2405,23 +2407,37 @@ kiblnd_passive_connect(struct rdma_cm_id *cmid, void *priv, int priv_nob)
 			goto failed;
 		}
 
-		/* tie-break connection race in favour of the higher NID */
+		/*
+		 * Tie-break connection race in favour of the higher NID.
+		 * If we keep running into a race condition multiple times,
+		 * we have to assume that the connection attempt with the
+		 * higher NID is stuck in a connecting state and will never
+		 * recover.  As such, we pass through this if-block and let
+		 * the lower NID connection win so we can move forward.
+		 */
 		if (peer2->ibp_connecting &&
-		    nid < ni->ni_nid) {
+		    nid < ni->ni_nid && peer2->ibp_races <
+		    MAX_CONN_RACES_BEFORE_ABORT) {
+			peer2->ibp_races++;
 			write_unlock_irqrestore(g_lock, flags);
 
-			CWARN("Conn race %s\n", libcfs_nid2str(peer2->ibp_nid));
+			CDEBUG(D_NET, "Conn race %s\n",
+			       libcfs_nid2str(peer2->ibp_nid));
 
 			kiblnd_peer_decref(peer);
 			rej.ibr_why = IBLND_REJECT_CONN_RACE;
 			goto failed;
 		}
-
+		if (peer2->ibp_races >= MAX_CONN_RACES_BEFORE_ABORT)
+			CNETERR("Conn race %s: unresolved after %d attempts, letting lower NID win\n",
+				libcfs_nid2str(peer2->ibp_nid),
+				MAX_CONN_RACES_BEFORE_ABORT);
 		/**
 		 * passive connection is allowed even this peer is waiting for
 		 * reconnection.
 		 */
 		peer2->ibp_reconnecting = 0;
+		peer2->ibp_races = 0;
 		peer2->ibp_accepting++;
 		kiblnd_peer_addref(peer2);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 188+ messages in thread

* Re: [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting
  2016-08-16 20:19   ` [lustre-devel] " James Simmons
@ 2016-10-16 15:14     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 188+ messages in thread
From: Greg Kroah-Hartman @ 2016-10-16 15:14 UTC (permalink / raw)
  To: James Simmons
  Cc: devel, Andreas Dilger, Oleg Drokin, Linux Kernel Mailing List,
	Jinshan Xiong, Lustre Development List

Digging up an old email...

On Tue, Aug 16, 2016 at 04:19:10PM -0400, James Simmons wrote:
> From: Jinshan Xiong <jinshan.xiong@intel.com>
> 
> A few changes are made in this patch for unstable pages tracking:
> 
> 1. Remove kernel NFS unstable pages tracking because it killed
>    performance
> 2. Track unstable pages as part of LRU cache. Otherwise Lustre
>    can use much more memory than max_cached_mb
> 3. Remove obd_unstable_pages tracking to avoid using global
>    atomic counter
> 4. Make unstable pages track optional. Tracking unstable pages is
>    turned off by default, and can be controlled by
>    llite.*.unstable_stats.
> 
> Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
> Reviewed-on: http://review.whamcloud.com/10003
> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  drivers/staging/lustre/lustre/include/cl_object.h  |   35 +++-
>  .../staging/lustre/lustre/include/obd_support.h    |    1 -
>  drivers/staging/lustre/lustre/llite/lproc_llite.c  |   41 ++++-
>  drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 -
>  drivers/staging/lustre/lustre/osc/osc_cache.c      |   96 +---------
>  drivers/staging/lustre/lustre/osc/osc_internal.h   |    2 +-
>  drivers/staging/lustre/lustre/osc/osc_page.c       |  208 +++++++++++++++++---
>  drivers/staging/lustre/lustre/osc/osc_request.c    |   13 +-
>  8 files changed, 253 insertions(+), 145 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
> index d269b32..ec6cf7c 100644
> --- a/drivers/staging/lustre/lustre/include/cl_object.h
> +++ b/drivers/staging/lustre/lustre/include/cl_object.h
> @@ -1039,23 +1039,32 @@ do {									  \
>  	}								     \
>  } while (0)
>  
> -static inline int __page_in_use(const struct cl_page *page, int refc)
> -{
> -	if (page->cp_type == CPT_CACHEABLE)
> -		++refc;
> -	LASSERT(atomic_read(&page->cp_ref) > 0);
> -	return (atomic_read(&page->cp_ref) > refc);
> -}
> -
> -#define cl_page_in_use(pg)       __page_in_use(pg, 1)
> -#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
> -
>  static inline struct page *cl_page_vmpage(struct cl_page *page)
>  {
>  	LASSERT(page->cp_vmpage);
>  	return page->cp_vmpage;
>  }
>  
> +/**
> + * Check if a cl_page is in use.
> + *
> + * Client cache holds a refcount, this refcount will be dropped when
> + * the page is taken out of cache, see vvp_page_delete().
> + */
> +static inline bool __page_in_use(const struct cl_page *page, int refc)
> +{
> +	return (atomic_read(&page->cp_ref) > refc + 1);
> +}
> +
> +/**
> + * Caller itself holds a refcount of cl_page.
> + */
> +#define cl_page_in_use(pg)	 __page_in_use(pg, 1)
> +/**
> + * Caller doesn't hold a refcount.
> + */
> +#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
> +
>  /** @} cl_page */
>  
>  /** \addtogroup cl_lock cl_lock
> @@ -2331,6 +2340,10 @@ struct cl_client_cache {
>  	 */
>  	spinlock_t		ccc_lru_lock;
>  	/**
> +	 * Set if unstable check is enabled
> +	 */
> +	unsigned int		ccc_unstable_check:1;
> +	/**
>  	 * # of unstable pages for this mount point
>  	 */
>  	atomic_t		ccc_unstable_nr;
> diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
> index 26fdff6..a11fff1 100644
> --- a/drivers/staging/lustre/lustre/include/obd_support.h
> +++ b/drivers/staging/lustre/lustre/include/obd_support.h
> @@ -54,7 +54,6 @@ extern int at_early_margin;
>  extern int at_extra;
>  extern unsigned int obd_sync_filter;
>  extern unsigned int obd_max_dirty_pages;
> -extern atomic_t obd_unstable_pages;
>  extern atomic_t obd_dirty_pages;
>  extern atomic_t obd_dirty_transit_pages;
>  extern char obd_jobid_var[];
> diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
> index 2f1f389..5f8e78d 100644
> --- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
> +++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
> @@ -828,10 +828,45 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
>  	pages = atomic_read(&cache->ccc_unstable_nr);
>  	mb = (pages * PAGE_SIZE) >> 20;
>  
> -	return sprintf(buf, "unstable_pages: %8d\n"
> -			    "unstable_mb:    %8d\n", pages, mb);
> +	return sprintf(buf, "unstable_check: %8d\n"
> +			    "unstable_pages: %8d\n"
> +			    "unstable_mb:    %8d\n",
> +			    cache->ccc_unstable_check, pages, mb);
>  }
> -LUSTRE_RO_ATTR(unstable_stats);
> +
> +static ssize_t unstable_stats_store(struct kobject *kobj,
> +				    struct attribute *attr,
> +				    const char *buffer,
> +				    size_t count)
> +{
> +	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
> +					      ll_kobj);
> +	char kernbuf[128];
> +	int val, rc;
> +
> +	if (!count)
> +		return 0;
> +	if (count < 0 || count >= sizeof(kernbuf))
> +		return -EINVAL;
> +
> +	if (copy_from_user(kernbuf, buffer, count))
> +		return -EFAULT;

It was just pointed out to me that this code has obviously never been
tested at all.

Sorry for missing this before, do you want me to revert this?  Or will
you send me a fix?

And I think I'm going to have to be stricter now, this patch did way too
much all at once.  When you have to list the different things a patch
does, that means it needs to be broken up.  This wasn't a simple one to
review, as is obvious with everyone who missed this basic issue (myself
include.)

Also, go fix your test harness, it's not being used, or is totally buggy :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting
@ 2016-10-16 15:14     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 188+ messages in thread
From: Greg Kroah-Hartman @ 2016-10-16 15:14 UTC (permalink / raw)
  To: James Simmons
  Cc: devel, Andreas Dilger, Oleg Drokin, Linux Kernel Mailing List,
	Jinshan Xiong, Lustre Development List

Digging up an old email...

On Tue, Aug 16, 2016 at 04:19:10PM -0400, James Simmons wrote:
> From: Jinshan Xiong <jinshan.xiong@intel.com>
> 
> A few changes are made in this patch for unstable pages tracking:
> 
> 1. Remove kernel NFS unstable pages tracking because it killed
>    performance
> 2. Track unstable pages as part of LRU cache. Otherwise Lustre
>    can use much more memory than max_cached_mb
> 3. Remove obd_unstable_pages tracking to avoid using global
>    atomic counter
> 4. Make unstable pages track optional. Tracking unstable pages is
>    turned off by default, and can be controlled by
>    llite.*.unstable_stats.
> 
> Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
> Reviewed-on: http://review.whamcloud.com/10003
> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  drivers/staging/lustre/lustre/include/cl_object.h  |   35 +++-
>  .../staging/lustre/lustre/include/obd_support.h    |    1 -
>  drivers/staging/lustre/lustre/llite/lproc_llite.c  |   41 ++++-
>  drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 -
>  drivers/staging/lustre/lustre/osc/osc_cache.c      |   96 +---------
>  drivers/staging/lustre/lustre/osc/osc_internal.h   |    2 +-
>  drivers/staging/lustre/lustre/osc/osc_page.c       |  208 +++++++++++++++++---
>  drivers/staging/lustre/lustre/osc/osc_request.c    |   13 +-
>  8 files changed, 253 insertions(+), 145 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
> index d269b32..ec6cf7c 100644
> --- a/drivers/staging/lustre/lustre/include/cl_object.h
> +++ b/drivers/staging/lustre/lustre/include/cl_object.h
> @@ -1039,23 +1039,32 @@ do {									  \
>  	}								     \
>  } while (0)
>  
> -static inline int __page_in_use(const struct cl_page *page, int refc)
> -{
> -	if (page->cp_type == CPT_CACHEABLE)
> -		++refc;
> -	LASSERT(atomic_read(&page->cp_ref) > 0);
> -	return (atomic_read(&page->cp_ref) > refc);
> -}
> -
> -#define cl_page_in_use(pg)       __page_in_use(pg, 1)
> -#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
> -
>  static inline struct page *cl_page_vmpage(struct cl_page *page)
>  {
>  	LASSERT(page->cp_vmpage);
>  	return page->cp_vmpage;
>  }
>  
> +/**
> + * Check if a cl_page is in use.
> + *
> + * Client cache holds a refcount, this refcount will be dropped when
> + * the page is taken out of cache, see vvp_page_delete().
> + */
> +static inline bool __page_in_use(const struct cl_page *page, int refc)
> +{
> +	return (atomic_read(&page->cp_ref) > refc + 1);
> +}
> +
> +/**
> + * Caller itself holds a refcount of cl_page.
> + */
> +#define cl_page_in_use(pg)	 __page_in_use(pg, 1)
> +/**
> + * Caller doesn't hold a refcount.
> + */
> +#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
> +
>  /** @} cl_page */
>  
>  /** \addtogroup cl_lock cl_lock
> @@ -2331,6 +2340,10 @@ struct cl_client_cache {
>  	 */
>  	spinlock_t		ccc_lru_lock;
>  	/**
> +	 * Set if unstable check is enabled
> +	 */
> +	unsigned int		ccc_unstable_check:1;
> +	/**
>  	 * # of unstable pages for this mount point
>  	 */
>  	atomic_t		ccc_unstable_nr;
> diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
> index 26fdff6..a11fff1 100644
> --- a/drivers/staging/lustre/lustre/include/obd_support.h
> +++ b/drivers/staging/lustre/lustre/include/obd_support.h
> @@ -54,7 +54,6 @@ extern int at_early_margin;
>  extern int at_extra;
>  extern unsigned int obd_sync_filter;
>  extern unsigned int obd_max_dirty_pages;
> -extern atomic_t obd_unstable_pages;
>  extern atomic_t obd_dirty_pages;
>  extern atomic_t obd_dirty_transit_pages;
>  extern char obd_jobid_var[];
> diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
> index 2f1f389..5f8e78d 100644
> --- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
> +++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
> @@ -828,10 +828,45 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
>  	pages = atomic_read(&cache->ccc_unstable_nr);
>  	mb = (pages * PAGE_SIZE) >> 20;
>  
> -	return sprintf(buf, "unstable_pages: %8d\n"
> -			    "unstable_mb:    %8d\n", pages, mb);
> +	return sprintf(buf, "unstable_check: %8d\n"
> +			    "unstable_pages: %8d\n"
> +			    "unstable_mb:    %8d\n",
> +			    cache->ccc_unstable_check, pages, mb);
>  }
> -LUSTRE_RO_ATTR(unstable_stats);
> +
> +static ssize_t unstable_stats_store(struct kobject *kobj,
> +				    struct attribute *attr,
> +				    const char *buffer,
> +				    size_t count)
> +{
> +	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
> +					      ll_kobj);
> +	char kernbuf[128];
> +	int val, rc;
> +
> +	if (!count)
> +		return 0;
> +	if (count < 0 || count >= sizeof(kernbuf))
> +		return -EINVAL;
> +
> +	if (copy_from_user(kernbuf, buffer, count))
> +		return -EFAULT;

It was just pointed out to me that this code has obviously never been
tested at all.

Sorry for missing this before, do you want me to revert this?  Or will
you send me a fix?

And I think I'm going to have to be stricter now, this patch did way too
much all at once.  When you have to list the different things a patch
does, that means it needs to be broken up.  This wasn't a simple one to
review, as is obvious with everyone who missed this basic issue (myself
include.)

Also, go fix your test harness, it's not being used, or is totally buggy :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting
  2016-10-16 15:14     ` [lustre-devel] " Greg Kroah-Hartman
@ 2016-10-16 17:16       ` Oleg Drokin
  -1 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2016-10-16 17:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: James Simmons, devel, Andreas Dilger, Linux Kernel Mailing List,
	Jinshan Xiong, Lustre Development List


On Oct 16, 2016, at 11:14 AM, Greg Kroah-Hartman wrote:

> Digging up an old email...
> 
> On Tue, Aug 16, 2016 at 04:19:10PM -0400, James Simmons wrote:
>> From: Jinshan Xiong <jinshan.xiong@intel.com>
>> 
>> A few changes are made in this patch for unstable pages tracking:
>> 
>> 1. Remove kernel NFS unstable pages tracking because it killed
>>   performance
>> 2. Track unstable pages as part of LRU cache. Otherwise Lustre
>>   can use much more memory than max_cached_mb
>> 3. Remove obd_unstable_pages tracking to avoid using global
>>   atomic counter
>> 4. Make unstable pages track optional. Tracking unstable pages is
>>   turned off by default, and can be controlled by
>>   llite.*.unstable_stats.
>> 
>> Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
>> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
>> Reviewed-on: http://review.whamcloud.com/10003
>> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
>> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
>> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>> ---
>> drivers/staging/lustre/lustre/include/cl_object.h  |   35 +++-
>> .../staging/lustre/lustre/include/obd_support.h    |    1 -
>> drivers/staging/lustre/lustre/llite/lproc_llite.c  |   41 ++++-
>> drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 -
>> drivers/staging/lustre/lustre/osc/osc_cache.c      |   96 +---------
>> drivers/staging/lustre/lustre/osc/osc_internal.h   |    2 +-
>> drivers/staging/lustre/lustre/osc/osc_page.c       |  208 +++++++++++++++++---
>> drivers/staging/lustre/lustre/osc/osc_request.c    |   13 +-
>> 8 files changed, 253 insertions(+), 145 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
>> index d269b32..ec6cf7c 100644
>> --- a/drivers/staging/lustre/lustre/include/cl_object.h
>> +++ b/drivers/staging/lustre/lustre/include/cl_object.h
>> @@ -1039,23 +1039,32 @@ do {									  \
>> 	}								     \
>> } while (0)
>> 
>> -static inline int __page_in_use(const struct cl_page *page, int refc)
>> -{
>> -	if (page->cp_type == CPT_CACHEABLE)
>> -		++refc;
>> -	LASSERT(atomic_read(&page->cp_ref) > 0);
>> -	return (atomic_read(&page->cp_ref) > refc);
>> -}
>> -
>> -#define cl_page_in_use(pg)       __page_in_use(pg, 1)
>> -#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
>> -
>> static inline struct page *cl_page_vmpage(struct cl_page *page)
>> {
>> 	LASSERT(page->cp_vmpage);
>> 	return page->cp_vmpage;
>> }
>> 
>> +/**
>> + * Check if a cl_page is in use.
>> + *
>> + * Client cache holds a refcount, this refcount will be dropped when
>> + * the page is taken out of cache, see vvp_page_delete().
>> + */
>> +static inline bool __page_in_use(const struct cl_page *page, int refc)
>> +{
>> +	return (atomic_read(&page->cp_ref) > refc + 1);
>> +}
>> +
>> +/**
>> + * Caller itself holds a refcount of cl_page.
>> + */
>> +#define cl_page_in_use(pg)	 __page_in_use(pg, 1)
>> +/**
>> + * Caller doesn't hold a refcount.
>> + */
>> +#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
>> +
>> /** @} cl_page */
>> 
>> /** \addtogroup cl_lock cl_lock
>> @@ -2331,6 +2340,10 @@ struct cl_client_cache {
>> 	 */
>> 	spinlock_t		ccc_lru_lock;
>> 	/**
>> +	 * Set if unstable check is enabled
>> +	 */
>> +	unsigned int		ccc_unstable_check:1;
>> +	/**
>> 	 * # of unstable pages for this mount point
>> 	 */
>> 	atomic_t		ccc_unstable_nr;
>> diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
>> index 26fdff6..a11fff1 100644
>> --- a/drivers/staging/lustre/lustre/include/obd_support.h
>> +++ b/drivers/staging/lustre/lustre/include/obd_support.h
>> @@ -54,7 +54,6 @@ extern int at_early_margin;
>> extern int at_extra;
>> extern unsigned int obd_sync_filter;
>> extern unsigned int obd_max_dirty_pages;
>> -extern atomic_t obd_unstable_pages;
>> extern atomic_t obd_dirty_pages;
>> extern atomic_t obd_dirty_transit_pages;
>> extern char obd_jobid_var[];
>> diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
>> index 2f1f389..5f8e78d 100644
>> --- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
>> +++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
>> @@ -828,10 +828,45 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
>> 	pages = atomic_read(&cache->ccc_unstable_nr);
>> 	mb = (pages * PAGE_SIZE) >> 20;
>> 
>> -	return sprintf(buf, "unstable_pages: %8d\n"
>> -			    "unstable_mb:    %8d\n", pages, mb);
>> +	return sprintf(buf, "unstable_check: %8d\n"
>> +			    "unstable_pages: %8d\n"
>> +			    "unstable_mb:    %8d\n",
>> +			    cache->ccc_unstable_check, pages, mb);
>> }
>> -LUSTRE_RO_ATTR(unstable_stats);
>> +
>> +static ssize_t unstable_stats_store(struct kobject *kobj,
>> +				    struct attribute *attr,
>> +				    const char *buffer,
>> +				    size_t count)
>> +{
>> +	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
>> +					      ll_kobj);
>> +	char kernbuf[128];
>> +	int val, rc;
>> +
>> +	if (!count)
>> +		return 0;
>> +	if (count < 0 || count >= sizeof(kernbuf))
>> +		return -EINVAL;
>> +
>> +	if (copy_from_user(kernbuf, buffer, count))
>> +		return -EFAULT;
> 
> It was just pointed out to me that this code has obviously never been
> tested at all.

Whoops.

> Sorry for missing this before, do you want me to revert this?  Or will
> you send me a fix?

I'll have a fix for you in a moment.

> Also, go fix your test harness, it's not being used, or is totally buggy :)

Well, at least there was no crash, right? ;)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting
@ 2016-10-16 17:16       ` Oleg Drokin
  0 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2016-10-16 17:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: James Simmons, devel, Andreas Dilger, Linux Kernel Mailing List,
	Jinshan Xiong, Lustre Development List


On Oct 16, 2016, at 11:14 AM, Greg Kroah-Hartman wrote:

> Digging up an old email...
> 
> On Tue, Aug 16, 2016 at 04:19:10PM -0400, James Simmons wrote:
>> From: Jinshan Xiong <jinshan.xiong@intel.com>
>> 
>> A few changes are made in this patch for unstable pages tracking:
>> 
>> 1. Remove kernel NFS unstable pages tracking because it killed
>>   performance
>> 2. Track unstable pages as part of LRU cache. Otherwise Lustre
>>   can use much more memory than max_cached_mb
>> 3. Remove obd_unstable_pages tracking to avoid using global
>>   atomic counter
>> 4. Make unstable pages track optional. Tracking unstable pages is
>>   turned off by default, and can be controlled by
>>   llite.*.unstable_stats.
>> 
>> Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
>> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4841
>> Reviewed-on: http://review.whamcloud.com/10003
>> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
>> Reviewed-by: Lai Siyao <lai.siyao@intel.com>
>> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
>> Signed-off-by: James Simmons <jsimmons@infradead.org>
>> ---
>> drivers/staging/lustre/lustre/include/cl_object.h  |   35 +++-
>> .../staging/lustre/lustre/include/obd_support.h    |    1 -
>> drivers/staging/lustre/lustre/llite/lproc_llite.c  |   41 ++++-
>> drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 -
>> drivers/staging/lustre/lustre/osc/osc_cache.c      |   96 +---------
>> drivers/staging/lustre/lustre/osc/osc_internal.h   |    2 +-
>> drivers/staging/lustre/lustre/osc/osc_page.c       |  208 +++++++++++++++++---
>> drivers/staging/lustre/lustre/osc/osc_request.c    |   13 +-
>> 8 files changed, 253 insertions(+), 145 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
>> index d269b32..ec6cf7c 100644
>> --- a/drivers/staging/lustre/lustre/include/cl_object.h
>> +++ b/drivers/staging/lustre/lustre/include/cl_object.h
>> @@ -1039,23 +1039,32 @@ do {									  \
>> 	}								     \
>> } while (0)
>> 
>> -static inline int __page_in_use(const struct cl_page *page, int refc)
>> -{
>> -	if (page->cp_type == CPT_CACHEABLE)
>> -		++refc;
>> -	LASSERT(atomic_read(&page->cp_ref) > 0);
>> -	return (atomic_read(&page->cp_ref) > refc);
>> -}
>> -
>> -#define cl_page_in_use(pg)       __page_in_use(pg, 1)
>> -#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
>> -
>> static inline struct page *cl_page_vmpage(struct cl_page *page)
>> {
>> 	LASSERT(page->cp_vmpage);
>> 	return page->cp_vmpage;
>> }
>> 
>> +/**
>> + * Check if a cl_page is in use.
>> + *
>> + * Client cache holds a refcount, this refcount will be dropped when
>> + * the page is taken out of cache, see vvp_page_delete().
>> + */
>> +static inline bool __page_in_use(const struct cl_page *page, int refc)
>> +{
>> +	return (atomic_read(&page->cp_ref) > refc + 1);
>> +}
>> +
>> +/**
>> + * Caller itself holds a refcount of cl_page.
>> + */
>> +#define cl_page_in_use(pg)	 __page_in_use(pg, 1)
>> +/**
>> + * Caller doesn't hold a refcount.
>> + */
>> +#define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
>> +
>> /** @} cl_page */
>> 
>> /** \addtogroup cl_lock cl_lock
>> @@ -2331,6 +2340,10 @@ struct cl_client_cache {
>> 	 */
>> 	spinlock_t		ccc_lru_lock;
>> 	/**
>> +	 * Set if unstable check is enabled
>> +	 */
>> +	unsigned int		ccc_unstable_check:1;
>> +	/**
>> 	 * # of unstable pages for this mount point
>> 	 */
>> 	atomic_t		ccc_unstable_nr;
>> diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
>> index 26fdff6..a11fff1 100644
>> --- a/drivers/staging/lustre/lustre/include/obd_support.h
>> +++ b/drivers/staging/lustre/lustre/include/obd_support.h
>> @@ -54,7 +54,6 @@ extern int at_early_margin;
>> extern int at_extra;
>> extern unsigned int obd_sync_filter;
>> extern unsigned int obd_max_dirty_pages;
>> -extern atomic_t obd_unstable_pages;
>> extern atomic_t obd_dirty_pages;
>> extern atomic_t obd_dirty_transit_pages;
>> extern char obd_jobid_var[];
>> diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
>> index 2f1f389..5f8e78d 100644
>> --- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
>> +++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
>> @@ -828,10 +828,45 @@ static ssize_t unstable_stats_show(struct kobject *kobj,
>> 	pages = atomic_read(&cache->ccc_unstable_nr);
>> 	mb = (pages * PAGE_SIZE) >> 20;
>> 
>> -	return sprintf(buf, "unstable_pages: %8d\n"
>> -			    "unstable_mb:    %8d\n", pages, mb);
>> +	return sprintf(buf, "unstable_check: %8d\n"
>> +			    "unstable_pages: %8d\n"
>> +			    "unstable_mb:    %8d\n",
>> +			    cache->ccc_unstable_check, pages, mb);
>> }
>> -LUSTRE_RO_ATTR(unstable_stats);
>> +
>> +static ssize_t unstable_stats_store(struct kobject *kobj,
>> +				    struct attribute *attr,
>> +				    const char *buffer,
>> +				    size_t count)
>> +{
>> +	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
>> +					      ll_kobj);
>> +	char kernbuf[128];
>> +	int val, rc;
>> +
>> +	if (!count)
>> +		return 0;
>> +	if (count < 0 || count >= sizeof(kernbuf))
>> +		return -EINVAL;
>> +
>> +	if (copy_from_user(kernbuf, buffer, count))
>> +		return -EFAULT;
> 
> It was just pointed out to me that this code has obviously never been
> tested at all.

Whoops.

> Sorry for missing this before, do you want me to revert this?  Or will
> you send me a fix?

I'll have a fix for you in a moment.

> Also, go fix your test harness, it's not being used, or is totally buggy :)

Well, at least there was no crash, right? ;)

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2016-08-16 20:18   ` [lustre-devel] " James Simmons
@ 2018-02-09  1:39     ` NeilBrown
  -1 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-09  1:39 UTC (permalink / raw)
  To: James Simmons, Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: wang di, Linux Kernel Mailing List, Lustre Development List


[-- Attachment #1.1: Type: text/plain, Size: 2813 bytes --]

On Tue, Aug 16 2016, James Simmons wrote:

>  
> +static inline bool
> +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
> +{
> +	int idx;
> +
> +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
> +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
> +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
> +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
> +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
> +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
> +		return false;

Hi James and all,
 This patch (8f18c8a48b736c2f in linux) is different from the
 corresponding patch in lustre-release (60e07b972114df).

In that patch, the last clause in the 'if' condition is

+           strcmp(lsm1->lsm_md_pool_name,
+                     lsm2->lsm_md_pool_name) != 0)

Whoever converted it to "!strcmp()" inverted the condition.  This is a
perfect example of why I absolutely *loathe* the "!strcmp()" construct!!

This causes many tests in the 'sanity' test suite to return
-ENOMEM (that had me puzzled for a while!!).
This seems to suggest that no-one has been testing the mainline linux
lustre.
It also seems to suggest that there is a good chance that there
are other bugs that have crept in while no-one has really been caring.
Given that the sanity test suite doesn't complete for me, but just
hangs (in test_27z I think), that seems particularly likely.


So my real question - to anyone interested in lustre for mainline linux
- is: can we actually trust this code at all?
I'm seriously tempted to suggest that we just
  rm -r drivers/staging/lustre

drivers/staging is great for letting the community work on code that has
been "thrown over the wall" and is not openly developed elsewhere, but
that is not the case for lustre.  lustre has (or seems to have) an open
development process.  Having on-going development happen both there and
in drivers/staging seems a waste of resources.

Might it make sense to instead start cleaning up the code in
lustre-release so as to make it meet the upstream kernel standards.
Then when the time is right, the kernel code can be moved *out* of
lustre-release and *in* to linux.  Then development can continue in
Linux (just like it does with other Linux filesystems).

An added bonus of this is that there is an obvious path to getting
server support in mainline Linux.  The current situation of client-only
support seems weird given how interdependent the two are.

What do others think?  Is there any chance that the current lustre in
Linux will ever be more than a poor second-cousin to the external
lustre-release.  If there isn't, should we just discard it now and move
on?

Thanks,
NeilBrown


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-09  1:39     ` NeilBrown
  0 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-09  1:39 UTC (permalink / raw)
  To: James Simmons, Greg Kroah-Hartman, devel, Andreas Dilger, Oleg Drokin
  Cc: wang di, Linux Kernel Mailing List, Lustre Development List

On Tue, Aug 16 2016, James Simmons wrote:

>  
> +static inline bool
> +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
> +{
> +	int idx;
> +
> +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
> +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
> +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
> +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
> +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
> +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
> +		return false;

Hi James and all,
 This patch (8f18c8a48b736c2f in linux) is different from the
 corresponding patch in lustre-release (60e07b972114df).

In that patch, the last clause in the 'if' condition is

+           strcmp(lsm1->lsm_md_pool_name,
+                     lsm2->lsm_md_pool_name) != 0)

Whoever converted it to "!strcmp()" inverted the condition.  This is a
perfect example of why I absolutely *loathe* the "!strcmp()" construct!!

This causes many tests in the 'sanity' test suite to return
-ENOMEM (that had me puzzled for a while!!).
This seems to suggest that no-one has been testing the mainline linux
lustre.
It also seems to suggest that there is a good chance that there
are other bugs that have crept in while no-one has really been caring.
Given that the sanity test suite doesn't complete for me, but just
hangs (in test_27z I think), that seems particularly likely.


So my real question - to anyone interested in lustre for mainline linux
- is: can we actually trust this code at all?
I'm seriously tempted to suggest that we just
  rm -r drivers/staging/lustre

drivers/staging is great for letting the community work on code that has
been "thrown over the wall" and is not openly developed elsewhere, but
that is not the case for lustre.  lustre has (or seems to have) an open
development process.  Having on-going development happen both there and
in drivers/staging seems a waste of resources.

Might it make sense to instead start cleaning up the code in
lustre-release so as to make it meet the upstream kernel standards.
Then when the time is right, the kernel code can be moved *out* of
lustre-release and *in* to linux.  Then development can continue in
Linux (just like it does with other Linux filesystems).

An added bonus of this is that there is an obvious path to getting
server support in mainline Linux.  The current situation of client-only
support seems weird given how interdependent the two are.

What do others think?  Is there any chance that the current lustre in
Linux will ever be more than a poor second-cousin to the external
lustre-release.  If there isn't, should we just discard it now and move
on?

Thanks,
NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180209/c219fee6/attachment.sig>

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  1:39     ` [lustre-devel] " NeilBrown
@ 2018-02-09  2:01       ` Oleg Drokin
  -1 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-09  2:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Greg Kroah-Hartman, Linux Kernel Mailing List, wang di,
	Andreas Dilger, Lustre Development List


> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Tue, Aug 16 2016, James Simmons wrote:

my that’s an old patch

> 
>> 
>> +static inline bool
>> +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
>> +{
>> +	int idx;
>> +
>> +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
>> +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
>> +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
>> +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
>> +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
>> +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
>> +		return false;
> 
> Hi James and all,
> This patch (8f18c8a48b736c2f in linux) is different from the
> corresponding patch in lustre-release (60e07b972114df).
> 
> In that patch, the last clause in the 'if' condition is
> 
> +           strcmp(lsm1->lsm_md_pool_name,
> +                     lsm2->lsm_md_pool_name) != 0)
> 
> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
> 
> This causes many tests in the 'sanity' test suite to return
> -ENOMEM (that had me puzzled for a while!!).

huh? I am not seeing anything of the sort and I was running sanity
all the time until a recent pause (but going to resume).

> This seems to suggest that no-one has been testing the mainline linux
> lustre.
> It also seems to suggest that there is a good chance that there
> are other bugs that have crept in while no-one has really been caring.
> Given that the sanity test suite doesn't complete for me, but just
> hangs (in test_27z I think), that seems particularly likely.

Works for me, here’s a run from earlier today on 4.15.0:
== sanity test 27z: check SEQ/OID on the MDT and OST filesystems ===================================== 16:43:58 (1518126238)
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0169548 s, 61.8 MB/s
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.02782 s, 75.4 MB/s
check file /mnt/lustre/d27z.sanity/f27z.sanity-1
FID seq 0x200000401, oid 0x4640 ver 0x0
LOV seq 0x200000401, oid 0x4640, count: 1
want: stripe:0 ost:0 oid:314/0x13a seq:0
Stopping /mnt/lustre-ost1 (opts:) on centos6-17
pdsh@fedora1: centos6-17: ssh exited with exit code 1
pdsh@fedora1: centos6-17: ssh exited with exit code 1
pdsh@fedora1: centos6-17: ssh exited with exit code 1
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Failed to initialize ZFS library: 256
h2tcp: deprecated, use h2nettype instead
centos6-17.localnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super all -lnet -lnd -pinger 16
pdsh@fedora1: centos6-17: ssh exited with exit code 1
pdsh@fedora1: centos6-17: ssh exited with exit code 1
Started lustre-OST0000
/mnt/lustre-ost1/O/0/d26/314: parent=[0x200000401:0x4640:0x0] stripe=0 stripe_size=0 stripe_count=0
check file /mnt/lustre/d27z.sanity/f27z.sanity-2
FID seq 0x200000401, oid 0x4642 ver 0x0
LOV seq 0x200000401, oid 0x4642, count: 2
want: stripe:0 ost:1 oid:1187/0x4a3 seq:0
Stopping /mnt/lustre-ost2 (opts:) on centos6-17
pdsh@fedora1: centos6-17: ssh exited with exit code 1
pdsh@fedora1: centos6-17: ssh exited with exit code 1
pdsh@fedora1: centos6-17: ssh exited with exit code 1
Starting ost2:   -o loop /tmp/lustre-ost2 /mnt/lustre-ost2
Failed to initialize ZFS library: 256
h2tcp: deprecated, use h2nettype instead
centos6-17.localnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super all -lnet -lnd -pinger 16
pdsh@fedora1: centos6-17: ssh exited with exit code 1
pdsh@fedora1: centos6-17: ssh exited with exit code 1
Started lustre-OST0001
/mnt/lustre-ost2/O/0/d3/1187: parent=[0x200000401:0x4642:0x0] stripe=0 stripe_size=0 stripe_count=0
want: stripe:1 ost:0 oid:315/0x13b seq:0
got: objid=0 seq=0 parent=[0x200000401:0x4642:0x0] stripe=1
Resetting fail_loc on all nodes...done.
16:44:32 (1518126272) waiting for centos6-16 network 5 secs ...
16:44:32 (1518126272) network interface is UP
16:44:33 (1518126273) waiting for centos6-17 network 5 secs ...
16:44:33 (1518126273) network interface is UP


> So my real question - to anyone interested in lustre for mainline linux
> - is: can we actually trust this code at all?

Absolutely. Seems that you just stumbled upon a corner case that was not
being hit by people that do the testing, so you have something unique about
your setup, I guess.

> I'm seriously tempted to suggest that we just
>  rm -r drivers/staging/lustre
> 
> drivers/staging is great for letting the community work on code that has
> been "thrown over the wall" and is not openly developed elsewhere, but
> that is not the case for lustre.  lustre has (or seems to have) an open
> development process.  Having on-going development happen both there and
> in drivers/staging seems a waste of resources.

It is a bit of a waste of resources, but there are some other things here.
E.g. we cannot have any APIs with no users in the kernel.
Also some people like to have in-kernel modules coming with their distros
(there were some users that used staging client on ubuntu as their
setup).

Instead the plan was to clean up the staging client into acceptable state,
move it out of staging, bring in all the missing features and then
drop the client (more or less) from the lustre-release. 

> Might it make sense to instead start cleaning up the code in
> lustre-release so as to make it meet the upstream kernel standards.
> Then when the time is right, the kernel code can be moved *out* of
> lustre-release and *in* to linux.  Then development can continue in
> Linux (just like it does with other Linux filesystems).

While we can be cleaning lustre in lustre-release, there are some things
we cannot do as easily, e.g. decoupling Lustre client from the server.
Also it would not attract any reviews from all the janitor or
(more importantly) Al Viro and other people with a sharp eyes.

> An added bonus of this is that there is an obvious path to getting
> server support in mainline Linux.  The current situation of client-only
> support seems weird given how interdependent the two are.

Given the pushback Lustre client was given I have no hope Lustre server
will get into mainline in my lifetime.

> What do others think?  Is there any chance that the current lustre in
> Linux will ever be more than a poor second-cousin to the external
> lustre-release.  If there isn't, should we just discard it now and move
> on?


I think many useful cleanups and fixes came from the staging tree at
the very least.
The biggest problem with it all is that we are in staging tree so
we cannot bring it to parity much. And we are in staging tree because
there’s a whole bunch of “cleanups” requested that take a lot of effort
(in both implementing them and then in finding other ways of achieving
things that were done in old ways before).
I understand that beggars cannot be choosers and while there are people
that are grandfathered with their atrocities in current kernel tree,
we must adhere to the shining standards first before having our chance,
but the standards are not easy to adhere to in an established sizeable
codebase.

Realistically speaking I suspect if we drop Lustre from staging,
it’s unlikely there would remain any steam behind the cleanup efforts
at all.

Bye,
    Oleg
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-09  2:01       ` Oleg Drokin
  0 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-09  2:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Greg Kroah-Hartman, Linux Kernel Mailing List, wang di,
	Andreas Dilger, Lustre Development List


> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Tue, Aug 16 2016, James Simmons wrote:

my that?s an old patch

> 
>> 
>> +static inline bool
>> +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
>> +{
>> +	int idx;
>> +
>> +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
>> +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
>> +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
>> +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
>> +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
>> +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
>> +		return false;
> 
> Hi James and all,
> This patch (8f18c8a48b736c2f in linux) is different from the
> corresponding patch in lustre-release (60e07b972114df).
> 
> In that patch, the last clause in the 'if' condition is
> 
> +           strcmp(lsm1->lsm_md_pool_name,
> +                     lsm2->lsm_md_pool_name) != 0)
> 
> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
> 
> This causes many tests in the 'sanity' test suite to return
> -ENOMEM (that had me puzzled for a while!!).

huh? I am not seeing anything of the sort and I was running sanity
all the time until a recent pause (but going to resume).

> This seems to suggest that no-one has been testing the mainline linux
> lustre.
> It also seems to suggest that there is a good chance that there
> are other bugs that have crept in while no-one has really been caring.
> Given that the sanity test suite doesn't complete for me, but just
> hangs (in test_27z I think), that seems particularly likely.

Works for me, here?s a run from earlier today on 4.15.0:
== sanity test 27z: check SEQ/OID on the MDT and OST filesystems ===================================== 16:43:58 (1518126238)
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0169548 s, 61.8 MB/s
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.02782 s, 75.4 MB/s
check file /mnt/lustre/d27z.sanity/f27z.sanity-1
FID seq 0x200000401, oid 0x4640 ver 0x0
LOV seq 0x200000401, oid 0x4640, count: 1
want: stripe:0 ost:0 oid:314/0x13a seq:0
Stopping /mnt/lustre-ost1 (opts:) on centos6-17
pdsh at fedora1: centos6-17: ssh exited with exit code 1
pdsh at fedora1: centos6-17: ssh exited with exit code 1
pdsh at fedora1: centos6-17: ssh exited with exit code 1
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Failed to initialize ZFS library: 256
h2tcp: deprecated, use h2nettype instead
centos6-17.localnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super all -lnet -lnd -pinger 16
pdsh at fedora1: centos6-17: ssh exited with exit code 1
pdsh at fedora1: centos6-17: ssh exited with exit code 1
Started lustre-OST0000
/mnt/lustre-ost1/O/0/d26/314: parent=[0x200000401:0x4640:0x0] stripe=0 stripe_size=0 stripe_count=0
check file /mnt/lustre/d27z.sanity/f27z.sanity-2
FID seq 0x200000401, oid 0x4642 ver 0x0
LOV seq 0x200000401, oid 0x4642, count: 2
want: stripe:0 ost:1 oid:1187/0x4a3 seq:0
Stopping /mnt/lustre-ost2 (opts:) on centos6-17
pdsh at fedora1: centos6-17: ssh exited with exit code 1
pdsh at fedora1: centos6-17: ssh exited with exit code 1
pdsh at fedora1: centos6-17: ssh exited with exit code 1
Starting ost2:   -o loop /tmp/lustre-ost2 /mnt/lustre-ost2
Failed to initialize ZFS library: 256
h2tcp: deprecated, use h2nettype instead
centos6-17.localnet: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super all -lnet -lnd -pinger 16
pdsh at fedora1: centos6-17: ssh exited with exit code 1
pdsh at fedora1: centos6-17: ssh exited with exit code 1
Started lustre-OST0001
/mnt/lustre-ost2/O/0/d3/1187: parent=[0x200000401:0x4642:0x0] stripe=0 stripe_size=0 stripe_count=0
want: stripe:1 ost:0 oid:315/0x13b seq:0
got: objid=0 seq=0 parent=[0x200000401:0x4642:0x0] stripe=1
Resetting fail_loc on all nodes...done.
16:44:32 (1518126272) waiting for centos6-16 network 5 secs ...
16:44:32 (1518126272) network interface is UP
16:44:33 (1518126273) waiting for centos6-17 network 5 secs ...
16:44:33 (1518126273) network interface is UP


> So my real question - to anyone interested in lustre for mainline linux
> - is: can we actually trust this code at all?

Absolutely. Seems that you just stumbled upon a corner case that was not
being hit by people that do the testing, so you have something unique about
your setup, I guess.

> I'm seriously tempted to suggest that we just
>  rm -r drivers/staging/lustre
> 
> drivers/staging is great for letting the community work on code that has
> been "thrown over the wall" and is not openly developed elsewhere, but
> that is not the case for lustre.  lustre has (or seems to have) an open
> development process.  Having on-going development happen both there and
> in drivers/staging seems a waste of resources.

It is a bit of a waste of resources, but there are some other things here.
E.g. we cannot have any APIs with no users in the kernel.
Also some people like to have in-kernel modules coming with their distros
(there were some users that used staging client on ubuntu as their
setup).

Instead the plan was to clean up the staging client into acceptable state,
move it out of staging, bring in all the missing features and then
drop the client (more or less) from the lustre-release. 

> Might it make sense to instead start cleaning up the code in
> lustre-release so as to make it meet the upstream kernel standards.
> Then when the time is right, the kernel code can be moved *out* of
> lustre-release and *in* to linux.  Then development can continue in
> Linux (just like it does with other Linux filesystems).

While we can be cleaning lustre in lustre-release, there are some things
we cannot do as easily, e.g. decoupling Lustre client from the server.
Also it would not attract any reviews from all the janitor or
(more importantly) Al Viro and other people with a sharp eyes.

> An added bonus of this is that there is an obvious path to getting
> server support in mainline Linux.  The current situation of client-only
> support seems weird given how interdependent the two are.

Given the pushback Lustre client was given I have no hope Lustre server
will get into mainline in my lifetime.

> What do others think?  Is there any chance that the current lustre in
> Linux will ever be more than a poor second-cousin to the external
> lustre-release.  If there isn't, should we just discard it now and move
> on?


I think many useful cleanups and fixes came from the staging tree at
the very least.
The biggest problem with it all is that we are in staging tree so
we cannot bring it to parity much. And we are in staging tree because
there?s a whole bunch of ?cleanups? requested that take a lot of effort
(in both implementing them and then in finding other ways of achieving
things that were done in old ways before).
I understand that beggars cannot be choosers and while there are people
that are grandfathered with their atrocities in current kernel tree,
we must adhere to the shining standards first before having our chance,
but the standards are not easy to adhere to in an established sizeable
codebase.

Realistically speaking I suspect if we drop Lustre from staging,
it?s unlikely there would remain any steam behind the cleanup efforts
at all.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  2:01       ` [lustre-devel] " Oleg Drokin
@ 2018-02-09  3:10         ` NeilBrown
  -1 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-09  3:10 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, Linux Kernel Mailing List, wang di,
	Andreas Dilger, Lustre Development List


[-- Attachment #1.1: Type: text/plain, Size: 5735 bytes --]

On Thu, Feb 08 2018, Oleg Drokin wrote:

>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Tue, Aug 16 2016, James Simmons wrote:
>
> my that’s an old patch
>
>> 
...
>> 
>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>> 
>> This causes many tests in the 'sanity' test suite to return
>> -ENOMEM (that had me puzzled for a while!!).
>
> huh? I am not seeing anything of the sort and I was running sanity
> all the time until a recent pause (but going to resume).

That does surprised me - I reproduce it every time.
I have two VMs running a SLE12-SP2 kernel with patches from
lustre-release applied.  These are servers. They have 2 3G virtual disks
each.
I have two over VMs running current mainline.  These are clients.

I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
all :-(


>
>> This seems to suggest that no-one has been testing the mainline linux
>> lustre.
>> It also seems to suggest that there is a good chance that there
>> are other bugs that have crept in while no-one has really been caring.
>> Given that the sanity test suite doesn't complete for me, but just
>> hangs (in test_27z I think), that seems particularly likely.
>
> Works for me, here’s a run from earlier today on 4.15.0:

Well that's encouraging .. I haven't looked into this one yet - I'm not
even sure where to start.

>
>> So my real question - to anyone interested in lustre for mainline linux
>> - is: can we actually trust this code at all?
>
> Absolutely. Seems that you just stumbled upon a corner case that was not
> being hit by people that do the testing, so you have something unique about
> your setup, I guess.
>
>> I'm seriously tempted to suggest that we just
>>  rm -r drivers/staging/lustre
>> 
>> drivers/staging is great for letting the community work on code that has
>> been "thrown over the wall" and is not openly developed elsewhere, but
>> that is not the case for lustre.  lustre has (or seems to have) an open
>> development process.  Having on-going development happen both there and
>> in drivers/staging seems a waste of resources.
>
> It is a bit of a waste of resources, but there are some other things here.
> E.g. we cannot have any APIs with no users in the kernel.
> Also some people like to have in-kernel modules coming with their distros
> (there were some users that used staging client on ubuntu as their
> setup).
>
> Instead the plan was to clean up the staging client into acceptable state,
> move it out of staging, bring in all the missing features and then
> drop the client (more or less) from the lustre-release.

That sounds like a great plan.  Any idea why it didn't happen?
It seems there is a lot of upstream work mixed in with the clean up, and
I don't think that really helps anyone.

Is it at all realistic that the client might be removed from
lustre-release?  That might be a good goal to work towards.

>
>> Might it make sense to instead start cleaning up the code in
>> lustre-release so as to make it meet the upstream kernel standards.
>> Then when the time is right, the kernel code can be moved *out* of
>> lustre-release and *in* to linux.  Then development can continue in
>> Linux (just like it does with other Linux filesystems).
>
> While we can be cleaning lustre in lustre-release, there are some things
> we cannot do as easily, e.g. decoupling Lustre client from the server.
> Also it would not attract any reviews from all the janitor or
> (more importantly) Al Viro and other people with a sharp eyes.
>
>> An added bonus of this is that there is an obvious path to getting
>> server support in mainline Linux.  The current situation of client-only
>> support seems weird given how interdependent the two are.
>
> Given the pushback Lustre client was given I have no hope Lustre server
> will get into mainline in my lifetime.

Even if it is horrible it would be nice to have it in staging... I guess
the changes required to ext4 prohibit that... I don't suppose it can be
made to work with mainline ext4 in a reduced-functionality-and-performance
way??

I think it would be a lot easier to motivate forward progress if there
were a credible end goal of everything being in mainline.

>
>> What do others think?  Is there any chance that the current lustre in
>> Linux will ever be more than a poor second-cousin to the external
>> lustre-release.  If there isn't, should we just discard it now and move
>> on?
>
>
> I think many useful cleanups and fixes came from the staging tree at
> the very least.
> The biggest problem with it all is that we are in staging tree so
> we cannot bring it to parity much. And we are in staging tree because
> there’s a whole bunch of “cleanups” requested that take a lot of effort
> (in both implementing them and then in finding other ways of achieving
> things that were done in old ways before).

Do you have a list of requested cleanups?  I would find that to be
useful.


> I understand that beggars cannot be choosers and while there are people
> that are grandfathered with their atrocities in current kernel tree,
> we must adhere to the shining standards first before having our chance,
> but the standards are not easy to adhere to in an established sizeable
> codebase.
>
> Realistically speaking I suspect if we drop Lustre from staging,
> it’s unlikely there would remain any steam behind the cleanup efforts
> at all.

Thanks for your thoughts,
NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-09  3:10         ` NeilBrown
  0 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-09  3:10 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, Linux Kernel Mailing List, wang di,
	Andreas Dilger, Lustre Development List

On Thu, Feb 08 2018, Oleg Drokin wrote:

>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Tue, Aug 16 2016, James Simmons wrote:
>
> my that?s an old patch
>
>> 
...
>> 
>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>> 
>> This causes many tests in the 'sanity' test suite to return
>> -ENOMEM (that had me puzzled for a while!!).
>
> huh? I am not seeing anything of the sort and I was running sanity
> all the time until a recent pause (but going to resume).

That does surprised me - I reproduce it every time.
I have two VMs running a SLE12-SP2 kernel with patches from
lustre-release applied.  These are servers. They have 2 3G virtual disks
each.
I have two over VMs running current mainline.  These are clients.

I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
all :-(


>
>> This seems to suggest that no-one has been testing the mainline linux
>> lustre.
>> It also seems to suggest that there is a good chance that there
>> are other bugs that have crept in while no-one has really been caring.
>> Given that the sanity test suite doesn't complete for me, but just
>> hangs (in test_27z I think), that seems particularly likely.
>
> Works for me, here?s a run from earlier today on 4.15.0:

Well that's encouraging .. I haven't looked into this one yet - I'm not
even sure where to start.

>
>> So my real question - to anyone interested in lustre for mainline linux
>> - is: can we actually trust this code at all?
>
> Absolutely. Seems that you just stumbled upon a corner case that was not
> being hit by people that do the testing, so you have something unique about
> your setup, I guess.
>
>> I'm seriously tempted to suggest that we just
>>  rm -r drivers/staging/lustre
>> 
>> drivers/staging is great for letting the community work on code that has
>> been "thrown over the wall" and is not openly developed elsewhere, but
>> that is not the case for lustre.  lustre has (or seems to have) an open
>> development process.  Having on-going development happen both there and
>> in drivers/staging seems a waste of resources.
>
> It is a bit of a waste of resources, but there are some other things here.
> E.g. we cannot have any APIs with no users in the kernel.
> Also some people like to have in-kernel modules coming with their distros
> (there were some users that used staging client on ubuntu as their
> setup).
>
> Instead the plan was to clean up the staging client into acceptable state,
> move it out of staging, bring in all the missing features and then
> drop the client (more or less) from the lustre-release.

That sounds like a great plan.  Any idea why it didn't happen?
It seems there is a lot of upstream work mixed in with the clean up, and
I don't think that really helps anyone.

Is it at all realistic that the client might be removed from
lustre-release?  That might be a good goal to work towards.

>
>> Might it make sense to instead start cleaning up the code in
>> lustre-release so as to make it meet the upstream kernel standards.
>> Then when the time is right, the kernel code can be moved *out* of
>> lustre-release and *in* to linux.  Then development can continue in
>> Linux (just like it does with other Linux filesystems).
>
> While we can be cleaning lustre in lustre-release, there are some things
> we cannot do as easily, e.g. decoupling Lustre client from the server.
> Also it would not attract any reviews from all the janitor or
> (more importantly) Al Viro and other people with a sharp eyes.
>
>> An added bonus of this is that there is an obvious path to getting
>> server support in mainline Linux.  The current situation of client-only
>> support seems weird given how interdependent the two are.
>
> Given the pushback Lustre client was given I have no hope Lustre server
> will get into mainline in my lifetime.

Even if it is horrible it would be nice to have it in staging... I guess
the changes required to ext4 prohibit that... I don't suppose it can be
made to work with mainline ext4 in a reduced-functionality-and-performance
way??

I think it would be a lot easier to motivate forward progress if there
were a credible end goal of everything being in mainline.

>
>> What do others think?  Is there any chance that the current lustre in
>> Linux will ever be more than a poor second-cousin to the external
>> lustre-release.  If there isn't, should we just discard it now and move
>> on?
>
>
> I think many useful cleanups and fixes came from the staging tree at
> the very least.
> The biggest problem with it all is that we are in staging tree so
> we cannot bring it to parity much. And we are in staging tree because
> there?s a whole bunch of ?cleanups? requested that take a lot of effort
> (in both implementing them and then in finding other ways of achieving
> things that were done in old ways before).

Do you have a list of requested cleanups?  I would find that to be
useful.


> I understand that beggars cannot be choosers and while there are people
> that are grandfathered with their atrocities in current kernel tree,
> we must adhere to the shining standards first before having our chance,
> but the standards are not easy to adhere to in an established sizeable
> codebase.
>
> Realistically speaking I suspect if we drop Lustre from staging,
> it?s unlikely there would remain any steam behind the cleanup efforts
> at all.

Thanks for your thoughts,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180209/05bb95ef/attachment.sig>

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  3:10         ` [lustre-devel] " NeilBrown
@ 2018-02-09  3:50           ` Oleg Drokin
  -1 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-09  3:50 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List


> On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Thu, Feb 08 2018, Oleg Drokin wrote:
> 
>>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>>> 
>>> On Tue, Aug 16 2016, James Simmons wrote:
>> 
>> my that’s an old patch
>> 
>>> 
> ...
>>> 
>>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>>> 
>>> This causes many tests in the 'sanity' test suite to return
>>> -ENOMEM (that had me puzzled for a while!!).
>> 
>> huh? I am not seeing anything of the sort and I was running sanity
>> all the time until a recent pause (but going to resume).
> 
> That does surprised me - I reproduce it every time.
> I have two VMs running a SLE12-SP2 kernel with patches from
> lustre-release applied.  These are servers. They have 2 3G virtual disks
> each.
> I have two over VMs running current mainline.  These are clients.
> 
> I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
> and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
> all :-(

More than that, but I am pretty sure James Simmons is running tests all the time too
(he has a different config, I only have tcp).

>>> This seems to suggest that no-one has been testing the mainline linux
>>> lustre.
>>> It also seems to suggest that there is a good chance that there
>>> are other bugs that have crept in while no-one has really been caring.
>>> Given that the sanity test suite doesn't complete for me, but just
>>> hangs (in test_27z I think), that seems particularly likely.
>> 
>> Works for me, here’s a run from earlier today on 4.15.0:
> 
> Well that's encouraging .. I haven't looked into this one yet - I'm not
> even sure where to start.

m… debug logs for example (greatly neutered in staging tree, but still useful)?
try lctl dk and see what’s in there.

>> Instead the plan was to clean up the staging client into acceptable state,
>> move it out of staging, bring in all the missing features and then
>> drop the client (more or less) from the lustre-release.
> 
> That sounds like a great plan.  Any idea why it didn't happen?

Because meeting open-ended demands is hard and certain demands sound like
“throw away your X and rewrite it from scratch" (e.g. everything IB-related).

Certain things that sound useless (like the debug subsystem in Lustre)
is very useful when you have a 10k nodes in a cluster and need to selectively
pull stuff from a run to debug a complicated cross-node interaction.
I asked NFS people how do they do it and they don’t have anything that scales
and usually involves reducing the problem to a much smaller set of nodes first.

> It seems there is a lot of upstream work mixed in with the clean up, and
> I don't think that really helps anyone.

I don’t understand what you mean here.

> Is it at all realistic that the client might be removed from
> lustre-release?  That might be a good goal to work towards.

Assuming we can bring the whole functionality over - sure.

Of course there’d still be some separate development place and we would
need to create patches (new features?) for like SuSE and other distros
and for testing of server features, I guess, but that could just that -
a side branch somewhere I hope.

It’s not that we are super glad to chase every kernel vendors put out,
of course it would be much easier if the kernels already included
a very functional Lustre client.

>>> Might it make sense to instead start cleaning up the code in
>>> lustre-release so as to make it meet the upstream kernel standards.
>>> Then when the time is right, the kernel code can be moved *out* of
>>> lustre-release and *in* to linux.  Then development can continue in
>>> Linux (just like it does with other Linux filesystems).
>> 
>> While we can be cleaning lustre in lustre-release, there are some things
>> we cannot do as easily, e.g. decoupling Lustre client from the server.
>> Also it would not attract any reviews from all the janitor or
>> (more importantly) Al Viro and other people with a sharp eyes.
>> 
>>> An added bonus of this is that there is an obvious path to getting
>>> server support in mainline Linux.  The current situation of client-only
>>> support seems weird given how interdependent the two are.
>> 
>> Given the pushback Lustre client was given I have no hope Lustre server
>> will get into mainline in my lifetime.
> 
> Even if it is horrible it would be nice to have it in staging... I guess
> the changes required to ext4 prohibit that... I don't suppose it can be
> made to work with mainline ext4 in a reduced-functionality-and-performance
> way??

We support unpatched ZFS as a server too! ;)
(and if somebody invests the time into it, there was some half-baked btrfs
backend too I think).
That said nobody here believes in any success of pushing Lustre server into
mainline.
It would just be easier to push the whole server into userspace (And there
was a project like this in the past, now abandoned because it was mostly
targeting Solaris anyway).

> I think it would be a lot easier to motivate forward progress if there
> were a credible end goal of everything being in mainline.
> 
>> 
>>> What do others think?  Is there any chance that the current lustre in
>>> Linux will ever be more than a poor second-cousin to the external
>>> lustre-release.  If there isn't, should we just discard it now and move
>>> on?
>> 
>> 
>> I think many useful cleanups and fixes came from the staging tree at
>> the very least.
>> The biggest problem with it all is that we are in staging tree so
>> we cannot bring it to parity much. And we are in staging tree because
>> there’s a whole bunch of “cleanups” requested that take a lot of effort
>> (in both implementing them and then in finding other ways of achieving
>> things that were done in old ways before).
> 
> Do you have a list of requested cleanups?  I would find that to be
> useful.

As Greg would tell you, “if you don’t know what needs to be done,
let’s just remove the whole thing from staging now”.

I assume you saw drivers/staging/lustre/TODO already, it’s only partially done.

We had a bunch of other requests from various people ranging from wholesale
removal of various parts to making sure there’s no checkpatch warnings
(Turned out rather hard to do, even though we greatly pared the numbers).

I have some patches to make Lustre a lot more monolithic too.
People want us to remove our indirections hell so the code is more readable
(I have some patches that need to be freshened up some that help here a bit,
but the work is huge.)

Other requests come out as some of the prior ones get completed due to
“you need o finish current level of cleanups so that we can see what other
cleanups are needed, the current code is too bad to see everything” pretty much.

Bye,
    Oleg
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-09  3:50           ` Oleg Drokin
  0 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-09  3:50 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List


> On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Thu, Feb 08 2018, Oleg Drokin wrote:
> 
>>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>>> 
>>> On Tue, Aug 16 2016, James Simmons wrote:
>> 
>> my that?s an old patch
>> 
>>> 
> ...
>>> 
>>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>>> 
>>> This causes many tests in the 'sanity' test suite to return
>>> -ENOMEM (that had me puzzled for a while!!).
>> 
>> huh? I am not seeing anything of the sort and I was running sanity
>> all the time until a recent pause (but going to resume).
> 
> That does surprised me - I reproduce it every time.
> I have two VMs running a SLE12-SP2 kernel with patches from
> lustre-release applied.  These are servers. They have 2 3G virtual disks
> each.
> I have two over VMs running current mainline.  These are clients.
> 
> I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
> and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
> all :-(

More than that, but I am pretty sure James Simmons is running tests all the time too
(he has a different config, I only have tcp).

>>> This seems to suggest that no-one has been testing the mainline linux
>>> lustre.
>>> It also seems to suggest that there is a good chance that there
>>> are other bugs that have crept in while no-one has really been caring.
>>> Given that the sanity test suite doesn't complete for me, but just
>>> hangs (in test_27z I think), that seems particularly likely.
>> 
>> Works for me, here?s a run from earlier today on 4.15.0:
> 
> Well that's encouraging .. I haven't looked into this one yet - I'm not
> even sure where to start.

m? debug logs for example (greatly neutered in staging tree, but still useful)?
try lctl dk and see what?s in there.

>> Instead the plan was to clean up the staging client into acceptable state,
>> move it out of staging, bring in all the missing features and then
>> drop the client (more or less) from the lustre-release.
> 
> That sounds like a great plan.  Any idea why it didn't happen?

Because meeting open-ended demands is hard and certain demands sound like
?throw away your X and rewrite it from scratch" (e.g. everything IB-related).

Certain things that sound useless (like the debug subsystem in Lustre)
is very useful when you have a 10k nodes in a cluster and need to selectively
pull stuff from a run to debug a complicated cross-node interaction.
I asked NFS people how do they do it and they don?t have anything that scales
and usually involves reducing the problem to a much smaller set of nodes first.

> It seems there is a lot of upstream work mixed in with the clean up, and
> I don't think that really helps anyone.

I don?t understand what you mean here.

> Is it at all realistic that the client might be removed from
> lustre-release?  That might be a good goal to work towards.

Assuming we can bring the whole functionality over - sure.

Of course there?d still be some separate development place and we would
need to create patches (new features?) for like SuSE and other distros
and for testing of server features, I guess, but that could just that -
a side branch somewhere I hope.

It?s not that we are super glad to chase every kernel vendors put out,
of course it would be much easier if the kernels already included
a very functional Lustre client.

>>> Might it make sense to instead start cleaning up the code in
>>> lustre-release so as to make it meet the upstream kernel standards.
>>> Then when the time is right, the kernel code can be moved *out* of
>>> lustre-release and *in* to linux.  Then development can continue in
>>> Linux (just like it does with other Linux filesystems).
>> 
>> While we can be cleaning lustre in lustre-release, there are some things
>> we cannot do as easily, e.g. decoupling Lustre client from the server.
>> Also it would not attract any reviews from all the janitor or
>> (more importantly) Al Viro and other people with a sharp eyes.
>> 
>>> An added bonus of this is that there is an obvious path to getting
>>> server support in mainline Linux.  The current situation of client-only
>>> support seems weird given how interdependent the two are.
>> 
>> Given the pushback Lustre client was given I have no hope Lustre server
>> will get into mainline in my lifetime.
> 
> Even if it is horrible it would be nice to have it in staging... I guess
> the changes required to ext4 prohibit that... I don't suppose it can be
> made to work with mainline ext4 in a reduced-functionality-and-performance
> way??

We support unpatched ZFS as a server too! ;)
(and if somebody invests the time into it, there was some half-baked btrfs
backend too I think).
That said nobody here believes in any success of pushing Lustre server into
mainline.
It would just be easier to push the whole server into userspace (And there
was a project like this in the past, now abandoned because it was mostly
targeting Solaris anyway).

> I think it would be a lot easier to motivate forward progress if there
> were a credible end goal of everything being in mainline.
> 
>> 
>>> What do others think?  Is there any chance that the current lustre in
>>> Linux will ever be more than a poor second-cousin to the external
>>> lustre-release.  If there isn't, should we just discard it now and move
>>> on?
>> 
>> 
>> I think many useful cleanups and fixes came from the staging tree at
>> the very least.
>> The biggest problem with it all is that we are in staging tree so
>> we cannot bring it to parity much. And we are in staging tree because
>> there?s a whole bunch of ?cleanups? requested that take a lot of effort
>> (in both implementing them and then in finding other ways of achieving
>> things that were done in old ways before).
> 
> Do you have a list of requested cleanups?  I would find that to be
> useful.

As Greg would tell you, ?if you don?t know what needs to be done,
let?s just remove the whole thing from staging now?.

I assume you saw drivers/staging/lustre/TODO already, it?s only partially done.

We had a bunch of other requests from various people ranging from wholesale
removal of various parts to making sure there?s no checkpatch warnings
(Turned out rather hard to do, even though we greatly pared the numbers).

I have some patches to make Lustre a lot more monolithic too.
People want us to remove our indirections hell so the code is more readable
(I have some patches that need to be freshened up some that help here a bit,
but the work is huge.)

Other requests come out as some of the prior ones get completed due to
?you need o finish current level of cleanups so that we can see what other
cleanups are needed, the current code is too bad to see everything? pretty much.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  3:50           ` Oleg Drokin
@ 2018-02-10 20:57             ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2018-02-10 20:57 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, NeilBrown, Linux Kernel Mailing List,
	wang di, Lustre Development List

[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]


> > On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
> > 
> > On Thu, Feb 08 2018, Oleg Drokin wrote:
> > 
> >>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
> >>> 
> >>> On Tue, Aug 16 2016, James Simmons wrote:
> >> 
> >> my that’s an old patch
> >> 
> >>> 
> > ...
> >>> 
> >>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> >>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
> >>> 
> >>> This causes many tests in the 'sanity' test suite to return
> >>> -ENOMEM (that had me puzzled for a while!!).
> >> 
> >> huh? I am not seeing anything of the sort and I was running sanity
> >> all the time until a recent pause (but going to resume).
> > 
> > That does surprised me - I reproduce it every time.
> > I have two VMs running a SLE12-SP2 kernel with patches from
> > lustre-release applied.  These are servers. They have 2 3G virtual disks
> > each.
> > I have two over VMs running current mainline.  These are clients.
> > 
> > I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
> > and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
> > all :-(
> 
> More than that, but I am pretty sure James Simmons is running tests all the time too
> (he has a different config, I only have tcp).

Yes I have been testing and haven't encountered this problem. Let me try 
the fix you pointed out. 
 
> > Do you have a list of requested cleanups?  I would find that to be
> > useful.
> 
> As Greg would tell you, “if you don’t know what needs to be done,
> let’s just remove the whole thing from staging now”.
> 
> I assume you saw drivers/staging/lustre/TODO already, it’s only partially done.

Actually the complete list is at :

https://jira.hpdd.intel.com/browse/LU-9679

I need to move that to our TODO list. Sorry I have been short on cycles.

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-10 20:57             ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2018-02-10 20:57 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, NeilBrown, Linux Kernel Mailing List,
	wang di, Lustre Development List


> > On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
> > 
> > On Thu, Feb 08 2018, Oleg Drokin wrote:
> > 
> >>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
> >>> 
> >>> On Tue, Aug 16 2016, James Simmons wrote:
> >> 
> >> my that?s an old patch
> >> 
> >>> 
> > ...
> >>> 
> >>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> >>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
> >>> 
> >>> This causes many tests in the 'sanity' test suite to return
> >>> -ENOMEM (that had me puzzled for a while!!).
> >> 
> >> huh? I am not seeing anything of the sort and I was running sanity
> >> all the time until a recent pause (but going to resume).
> > 
> > That does surprised me - I reproduce it every time.
> > I have two VMs running a SLE12-SP2 kernel with patches from
> > lustre-release applied.  These are servers. They have 2 3G virtual disks
> > each.
> > I have two over VMs running current mainline.  These are clients.
> > 
> > I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
> > and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
> > all :-(
> 
> More than that, but I am pretty sure James Simmons is running tests all the time too
> (he has a different config, I only have tcp).

Yes I have been testing and haven't encountered this problem. Let me try 
the fix you pointed out. 
 
> > Do you have a list of requested cleanups?  I would find that to be
> > useful.
> 
> As Greg would tell you, ?if you don?t know what needs to be done,
> let?s just remove the whole thing from staging now?.
> 
> I assume you saw drivers/staging/lustre/TODO already, it?s only partially done.

Actually the complete list is at :

https://jira.hpdd.intel.com/browse/LU-9679

I need to move that to our TODO list. Sorry I have been short on cycles.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  1:39     ` [lustre-devel] " NeilBrown
@ 2018-02-10 22:14       ` James Simmons
  -1 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2018-02-10 22:14 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Andreas Dilger, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Oleg Drokin, wang di,
	Lustre Development List


> > +static inline bool
> > +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
> > +{
> > +	int idx;
> > +
> > +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
> > +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
> > +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
> > +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
> > +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
> > +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
> > +		return false;
> 
> Hi James and all,
>  This patch (8f18c8a48b736c2f in linux) is different from the
>  corresponding patch in lustre-release (60e07b972114df).
> 
> In that patch, the last clause in the 'if' condition is
> 
> +           strcmp(lsm1->lsm_md_pool_name,
> +                     lsm2->lsm_md_pool_name) != 0)
> 
> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
> 
> This causes many tests in the 'sanity' test suite to return
> -ENOMEM (that had me puzzled for a while!!).
> This seems to suggest that no-one has been testing the mainline linux
> lustre.
> It also seems to suggest that there is a good chance that there
> are other bugs that have crept in while no-one has really been caring.
> Given that the sanity test suite doesn't complete for me, but just
> hangs (in test_27z I think), that seems particularly likely.
> 
> 
> So my real question - to anyone interested in lustre for mainline linux
> - is: can we actually trust this code at all?
> I'm seriously tempted to suggest that we just
>   rm -r drivers/staging/lustre
> 
> drivers/staging is great for letting the community work on code that has
> been "thrown over the wall" and is not openly developed elsewhere, but
> that is not the case for lustre.  lustre has (or seems to have) an open
> development process.  Having on-going development happen both there and
> in drivers/staging seems a waste of resources.
> 
> Might it make sense to instead start cleaning up the code in
> lustre-release so as to make it meet the upstream kernel standards.
> Then when the time is right, the kernel code can be moved *out* of
> lustre-release and *in* to linux.  Then development can continue in
> Linux (just like it does with other Linux filesystems).
> 
> An added bonus of this is that there is an obvious path to getting
> server support in mainline Linux.  The current situation of client-only
> support seems weird given how interdependent the two are.
> 
> What do others think?  Is there any chance that the current lustre in
> Linux will ever be more than a poor second-cousin to the external
> lustre-release.  If there isn't, should we just discard it now and move
> on?

If you think that the OpenSFS/Intel branch (lustre-release) is the land
of milk and honey you are very wrong. Take for example the UAPI header
cleanup I push to the linux client several months ago. That work took
5 years to complete. I had to complete that work in the Intel branch
since it impacted our tools. This isn't the only example. I worked along
side Intel for increasing striping of a file to more then the 160 stripe
limit Lustre use to have. That work took 3 years to complete. If the
patch is more than one line it will normally take 1 to 2 months to land.
It is common to have patches 6 months or more in age.

This is one of the major reasons I'm involved in the upstream client
work. If lustre remains a tiny under manned community it is doomed to
remain a niche file system. For years I have tried to recruit new
developers to help out and even gave talks at lustre conferences on
internals. That effort was meet with little success. This is not the
case with the linux lustre client. We do have people contributing
including you. So the reality is that if we removed the lustre client
it would be at least 3+ years before the code would be ready to merged
back in. It would be another 3+ years before it left staging. Many
cleanups in the linux client which impact many lines of code have not
been ported to the Intel branch. It would take forever to get those in.
Honestly I gave up some time ago for those types of cleanups. The cleanups
done in the upstream client would have to be redone. What we really
need is to expand the community. Recently a lot of work has gone into
supporting Ubuntu for our utilities. I hope this helps to get Canonical
involved with the upstream lustre client.

The upstream client is not as bad as you think. A year ago no one in
their right mind would touch the upstream client but their are actually
sites using it today. Its not perfect but it is usable and it is improving
all the time. Yes we have quite a few bugs to squash that show up in
our test suite but the barrier to leaving staging is much much smaller
than it used to be. Once the number of bugs reported in test suite
becomes reasonable we can start auto testing patches posted here. The
ultimate goal is that as more people join in the linux client effort
and it becomes a full member of the broader linux open source community
that we can leave the Intel lustre-release branch in the dust. I believe
the future is much closer than you think.
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-10 22:14       ` James Simmons
  0 siblings, 0 replies; 188+ messages in thread
From: James Simmons @ 2018-02-10 22:14 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Andreas Dilger, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Oleg Drokin, wang di,
	Lustre Development List


> > +static inline bool
> > +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
> > +{
> > +	int idx;
> > +
> > +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
> > +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
> > +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
> > +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
> > +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
> > +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
> > +		return false;
> 
> Hi James and all,
>  This patch (8f18c8a48b736c2f in linux) is different from the
>  corresponding patch in lustre-release (60e07b972114df).
> 
> In that patch, the last clause in the 'if' condition is
> 
> +           strcmp(lsm1->lsm_md_pool_name,
> +                     lsm2->lsm_md_pool_name) != 0)
> 
> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
> 
> This causes many tests in the 'sanity' test suite to return
> -ENOMEM (that had me puzzled for a while!!).
> This seems to suggest that no-one has been testing the mainline linux
> lustre.
> It also seems to suggest that there is a good chance that there
> are other bugs that have crept in while no-one has really been caring.
> Given that the sanity test suite doesn't complete for me, but just
> hangs (in test_27z I think), that seems particularly likely.
> 
> 
> So my real question - to anyone interested in lustre for mainline linux
> - is: can we actually trust this code at all?
> I'm seriously tempted to suggest that we just
>   rm -r drivers/staging/lustre
> 
> drivers/staging is great for letting the community work on code that has
> been "thrown over the wall" and is not openly developed elsewhere, but
> that is not the case for lustre.  lustre has (or seems to have) an open
> development process.  Having on-going development happen both there and
> in drivers/staging seems a waste of resources.
> 
> Might it make sense to instead start cleaning up the code in
> lustre-release so as to make it meet the upstream kernel standards.
> Then when the time is right, the kernel code can be moved *out* of
> lustre-release and *in* to linux.  Then development can continue in
> Linux (just like it does with other Linux filesystems).
> 
> An added bonus of this is that there is an obvious path to getting
> server support in mainline Linux.  The current situation of client-only
> support seems weird given how interdependent the two are.
> 
> What do others think?  Is there any chance that the current lustre in
> Linux will ever be more than a poor second-cousin to the external
> lustre-release.  If there isn't, should we just discard it now and move
> on?

If you think that the OpenSFS/Intel branch (lustre-release) is the land
of milk and honey you are very wrong. Take for example the UAPI header
cleanup I push to the linux client several months ago. That work took
5 years to complete. I had to complete that work in the Intel branch
since it impacted our tools. This isn't the only example. I worked along
side Intel for increasing striping of a file to more then the 160 stripe
limit Lustre use to have. That work took 3 years to complete. If the
patch is more than one line it will normally take 1 to 2 months to land.
It is common to have patches 6 months or more in age.

This is one of the major reasons I'm involved in the upstream client
work. If lustre remains a tiny under manned community it is doomed to
remain a niche file system. For years I have tried to recruit new
developers to help out and even gave talks at lustre conferences on
internals. That effort was meet with little success. This is not the
case with the linux lustre client. We do have people contributing
including you. So the reality is that if we removed the lustre client
it would be at least 3+ years before the code would be ready to merged
back in. It would be another 3+ years before it left staging. Many
cleanups in the linux client which impact many lines of code have not
been ported to the Intel branch. It would take forever to get those in.
Honestly I gave up some time ago for those types of cleanups. The cleanups
done in the upstream client would have to be redone. What we really
need is to expand the community. Recently a lot of work has gone into
supporting Ubuntu for our utilities. I hope this helps to get Canonical
involved with the upstream lustre client.

The upstream client is not as bad as you think. A year ago no one in
their right mind would touch the upstream client but their are actually
sites using it today. Its not perfect but it is usable and it is improving
all the time. Yes we have quite a few bugs to squash that show up in
our test suite but the barrier to leaving staging is much much smaller
than it used to be. Once the number of bugs reported in test suite
becomes reasonable we can start auto testing patches posted here. The
ultimate goal is that as more people join in the linux client effort
and it becomes a full member of the broader linux open source community
that we can leave the Intel lustre-release branch in the dust. I believe
the future is much closer than you think.

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  3:50           ` Oleg Drokin
@ 2018-02-11 23:44             ` NeilBrown
  -1 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-11 23:44 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List


[-- Attachment #1.1: Type: text/plain, Size: 10620 bytes --]

On Thu, Feb 08 2018, Oleg Drokin wrote:

>> On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Thu, Feb 08 2018, Oleg Drokin wrote:
>> 
>>>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>>>> 
>>>> On Tue, Aug 16 2016, James Simmons wrote:
>>> 
>>> my that’s an old patch
>>> 
>>>> 
>> ...
>>>> 
>>>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>>>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>>>> 
>>>> This causes many tests in the 'sanity' test suite to return
>>>> -ENOMEM (that had me puzzled for a while!!).
>>> 
>>> huh? I am not seeing anything of the sort and I was running sanity
>>> all the time until a recent pause (but going to resume).
>> 
>> That does surprised me - I reproduce it every time.
>> I have two VMs running a SLE12-SP2 kernel with patches from
>> lustre-release applied.  These are servers. They have 2 3G virtual disks
>> each.
>> I have two over VMs running current mainline.  These are clients.
>> 
>> I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
>> and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
>> all :-(
>
> More than that, but I am pretty sure James Simmons is running tests all the time too
> (he has a different config, I only have tcp).
>
>>>> This seems to suggest that no-one has been testing the mainline linux
>>>> lustre.
>>>> It also seems to suggest that there is a good chance that there
>>>> are other bugs that have crept in while no-one has really been caring.
>>>> Given that the sanity test suite doesn't complete for me, but just
>>>> hangs (in test_27z I think), that seems particularly likely.
>>> 
>>> Works for me, here’s a run from earlier today on 4.15.0:
>> 
>> Well that's encouraging .. I haven't looked into this one yet - I'm not
>> even sure where to start.
>
> m… debug logs for example (greatly neutered in staging tree, but still useful)?
> try lctl dk and see what’s in there.

Debug logs seem to tell me that some message is being sent to a server
and a reply is being received, but that request we are waiting on
doesn't make progress.  I plan to dig in and learn more about how lustre
rpc works so I have a better changes of interpreted those debug logs.


>
>>> Instead the plan was to clean up the staging client into acceptable state,
>>> move it out of staging, bring in all the missing features and then
>>> drop the client (more or less) from the lustre-release.
>> 
>> That sounds like a great plan.  Any idea why it didn't happen?
>
> Because meeting open-ended demands is hard and certain demands sound like
> “throw away your X and rewrite it from scratch" (e.g. everything IB-related).

My narrow perspective on IB - from when rdma support was added to the
NFS server - is that it is broken by design and impossible to do
"right".  So different people could easily have different ideas on how
to make the best of a bad lot.
I might try to have a look.

>
> Certain things that sound useless (like the debug subsystem in Lustre)
> is very useful when you have a 10k nodes in a cluster and need to selectively
> pull stuff from a run to debug a complicated cross-node interaction.
> I asked NFS people how do they do it and they don’t have anything that scales
> and usually involves reducing the problem to a much smaller set of nodes first.

the "rpcdebug" stuff that Linux/nfs has is sometimes useful, but some parts
are changing to tracepoints and some parts have remained, which is a
little confusing.

The fact that lustre tracing seems to *always* log everything so that if
something goes wrong you can extract that last few meg(?) of logs seems
really useful.

I discovered - thanks to James -
 https://jira.hpdd.intel.com/browse/LU-8980
 Add tracepoint support to Lustre

which is "closed", but I cannot find any trace of tracepoints in
drivers/staging or in lustre-release.  Maybe I'm confused.
I suspect tracepoints is a good way to go.

>
>> It seems there is a lot of upstream work mixed in with the clean up, and
>> I don't think that really helps anyone.
>
> I don’t understand what you mean here.

Just that I thought that the main point of drivers/staging is to get the
code into a mergable state, and if feature addition happens at the same
time, then priorities get blurred and goals don't get reached.

>
>> Is it at all realistic that the client might be removed from
>> lustre-release?  That might be a good goal to work towards.
>
> Assuming we can bring the whole functionality over - sure.
>
> Of course there’d still be some separate development place and we would
> need to create patches (new features?) for like SuSE and other distros
> and for testing of server features, I guess, but that could just that -
> a side branch somewhere I hope.

Of course - code doesn't go upstream until it is ready.  Lots of
development happens elsewhere.
Of course distros like SUSE would generally rather ship code that was
"ready" and so like to see it upstream.  There is usually room for
negotiation.

>
> It’s not that we are super glad to chase every kernel vendors put out,
> of course it would be much easier if the kernels already included
> a very functional Lustre client.
>
>>>> Might it make sense to instead start cleaning up the code in
>>>> lustre-release so as to make it meet the upstream kernel standards.
>>>> Then when the time is right, the kernel code can be moved *out* of
>>>> lustre-release and *in* to linux.  Then development can continue in
>>>> Linux (just like it does with other Linux filesystems).
>>> 
>>> While we can be cleaning lustre in lustre-release, there are some things
>>> we cannot do as easily, e.g. decoupling Lustre client from the server.
>>> Also it would not attract any reviews from all the janitor or
>>> (more importantly) Al Viro and other people with a sharp eyes.
>>> 
>>>> An added bonus of this is that there is an obvious path to getting
>>>> server support in mainline Linux.  The current situation of client-only
>>>> support seems weird given how interdependent the two are.
>>> 
>>> Given the pushback Lustre client was given I have no hope Lustre server
>>> will get into mainline in my lifetime.
>> 
>> Even if it is horrible it would be nice to have it in staging... I guess
>> the changes required to ext4 prohibit that... I don't suppose it can be
>> made to work with mainline ext4 in a reduced-functionality-and-performance
>> way??
>
> We support unpatched ZFS as a server too! ;)

So that that mean you would expect lustre-server to work with unpatched
ext4? In that case I won't give up hope of seeing the server in mainline
in my lifetime.  Client first though.

> (and if somebody invests the time into it, there was some half-baked btrfs
> backend too I think).
> That said nobody here believes in any success of pushing Lustre server into
> mainline.
> It would just be easier to push the whole server into userspace (And there
> was a project like this in the past, now abandoned because it was mostly
> targeting Solaris anyway).
>
>> I think it would be a lot easier to motivate forward progress if there
>> were a credible end goal of everything being in mainline.
>> 
>>> 
>>>> What do others think?  Is there any chance that the current lustre in
>>>> Linux will ever be more than a poor second-cousin to the external
>>>> lustre-release.  If there isn't, should we just discard it now and move
>>>> on?
>>> 
>>> 
>>> I think many useful cleanups and fixes came from the staging tree at
>>> the very least.
>>> The biggest problem with it all is that we are in staging tree so
>>> we cannot bring it to parity much. And we are in staging tree because
>>> there’s a whole bunch of “cleanups” requested that take a lot of effort
>>> (in both implementing them and then in finding other ways of achieving
>>> things that were done in old ways before).
>> 
>> Do you have a list of requested cleanups?  I would find that to be
>> useful.
>
> As Greg would tell you, “if you don’t know what needs to be done,
> let’s just remove the whole thing from staging now”.

Of course, but I don't expect that I will see the same things that
others see.  And if people have gone to the trouble to provide feedback,
it seems polite to record that feed back for all to see.

>
> I assume you saw drivers/staging/lustre/TODO already, it’s only partially done.

Yes - it isn't very detailed though.  Maybe I'll flesh it out with some
of the things you have said.

>
> We had a bunch of other requests from various people ranging from wholesale
> removal of various parts to making sure there’s no checkpatch warnings
> (Turned out rather hard to do, even though we greatly pared the
> numbers).

checkpatch is a useful guide, but an awful master.

% find drivers/staging/lustre/ -name '*.[ch]' | while read a; do
   ./scripts/checkpatch.pl --max-line-length=10000 --no-summary -f $a
   done|grep '^ERROR' | sort | uniq -c


     17 ERROR: Macros with complex values should be enclosed in parentheses
      2 ERROR: Macros with multiple statements should be enclosed in a do - while loop
     12 ERROR: No #include in ...include/uapi/... should use a uapi/ path prefix
      1 ERROR: space required before the open brace '{'
      8 ERROR: that open brace { should be on the previous line
      1 ERROR: trailing statements should be on next line
      1 ERROR: trailing whitespace

Thanks isn't too bad - obviously nearly there with checkpatch.
Lots more warnings - some might be interesting.

wholesale removal - like the prng, the workqueues, and the
ll_wait_event() macro?  I can do that :-)


>
> I have some patches to make Lustre a lot more monolithic too.

Yes, it annoys me that I cannot build without modules.  I took some
steps towards fixing that and went off down a rabbit hole..
Should be fairly easy.

> People want us to remove our indirections hell so the code is more readable
> (I have some patches that need to be freshened up some that help here a bit,
> but the work is huge.)

But indirections solve all problems :-)

>
> Other requests come out as some of the prior ones get completed due to
> “you need o finish current level of cleanups so that we can see what other
> cleanups are needed, the current code is too bad to see everything” pretty much.

Thanks a lot for your helpful reply.

NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-11 23:44             ` NeilBrown
  0 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-11 23:44 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List

On Thu, Feb 08 2018, Oleg Drokin wrote:

>> On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Thu, Feb 08 2018, Oleg Drokin wrote:
>> 
>>>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>>>> 
>>>> On Tue, Aug 16 2016, James Simmons wrote:
>>> 
>>> my that?s an old patch
>>> 
>>>> 
>> ...
>>>> 
>>>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>>>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>>>> 
>>>> This causes many tests in the 'sanity' test suite to return
>>>> -ENOMEM (that had me puzzled for a while!!).
>>> 
>>> huh? I am not seeing anything of the sort and I was running sanity
>>> all the time until a recent pause (but going to resume).
>> 
>> That does surprised me - I reproduce it every time.
>> I have two VMs running a SLE12-SP2 kernel with patches from
>> lustre-release applied.  These are servers. They have 2 3G virtual disks
>> each.
>> I have two over VMs running current mainline.  These are clients.
>> 
>> I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
>> and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
>> all :-(
>
> More than that, but I am pretty sure James Simmons is running tests all the time too
> (he has a different config, I only have tcp).
>
>>>> This seems to suggest that no-one has been testing the mainline linux
>>>> lustre.
>>>> It also seems to suggest that there is a good chance that there
>>>> are other bugs that have crept in while no-one has really been caring.
>>>> Given that the sanity test suite doesn't complete for me, but just
>>>> hangs (in test_27z I think), that seems particularly likely.
>>> 
>>> Works for me, here?s a run from earlier today on 4.15.0:
>> 
>> Well that's encouraging .. I haven't looked into this one yet - I'm not
>> even sure where to start.
>
> m? debug logs for example (greatly neutered in staging tree, but still useful)?
> try lctl dk and see what?s in there.

Debug logs seem to tell me that some message is being sent to a server
and a reply is being received, but that request we are waiting on
doesn't make progress.  I plan to dig in and learn more about how lustre
rpc works so I have a better changes of interpreted those debug logs.


>
>>> Instead the plan was to clean up the staging client into acceptable state,
>>> move it out of staging, bring in all the missing features and then
>>> drop the client (more or less) from the lustre-release.
>> 
>> That sounds like a great plan.  Any idea why it didn't happen?
>
> Because meeting open-ended demands is hard and certain demands sound like
> ?throw away your X and rewrite it from scratch" (e.g. everything IB-related).

My narrow perspective on IB - from when rdma support was added to the
NFS server - is that it is broken by design and impossible to do
"right".  So different people could easily have different ideas on how
to make the best of a bad lot.
I might try to have a look.

>
> Certain things that sound useless (like the debug subsystem in Lustre)
> is very useful when you have a 10k nodes in a cluster and need to selectively
> pull stuff from a run to debug a complicated cross-node interaction.
> I asked NFS people how do they do it and they don?t have anything that scales
> and usually involves reducing the problem to a much smaller set of nodes first.

the "rpcdebug" stuff that Linux/nfs has is sometimes useful, but some parts
are changing to tracepoints and some parts have remained, which is a
little confusing.

The fact that lustre tracing seems to *always* log everything so that if
something goes wrong you can extract that last few meg(?) of logs seems
really useful.

I discovered - thanks to James -
 https://jira.hpdd.intel.com/browse/LU-8980
 Add tracepoint support to Lustre

which is "closed", but I cannot find any trace of tracepoints in
drivers/staging or in lustre-release.  Maybe I'm confused.
I suspect tracepoints is a good way to go.

>
>> It seems there is a lot of upstream work mixed in with the clean up, and
>> I don't think that really helps anyone.
>
> I don?t understand what you mean here.

Just that I thought that the main point of drivers/staging is to get the
code into a mergable state, and if feature addition happens at the same
time, then priorities get blurred and goals don't get reached.

>
>> Is it at all realistic that the client might be removed from
>> lustre-release?  That might be a good goal to work towards.
>
> Assuming we can bring the whole functionality over - sure.
>
> Of course there?d still be some separate development place and we would
> need to create patches (new features?) for like SuSE and other distros
> and for testing of server features, I guess, but that could just that -
> a side branch somewhere I hope.

Of course - code doesn't go upstream until it is ready.  Lots of
development happens elsewhere.
Of course distros like SUSE would generally rather ship code that was
"ready" and so like to see it upstream.  There is usually room for
negotiation.

>
> It?s not that we are super glad to chase every kernel vendors put out,
> of course it would be much easier if the kernels already included
> a very functional Lustre client.
>
>>>> Might it make sense to instead start cleaning up the code in
>>>> lustre-release so as to make it meet the upstream kernel standards.
>>>> Then when the time is right, the kernel code can be moved *out* of
>>>> lustre-release and *in* to linux.  Then development can continue in
>>>> Linux (just like it does with other Linux filesystems).
>>> 
>>> While we can be cleaning lustre in lustre-release, there are some things
>>> we cannot do as easily, e.g. decoupling Lustre client from the server.
>>> Also it would not attract any reviews from all the janitor or
>>> (more importantly) Al Viro and other people with a sharp eyes.
>>> 
>>>> An added bonus of this is that there is an obvious path to getting
>>>> server support in mainline Linux.  The current situation of client-only
>>>> support seems weird given how interdependent the two are.
>>> 
>>> Given the pushback Lustre client was given I have no hope Lustre server
>>> will get into mainline in my lifetime.
>> 
>> Even if it is horrible it would be nice to have it in staging... I guess
>> the changes required to ext4 prohibit that... I don't suppose it can be
>> made to work with mainline ext4 in a reduced-functionality-and-performance
>> way??
>
> We support unpatched ZFS as a server too! ;)

So that that mean you would expect lustre-server to work with unpatched
ext4? In that case I won't give up hope of seeing the server in mainline
in my lifetime.  Client first though.

> (and if somebody invests the time into it, there was some half-baked btrfs
> backend too I think).
> That said nobody here believes in any success of pushing Lustre server into
> mainline.
> It would just be easier to push the whole server into userspace (And there
> was a project like this in the past, now abandoned because it was mostly
> targeting Solaris anyway).
>
>> I think it would be a lot easier to motivate forward progress if there
>> were a credible end goal of everything being in mainline.
>> 
>>> 
>>>> What do others think?  Is there any chance that the current lustre in
>>>> Linux will ever be more than a poor second-cousin to the external
>>>> lustre-release.  If there isn't, should we just discard it now and move
>>>> on?
>>> 
>>> 
>>> I think many useful cleanups and fixes came from the staging tree at
>>> the very least.
>>> The biggest problem with it all is that we are in staging tree so
>>> we cannot bring it to parity much. And we are in staging tree because
>>> there?s a whole bunch of ?cleanups? requested that take a lot of effort
>>> (in both implementing them and then in finding other ways of achieving
>>> things that were done in old ways before).
>> 
>> Do you have a list of requested cleanups?  I would find that to be
>> useful.
>
> As Greg would tell you, ?if you don?t know what needs to be done,
> let?s just remove the whole thing from staging now?.

Of course, but I don't expect that I will see the same things that
others see.  And if people have gone to the trouble to provide feedback,
it seems polite to record that feed back for all to see.

>
> I assume you saw drivers/staging/lustre/TODO already, it?s only partially done.

Yes - it isn't very detailed though.  Maybe I'll flesh it out with some
of the things you have said.

>
> We had a bunch of other requests from various people ranging from wholesale
> removal of various parts to making sure there?s no checkpatch warnings
> (Turned out rather hard to do, even though we greatly pared the
> numbers).

checkpatch is a useful guide, but an awful master.

% find drivers/staging/lustre/ -name '*.[ch]' | while read a; do
   ./scripts/checkpatch.pl --max-line-length=10000 --no-summary -f $a
   done|grep '^ERROR' | sort | uniq -c


     17 ERROR: Macros with complex values should be enclosed in parentheses
      2 ERROR: Macros with multiple statements should be enclosed in a do - while loop
     12 ERROR: No #include in ...include/uapi/... should use a uapi/ path prefix
      1 ERROR: space required before the open brace '{'
      8 ERROR: that open brace { should be on the previous line
      1 ERROR: trailing statements should be on next line
      1 ERROR: trailing whitespace

Thanks isn't too bad - obviously nearly there with checkpatch.
Lots more warnings - some might be interesting.

wholesale removal - like the prng, the workqueues, and the
ll_wait_event() macro?  I can do that :-)


>
> I have some patches to make Lustre a lot more monolithic too.

Yes, it annoys me that I cannot build without modules.  I took some
steps towards fixing that and went off down a rabbit hole..
Should be fairly easy.

> People want us to remove our indirections hell so the code is more readable
> (I have some patches that need to be freshened up some that help here a bit,
> but the work is huge.)

But indirections solve all problems :-)

>
> Other requests come out as some of the prior ones get completed due to
> ?you need o finish current level of cleanups so that we can see what other
> cleanups are needed, the current code is too bad to see everything? pretty much.

Thanks a lot for your helpful reply.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180212/42b47340/attachment.sig>

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-10 20:57             ` James Simmons
@ 2018-02-11 23:50               ` NeilBrown
  -1 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-11 23:50 UTC (permalink / raw)
  To: James Simmons, Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List


[-- Attachment #1.1: Type: text/plain, Size: 2802 bytes --]

On Sat, Feb 10 2018, James Simmons wrote:

>> > On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
>> > 
>> > On Thu, Feb 08 2018, Oleg Drokin wrote:
>> > 
>> >>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>> >>> 
>> >>> On Tue, Aug 16 2016, James Simmons wrote:
>> >> 
>> >> my that’s an old patch
>> >> 
>> >>> 
>> > ...
>> >>> 
>> >>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>> >>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>> >>> 
>> >>> This causes many tests in the 'sanity' test suite to return
>> >>> -ENOMEM (that had me puzzled for a while!!).
>> >> 
>> >> huh? I am not seeing anything of the sort and I was running sanity
>> >> all the time until a recent pause (but going to resume).
>> > 
>> > That does surprised me - I reproduce it every time.
>> > I have two VMs running a SLE12-SP2 kernel with patches from
>> > lustre-release applied.  These are servers. They have 2 3G virtual disks
>> > each.
>> > I have two over VMs running current mainline.  These are clients.
>> > 
>> > I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
>> > and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
>> > all :-(
>> 
>> More than that, but I am pretty sure James Simmons is running tests all the time too
>> (he has a different config, I only have tcp).
>
> Yes I have been testing and haven't encountered this problem. Let me try 
> the fix you pointed out. 

Yeah, I guess I over reacted a bit in suggesting that no-one can have
been testing - sorry about that.  It seemed really strange though as the
bug was so easy for me to hit.

Maybe - as you suggest in another email - it is due to some
client/server incompatibility.  I guess it is unavoidable with an fs
like lustre to have incompatible protocol changes.  Is there any
mechanism for detecting the version of other peers in the cluster and
refusing to run if versions are incompatible?

If you haven't hit the problem in testing, I suspect you aren't touching
that code path at all.  Maybe put a BUG() call in there to see :-)

>  
>> > Do you have a list of requested cleanups?  I would find that to be
>> > useful.
>> 
>> As Greg would tell you, “if you don’t know what needs to be done,
>> let’s just remove the whole thing from staging now”.
>> 
>> I assume you saw drivers/staging/lustre/TODO already, it’s only partially done.
>
> Actually the complete list is at :
>
> https://jira.hpdd.intel.com/browse/LU-9679
>
> I need to move that to our TODO list. Sorry I have been short on cycles.

Just adding that link to TODO would be a great start.  I might do that
when I next send some patches.

Thanks,
NeilBrown


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-11 23:50               ` NeilBrown
  0 siblings, 0 replies; 188+ messages in thread
From: NeilBrown @ 2018-02-11 23:50 UTC (permalink / raw)
  To: James Simmons, Oleg Drokin
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List

On Sat, Feb 10 2018, James Simmons wrote:

>> > On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@suse.com> wrote:
>> > 
>> > On Thu, Feb 08 2018, Oleg Drokin wrote:
>> > 
>> >>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@suse.com> wrote:
>> >>> 
>> >>> On Tue, Aug 16 2016, James Simmons wrote:
>> >> 
>> >> my that?s an old patch
>> >> 
>> >>> 
>> > ...
>> >>> 
>> >>> Whoever converted it to "!strcmp()" inverted the condition.  This is a
>> >>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>> >>> 
>> >>> This causes many tests in the 'sanity' test suite to return
>> >>> -ENOMEM (that had me puzzled for a while!!).
>> >> 
>> >> huh? I am not seeing anything of the sort and I was running sanity
>> >> all the time until a recent pause (but going to resume).
>> > 
>> > That does surprised me - I reproduce it every time.
>> > I have two VMs running a SLE12-SP2 kernel with patches from
>> > lustre-release applied.  These are servers. They have 2 3G virtual disks
>> > each.
>> > I have two over VMs running current mainline.  These are clients.
>> > 
>> > I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
>> > and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
>> > all :-(
>> 
>> More than that, but I am pretty sure James Simmons is running tests all the time too
>> (he has a different config, I only have tcp).
>
> Yes I have been testing and haven't encountered this problem. Let me try 
> the fix you pointed out. 

Yeah, I guess I over reacted a bit in suggesting that no-one can have
been testing - sorry about that.  It seemed really strange though as the
bug was so easy for me to hit.

Maybe - as you suggest in another email - it is due to some
client/server incompatibility.  I guess it is unavoidable with an fs
like lustre to have incompatible protocol changes.  Is there any
mechanism for detecting the version of other peers in the cluster and
refusing to run if versions are incompatible?

If you haven't hit the problem in testing, I suspect you aren't touching
that code path at all.  Maybe put a BUG() call in there to see :-)

>  
>> > Do you have a list of requested cleanups?  I would find that to be
>> > useful.
>> 
>> As Greg would tell you, ?if you don?t know what needs to be done,
>> let?s just remove the whole thing from staging now?.
>> 
>> I assume you saw drivers/staging/lustre/TODO already, it?s only partially done.
>
> Actually the complete list is at :
>
> https://jira.hpdd.intel.com/browse/LU-9679
>
> I need to move that to our TODO list. Sorry I have been short on cycles.

Just adding that link to TODO would be a great start.  I might do that
when I next send some patches.

Thanks,
NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180212/c983dead/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-11 23:50               ` NeilBrown
@ 2018-02-12  0:06                 ` Oleg Drokin
  -1 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-12  0:06 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel@driverdev.osuosl.org SUBSYSTEM, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Lustre Development List


> On Feb 11, 2018, at 6:50 PM, NeilBrown <neilb@suse.com> wrote:
> 
> Maybe - as you suggest in another email - it is due to some
> client/server incompatibility.  I guess it is unavoidable with an fs
> like lustre to have incompatible protocol changes.  Is there any
> mechanism for detecting the version of other peers in the cluster and
> refusing to run if versions are incompatible?

Yes, client and server exchange “feature bits” at connect time
and only use the subset of features that both can understand.

Bye,
    Oleg
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-12  0:06                 ` Oleg Drokin
  0 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-12  0:06 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel@driverdev.osuosl.org SUBSYSTEM, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Lustre Development List


> On Feb 11, 2018, at 6:50 PM, NeilBrown <neilb@suse.com> wrote:
> 
> Maybe - as you suggest in another email - it is due to some
> client/server incompatibility.  I guess it is unavoidable with an fs
> like lustre to have incompatible protocol changes.  Is there any
> mechanism for detecting the version of other peers in the cluster and
> refusing to run if versions are incompatible?

Yes, client and server exchange ?feature bits? at connect time
and only use the subset of features that both can understand.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-11 23:44             ` NeilBrown
@ 2018-02-12  0:52               ` Oleg Drokin
  -1 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-12  0:52 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List


> On Feb 11, 2018, at 6:44 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Thu, Feb 08 2018, Oleg Drokin wrote:
>> 
>> Certain things that sound useless (like the debug subsystem in Lustre)
>> is very useful when you have a 10k nodes in a cluster and need to selectively
>> pull stuff from a run to debug a complicated cross-node interaction.
>> I asked NFS people how do they do it and they don’t have anything that scales
>> and usually involves reducing the problem to a much smaller set of nodes first.
> 
> the "rpcdebug" stuff that Linux/nfs has is sometimes useful, but some parts
> are changing to tracepoints and some parts have remained, which is a
> little confusing.
> 
> The fact that lustre tracing seems to *always* log everything so that if
> something goes wrong you can extract that last few meg(?) of logs seems
> really useful.

Not really. Lustre also has a bitmask for logs (since otherwise all those prints
are pretty cpu taxing), but what makes those logs better is:
the size is unlimited, not constrained by dmesg buffer size.
You can capture those logs from a crashdump (something I really wish
somebody would implement for tracepoint buffers, but alas, I have not
found anything for this yet - we have a crash plugin to extract lustre
debug logs from a kernel crashdump).
>>> 
>>> Even if it is horrible it would be nice to have it in staging... I guess
>>> the changes required to ext4 prohibit that... I don't suppose it can be
>>> made to work with mainline ext4 in a reduced-functionality-and-performance
>>> way??
>> 
>> We support unpatched ZFS as a server too! ;)
> 
> So that that mean you would expect lustre-server to work with unpatched
> ext4? In that case I won't give up hope of seeing the server in mainline
> in my lifetime.  Client first though.

While unpatched ext4 might in theory be possible, currently it does not export
everything we need from the transaction/fs control perspective.

Bye,
    Oleg
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-12  0:52               ` Oleg Drokin
  0 siblings, 0 replies; 188+ messages in thread
From: Oleg Drokin @ 2018-02-12  0:52 UTC (permalink / raw)
  To: NeilBrown
  Cc: devel, Greg Kroah-Hartman, wang di, Linux Kernel Mailing List,
	Lustre Development List


> On Feb 11, 2018, at 6:44 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Thu, Feb 08 2018, Oleg Drokin wrote:
>> 
>> Certain things that sound useless (like the debug subsystem in Lustre)
>> is very useful when you have a 10k nodes in a cluster and need to selectively
>> pull stuff from a run to debug a complicated cross-node interaction.
>> I asked NFS people how do they do it and they don?t have anything that scales
>> and usually involves reducing the problem to a much smaller set of nodes first.
> 
> the "rpcdebug" stuff that Linux/nfs has is sometimes useful, but some parts
> are changing to tracepoints and some parts have remained, which is a
> little confusing.
> 
> The fact that lustre tracing seems to *always* log everything so that if
> something goes wrong you can extract that last few meg(?) of logs seems
> really useful.

Not really. Lustre also has a bitmask for logs (since otherwise all those prints
are pretty cpu taxing), but what makes those logs better is:
the size is unlimited, not constrained by dmesg buffer size.
You can capture those logs from a crashdump (something I really wish
somebody would implement for tracepoint buffers, but alas, I have not
found anything for this yet - we have a crash plugin to extract lustre
debug logs from a kernel crashdump).
>>> 
>>> Even if it is horrible it would be nice to have it in staging... I guess
>>> the changes required to ext4 prohibit that... I don't suppose it can be
>>> made to work with mainline ext4 in a reduced-functionality-and-performance
>>> way??
>> 
>> We support unpatched ZFS as a server too! ;)
> 
> So that that mean you would expect lustre-server to work with unpatched
> ext4? In that case I won't give up hope of seeing the server in mainline
> in my lifetime.  Client first though.

While unpatched ext4 might in theory be possible, currently it does not export
everything we need from the transaction/fs control perspective.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 188+ messages in thread

* Re: [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
  2018-02-09  1:39     ` [lustre-devel] " NeilBrown
@ 2018-02-12  7:41       ` Dan Carpenter
  -1 siblings, 0 replies; 188+ messages in thread
From: Dan Carpenter @ 2018-02-12  7:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Simmons, Greg Kroah-Hartman, devel, Andreas Dilger,
	Oleg Drokin, wang di, Linux Kernel Mailing List,
	Lustre Development List

On Fri, Feb 09, 2018 at 12:39:18PM +1100, NeilBrown wrote:
> On Tue, Aug 16 2016, James Simmons wrote:
> 
> >  
> > +static inline bool
> > +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
> > +{
> > +	int idx;
> > +
> > +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
> > +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
> > +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
> > +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
> > +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
> > +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
> > +		return false;
> 
> Hi James and all,
>  This patch (8f18c8a48b736c2f in linux) is different from the
>  corresponding patch in lustre-release (60e07b972114df).
> 
> In that patch, the last clause in the 'if' condition is
> 
> +           strcmp(lsm1->lsm_md_pool_name,
> +                     lsm2->lsm_md_pool_name) != 0)
> 
> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!

People think that "if (!strcmp()) " is prefered kernel style but it's
not.

	if (foo != NULL) {

The != NULL is a double negative.  I don't think it adds anything.
Some kernel developers like this style because it's explicit about the
type.  I have never seen any bugs caused by this format or solved by
this format.  Anyway checkpatch complains.

	if (ret != 0) {

In this situation "ret" is not a number, it's an error code.  The != 0
is a double negative and complicated to think about.  Btw, I sort of
prefer "if (ret)" to "if (ret < 0)", not because of style but it's
easier for Smatch.  No subsystems are totally consistent so the (by
definition inconsistent) "if (ret < 0)" checks cause false positives in
Smatch.

	if (len != 0)

This is OK.  "len" is a number.

	if (strcmp(one, two) != 0) {

With strcmp() I really prefer == 0 and != 0 because it works like this:

	strcmp(one, two) == 0  --> means one == two
	strcmp(one, two) < 0   --> means one < two
        strcmp(one, two) != 0  --> means one != two

Either style is accepted in the kernel but I think == 0 just makes so
much sense.  I mostly see bugs from this when people are "fixing" the
style from == 0 to !strcmp() so my sample is very biased.  Normally, if
the original author writes the code any bugs are caught in testing so
either way is going to be bug free.

But the only thing that checkpatch complains about is == NULL and
!= NULL.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 188+ messages in thread

* [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe
@ 2018-02-12  7:41       ` Dan Carpenter
  0 siblings, 0 replies; 188+ messages in thread
From: Dan Carpenter @ 2018-02-12  7:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Simmons, Greg Kroah-Hartman, devel, Andreas Dilger,
	Oleg Drokin, wang di, Linux Kernel Mailing List,
	Lustre Development List

On Fri, Feb 09, 2018 at 12:39:18PM +1100, NeilBrown wrote:
> On Tue, Aug 16 2016, James Simmons wrote:
> 
> >  
> > +static inline bool
> > +lsm_md_eq(const struct lmv_stripe_md *lsm1, const struct lmv_stripe_md *lsm2)
> > +{
> > +	int idx;
> > +
> > +	if (lsm1->lsm_md_magic != lsm2->lsm_md_magic ||
> > +	    lsm1->lsm_md_stripe_count != lsm2->lsm_md_stripe_count ||
> > +	    lsm1->lsm_md_master_mdt_index != lsm2->lsm_md_master_mdt_index ||
> > +	    lsm1->lsm_md_hash_type != lsm2->lsm_md_hash_type ||
> > +	    lsm1->lsm_md_layout_version != lsm2->lsm_md_layout_version ||
> > +	    !strcmp(lsm1->lsm_md_pool_name, lsm2->lsm_md_pool_name))
> > +		return false;
> 
> Hi James and all,
>  This patch (8f18c8a48b736c2f in linux) is different from the
>  corresponding patch in lustre-release (60e07b972114df).
> 
> In that patch, the last clause in the 'if' condition is
> 
> +           strcmp(lsm1->lsm_md_pool_name,
> +                     lsm2->lsm_md_pool_name) != 0)
> 
> Whoever converted it to "!strcmp()" inverted the condition.  This is a
> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!

People think that "if (!strcmp()) " is prefered kernel style but it's
not.

	if (foo != NULL) {

The != NULL is a double negative.  I don't think it adds anything.
Some kernel developers like this style because it's explicit about the
type.  I have never seen any bugs caused by this format or solved by
this format.  Anyway checkpatch complains.

	if (ret != 0) {

In this situation "ret" is not a number, it's an error code.  The != 0
is a double negative and complicated to think about.  Btw, I sort of
prefer "if (ret)" to "if (ret < 0)", not because of style but it's
easier for Smatch.  No subsystems are totally consistent so the (by
definition inconsistent) "if (ret < 0)" checks cause false positives in
Smatch.

	if (len != 0)

This is OK.  "len" is a number.

	if (strcmp(one, two) != 0) {

With strcmp() I really prefer == 0 and != 0 because it works like this:

	strcmp(one, two) == 0  --> means one == two
	strcmp(one, two) < 0   --> means one < two
        strcmp(one, two) != 0  --> means one != two

Either style is accepted in the kernel but I think == 0 just makes so
much sense.  I mostly see bugs from this when people are "fixing" the
style from == 0 to !strcmp() so my sample is very biased.  Normally, if
the original author writes the code any bugs are caught in testing so
either way is going to be bug free.

But the only thing that checkpatch complains about is == NULL and
!= NULL.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 188+ messages in thread

end of thread, other threads:[~2018-02-12  7:41 UTC | newest]

Thread overview: 188+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16 20:18 [PATCH 00/80] staging: lustre: majority of missing fixes for 2.6 release James Simmons
2016-08-16 20:18 ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 01/80] staging: lustre: llite: add md_op_data parameter to ll_get_dir_page James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 02/80] staging: lustre: llite: remove comment from ll_dir_read James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 03/80] staging: lustre: llite: style cleanup for llite_internal.h James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 04/80] staging: lustre: llite: pass inode to ll_release_page James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 05/80] staging: lustre: llite: change remove parameter to bool James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 06/80] staging: lustre: mdc: don't take rpc lock for readdir case James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 07/80] staging: lustre: lmv: remove unused lmv_get_mea function James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 08/80] staging: lustre: lmv: remove duplicate MAX_HASH_* James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 09/80] staging: lustre: lmv: change handling of lmv striping information James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 10/80] staging: lustre: lmv: remove lmv_get_easize James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 11/80] staging: lustre: lmv: replace obd_free_memmd with lmv_free_memmd James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 12/80] staging: lustre: create striped directory James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 13/80] staging: lustre: llite: fix "getdirstripe" to show stripe info James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 14/80] staging: lustre: delete striped directory James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 15/80] staging: lustre: obdclass: fix lmd_parse() to handle comma-separated NIDs James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 16/80] staging: lustre: obdclass: bug fixes for lu_device_type handling James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 17/80] staging: lustre: add ability to migrate inodes James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 18/80] staging: lustre: lmv: fix issue found by Klocwork Insight tool James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 19/80] staging: lustre: libcfs: Only dump log once per sec. to avoid EEXIST James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 20/80] staging: lustre: llite: enable clients to inject error for lfsck James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 21/80] staging: lustre: osc: allow to call brw_commit() multiple times James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 22/80] staging: lustre: llite: a few fixes for migration James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 23/80] staging: lustre: mdc: fixup MDS_SWAP_LAYOUTS ELC handling James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 24/80] staging: lustre: don't need to const __u64 parameters for lustre_idl.h James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 25/80] staging: lustre: const correct FID/OSTID/... helpers James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 26/80] staging: lustre: use bool for several function in lustre_idl.h/lustre_fid.h James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 27/80] staging: lustre: simplify inline functions in lustre_fid.h James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 28/80] staging: lustre: lmv: access lum_stripe_offset as little endian James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 29/80] staging: lustre: lmv: lookup remote migrating object in LMV James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 30/80] staging: lustre: lmv: Ensure lmv_intent_lookup cleans up reqp James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 31/80] staging: lustre: llite: avoid a deadlock in page write James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 32/80] staging: lustre: lov: handle the case of stripe size is not power 2 James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 33/80] staging: lustre: lmv: cleanup req in lmv_getattr_name() James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 34/80] staging: lustre: lmv: rename request to preq " James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 35/80] staging: lustre: obdclass: unified flow control interfaces James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 36/80] staging: lustre: reorder LOV_MAGIC_* definition James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 37/80] staging: lustre: ldlm: flock completion fixes James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 38/80] staging: lustre: move ioctls to lustre_ioctl.h James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 39/80] staging: lustre: llite: add error handler in inode prepare phase James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 40/80] staging: lustre: ptlrpc: Early replies need to honor at_max James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2018-02-09  1:39   ` NeilBrown
2018-02-09  1:39     ` [lustre-devel] " NeilBrown
2018-02-09  2:01     ` Oleg Drokin
2018-02-09  2:01       ` [lustre-devel] " Oleg Drokin
2018-02-09  3:10       ` NeilBrown
2018-02-09  3:10         ` [lustre-devel] " NeilBrown
2018-02-09  3:50         ` Oleg Drokin
2018-02-09  3:50           ` Oleg Drokin
2018-02-10 20:57           ` James Simmons
2018-02-10 20:57             ` James Simmons
2018-02-11 23:50             ` NeilBrown
2018-02-11 23:50               ` NeilBrown
2018-02-12  0:06               ` Oleg Drokin
2018-02-12  0:06                 ` Oleg Drokin
2018-02-11 23:44           ` NeilBrown
2018-02-11 23:44             ` NeilBrown
2018-02-12  0:52             ` Oleg Drokin
2018-02-12  0:52               ` Oleg Drokin
2018-02-10 22:14     ` James Simmons
2018-02-10 22:14       ` [lustre-devel] " James Simmons
2018-02-12  7:41     ` Dan Carpenter
2018-02-12  7:41       ` [lustre-devel] " Dan Carpenter
2016-08-16 20:18 ` [PATCH 42/80] staging: lustre: llite: validate names James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 43/80] staging: lustre: llite: fix inconsistencies of root squash feature James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 44/80] staging: lustre: Remove static declaration in anonymous union James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 45/80] staging: lustre: llite: Fix the deadlock in balance_dirty_pages() James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:18 ` [PATCH 46/80] staging: lustre: llite: Change readdir BRW metrics James Simmons
2016-08-16 20:18   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 47/80] staging: lustre: uapi: reduce scope of lustre_idl.h James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 48/80] staging: lustre: llite: a few fixes about readdir of striped dir James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 49/80] staging: lustre: lmv: validate lock with correct stripe FID James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 50/80] staging: lustre: lov: new pattern flag for partially repaired file James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 51/80] staging: lustre: lmv: Match MDT where the FID locates first James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 52/80] staging: lustre: llite: use the correct mode for striped directory James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 53/80] staging: lustre: obd: rename lsr_padding to lsr_valid James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 54/80] staging: lustre: llite: set dir LOV xattr length variable James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 55/80] staging: lustre: mdt: add mbo_ prefix to members of struct mdt_body James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 56/80] staging: lustre: clio: Reduce memory overhead of per-page allocation James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 57/80] staging: lustre: osc: revise unstable pages accounting James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-10-16 15:14   ` Greg Kroah-Hartman
2016-10-16 15:14     ` [lustre-devel] " Greg Kroah-Hartman
2016-10-16 17:16     ` Oleg Drokin
2016-10-16 17:16       ` [lustre-devel] " Oleg Drokin
2016-08-16 20:19 ` [PATCH 58/80] staging: lustre: mdc: always use D_INFO for debug info when mdc_put_rpc_lock fails James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 59/80] staging: lustre: fld: add fld description documentation James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 60/80] staging: lustre: ldlm: improve ldlm_lock_create() return value James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 61/80] staging: lustre: obdclass: compile issues with variable not being initialized James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 62/80] staging: lustre: obd: limit lu_object cache James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 63/80] staging: lustre: fid: do open-by-fid by default James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 64/80] staging: lustre: ptlrpc: add OBD_CONNECT_UNLINK_CLOSE flag James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 65/80] staging: lustre: llog: keep llog ctxt indices constant James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 66/80] staging: lustre: lmv: try all stripes for unknown hash functions James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 67/80] staging: lustre: ptlrpc: request gets stuck in UNREGISTERING phase James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 68/80] staging: lustre: lmv: build master LMV EA dynamically build via readdir James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 69/80] staging: lustre: osc: Automatically increase the max_dirty_mb James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 70/80] staging: lustre: include: fix one off errors in lustre_id.h James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 71/80] staging: lustre: llite: remove assert for acl refcount James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 72/80] staging: lustre: obd: validate open handle cookies James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 73/80] staging: lustre: lmv: build error with gcc 4.7.0 20110509 James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 74/80] staging: lustre: obd: implement md_read_page James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 75/80] staging: lustre: llite: set op_max_pages James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 76/80] staging: lustre: lnet: Do not drop message when shutting down LNet James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 77/80] staging: lustre: lnet: Correct position of lnet_ni_decref() James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 78/80] staging: lustre: lnet: make connection more stable with packet loss James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 79/80] staging: lustre: lnet: lock improvement for ko2iblnd James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons
2016-08-16 20:19 ` [PATCH 80/80] staging: lustre: lnet: Stop Infinite CON RACE Condition James Simmons
2016-08-16 20:19   ` [lustre-devel] " James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.