From: Yonghong Song <yhs@fb.com>
To: Jiong Wang <jiong.wang@netronome.com>,
"alexei.starovoitov@gmail.com" <alexei.starovoitov@gmail.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"oss-drivers@netronome.com" <oss-drivers@netronome.com>
Subject: Re: [LLVM PATCH] bpf: fix wrong truncation elimination when there is back-edge/loop
Date: Sat, 12 Oct 2019 15:07:59 +0000 [thread overview]
Message-ID: <9f010f55-4025-b27c-f05e-2ae0e00c1492@fb.com> (raw)
In-Reply-To: <1570864740-16857-1-git-send-email-jiong.wang@netronome.com>
On 10/12/19 12:18 AM, Jiong Wang wrote:
> Currently, BPF backend is doing truncation elimination. If one truncation
> is performed on a value defined by narrow loads, then it could be redundant
> given BPF loads zero extend the destination register implicitly.
>
> When the definition of the truncated value is a merging value (PHI node)
> that could come from different code paths, then checks need to be done on
> all possible code paths.
>
> Above described optimization was introduced as r306685, however it doesn't
> work when there is back-edge, for example when loop is used inside BPF
> code.
>
> For example for the following code, a zero-extended value should be stored
> into b[i], but the "and reg, 0xffff" is wrongly eliminated which then
> generates corrupted data.
>
> void cal1(unsigned short *a, unsigned long *b, unsigned int k)
> {
> unsigned short e;
>
> e = *a;
> for (unsigned int i = 0; i < k; i++) {
> b[i] = e;
> e = ~e;
> }
> }
>
> The reason is r306685 was trying to do the PHI node checks inside isel
> DAG2DAG phase, and the checks are done on MachineInstr. This is actually
> wrong, because MachineInstr is being built during isel phase and the
> associated information is not completed yet. A quick search shows none
> target other than BPF is access MachineInstr info during isel phase.
>
> For an PHI node, when you reached it during isel phase, it may have all
> predecessors linked, but not successors. It seems successors are linked to
> PHI node only when doing SelectionDAGISel::FinishBasicBlock and this
> happens later than PreprocessISelDAG hook.
>
> Previously, BPF program doesn't allow loop, there is probably the reason
> why this bug was not exposed.
Thanks for the fix. Yes, previously BPF programs do not allow loops
so I did an implementation in SelectionDAGISel phase taking advantage of
this fact. I do agree a later phase with SSA available is better
to handle all cases (with or without loops).
Will look at the patch detail a little bit later.
>
> This patch therefore fixes the bug by the following approach:
> - The existing truncation elimination code and the associated
> "load_to_vreg_" records are removed.
> - Instead, implement truncation elimination using MachineSSA pass, this
> is where all information are built, and keep the pass together with other
> similar peephole optimizations inside BPFMIPeephole.cpp. Redundant move
> elimination logic is updated accordingly.
> - Unit testcase included + no compilation errors for kernel BPF selftest.
>
> Reported-by: David Beckett <david.beckett@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
> lib/Target/BPF/BPF.h | 2 +
> lib/Target/BPF/BPFISelDAGToDAG.cpp | 169 +++---------------------------
> lib/Target/BPF/BPFMIPeephole.cpp | 187 +++++++++++++++++++++++++++++++++-
> lib/Target/BPF/BPFTargetMachine.cpp | 12 ++-
> test/CodeGen/BPF/remove_truncate_6.ll | 80 +++++++++++++++
> 5 files changed, 283 insertions(+), 167 deletions(-)
> create mode 100644 test/CodeGen/BPF/remove_truncate_6.ll
>
> diff --git a/lib/Target/BPF/BPF.h b/lib/Target/BPF/BPF.h
> index ba21503..6e4f35f 100644
> --- a/lib/Target/BPF/BPF.h
> +++ b/lib/Target/BPF/BPF.h
> @@ -20,12 +20,14 @@ ModulePass *createBPFAbstractMemberAccess(BPFTargetMachine *TM);
> FunctionPass *createBPFISelDag(BPFTargetMachine &TM);
> FunctionPass *createBPFMISimplifyPatchablePass();
> FunctionPass *createBPFMIPeepholePass();
> +FunctionPass *createBPFMIPeepholeTruncElimPass();
> FunctionPass *createBPFMIPreEmitPeepholePass();
> FunctionPass *createBPFMIPreEmitCheckingPass();
>
> void initializeBPFAbstractMemberAccessPass(PassRegistry&);
> void initializeBPFMISimplifyPatchablePass(PassRegistry&);
> void initializeBPFMIPeepholePass(PassRegistry&);
> +void initializeBPFMIPeepholeTruncElimPass(PassRegistry&);
> void initializeBPFMIPreEmitPeepholePass(PassRegistry&);
> void initializeBPFMIPreEmitCheckingPass(PassRegistry&);
> }
> diff --git a/lib/Target/BPF/BPFISelDAGToDAG.cpp b/lib/Target/BPF/BPFISelDAGToDAG.cpp
> index 85fa1f2..f2be0ff 100644
> --- a/lib/Target/BPF/BPFISelDAGToDAG.cpp
> +++ b/lib/Target/BPF/BPFISelDAGToDAG.cpp
> @@ -45,9 +45,7 @@ class BPFDAGToDAGISel : public SelectionDAGISel {
>
> public:
> explicit BPFDAGToDAGISel(BPFTargetMachine &TM)
> - : SelectionDAGISel(TM), Subtarget(nullptr) {
> - curr_func_ = nullptr;
> - }
> + : SelectionDAGISel(TM), Subtarget(nullptr) {}
>
> StringRef getPassName() const override {
> return "BPF DAG->DAG Pattern Instruction Selection";
> @@ -92,14 +90,8 @@ private:
> val_vec_type &Vals, int Offset);
> bool getConstantFieldValue(const GlobalAddressSDNode *Node, uint64_t Offset,
> uint64_t Size, unsigned char *ByteSeq);
> - bool checkLoadDef(unsigned DefReg, unsigned match_load_op);
> -
> // Mapping from ConstantStruct global value to corresponding byte-list values
> std::map<const void *, val_vec_type> cs_vals_;
> - // Mapping from vreg to load memory opcode
> - std::map<unsigned, unsigned> load_to_vreg_;
> - // Current function
> - const Function *curr_func_;
> };
> } // namespace
>
> @@ -325,32 +317,13 @@ void BPFDAGToDAGISel::PreprocessLoad(SDNode *Node,
> }
>
> void BPFDAGToDAGISel::PreprocessISelDAG() {
> - // Iterate through all nodes, interested in the following cases:
> + // Iterate through all nodes, interested in the following case:
> //
> // . loads from ConstantStruct or ConstantArray of constructs
> // which can be turns into constant itself, with this we can
> // avoid reading from read-only section at runtime.
> //
> - // . reg truncating is often the result of 8/16/32bit->64bit or
> - // 8/16bit->32bit conversion. If the reg value is loaded with
> - // masked byte width, the AND operation can be removed since
> - // BPF LOAD already has zero extension.
> - //
> - // This also solved a correctness issue.
> - // In BPF socket-related program, e.g., __sk_buff->{data, data_end}
> - // are 32-bit registers, but later on, kernel verifier will rewrite
> - // it with 64-bit value. Therefore, truncating the value after the
> - // load will result in incorrect code.
> -
> - // clear the load_to_vreg_ map so that we have a clean start
> - // for this function.
> - if (!curr_func_) {
> - curr_func_ = FuncInfo->Fn;
> - } else if (curr_func_ != FuncInfo->Fn) {
> - load_to_vreg_.clear();
> - curr_func_ = FuncInfo->Fn;
> - }
> -
> + // . Removing redundant AND for intrinsic narrow loads.
> for (SelectionDAG::allnodes_iterator I = CurDAG->allnodes_begin(),
> E = CurDAG->allnodes_end();
> I != E;) {
> @@ -358,8 +331,6 @@ void BPFDAGToDAGISel::PreprocessISelDAG() {
> unsigned Opcode = Node->getOpcode();
> if (Opcode == ISD::LOAD)
> PreprocessLoad(Node, I);
> - else if (Opcode == ISD::CopyToReg)
> - PreprocessCopyToReg(Node);
> else if (Opcode == ISD::AND)
> PreprocessTrunc(Node, I);
> }
> @@ -491,36 +462,6 @@ bool BPFDAGToDAGISel::fillConstantStruct(const DataLayout &DL,
> return true;
> }
>
> -void BPFDAGToDAGISel::PreprocessCopyToReg(SDNode *Node) {
> - const RegisterSDNode *RegN = dyn_cast<RegisterSDNode>(Node->getOperand(1));
> - if (!RegN || !Register::isVirtualRegister(RegN->getReg()))
> - return;
> -
> - const LoadSDNode *LD = dyn_cast<LoadSDNode>(Node->getOperand(2));
> - if (!LD)
> - return;
> -
> - // Assign a load value to a virtual register. record its load width
> - unsigned mem_load_op = 0;
> - switch (LD->getMemOperand()->getSize()) {
> - default:
> - return;
> - case 4:
> - mem_load_op = BPF::LDW;
> - break;
> - case 2:
> - mem_load_op = BPF::LDH;
> - break;
> - case 1:
> - mem_load_op = BPF::LDB;
> - break;
> - }
> -
> - LLVM_DEBUG(dbgs() << "Find Load Value to VReg "
> - << Register::virtReg2Index(RegN->getReg()) << '\n');
> - load_to_vreg_[RegN->getReg()] = mem_load_op;
> -}
> -
> void BPFDAGToDAGISel::PreprocessTrunc(SDNode *Node,
> SelectionDAG::allnodes_iterator &I) {
> ConstantSDNode *MaskN = dyn_cast<ConstantSDNode>(Node->getOperand(1));
> @@ -534,112 +475,26 @@ void BPFDAGToDAGISel::PreprocessTrunc(SDNode *Node,
> // which the generic optimizer doesn't understand their results are
> // zero extended.
> SDValue BaseV = Node->getOperand(0);
> - if (BaseV.getOpcode() == ISD::INTRINSIC_W_CHAIN) {
> - unsigned IntNo = cast<ConstantSDNode>(BaseV->getOperand(1))->getZExtValue();
> - uint64_t MaskV = MaskN->getZExtValue();
> -
> - if (!((IntNo == Intrinsic::bpf_load_byte && MaskV == 0xFF) ||
> - (IntNo == Intrinsic::bpf_load_half && MaskV == 0xFFFF) ||
> - (IntNo == Intrinsic::bpf_load_word && MaskV == 0xFFFFFFFF)))
> - return;
> -
> - LLVM_DEBUG(dbgs() << "Remove the redundant AND operation in: ";
> - Node->dump(); dbgs() << '\n');
> -
> - I--;
> - CurDAG->ReplaceAllUsesWith(SDValue(Node, 0), BaseV);
> - I++;
> - CurDAG->DeleteNode(Node);
> -
> - return;
> - }
> -
> - // Multiple basic blocks case.
> - if (BaseV.getOpcode() != ISD::CopyFromReg)
> + if (BaseV.getOpcode() != ISD::INTRINSIC_W_CHAIN)
> return;
>
> - unsigned match_load_op = 0;
> - switch (MaskN->getZExtValue()) {
> - default:
> - return;
> - case 0xFFFFFFFF:
> - match_load_op = BPF::LDW;
> - break;
> - case 0xFFFF:
> - match_load_op = BPF::LDH;
> - break;
> - case 0xFF:
> - match_load_op = BPF::LDB;
> - break;
> - }
> + unsigned IntNo = cast<ConstantSDNode>(BaseV->getOperand(1))->getZExtValue();
> + uint64_t MaskV = MaskN->getZExtValue();
>
> - const RegisterSDNode *RegN =
> - dyn_cast<RegisterSDNode>(BaseV.getNode()->getOperand(1));
> - if (!RegN || !Register::isVirtualRegister(RegN->getReg()))
> + if (!((IntNo == Intrinsic::bpf_load_byte && MaskV == 0xFF) ||
> + (IntNo == Intrinsic::bpf_load_half && MaskV == 0xFFFF) ||
> + (IntNo == Intrinsic::bpf_load_word && MaskV == 0xFFFFFFFF)))
> return;
> - unsigned AndOpReg = RegN->getReg();
> - LLVM_DEBUG(dbgs() << "Examine " << printReg(AndOpReg) << '\n');
> -
> - // Examine the PHI insns in the MachineBasicBlock to found out the
> - // definitions of this virtual register. At this stage (DAG2DAG
> - // transformation), only PHI machine insns are available in the machine basic
> - // block.
> - MachineBasicBlock *MBB = FuncInfo->MBB;
> - MachineInstr *MII = nullptr;
> - for (auto &MI : *MBB) {
> - for (unsigned i = 0; i < MI.getNumOperands(); ++i) {
> - const MachineOperand &MOP = MI.getOperand(i);
> - if (!MOP.isReg() || !MOP.isDef())
> - continue;
> - Register Reg = MOP.getReg();
> - if (Register::isVirtualRegister(Reg) && Reg == AndOpReg) {
> - MII = &MI;
> - break;
> - }
> - }
> - }
> -
> - if (MII == nullptr) {
> - // No phi definition in this block.
> - if (!checkLoadDef(AndOpReg, match_load_op))
> - return;
> - } else {
> - // The PHI node looks like:
> - // %2 = PHI %0, <%bb.1>, %1, <%bb.3>
> - // Trace each incoming definition, e.g., (%0, %bb.1) and (%1, %bb.3)
> - // The AND operation can be removed if both %0 in %bb.1 and %1 in
> - // %bb.3 are defined with a load matching the MaskN.
> - LLVM_DEBUG(dbgs() << "Check PHI Insn: "; MII->dump(); dbgs() << '\n');
> - unsigned PrevReg = -1;
> - for (unsigned i = 0; i < MII->getNumOperands(); ++i) {
> - const MachineOperand &MOP = MII->getOperand(i);
> - if (MOP.isReg()) {
> - if (MOP.isDef())
> - continue;
> - PrevReg = MOP.getReg();
> - if (!Register::isVirtualRegister(PrevReg))
> - return;
> - if (!checkLoadDef(PrevReg, match_load_op))
> - return;
> - }
> - }
> - }
>
> - LLVM_DEBUG(dbgs() << "Remove the redundant AND operation in: "; Node->dump();
> - dbgs() << '\n');
> + LLVM_DEBUG(dbgs() << "Remove the redundant AND operation in: ";
> + Node->dump(); dbgs() << '\n');
>
> I--;
> CurDAG->ReplaceAllUsesWith(SDValue(Node, 0), BaseV);
> I++;
> CurDAG->DeleteNode(Node);
> -}
> -
> -bool BPFDAGToDAGISel::checkLoadDef(unsigned DefReg, unsigned match_load_op) {
> - auto it = load_to_vreg_.find(DefReg);
> - if (it == load_to_vreg_.end())
> - return false; // The definition of register is not exported yet.
>
> - return it->second == match_load_op;
> + return;
> }
>
> FunctionPass *llvm::createBPFISelDag(BPFTargetMachine &TM) {
> diff --git a/lib/Target/BPF/BPFMIPeephole.cpp b/lib/Target/BPF/BPFMIPeephole.cpp
> index fafd2f7..2b52ce0 100644
> --- a/lib/Target/BPF/BPFMIPeephole.cpp
> +++ b/lib/Target/BPF/BPFMIPeephole.cpp
> @@ -71,7 +71,7 @@ void BPFMIPeephole::initialize(MachineFunction &MFParm) {
> MF = &MFParm;
> MRI = &MF->getRegInfo();
> TII = MF->getSubtarget<BPFSubtarget>().getInstrInfo();
> - LLVM_DEBUG(dbgs() << "*** BPF MachineSSA peephole pass ***\n\n");
> + LLVM_DEBUG(dbgs() << "*** BPF MachineSSA ZEXT Elim peephole pass ***\n\n");
> }
>
> bool BPFMIPeephole::isMovFrom32Def(MachineInstr *MovMI)
> @@ -186,7 +186,8 @@ bool BPFMIPeephole::eliminateZExtSeq(void) {
> } // end default namespace
>
> INITIALIZE_PASS(BPFMIPeephole, DEBUG_TYPE,
> - "BPF MachineSSA Peephole Optimization", false, false)
> + "BPF MachineSSA Peephole Optimization For ZEXT Eliminate",
> + false, false)
>
> char BPFMIPeephole::ID = 0;
> FunctionPass* llvm::createBPFMIPeepholePass() { return new BPFMIPeephole(); }
> @@ -253,12 +254,16 @@ bool BPFMIPreEmitPeephole::eliminateRedundantMov(void) {
> // enabled. The special type cast insn MOV_32_64 involves different
> // register class on src (i32) and dst (i64), RA could generate useless
> // instruction due to this.
> - if (MI.getOpcode() == BPF::MOV_32_64) {
> + unsigned Opcode = MI.getOpcode();
> + if (Opcode == BPF::MOV_32_64 ||
> + Opcode == BPF::MOV_rr || Opcode == BPF::MOV_rr_32) {
> Register dst = MI.getOperand(0).getReg();
> - Register dst_sub = TRI->getSubReg(dst, BPF::sub_32);
> Register src = MI.getOperand(1).getReg();
>
> - if (dst_sub != src)
> + if (Opcode == BPF::MOV_32_64)
> + dst = TRI->getSubReg(dst, BPF::sub_32);
> +
> + if (dst != src)
> continue;
>
> ToErase = &MI;
> @@ -281,3 +286,175 @@ FunctionPass* llvm::createBPFMIPreEmitPeepholePass()
> {
> return new BPFMIPreEmitPeephole();
> }
> +
> +STATISTIC(TruncElemNum, "Number of truncation eliminated");
> +
> +namespace {
> +
> +struct BPFMIPeepholeTruncElim : public MachineFunctionPass {
> +
> + static char ID;
> + const BPFInstrInfo *TII;
> + MachineFunction *MF;
> + MachineRegisterInfo *MRI;
> +
> + BPFMIPeepholeTruncElim() : MachineFunctionPass(ID) {
> + initializeBPFMIPeepholeTruncElimPass(*PassRegistry::getPassRegistry());
> + }
> +
> +private:
> + // Initialize class variables.
> + void initialize(MachineFunction &MFParm);
> +
> + bool eliminateTruncSeq(void);
> +
> +public:
> +
> + // Main entry point for this pass.
> + bool runOnMachineFunction(MachineFunction &MF) override {
> + if (skipFunction(MF.getFunction()))
> + return false;
> +
> + initialize(MF);
> +
> + return eliminateTruncSeq();
> + }
> +};
> +
> +static bool TruncSizeCompatible(int TruncSize, unsigned opcode)
> +{
> + if (TruncSize == 1)
> + return opcode == BPF::LDB || opcode == BPF::LDB32;
> +
> + if (TruncSize == 2)
> + return opcode == BPF::LDH || opcode == BPF::LDH32;
> +
> + if (TruncSize == 4)
> + return opcode == BPF::LDW || opcode == BPF::LDW32;
> +
> + return false;
> +}
> +
> +// Initialize class variables.
> +void BPFMIPeepholeTruncElim::initialize(MachineFunction &MFParm) {
> + MF = &MFParm;
> + MRI = &MF->getRegInfo();
> + TII = MF->getSubtarget<BPFSubtarget>().getInstrInfo();
> + LLVM_DEBUG(dbgs() << "*** BPF MachineSSA TRUNC Elim peephole pass ***\n\n");
> +}
> +
> +// Reg truncating is often the result of 8/16/32bit->64bit or
> +// 8/16bit->32bit conversion. If the reg value is loaded with
> +// masked byte width, the AND operation can be removed since
> +// BPF LOAD already has zero extension.
> +//
> +// This also solved a correctness issue.
> +// In BPF socket-related program, e.g., __sk_buff->{data, data_end}
> +// are 32-bit registers, but later on, kernel verifier will rewrite
> +// it with 64-bit value. Therefore, truncating the value after the
> +// load will result in incorrect code.
> +bool BPFMIPeepholeTruncElim::eliminateTruncSeq(void) {
> + MachineInstr* ToErase = nullptr;
> + bool Eliminated = false;
> +
> + for (MachineBasicBlock &MBB : *MF) {
> + for (MachineInstr &MI : MBB) {
> + // The second insn to remove if the eliminate candidate is a pair.
> + MachineInstr *MI2 = nullptr;;
> + Register DstReg, SrcReg;
> + MachineInstr *DefMI;
> + int TruncSize = -1;
> +
> + // If the previous instruction was marked for elimination, remove it now.
> + if (ToErase) {
> + ToErase->eraseFromParent();
> + ToErase = nullptr;
> + }
> +
> + // AND A, 0xFFFFFFFF will be turned into SLL/SRL pair due to immediate
> + // for BPF ANDI is i32, and this case only happens on ALU64.
> + if (MI.getOpcode() == BPF::SRL_ri &&
> + MI.getOperand(2).getImm() == 32) {
> + SrcReg = MI.getOperand(1).getReg();
> + MI2 = MRI->getVRegDef(SrcReg);
> + DstReg = MI.getOperand(0).getReg();
> +
> + if (!MI2 ||
> + MI2->getOpcode() != BPF::SLL_ri ||
> + MI2->getOperand(2).getImm() != 32)
> + continue;
> +
> + // Update SrcReg.
> + SrcReg = MI2->getOperand(1).getReg();
> + DefMI = MRI->getVRegDef(SrcReg);
> + if (DefMI)
> + TruncSize = 4;
> + } else if (MI.getOpcode() == BPF::AND_ri ||
> + MI.getOpcode() == BPF::AND_ri_32) {
> + SrcReg = MI.getOperand(1).getReg();
> + DstReg = MI.getOperand(0).getReg();
> + DefMI = MRI->getVRegDef(SrcReg);
> +
> + if (!DefMI)
> + continue;
> +
> + int64_t imm = MI.getOperand(2).getImm();
> + if (imm == 0xff)
> + TruncSize = 1;
> + else if (imm == 0xffff)
> + TruncSize = 2;
> + }
> +
> + if (TruncSize == -1)
> + continue;
> +
> + // The definition is PHI node, check all inputs.
> + if (DefMI->isPHI()) {
> + bool CheckFail = false;
> +
> + for (unsigned i = 1, e = DefMI->getNumOperands(); i < e; i += 2) {
> + MachineOperand &opnd = DefMI->getOperand(i);
> + if (!opnd.isReg())
> + return false;
> +
> + MachineInstr *PhiDef = MRI->getVRegDef(opnd.getReg());
> + if (!PhiDef || PhiDef->isPHI() ||
> + !TruncSizeCompatible(TruncSize, PhiDef->getOpcode())) {
> + CheckFail = true;
> + break;
> + }
> + }
> +
> + if (CheckFail)
> + continue;
> + } else if (!TruncSizeCompatible(TruncSize, DefMI->getOpcode())) {
> + continue;
> + }
> +
> + BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(BPF::MOV_rr), DstReg)
> + .addReg(SrcReg);
> +
> + if (MI2)
> + MI2->eraseFromParent();
> +
> + // Mark it to ToErase, and erase in the next iteration.
> + ToErase = &MI;
> + TruncElemNum++;
> + Eliminated = true;
> + }
> + }
> +
> + return Eliminated;
> +}
> +
> +} // end default namespace
> +
> +INITIALIZE_PASS(BPFMIPeepholeTruncElim, "bpf-mi-trunc-elim",
> + "BPF MachineSSA Peephole Optimization For TRUNC Eliminate",
> + false, false)
> +
> +char BPFMIPeepholeTruncElim::ID = 0;
> +FunctionPass* llvm::createBPFMIPeepholeTruncElimPass()
> +{
> + return new BPFMIPeepholeTruncElim();
> +}
> diff --git a/lib/Target/BPF/BPFTargetMachine.cpp b/lib/Target/BPF/BPFTargetMachine.cpp
> index d940ac9..0c4f2c7 100644
> --- a/lib/Target/BPF/BPFTargetMachine.cpp
> +++ b/lib/Target/BPF/BPFTargetMachine.cpp
> @@ -36,6 +36,7 @@ extern "C" void LLVMInitializeBPFTarget() {
> PassRegistry &PR = *PassRegistry::getPassRegistry();
> initializeBPFAbstractMemberAccessPass(PR);
> initializeBPFMIPeepholePass(PR);
> + initializeBPFMIPeepholeTruncElimPass(PR);
> }
>
> // DataLayout: little or big endian
> @@ -115,15 +116,16 @@ void BPFPassConfig::addMachineSSAOptimization() {
> TargetPassConfig::addMachineSSAOptimization();
>
> const BPFSubtarget *Subtarget = getBPFTargetMachine().getSubtargetImpl();
> - if (Subtarget->getHasAlu32() && !DisableMIPeephole)
> - addPass(createBPFMIPeepholePass());
> + if (!DisableMIPeephole) {
> + if (Subtarget->getHasAlu32())
> + addPass(createBPFMIPeepholePass());
> + addPass(createBPFMIPeepholeTruncElimPass());
> + }
> }
>
> void BPFPassConfig::addPreEmitPass() {
> - const BPFSubtarget *Subtarget = getBPFTargetMachine().getSubtargetImpl();
> -
> addPass(createBPFMIPreEmitCheckingPass());
> if (getOptLevel() != CodeGenOpt::None)
> - if (Subtarget->getHasAlu32() && !DisableMIPeephole)
> + if (!DisableMIPeephole)
> addPass(createBPFMIPreEmitPeepholePass());
> }
> diff --git a/test/CodeGen/BPF/remove_truncate_6.ll b/test/CodeGen/BPF/remove_truncate_6.ll
> new file mode 100644
> index 0000000..6577afb
> --- /dev/null
> +++ b/test/CodeGen/BPF/remove_truncate_6.ll
> @@ -0,0 +1,80 @@
> +; RUN: llc < %s -march=bpf -verify-machineinstrs | FileCheck %s
> +; RUN: llc < %s -march=bpf -mattr=+alu32 -verify-machineinstrs | FileCheck --check-prefix=CHECK-32 %s
> +;
> +; void cal1(unsigned short *a, unsigned long *b, unsigned int k)
> +; {
> +; unsigned short e;
> +;
> +; e = *a;
> +; for (unsigned int i = 0; i < k; i++) {
> +; b[i] = e;
> +; e = ~e;
> +; }
> +; }
> +;
> +; void cal2(unsigned short *a, unsigned int *b, unsigned int k)
> +; {
> +; unsigned short e;
> +;
> +; e = *a;
> +; for (unsigned int i = 0; i < k; i++) {
> +; b[i] = e;
> +; e = ~e;
> +; }
> +; }
> +
> +; Function Attrs: nofree norecurse nounwind optsize
> +define dso_local void @cal1(i16* nocapture readonly %a, i64* nocapture %b, i32 %k) local_unnamed_addr #0 {
> +entry:
> + %cmp8 = icmp eq i32 %k, 0
> + br i1 %cmp8, label %for.cond.cleanup, label %for.body.preheader
> +
> +for.body.preheader: ; preds = %entry
> + %0 = load i16, i16* %a, align 2
> + %wide.trip.count = zext i32 %k to i64
> + br label %for.body
> +
> +for.cond.cleanup: ; preds = %for.body, %entry
> + ret void
> +
> +for.body: ; preds = %for.body, %for.body.preheader
> + %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
> + %e.09 = phi i16 [ %0, %for.body.preheader ], [ %neg, %for.body ]
> + %conv = zext i16 %e.09 to i64
> + %arrayidx = getelementptr inbounds i64, i64* %b, i64 %indvars.iv
> +; CHECK: r{{[0-9]+}} &= 65535
> +; CHECK-32: r{{[0-9]+}} &= 65535
> + store i64 %conv, i64* %arrayidx, align 8
> + %neg = xor i16 %e.09, -1
> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
> + %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
> + br i1 %exitcond, label %for.cond.cleanup, label %for.body
> +}
> +
> +; Function Attrs: nofree norecurse nounwind optsize
> +define dso_local void @cal2(i16* nocapture readonly %a, i32* nocapture %b, i32 %k) local_unnamed_addr #0 {
> +entry:
> + %cmp8 = icmp eq i32 %k, 0
> + br i1 %cmp8, label %for.cond.cleanup, label %for.body.preheader
> +
> +for.body.preheader: ; preds = %entry
> + %0 = load i16, i16* %a, align 2
> + %wide.trip.count = zext i32 %k to i64
> + br label %for.body
> +
> +for.cond.cleanup: ; preds = %for.body, %entry
> + ret void
> +
> +for.body: ; preds = %for.body, %for.body.preheader
> + %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
> + %e.09 = phi i16 [ %0, %for.body.preheader ], [ %neg, %for.body ]
> + %conv = zext i16 %e.09 to i32
> + %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
> +; CHECK: r{{[0-9]+}} &= 65535
> +; CHECK-32: w{{[0-9]+}} &= 65535
> + store i32 %conv, i32* %arrayidx, align 4
> + %neg = xor i16 %e.09, -1
> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
> + %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
> + br i1 %exitcond, label %for.cond.cleanup, label %for.body
> +}
>
next prev parent reply other threads:[~2019-10-12 15:08 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-12 7:18 [LLVM PATCH] bpf: fix wrong truncation elimination when there is back-edge/loop Jiong Wang
2019-10-12 15:07 ` Yonghong Song [this message]
2019-10-15 2:41 ` Yonghong Song
2019-10-15 10:03 ` Jiong Wang
2019-10-16 15:33 ` Jiong Wang
2019-10-16 19:03 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f010f55-4025-b27c-f05e-2ae0e00c1492@fb.com \
--to=yhs@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=jiong.wang@netronome.com \
--cc=oss-drivers@netronome.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).